🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
MPL-2.0 License
Bot releases are visible (Hide)
This is the 1.4.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is backwards compatible with previous 1.x versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started with Coqui STT 1.4.0 by following the steps in our documentation.
Compatible pre-trained models are available in the Coqui Model Zoo.
We also include example audio files:
which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):
coqui-stt-1.4.0-checkpoint.tar.gz
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.
Added experimental WebAssembly support
With the new WASM package you can deploy Coqui STT directly in the browser.
Added ARMv7 and AArch64 Python wheels for Python 3.7 and 3.9
Migrated .NET bindings to .NET Framework 4.8
Rewritten audio processing logic in iOS demo app
Documentation is available on stt.readthedocs.io.
We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!
Published by github-actions[bot] about 2 years ago
Published by github-actions[bot] about 2 years ago
Published by github-actions[bot] about 2 years ago
Published by github-actions[bot] about 2 years ago
Published by github-actions[bot] about 2 years ago
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
This is the 1.3.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is backwards compatible with previous 1.x versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.3.0 by following the steps in our documentation.
Compatible pre-trained models are available in the Coqui Model Zoo.
We also include example audio files:
which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):
coqui-stt-1.3.0-checkpoint.tar.gz
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.
Added new experimental APIs for loading Coqui STT models from memory buffers
This allows loading models without writing them to disk first, which can be useful for dynamic model loading as well as for handling packaging in mobile platforms
Added ElectronJS 16 support
Rewritten audio processing logic in iOS demo app
Added pre-built binaries for iOS/Swift bindings in CI
With these two changes we're hoping to get more feedback from iOS developers on our Swift bindings and pre-built STT frameworks - how can we best package and distribute the bindings so that it feels native to Swift/iOS developers? If you have any feedback, join our Gitter room!
Extended the Multilingual LibriSpeech importer to support all languages in the dataset
Supported languages: English, German, Dutch, French, Spanish, Italian, Portuguese, Polish
Exposed full metadata information for decoded samples when using the coqui_stt_ctcdecoder Python package
This allows access to the entire information returned by the decoder in training code, meaning experimenting with new model architectures doesn't require adapting the C++ inference library to test your changes.
Added initial support for Apple Silicon in our pre-built binaries
C/C++ pre-built libraries are universal, language bindings will be updated soon
Added support for FLAC files in training
Documentation is available on stt.readthedocs.io.
We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
Published by github-actions[bot] over 2 years ago
This is the 1.2.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is backwards compatible with previous 1.x versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.2.0 by following the steps in our documentation.
Compatible pre-trained models are available in the Coqui Model Zoo.
We also include example audio files:
which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):
coqui-stt-1.2.0-checkpoint.tar.gz
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.
Documentation is available on stt.readthedocs.io.
We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!
Published by github-actions[bot] almost 3 years ago
This is the 1.1.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.1.0 by following the steps in our documentation.
Compatible pre-trained models are available in the Coqui Model Zoo.
We also include example audio files:
which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):
coqui-stt-1.1.0-checkpoint.tar.gz
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.
Documentation is available on stt.readthedocs.io.
We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!
Published by github-actions[bot] almost 3 years ago
Published by github-actions[bot] almost 3 years ago
Published by github-actions[bot] about 3 years ago
This is the 1.0.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the inference APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Android. You can get started today with Coqui STT 1.0.0 by following the steps in our documentation.
This release includes pre-trained English models, available in the Coqui Model Zoo:
all under the Apache 2.0 license.
The acoustic models were trained on American English data with synthetic noise augmentation. The model achieves a 4.5% word error rate on the LibriSpeech clean test corpus and 13.6% word error rate on the LibriSpeech other test corpus with the largest release language model.
Note that the model currently performs best in low-noise environments with clear recordings. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to further fine tune the model to meet their intended use-case.
We also include example audio files:
which can be used to test the engine, and checkpoint files for the English model:
coqui-stt-1.0.0-checkpoint.tar.gz
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 NVIDIA A100 GPUs each with 40GB of VRAM), along with the full training hyperparameters. The full training configuration in JSON format is available here.
The datasets used were:
The optimal lm_alpha
and lm_beta
values with respect to the Common Voice 7.0 (custom Coqui splits) and a large vocabulary language model:
Documentation is available on stt.readthedocs.io.
We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!
Published by github-actions[bot] about 3 years ago
Test automatic release notes.