Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
AGPL-3.0 License
Beta Software
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.
Requirements:
cffi
, numpy
Models:
Model | Download Size |
---|---|
Facebook Wav2Vec2 2.0 Base (960h) | 360 MB |
Facebook Wav2Vec2 2.0 Large (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 Self (960h) | 1.18 GB |
from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')
import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())
assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'
Also contains a simple CLI interface for recognizing wav
files:
$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...
positional arguments:
{decode} sub-command
decode decode one or more WAV files
optional arguments:
-h, --help show this help message and exit
Recommended installation via wheel from pip (requires a recent version of pip):
python -m pip install wav2vec2_stt
See setup.py for more details on building it yourself.
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.