Speech to text using whisper, used in....
GPL-3.0 License
CUDA is highly recommended for this! CPU is about 3x slower. Read more about speed comparison here: https://github.com/guillaumekln/faster-whisper#benchmark
python -m venv venv
source venv/bin/activate # linux, or...
venv/Scripts/activate # for windows
pip install -r requirements.txt
Head over to pytorch.org, select:
Then run the given command to install pytorch.
Copy SAMPLE_config.json
to config.json
and change the api endpoints.
Make sure to have ffmpeg
installed.
Just running the main script will mostly do all you need. Transcribed scripts will be saved in transcripts/
.
python main.py -e prod transcribe
Using the large whisper model (default) will result in the best speech to text and requires ~6GB GPU memory. Use python main.py transcribe -h
so see all available models.
usage: Wubbl0rz Archiv Transcribe [-h] [-c CONFIG] -e {prod,dev} [-o OUTPUT]
{transcribe,post} ...
positional arguments:
{transcribe,post} Available commands
transcribe Run whisper to transcribe vods to text
post Post available transcriptions
options:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Path to config.json
-e {prod,dev}, --environment {prod,dev}
Target environment
-o OUTPUT, --output OUTPUT
Output directory for transcripts