Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.
MIT License
Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.
sudo apt install ffmpeg
conda create --name transcribo python=3.10
conda activate transcribo
nvcc --version
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip uninstall onnxruntime
pip install --force-reinstall onnxruntime-gpu
pip install --force-reinstall -v "numpy==1.26.3"
.env
file and add your access token. See the file .env_example
. HF_AUTH_TOKEN = ...
.env_example
in your .env
file for your specific configuration. Make sure that your .env
file is in your .gitignore
.docker-compose up -d --build
Start the worker and frontend scripts:
tmux new -s transcribe_worker
conda activate transcribo
python worker.py
CTRL-B
and D
.tmux new -s transcribe_frontend
conda activate transcribo
python main.py
tmux attach -t transcribe_worker
and tmux attach -t transcribe_frontend
run_gui.bat
, run_transcribo.bat
and run_worker.bat
Description | |
---|---|
ONLINE | Boolean. If TRUE, exposes the frontend in your network. For https, you must provide a SSL cert and key file. See the nicegui documentation for more information |
SSL_CERTFILE | String. The file path to the SSL cert file |
SSL_KEYFILE | String. The file path to the SSL key file |
STORAGE_SECRET | String. Secret key for cookie-based identification of users |
ROOT | String. path to main.py and worker.py |
WINDOWS | Boolean. Set TRUE if you are running this application on Windows. |
DEVICE | String. 'cuda' if you are using a GPU. 'cpu' otherwise. |
ADDITIONAL_SPEAKERS | Integer. Number of additional speakers provied in the editor |
BATCH_SIZE | Integer. Batch size for Whisper inference. Recommended batch size is 4 with 8GB VRAM and 32 with 16GB VRAM. |
This application provides advanced transcription capabilities for confidential audio and video files using the state-of-the-art Whisper v3 large model (non-quantized). It offers top-tier transcription quality without licensing or usage fees, even for Swiss German.
This project is a collaborative effort of these people of the cantonal administration of Zurich:
Please share your feedback and let us know how you use the app in your institution. You can write an email or share your ideas by opening an issue or a pull requests.
Please note, we use Ruff for linting and code formatting with default settings.
This transcription software (the Software) incorporates the open-source model Whisper Large v3 (the Model) and has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software as well as of the underlying Model complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.