A real-time speech-to-text transcriber using the Whisper model, designed for efficiency and ease of use in the console. This tool leverages the faster_whisper library and Rich to provide a seamless user experience for transcribing audio inputs on the fly.
MIT License
A real-time speech-to-text transcriber using the Whisper model, designed for efficiency and ease of use in the console. This tool leverages the faster_whisper library and Rich to provide a seamless user experience for transcribing audio inputs on the fly.
Whisper is a state-of-the-art model for automatic speech recognition (ASR). This project utilizes the Whisper model and provides a practical interface for capturing live audio input, transcribing it, and displaying the results in real time. It's designed to be flexible, allowing the user to choose the language of transcription and offering a buffer system to handle continuous speech.
To install and run this project, follow these steps:
Clone the repo:
git clone https://github.com/nexuslux/Realtime-Whisper-Console-Transcriber
cd WhisperConsoleTranscriber
Set up a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
Install required dependencies:
pip install faster_whisper speechrecognition rich
Run the script:
python script_name.py
Follow the prompts:
Start speaking or playing audio:
Stop listening:
CTRL + C
to stop the transcription process.python transcribe.py
After this you will be asked to enter the main language. • Enter the language code: en • Start speaking. The application will display transcribed text in the console. • End the session with CTRL + C. The output will be saved to a text file in the Downloads folder. Customization You can customize the following parameters in the script: • buffer_size: Number of segments to buffer before displaying the transcription. • language_code: Set your preferred default language code for transcription.