mlx_speech2text

Audio transcription using mlx whisper and vad silence processing

MIT License

Stars
7
Committers
2

Abstract

Transcription for Apple Silicon.

Segmentation is performed to divide the sound source into small chunks, a sound source is created by removing silent parts for each chunk, and text is extracted.

Install

$ git clone https://github.com/mbotsu/mlx_speech2text.git
$ pip install -r requirements.txt

Run

// convert to wav 16K
$ ffmpeg -i input.mp4 -ar 16000 out.wav

// run
$ python speech2text.py -i out.wav -o track -v

References

Related Projects