Whisper-Finetune

This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:

Timestamp training
Prompt training
Stochastic depth implementation for improved model generalization
Correct implementation of SpecAugment for robust audio data augmentation
Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
Integration with Weights & Biases (wandb) for experiment tracking and model versioning

Installation

Clone the repository:

git clone https://github.com/i4ds/whisper-finetune.git
cd whisper-finetune

Create and activate a virtual environment (strongly recommended) with Python 3.9.* and a Rust compiler available.
Install the package in editable mode:
```
pip install -e .
```

Data

Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a 🤗 Datasets to the script.

Usage

Create a configuration file (see examples in configs/*.yaml)

Run the fine-tuning script:

python src/whisper_finetune/scripts/finetune.py --config configs/large-cv-srg-sg-corpus.yaml

Deployment

We suggest to use faster-whisper. To convert your fine-tuned model, you can use the script located at src/whisper_finetune/scripts/convert_c2t.py.

Further improvement of quality can be archieved by serving the requests with whisperx.

Configuration

Modify the YAML files in the configs/ directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.

Thank you

The starting point of this repository was the excellent repository by Jumon at https://github.com/jumon/whisper-finetuning

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

Support

If you encounter any problems, please file an issue along with a detailed description.

Maintainer

Vincenzo Timmel ([email protected])

Developers

Vincenzo Timmel ([email protected])
Claudio Paonessa ([email protected])

License

This project is licensed under the MIT License - see the LICENSE file for details.

Badges

Extracted from project README's

Related Projects

whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

25 Jan 2023 3,362

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

16 Sep 2022 64,924

whisper.api

This project provides an API with user level access support to transcribe speech to text using a ...

12 Aug 2023 863

Whisper-WebUI

A Web UI for easy subtitle using whisper model.

02 Mar 2023 1,083

WhisperKit

Swift native on-device speech recognition with Whisper for Apple Silicon

26 Jan 2024 2,234

whisper-playground

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

02 Oct 2022 776

whisper-ctranslate2

Whisper command line client compatible with original OpenAI client based on CTranslate2.

17 Mar 2023 872

WhisperLive

A nearly-live implementation of OpenAI's Whisper.

04 May 2023 1,194

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

09 Dec 2022 8,782

ChineseTaiwaneseWhisper

This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (M...

01 Jul 2024 3

Fine-tuning-Whisper

Fine tuning Whisper-Small LLM for Hinglish Audio dataset

30 Jul 2024 2

Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, traini...

22 Apr 2023 501

WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

16 Dec 2023 284

whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.

18 Apr 2023 320

whisper-node

Node.js bindings for OpenAI's Whisper. (C++ CPU version by ggerganov)

18 Dec 2022 225