Transcribe-Translate

Documentation

[!WARNING] Documentation is currently under development

You can access the project documentation at [GitHub Pages].

Host requirements

Docker: [Installation Guide]
Docker Compose: [Installation Guide]
Compatibile with Linux and Windows Host
Ensure port 3000 and 8000 are not already in use
Project can be ran on either CPU or GPU

Model requirements

The following table outlines the recommended hardware requirements for each Whisper model based on typical usage scenarios. Please ensure that your system meets or exceeds these specifications for optimal performance.

Model	Size (GB)	Minimum RAM (GB)	Recommended RAM (GB)	GPU Memory (VRAM) (GB)	Notes
`tiny`	~0.07	2	4	1	Suitable for lightweight tasks and low resource usage.
`base`	~0.14	4	6	2	Good for basic transcription and smaller workloads.
`small`	~0.46	6	8	4	Ideal for moderate tasks, offering a balance between performance and accuracy.
`medium`	~1.5	8	12	8	Recommended for larger tasks with higher accuracy demands.
`large-v2`	~2.88	10	16	10	Best for high-performance tasks and large-scale transcription.
`large-v3`	~2.88	12	16+	10+	Highest accuracy and resource usage. Ideal for GPU-accelerated environments.

[!TIP] For models running on GPU, using CUDA-enabled GPUs with sufficient VRAM is recommended to significantly improve performance. CPU-based inference may require additional RAM and processing time.

[!WARNING] By default, base, base.en, & large-v3 models are loaded. Models can be configured from the backend/Dockerfile. However, base model must not be removed as it is statically configured to be the default model.

Supported formats

Import Options:

Audio: .mp3, .wav, .flac, .m4a, etc.
Video: .mp4, .mkv, .avi, .mov, etc.

Export Options:

Users can export the results in .txt, .json, .srt, or .vtt formats.

Usage

[!NOTE] Project will run on GPU by default. To run on CPU, use the docker-compose.cpu.yml instead

Clone this repository and navigate to project folder

git clone https://github.com/NotYuSheng/Transcribe-Translate.git
cd Transcribe-Translate

Configure the frontend/.env file

# IMPORTANT: Replace "localhost" with the server's IP address where the backend is running
REACT_APP_BACKEND_URL=http://localhost:8000

Build the Docker images:

docker-compose build

Run images

docker-compose up -d

Access webpage from host

<host-ip>:3000

API calls to Whisper server can be made to (refer to :8000/docs for more info)

<host-ip>:8000

Additional Notes

[!CAUTION] Project is intended to be use in a local network by trusted user, therefore there is no rate limit configured and the project is vulnerable to request floods. Consider switching to slowapi if this is unacceptable.

[!TIP] For transcribing English inputs, .en version of the models are recommended

Related Projects

whisper-playground

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

02 Oct 2022 776

whisper.api

This project provides an API with user level access support to transcribe speech to text using a ...

12 Aug 2023 863

transcription_service

System/service with REST API for extracting text transcriptions from movies and audio recordings ...

16 Aug 2024 2

go-whisper

Speech-to-Text in golang

01 Dec 2022 61

generate-subtitles

Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Wh...

06 Nov 2022 742

whisper-node

Node.js bindings for OpenAI's Whisper. (C++ CPU version by ggerganov)

18 Dec 2022 225

transcriptionstream

turnkey self-hosted offline transcription and diarization service with llm summary

13 Nov 2023 703

docker-whisper-server

whisper.cpp HTTP transcription server with OpenAI-like API in Docker

20 Jul 2024 8

ChineseTaiwaneseWhisper

This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (M...

01 Jul 2024 3

SpeechToText

Speech-to-Text using OpenAI's Whisper model

04 Sep 2024 0

wscribe

ez audio transcription tool with flexible processing and post-processing options

21 Jul 2023 125

speech-to-text

Real-time transcription using faster-whisper

30 Mar 2023 375

Whisper-WebUI

A Web UI for easy subtitle using whisper model.

02 Mar 2023 1,083

whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by...

26 Aug 2023 1,420

WhisperLive

A nearly-live implementation of OpenAI's Whisper.

04 May 2023 1,194