Transcribe-Translate
-
Transcribe & Translate v1.0.0 - Initial Release
Latest Release
Published by NotYuSheng about 1 month ago
Release Date: 7 September 2024
Overview
This is the initial release of the Transcribe & Translate, an open-source project that allows users to:
- Transcribe and translate audio and video files.
- Detect the language of uploaded media automatically.
- Export transcriptions and translations in multiple formats (TXT, JSON, SRT, VTT).
- View transcriptions and translations with timestamps.
This release marks v1.0.0, which includes support for handling Whisper models to provide high-quality transcriptions and translations, with an easy-to-use React frontend and FastAPI backend. The application is fully containerized with Docker, making it easy to deploy.
Key Features
Transcription
-
Supported Media Types: Audio (MP3, WAV) and Video (MP4, MKV, AVI).
-
Model Selection: Choose from multiple preloaded Whisper models (
base
, base.en
, large
) to handle various transcription and translation needs.
-
Automatic Language Detection: The app automatically detects the language of the media file if not specified by the user.
-
Timestamps: Transcriptions are displayed with precise start and end timestamps.
Translation
-
Multilingual Support: Translate media into multiple languages with Whisper's powerful translation capabilities.
-
Automatic Source Language Detection: If no input language is provided, the app detects the source language automatically.
-
Side-by-Side View: When translating, view both the original transcription and its translation side by side.
Export Options
- Export your transcription or translation into the following formats:
-
TXT: Simple plain text format.
-
JSON: Structured data with timestamps.
-
SRT: Subtitle format with time codes.
-
VTT: Web Video Text Tracks format for video captioning.
Loading Indicator
- Real-time feedback with loading animations during transcription or translation, along with an elapsed time display once the process completes.
Dynamic Frontend
- The frontend dynamically loads available Whisper models from the backend.
- Provides media preview (video/audio) directly in the browser.
- User-friendly layout with responsive design for different screen sizes.
Dockerized for Easy Deployment
- The project is containerized with Docker, allowing for straightforward setup and deployment.
- Nginx is used to serve the frontend, and FastAPI for the backend.
Installation & Setup
Prerequisites
- Docker and Docker Compose installed.
Steps to Run the Project Locally
-
Clone the repository:
git clone https://github.com/your-repo/transcribe-translate-app.git
cd transcribe-translate-app
-
Build and start the Docker containers:
docker-compose up --build
-
Access the app in your browser:
http://localhost:3000
-
The backend API will run on:
http://localhost:8000
Whisper Models
The app downloads and uses pre-trained Whisper models (such as base
, base.en
, and large
) for transcription and translation. These models are stored in a Docker volume for persistent storage and efficient use.
Known Issues
-
Performance on Large Files: The application may take a while to process large media files, especially with the larger Whisper models.
-
Model Download Time: On the first run, downloading the Whisper models can take a while depending on your internet connection.
Future Enhancements
-
Additional Language Models: Adding more language models for extended support.
-
Batch Processing: Implementing the ability to transcribe or translate multiple files at once.
-
UI Improvements: Further improving the responsiveness and design of the frontend.
-
More Export Formats: Adding support for additional export formats like CSV and PDF.
Contributors
-
Ong Yu Sheng - Full Stack Developer
Acknowledgments
This project uses OpenAI's Whisper for transcription and translation services. We extend our gratitude to the open-source community for contributing to these fantastic tools.
Download