Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Inference and training library for high-quality TTS models.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
A Pytorch implementation for the ZeroSpeech 2019 challenge.
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multi...
A Web UI for easy subtitle using whisper model.
WhisperPlus: Advancing Speech-to-Text Processing 🚀
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to supp...
Text-to-Audio/Music Generation