Multilingual Automatic Speech Recognition with word-level timestamps and confidence
AGPL-3.0 License
WaveNet vocoder
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Suppo...
Official Implementation of Mockingjay in Pytorch
Joint CTC-Attention End-to-end Speech Recognition - PyTorch Implementation (Deep Learning for Hum...
Foundational model for human-like, expressive TTS
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
A PyTorch-based Speech Toolkit
A Web UI for easy subtitle using whisper model.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A flexible package for multimodal-deep-learning to combine tabular data with text and images usin...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training w...
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
An unofficial PyTorch implementation of the audio LM VALL-E
An Open Source text-to-speech system built by inverting Whisper.