Multilingual Automatic Speech Recognition with word-level timestamps and confidence
AGPL-3.0 License
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
Foundational model for human-like, expressive TTS
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Suppo...
Joint CTC-Attention End-to-end Speech Recognition - PyTorch Implementation (Deep Learning for Hum...
An unofficial PyTorch implementation of the audio LM VALL-E
An Open Source text-to-speech system built by inverting Whisper.
Official Implementation of Mockingjay in Pytorch
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
A flexible package for multimodal-deep-learning to combine tabular data with text and images usin...
A PyTorch-based Speech Toolkit
A Web UI for easy subtitle using whisper model.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training w...
WaveNet vocoder