Linguistic processing for Common Voice
AGPL-3.0 License
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to supp...
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
Multilingual Voice Understanding Model
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system ...
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's...
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in h...
Core Engine of Singing Voice Conversion & Singing Voice Clone
Finetune VITS and MMS using HuggingFace's tools
Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
KhanomTan TTS (ขนมตาล) is an open-source Thai text-to-speech model that supports multilingual spe...
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXの音声合成エンジン
Multi-lingual large voice generation model, providing inference, training and deployment full-sta...