EasyTTS

Ready-to-use Multilingual Text-To-Speech (TTS) package.

MIT License

Downloads
167
Stars
20

EasyTTS is an open-source and ready-to-use Multilingual Text-To-Speech (TTS) package.

The goal is to simplify usages of state-of-the-art text-to-speech models for a variety of languages (french, english, ...).

EasyTTS is currently in beta.

Quick installation

EasyTTS is constantly evolving. New features, tutorials, and documentation will appear over time. EasyTTS can be installed via PyPI to rapidly use the standard library. Moreover, a local installation can be used by those users than want to run experiments and modify/customize the toolkit. EasyTTS supports both CPU and GPU computations. Please note that CUDA must be properly installed to use GPUs.

Anaconda setup

conda create --name EasyTTS python=3.7 -y
conda activate EasyTTS
pip install git+https://github.com/repodiac/german_transliterate

More information on managing environments with Anaconda can be found in the conda cheat sheet.

Install via PyPI

Once you have created your Python environment (Python 3.7+) you can simply type:

pip install EasyTTS
pip install git+https://github.com/repodiac/german_transliterate

Install with GitHub

Once you have created your Python environment (Python 3.7+) you can simply type:

git clone https://github.com/qanastek/EasyTTS.git
cd EasyTTS
pip install -r requirements.txt
pip install --editable .

Any modification made to the EasyTTS package will be automatically interpreted as we installed it with the --editable flag.

Example Usage

import soundfile as sf
from EasyTTS.inference.TTS import TTS

tts = TTS(lang="fr") # Instantiate the model for your language
audio = tts.predict(text="Bonjour  tous") # Make a prediction

sf.write('./audio_pip.wav', audio, 22050, "PCM_16") # Save output in .WAV file

Audios Samples

Sentence Language Audio File
Comme le capitaine prononait ces mots, un clair illumina les ondes de l'Atlantique, puis une dtonation se fit entendre et deux boulets rams balayrent le pont de l'Alcyon. FR audio_fr.wav
We shall not flag or fail. We shall go on to the end... we shall never surrender. EN audio_en.wav

Model architectures

  1. Tacotron 2 (from Google Research & University of California, Berkeley) released with the paper NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM PREDICTIONS, by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis and Yonghui Wu.

Datasets used

  1. SynPaFlex (from IRISA, LLF (Laboratoire de Linguistique Formelle de Nantes), LIUM (Le Mans Universit) and ATILF (Analyse et Traitement Informatique de la Langue Franaise)) released with the paper SynPaFlex-Corpus: An Expressive French Audiobooks Corpus Dedicated to Expressive Speech Synthesis, by Aghilas Sini, Damien Lolive, Galle Vidal, Marie Tahon and lisabeth Delais-Roussarie.

Build PyPi package

Build: python setup.py sdist bdist_wheel

Upload: twine upload dist/*

Package Rankings
Top 16.42% on Pypi.org
Badges
Extracted from project README
PyPI version GitHub Issues Contributions welcome License: MIT Downloads