Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu
MIT License
The Implementation of FastSpeech Based on Pytorch.
data
.alignments.zip
*
squeezewave/pretrained_model
;python preprocess.py
.* if you want to calculate alignment, don't unzip alignments.zip and put Nvidia pretrained Tacotron2 model in the Tacotron2/pretrained_model
Run python train.py
.
Run python synthesis.py "write your TTS Here"
.
Intel® Core™ i5-6300U CPU example 1 taskset --cpu-list 1 python3 synthesis.py "Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu" Speech synthesis time: 1.7220683097839355
soxi out: Input File : 'results/Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu_112000_squeezewave.wav' Channels : 1 Sample Rate : 22050 Precision : 16-bit Duration : 00:00:05.96 = 131328 samples ~ 446.694 CDDA sectors File Size : 263k Bit Rate : 353k Sample Encoding: 16-bit Signed Integer PCM approx. 6 sec. audio output in 1.72 sec on single cpu
example 2 taskset --cpu-list 0 python3 synthesis.py "How are you" Speech synthesis time: 0.3431851863861084 soxi out: Input File : 'results/How are you _112000_squeezewave.wav' Channels : 1 Sample Rate : 22050 Precision : 16-bit Duration : 00:00:00.85 = 18688 samples ~ 63.5646 CDDA sectors File Size : 37.4k Bit Rate : 353k Sample Encoding: 16-bit Signed Integer PCM 0.85 sec. audio output in 0.34 sec on single cpu
results
.