Pre Processing Pipeline for Azure Custom Speech service
MIT License
This process will correlate the official transcription for the file with the text recognised from the Speech to Text Cognitive Service and generate a transcription file to be fed into the Custom Speech API. This will enable training on individual speakers with distinct accents and also apply accoustic models to filter out unwanted noise.
We will correlate recognised text with the official transcription using NLP semantic search, fuzzy matching and regular expressions as a last resort. Once trained on a single transcription file, we will use the custom model from the Custom Speech service for more accuracy which in turn will generate higher accuracy transcription files.
pip install -r requirements.txt
.python -m spacy download en_core_web_lg
WORD_SLIDE = 20
WORD_PAD = 20
SUBSTRING_PAD = 3
TRANSCRIBED_FILE = '/Users/shanepeckham/sources/video/File/11_WTA_ROM_STEPvGARC_2018/11_WTA_ROM_STEPvGARC_2018.txt'
AUDIO_PROCESSED_FILE = '/Users/shanepeckham/sources/video/File/11_WTA_ROM_STEPvGARC_2018/11_WTA_ROM_STEPvGARC_OFFSET.txt'
GENERATED_TRANSCRIPT_FILE = '/Users/shanepeckham/sources/video/Results2/11_WTA_ROM_STEPvGARC_TRANSCRIPT_OUTPUT.txt'
PASS2_THRESHOLD = 80
PASS3_THRESHOLD = 72