Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
AGPL-3.0 License
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.
Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.
This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.
See the Changelog for the latest updates.
HCLG.fst
file, or KaldiAG's included pre-trained dictation model.kaldi
branch of my fork, and has been merged as of Dragonfly v0.15.0.
Want to get started quickly & easily on Windows? Available under project releases:
kaldi-dragonfly-winpython
: A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!kaldi-dragonfly-winpython-dev
: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!kaldi-caster-winpython-dev
: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2 + caster. Just unzip and run!Otherwise...
Requirements:
Installation:
pip install 'dragonfly2[kaldi]'
to install all necessary packages. See the dragonfly documentation for details on installation, plus how to define grammars and actions.pip install kaldi-active-grammar
g2p_en
package with pip install 'kaldi-active-grammar[g2p_en]'
v3.0.0
.requests
package with pip install 'kaldi-active-grammar[online]'
AND pass allow_online_pronunciations=True
to Compiler.add_word()
or Model.add_word()
pip install kaldi-active-grammar
(directly or indirectly), not python setup.py install
, in order to get the required binaries.pip
(to at least 19.0+
) by executing python -m pip install --upgrade pip
, to support the required python binary wheel package.The code execution cannot proceed because VCRUNTIME140.dll was not found.
(or similar)
.tmp
directory, and re-running.user_lexicon.txt
file before deleting, to put in the new model directory.)import logging; logging.basicConfig(level=1)
at the top of your main/loader file to enable full debugging logging.Formal documentation is somewhat lacking currently. To see example usage, examine:
wav
file.The KaldiAG API is fairly low level, but basically: you define a set of grammar rules, then send in audio data, along with a bit mask of which rules are active at the beginning of each utterance, and receive back the recognized rule and text. The easy way is to go through Dragonfly, which makes it easy to define the rules, contexts, and actions.
python -m pip install -r requirements-build.txt
python setup.py bdist_wheel
(see CMakeLists.txt
for details)build-windows
section of the manifest.Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.
Donations are appreciated to encourage development.
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE.txt file for details. If this license is problematic for you, please contact me.