A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
MIT License
Coming soon
Right now getting started is as simple as either a pip install from root or the upstream repo:
pip install .
#or
pip install git+https://github.com/rhysdg/whisper-onnx-python.git
For Jetpack 5 support with Python 3.11 go ahead and run the installation script first to grab a pre-built onnxruntime-gpu
wheel for aarch_64
and a few extra dependencies:
sh jetson_install.sh
pip install .
Currently usage closely follows the official package but with a trt swicth (currently being debugged, False is recommended as a result) and expects either an audio file or a numy array:
import numpy as np
import whisper
args = {"language": 'English',
"name": "small.en",
"precision": "fp32",
"disable_cupy": False}
temperature = tuple(np.arange(0, 1.0 + 1e-6, 0.2))
model = whisper.load_model(trt=False, **args)
result = model.transcribe(
'data/test.wav',
temperature=temperature,
**args
)
You can also find an example voice transcription assistant at examples/example_assistant.py
python examples/example_assistant.py
Ubuntu 22.04 - RTX 3080, 8-core, Python 3.11 - passing
AGX Xavier, Jetpack 5.1.3, Python 3.11 - Passing
CI/CD will be expanded as we go - all general instantiation test pass so far.