Pybind11 bindings for whisper.cpp
Install with pip:
pip install whispercpp
NOTE: We will setup a hermetic toolchain for all platforms that doesn't have a prebuilt wheels, (which means you don't have to setup anything to install the Python package) which will take a bit longer to install. Pass
-vv
topip
to see the progress.
To use the latest version, install from source:
pip install git+https://github.com/aarnphm/whispercpp.git -vv
For local setup, initialize all submodules:
git submodule update --init --recursive
Build the wheel:
# Option 1: using pypa/build
python3 -m build -w
# Option 2: using bazel
./tools/bazel build //:whispercpp_wheel
Install the wheel:
# Option 1: via pypa/build
pip install dist/*.whl
# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whl
The binding provides a Whisper
class:
from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")
Currently, the inference API is provided via transcribe
:
w.transcribe(np.ones((1, 16000)))
You can use any of your favorite audio libraries
(ffmpeg or
librosa, or
whispercpp.api.load_wav_file
) to load audio files into a Numpy array, then
pass it to transcribe
:
import ffmpeg
import numpy as np
try:
y, _ = (
ffmpeg.input("/path/to/audio.wav", threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
.run(
cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
w.transcribe(arr)
You can also use the model transcribe_from_file
for convience:
w.transcribe_from_file("/path/to/audio.wav")
The Pybind11 bindings supports all of the features from whisper.cpp, that takes inspiration from whisper-rs
The binding can also be used via api
:
from whispercpp import api
# Binding directly fromn whisper.cpp
See DEVELOPMENT.md
Whisper
Whisper.from_pretrained(model_name: str) -> Whisper
Load a pre-trained model from the local cache or download and cache if
needed. Supports loading a custom ggml model from a local path passed as model_name
.
w = Whisper.from_pretrained("tiny.en")
w = Whisper.from_pretrained("/path/to/model.bin")
The model will be saved to $XDG_DATA_HOME/whispercpp
or
~/.local/share/whispercpp
if the environment variable is not set.
Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)
Running transcription on a given Numpy array. This calls full
from
whisper.cpp
. If num_proc
is greater than 1, it will use full_parallel
instead.
w.transcribe(np.ones((1, 16000)))
To transcribe from a WAV file use transcribe_from_file
:
w.transcribe_from_file("/path/to/audio.wav")
Whisper.stream_transcribe(*, length_ms: int=..., device_id: int=..., num_proc: int=...) -> Iterator[str]
[EXPERIMENTAL] Streaming transcription. This calls stream_
from
whisper.cpp
. The transcription will be yielded as soon as it's available.
See stream.py for an example.
Note: The
device_id
is the index of the audio device. You can usewhispercpp.api.available_audio_devices
to get the list of available audio devices.
api
api
is a direct binding from whisper.cpp
, that has similar API to
whisper-rs
.
api.Context
This class is a wrapper around whisper_context
from whispercpp import api
ctx = api.Context.from_file("/path/to/saved_weight.bin")
Note: The context can also be accessed from the
Whisper
class viaw.context
api.Params
This class is a wrapper around whisper_params
from whispercpp import api
params = api.Params()
Note: The params can also be accessed from the
Whisper
class viaw.params
whispercpp.py. There are a few key differences here:
whispercpp
. The difference is whispercpp
use Pybind11whispercpp.py
and whispercpp
are mutually exclusive, as they also usewhispercpp
namespace.whispercpp
provides similar APIs aswhisper-rs
, which provides afrom_pretrained
andtranscribe
) to quickly use whisper.cpp in Python.whispercpp
doesn't pollute your $HOME
directory, rather it follows theUsing cdll
and ctypes
and be done with it?
See examples for more information