silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

MIT License

Downloads
31.1K
Stars
4.2K
Committers
36
silero-vad - Update the PIP package

Published by snakers4 16 days ago

A tag to upload new PIP package.

silero-vad - Minor fixes

Published by snakers4 16 days ago

What's Changed

New Contributors

Full Changelog: https://github.com/snakers4/silero-vad/compare/v5.1...v5.1.1

silero-vad - v5.1 Latest Release

Published by snakers4 4 months ago

Experimental PIP package release

  • Experimental pip-package release;
  • Community PRs to update the examples;

What's Changed

New Contributors

Full Changelog: https://github.com/snakers4/silero-vad/compare/v5.0...v5.1

silero-vad - Finally, V5 is here, 3x faster, supporting 6000+ languages!

Published by snakers4 4 months ago

image

Performance and Model Size

  • 3x faster inference for TorchScript, 10% faster inference for ONNX;
  • Now TorchScript is as fast as ONNX;
  • Model size is 2x larger, 2MB vs. 1MB;

Quality

  • The VAD supports more than 6,000 languages now;
  • Significanly more robust on noisy data;
  • Overall 5-7% quality increase on clean data;
  • Quality difference for 8 kHz and 16 kHz is negligible now;
  • Quality difference for different window sizes is negligible => window size was deprecated;
  • Added benchmarks on 9 unique datasets (2 private) and one holistic multi-domain dataset;

Changes and deprecations

  • ONNX opset 16;
  • window_size_samples is deprecated - now the VAD only works with fixed size window;
  • VAD now works with 8 kHz and 16 kHz sample rates, only with fixed 256 and 512 sample windows respectively;
  • Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk;
  • Sample rates that are a multiple of 16 kHz are still supported;
silero-vad - # New V4 VAD Released

Published by snakers4 almost 2 years ago

New V4 VAD Released

  • Improved quality
  • Improved perfomance
  • Both 8k and 16k sampling rates are now supported by the ONNX model
  • Batching is now supported by the ONNX model
  • Added audio_forward method for one-line processing of a single or multiple audio without postprocessing
  • Hotfix applied - wrong model was uploaded
  • Minor hotfix re. PyTorch version
silero-vad - New V3 ONNX VAD Released

Published by snakers4 almost 3 years ago

We finally were able to port a model to ONNX:

  • Compact model (~100k params);
  • Both PyTorch and ONNX models are not quantized;
  • Same quality model as the latest best PyTorch release;
  • Only 16kHz available now (ONNX has some issues with if-statements and / or tracing vs scripting) with cryptic errors;
  • In our tests, on short audios (chunks) ONNX is 2-3x faster than PyTorch (this is mitigated with larger batches or long audios);
  • Audio examples and non-core models moved out of the repo to save space;
silero-vad - New V3 Silero VAD is Already Here

Published by snakers4 almost 3 years ago

Main changes

  • One VAD to rule them all! New model includes the functionality of the previous ones with improved quality and speed!
  • Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
  • Flexible chunk size, minimum chunk size is just 30 milliseconds!
  • 100k parameters;
  • GPU and batching are supported;
  • Radically simplified examples;

Migration

Please see the new examples.

New get_speech_timestamps is a simplified and unified version of the old deprecated get_speech_ts or get_speech_ts_adaptive methods.

speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)

New VADIterator class serves as an example for streaming tasks instead of old deprecated VADiterator and VADiteratorAdaptive.

vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
   speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
   if speech_dict:
       print(speech_dict, end=' ')
vad_iterator.reset_states()

silero-vad - V2 Legacy Release for History

Published by snakers4 almost 3 years ago

This is a technical tag, so that users, who do now want to use newer models, could just checkout this tag.

Package Rankings
Top 6.32% on Proxy.golang.org
Top 35.98% on Pypi.org
Badges
Extracted from project README
Mailing list : test Mailing list : test License: CC BY-NC 4.0 Open In Colab
Related Projects