WhisperKit | macOS Ecosystem Directory

Bot releases are hidden (Show)

WhisperKit - v0.7.1 Latest Release

Published by ZachNagengast 5 months ago

Hotifx for shouldEarlyStop logic

What's Changed

Ensures early stopping flag on TextDecoder is always reset at the beginning of a new loop

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.7.0...v0.7.1

WhisperKit - v0.7.0

Published by ZachNagengast 5 months ago

This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking 🚀

Highlights

Energy VAD based chunking 🗣️ @jkrukowski
- There is a new decoding option called chunkingStrategy which can significantly speed up your single file transcriptions with minimal WER downsides.
- It works by finding a clip point in the middle of the longest silence (lowest audio energy) in the last 15s of a 30s window and uses that to split up all the audio ahead of time so it can be asynchronously decoded in parallel.
- Heres a video of it in action, comparing .none chunking strategy with .vad

https://github.com/argmaxinc/WhisperKit/assets/1981179/0f865caa-3a08-412e-a0bf-080ec16a439a

Detect language helper:
- You can now call detectLanguage with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing.
- Example:

let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"

WhisperKit via Expo @seb-sep
- For anyone that's been wanting to use WhisperKit in react native, @seb-sep is maintaining a repo that makes it easy, and also setup an automation that will automatically update it with each new WhisperKit release, check it out here: https://github.com/seb-sep/whisper-kit-expo
Bug fixes and enhancements:
- @jiangdi0924 and @fengcunhan contributed some nice fixes in this release with #136 and #138 (see below)
- Also moved the decoding progress callback to be fully async so that it doesn't block the decoder thread

What's Changed

Fix language detection by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/133
Fix the reset operation exception in transcribeFile in the Demo. by @jiangdi0924 in https://github.com/argmaxinc/WhisperKit/pull/136
gh action for making pr to whisper-kit-expo on whisperkit release by @seb-sep in https://github.com/argmaxinc/WhisperKit/pull/137
add reStartRecordingLive function by @fengcunhan in https://github.com/argmaxinc/WhisperKit/pull/138
Added @_disfavoredOverload for deprecated methods by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/143
VAD audio chunking by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/135
Async Progress Callback by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/145
Detect language helper by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/146

New Contributors

@jiangdi0924 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/136
@seb-sep made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/137
@fengcunhan made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/138

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.1...v0.7.0

WhisperKit - v0.6.1

Published by ZachNagengast 6 months ago

Smaller patch release with some nice improvements and two new contributors 🙌

Highlights

Tokenizer no longer requires a HubApi request to succeed if the files are already downloaded
- This was a big request from the community and should enable offline transcription as long as everything is downloaded already
- Also made the function public so you can bundle the tokenizer with the app along with the model files
@smpanaro found a really nice speedup across the board by using IOSurface backed MLMultiArrays
- Especially noticeable on older devices
General cleanup, including a nice bug fix from @couche1 when streaming via the CLI

What's Changed

Memory and Latency Regression Tests by @Abhinay1997 in https://github.com/argmaxinc/WhisperKit/pull/99
- @Abhinay1997 is building out this regression test suite so we can be sure we're always shipping code that has the same or better speed, accuracy, memory, etc
Fix audio file requirement for streaming mode by @couche1 in https://github.com/argmaxinc/WhisperKit/pull/121
Use IOSurface-backed MLMultiArrays for float16 by @smpanaro in https://github.com/argmaxinc/WhisperKit/pull/130
Cleanup by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/132

New Contributors

@couche1 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/121
@smpanaro made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/130

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.0...v0.6.1

WhisperKit - v0.6.0

Published by ZachNagengast 6 months ago

Highlights

Async batch transcription is here 🎉 contributed by @jkrukowski
- With this release, you can now simultaneously transcribe multiple audio files at once, fully utilizing the new async prediction APIs released with iOS17/macOS14 (see the wwdc video here).
- New interface with audioPaths input:
- ```
  let audioPaths = [
      "/path/to/file1.wav",
      "/path/to/file2.wav"
  ]
  let whisperKit = try await WhisperKit()
  let transcriptionResults: [[TranscriptionResult]?] = await whisperKit.transcribe(audioPaths: audioPaths)
```
- You can also use it via the CLI using the new argument --audio-folder "path/to/folder/"
- Future work will be chunking up single files to significantly speed up long-form transcription
- Note that this entails breaking changes and deprecations, see below for the full upgrade guide.
Several bug fixes, accuracy improvements, and quality of life upgrades by @hewigovens @shawiz and @jkrukowski
- Every issue raised and PR merged from the community helps make WhisperKit better every release, thank you and keep them coming! 🙏

⚠️ Upgrade Guide

We aim to minimize breaking changes, so with this update we added a few deprecation flags for changed interfaces, which will be removed later but for now are still usable and will not throw build errors. There are some breaking changes for lower level and newer methods so if you do notice build errors click the dropdown below to see the full guide.

API changes

Deprecations

`WhisperKit`

Deprecated

public func transcribe(
    audioPath: String,
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?

use instead

public func transcribe(
    audioPath: String,
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]

Deprecated

public func transcribe(
    audioArray: [Float],
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?

use instead

public func transcribe(
    audioArray: [Float],
    decodeOptions: DecodingOptions? = nil,
    callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]

`TextDecoding`

Deprecated

func decodeText(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options decoderOptions: DecodingOptions,
    callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> [DecodingResult]

use instead

func decodeText(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options decoderOptions: DecodingOptions,
    callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> DecodingResult

Deprecated

func detectLanguage(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options: DecodingOptions,
    temperature: FloatType
) async throws -> [DecodingResult]

use instead

func detectLanguage(
    from encoderOutput: MLMultiArray,
    using decoderInputs: DecodingInputs,
    sampler tokenSampler: TokenSampling,
    options: DecodingOptions,
    temperature: FloatType
) async throws -> DecodingResult

Breaking changes

removed Transcriber protocol

`AudioProcessing`

static func loadAudio(fromPath audioFilePath: String) -> AVAudioPCMBuffer?

becomes

static func loadAudio(fromPath audioFilePath: String) throws -> AVAudioPCMBuffer

`AudioStreamTranscriber`

public init(
    audioProcessor: any AudioProcessing, 
    transcriber: any Transcriber, 
    decodingOptions: DecodingOptions, 
    requiredSegmentsForConfirmation: Int = 2, 
    silenceThreshold: Float = 0.3, 
    compressionCheckWindow: Int = 20, 
    useVAD: Bool = true, 
    stateChangeCallback: AudioStreamTranscriberCallback?
)

becomes

public init(
    audioEncoder: any AudioEncoding,
    featureExtractor: any FeatureExtracting,
    segmentSeeker: any SegmentSeeking,
    textDecoder: any TextDecoding,
    tokenizer: any WhisperTokenizer,
    audioProcessor: any AudioProcessing,
    decodingOptions: DecodingOptions,
    requiredSegmentsForConfirmation: Int = 2,
    silenceThreshold: Float = 0.3,
    compressionCheckWindow: Int = 20,
    useVAD: Bool = true,
    stateChangeCallback: AudioStreamTranscriberCallback?
)

`TextDecoding`

func prepareDecoderInputs(withPrompt initialPrompt: [Int]) -> DecodingInputs?

becomes

func prepareDecoderInputs(withPrompt initialPrompt: [Int]) throws -> DecodingInputs

What's Changed

Add microphoneUnavailable error by @hewigovens in https://github.com/argmaxinc/WhisperKit/pull/113
Improve token timestamps and language detection by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/114
Respect skipSpecialTokens option in the decodingCallback function by @shawiz in https://github.com/argmaxinc/WhisperKit/pull/115
Disallow invalid --language values by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/116
Run tests in parallel on CI by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/117
Async batch predictions by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/107

New Contributors

@hewigovens made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/113
@shawiz made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/115

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.5.0...v0.6.0

WhisperKit - v0.5.0

Published by ZachNagengast 7 months ago

This is a HUGE release with some great new features and fixes 🙌

Highlights

Timestamp logits filter by @jkrukowski
- Significantly improves the amount of timestamp tokens in a particular window, which helps a lot with segmentation
- This is on by default but can be disabled using the decoding option withoutTimestamps: true
Language detection by @Abhinay1997
- New function on the TextDecoding protocol which runs a single forward pass and reads the language logits to find the most likely language for the input audio
- Enabled by default for decoding options whereusePrefilPrompt: false and the language: nil and it is not an English only model.
First token log prob thresholds fallback check by @jkrukowski
- This feature is not in the original openai implementation but helps reduce hallucinations quite a bit.
- Often, fallbacks due to log prob threshold are immediately identifiable by the first token, so this reduces the amount of forward passes needed to move to a higher temperature
Distil whisper support
- Recently distil-large-v3 was released which massively speeds up predictions at minimal quality loss. We've converted and optimized 4 distil models to use in WhisperKit on CoreML, they're really fast!
- distil-large-v3
  distil-large-v3_594MB
  distil-large-v3_turbo
  distil-large-v3_turbo_600MB
- Note that these do not yet have word timestamp alignment heads, so can't be used with wordTimestamps: true
- It can be run via CLI as well:
  - swift run whisperkit-cli transcribe --model-prefix "distil" --model "large-v3_turbo_600MB" --verbose --audio-path ~/your_audio.wav

⚠️ Experimental new stream mode

We added an experimental new mode for streaming in WhisperAX called "Eager streaming mode". We're still refining this feature but we think it can soon be a great way to do real-time transcription with Whisper. Give it a try in Testflight or take a look a the code and let us know how it can be improved.

Recommended settings for the best performance for this iteration are:

Max tokens per loop < 100
Max fallback count < 2
Prompt and cache prefill true

Looking for feedback on:

Token confirmation numbers that work well
Model, device, and settings combinations that work well

https://github.com/argmaxinc/WhisperKit/assets/1981179/0a88ca34-3a0e-4ff5-9829-9f980a4661ea

What's Changed

CLI Task Handling in https://github.com/argmaxinc/WhisperKit/pull/85
Added TimestampRulesFilter implementation by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/45
Support distil whisper models in https://github.com/argmaxinc/WhisperKit/pull/88
Language Detection by @Abhinay1997 in https://github.com/argmaxinc/WhisperKit/pull/78
Tokenizer refactor, tests cleanup by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/87
First token logProb thresholding by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/90
[#93] Add missing settings to decoding options by @cgfarmer4 in https://github.com/argmaxinc/WhisperKit/pull/94
"Eager" streaming mode via word timestamps in https://github.com/argmaxinc/WhisperKit/pull/95

New Contributors

@Abhinay1997 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/78

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.4.1...v0.5.0

WhisperKit - v0.4.1

Published by ZachNagengast 7 months ago

v0.4.0 was our first release on Homebrew, and this will be our first automated update to the formula, huge props to @jkrukowski for his contributions on this.

What's Changed

Homebrew github action, updated readme by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/79
Fix setupModels error handling by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/80
Use GPU for audio encoder on macOS 13 by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/83
- Some of the models were having issues with ANE on macOS 13 / iOS 16, which this resolves by defaulting to GPU on those.

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.4.0...v0.4.1

WhisperKit - v0.4.0

Published by ZachNagengast 7 months ago

Lots of nice fixes in this release!

⚠️ Breaking change

We had to rename the CLI entry point in preparation for homebrew distribution, here is how to use it now:
Old:
swift run transcribe --audio-path path/to/your/audio.mp3
New:
swift run whisperkit-cli transcribe --audio-path path/to/your/audio.mp3

What's Changed

skip functional tests for models that are not downloaded. by @metropol in https://github.com/argmaxinc/WhisperKit/pull/48
Fix crash with mic device sample rate mismatch by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/69
WhisperKit CLI cleanup by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/68
Add Progress to WhisperKit by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/71
Updated swift-transformers and tokenizer changes by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/72
Updated swift-transformers, do not use background url session in CLI by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/74
Add pre-merge and pre-release tests by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/76

New Contributors

@metropol made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/48

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.3...v0.4.0

WhisperKit - v0.3.3

Published by ZachNagengast 7 months ago

What's Changed

Some great contributions in this patch:

Expose downloadBase in WhisperKit init by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/57
- Convenience for managing model files
Add audio device selector to transcribe + take a stab at Delete/Retry models by @cgfarmer4 in https://github.com/argmaxinc/WhisperKit/pull/54
- Extends example app functionality
Issue - 42 WhisperKit support simulator fixed by @bharat9806 in https://github.com/argmaxinc/WhisperKit/pull/52
- Fixes a couple of bugs that show up during development on simulators (and fixed a decoding bug in the process #63 ).

New Contributors

@bharat9806 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/52

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.2...v0.3.3

WhisperKit - v0.3.2

Published by ZachNagengast 8 months ago

What's Changed

Fixed Conformance of 'Float16' warning by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/58
Fix memory leak from non-async MLModel prediction by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/56

With these our build warnings are now down to 0 🎉

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.1...v0.3.2

WhisperKit - v0.3.1

Published by ZachNagengast 8 months ago

What's Changed

macOS 13 & iOS 16 support in https://github.com/argmaxinc/WhisperKit/pull/40
- We have made WhisperKit available on older OS versions based on community feedback.
- Please note that macOS 13 and iOS 16 performance will be degraded in terms of prediction latency, compile time, peak memory consumption.
- We have tested and recommend using tiny and base variants on devices with these older OS versions for a stable user experience.
- If you run into any output correctness issues, please switch to using cpuAndGPU compute units (from the default of cpuAndNeuralEngine) via the ModelComputeOptions init parameter.
- As always, if you notice any irregularities, please post an issue here for us to follow up on.
Implement selecting input device by @cgfarmer4 in https://github.com/argmaxinc/WhisperKit/pull/51
- Thanks to @cgfarmer4, macOS users can now select their preferred microphone, not just the default one. Check out @cgfarmer4's fantastic feature walkthrough, and dive into the fully implemented sample code in the WhisperAX example app to see it in action!

New Contributors

@eltociear made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/43
@cgfarmer4 made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/51

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.0...v0.3.1

WhisperKit - v0.3.0

Published by ZachNagengast 8 months ago

What's Changed

Word Timestamp support in https://github.com/argmaxinc/WhisperKit/pull/38
- You can now generate word level timestamps with the new decoding option wordTimestamps: true or via the cli with --word-timestamps
- They are included on each TranscriptionSegment in a new words parameter
- Following up with demo code and example app integrations in a later release
- Example json output: https://gist.github.com/ZachNagengast/f36a751bc68a3b5f2c41ada8bcc33746
- Check out this example video from @finnvoor showing it in action:

https://github.com/argmaxinc/WhisperKit/assets/8284016/3bfc1b79-8e01-4e2b-bd14-ecd86ca49d57

Allow setting a downloadBase so downloaded models are not forced into the user's Documents folder by @jordibruin in https://github.com/argmaxinc/WhisperKit/pull/34
Streaming Microphone for CLI by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/35

New Contributors

@jordibruin made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/34

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.2.1...v0.3.0

WhisperKit - v0.2.1

Published by ZachNagengast 8 months ago

What's Changed

Added implementation for SuppressBlankFilter by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/18
- Also includes a performance improvement for the common LogitFilter operation for filling in -infinity probability.
Fixed issue with swift package dependencies that point to commit hashes #21 reported by @sleeper

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.2.0...v0.2.1

WhisperKit - v0.2.0

Published by ZachNagengast 8 months ago

What's Changed

watchOS example & downloading improvements https://github.com/argmaxinc/WhisperKit/pull/20

You can now try out our watchOS example on any Series 9 or Ultra 2 apple watch. In order to build to it, just change the target in the WhisperAX example app:

Supported models are:

base
base.en
tiny
tiny.en

Screenshot loading
Screenshot loaded
Screenshot 2024-02-13 at 10 11 43 PM

In addition to the WatchOS example app, this version includes a fix for downloading models when there is a partial download already in the filesystem. This includes the following changes:

New init parameter download to allow/disallow downloading if modelFolder is nil (default true)
- This is particularly useful if you want to initialize a "empty" whisperkit object
modelFolder is now an optional
Breaking change: load has been renamed to download for clarity, we will keep such changes rare moving forward

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.1.2...v0.2.0

WhisperKit - v0.1.2

Published by ZachNagengast 9 months ago

What's Changed

Added implementation for SuppressTokensFilter by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/14

You can now use the SuppressTokensFilter protocol via the decoding options:

let options = DecodingOptions(
    supressTokens: [220, 50257] // array of tokens you want to supress
)
let transcribeResult = try await whisperKit.transcribe(audioPath: path, decodeOptions: options)

Fixes and cleanup from early feedback by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/15
- New Makefile command: make download-model MODEL=tiny to download only the specified model instead of the entire model repo
- This release also includes the new macOS 14 github runner for CI.

New Contributors

@jkrukowski made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/14

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.1.1...v0.1.2

WhisperKit - v0.1.1

Published by ZachNagengast 9 months ago

What's Changed

Fix broken Hugging Face link by @thenameless7741 in https://github.com/argmaxinc/WhisperKit/pull/1
Fix memory leak by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/8
Updated to semantic versioning for dependency swift-transformers