Swift native on-device speech recognition with Whisper for Apple Silicon
MIT License
Bot releases are hidden (Show)
Hotifx for shouldEarlyStop
logic
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.7.0...v0.7.1
Published by ZachNagengast 5 months ago
This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking 🚀
chunkingStrategy
which can significantly speed up your single file transcriptions with minimal WER downsides..none
chunking strategy with .vad
https://github.com/argmaxinc/WhisperKit/assets/1981179/0f865caa-3a08-412e-a0bf-080ec16a439a
detectLanguage
with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing.let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"
@_disfavoredOverload
for deprecated methods by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/143
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.1...v0.7.0
Published by ZachNagengast 6 months ago
Smaller patch release with some nice improvements and two new contributors 🙌
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.0...v0.6.1
Published by ZachNagengast 6 months ago
audioPaths
input: let audioPaths = [
"/path/to/file1.wav",
"/path/to/file2.wav"
]
let whisperKit = try await WhisperKit()
let transcriptionResults: [[TranscriptionResult]?] = await whisperKit.transcribe(audioPaths: audioPaths)
--audio-folder "path/to/folder/"
We aim to minimize breaking changes, so with this update we added a few deprecation flags for changed interfaces, which will be removed later but for now are still usable and will not throw build errors. There are some breaking changes for lower level and newer methods so if you do notice build errors click the dropdown below to see the full guide.
WhisperKit
Deprecated
public func transcribe(
audioPath: String,
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?
use instead
public func transcribe(
audioPath: String,
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]
Deprecated
public func transcribe(
audioArray: [Float],
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?
use instead
public func transcribe(
audioArray: [Float],
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]
TextDecoding
Deprecated
func decodeText(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options decoderOptions: DecodingOptions,
callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> [DecodingResult]
use instead
func decodeText(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options decoderOptions: DecodingOptions,
callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> DecodingResult
Deprecated
func detectLanguage(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options: DecodingOptions,
temperature: FloatType
) async throws -> [DecodingResult]
use instead
func detectLanguage(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options: DecodingOptions,
temperature: FloatType
) async throws -> DecodingResult
Transcriber
protocolAudioProcessing
static func loadAudio(fromPath audioFilePath: String) -> AVAudioPCMBuffer?
becomes
static func loadAudio(fromPath audioFilePath: String) throws -> AVAudioPCMBuffer
AudioStreamTranscriber
public init(
audioProcessor: any AudioProcessing,
transcriber: any Transcriber,
decodingOptions: DecodingOptions,
requiredSegmentsForConfirmation: Int = 2,
silenceThreshold: Float = 0.3,
compressionCheckWindow: Int = 20,
useVAD: Bool = true,
stateChangeCallback: AudioStreamTranscriberCallback?
)
becomes
public init(
audioEncoder: any AudioEncoding,
featureExtractor: any FeatureExtracting,
segmentSeeker: any SegmentSeeking,
textDecoder: any TextDecoding,
tokenizer: any WhisperTokenizer,
audioProcessor: any AudioProcessing,
decodingOptions: DecodingOptions,
requiredSegmentsForConfirmation: Int = 2,
silenceThreshold: Float = 0.3,
compressionCheckWindow: Int = 20,
useVAD: Bool = true,
stateChangeCallback: AudioStreamTranscriberCallback?
)
TextDecoding
func prepareDecoderInputs(withPrompt initialPrompt: [Int]) -> DecodingInputs?
becomes
func prepareDecoderInputs(withPrompt initialPrompt: [Int]) throws -> DecodingInputs
microphoneUnavailable
error by @hewigovens in https://github.com/argmaxinc/WhisperKit/pull/113
--language
values by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/116
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.5.0...v0.6.0
Published by ZachNagengast 7 months ago
This is a HUGE release with some great new features and fixes 🙌
withoutTimestamps: true
TextDecoding
protocol which runs a single forward pass and reads the language logits to find the most likely language for the input audiousePrefilPrompt: false
and the language: nil
and it is not an English only model.wordTimestamps: true
swift run whisperkit-cli transcribe --model-prefix "distil" --model "large-v3_turbo_600MB" --verbose --audio-path ~/your_audio.wav
We added an experimental new mode for streaming in WhisperAX called "Eager streaming mode". We're still refining this feature but we think it can soon be a great way to do real-time transcription with Whisper. Give it a try in Testflight or take a look a the code and let us know how it can be improved.
Recommended settings for the best performance for this iteration are:
Looking for feedback on:
https://github.com/argmaxinc/WhisperKit/assets/1981179/0a88ca34-3a0e-4ff5-9829-9f980a4661ea
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.4.1...v0.5.0
Published by ZachNagengast 7 months ago
v0.4.0 was our first release on Homebrew, and this will be our first automated update to the formula, huge props to @jkrukowski for his contributions on this.
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.4.0...v0.4.1
Published by ZachNagengast 7 months ago
Lots of nice fixes in this release!
We had to rename the CLI entry point in preparation for homebrew distribution, here is how to use it now:
Old:
swift run transcribe --audio-path path/to/your/audio.mp3
New:
swift run whisperkit-cli transcribe --audio-path path/to/your/audio.mp3
Progress
to WhisperKit
by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/71
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.3...v0.4.0
Published by ZachNagengast 7 months ago
Some great contributions in this patch:
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.2...v0.3.3
Published by ZachNagengast 8 months ago
With these our build warnings are now down to 0 🎉
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.1...v0.3.2
Published by ZachNagengast 8 months ago
macOS 13 & iOS 16 support in https://github.com/argmaxinc/WhisperKit/pull/40
tiny
and base
variants on devices with these older OS versions for a stable user experience.cpuAndGPU
compute units (from the default of cpuAndNeuralEngine
) via the ModelComputeOptions
init parameter.Implement selecting input device by @cgfarmer4 in https://github.com/argmaxinc/WhisperKit/pull/51
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.0...v0.3.1
Published by ZachNagengast 8 months ago
wordTimestamps: true
or via the cli with --word-timestampsTranscriptionSegment
in a new words
parameterhttps://github.com/argmaxinc/WhisperKit/assets/8284016/3bfc1b79-8e01-4e2b-bd14-ecd86ca49d57
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.2.1...v0.3.0
Published by ZachNagengast 8 months ago
-infinity
probability.Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.2.0...v0.2.1
Published by ZachNagengast 8 months ago
You can now try out our watchOS example on any Series 9 or Ultra 2 apple watch. In order to build to it, just change the target in the WhisperAX example app:
Supported models are:
In addition to the WatchOS example app, this version includes a fix for downloading models when there is a partial download already in the filesystem. This includes the following changes:
download
to allow/disallow downloading if modelFolder
is nil (default true)
modelFolder
is now an optionalload
has been renamed to download
for clarity, we will keep such changes rare moving forwardFull Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.1.2...v0.2.0
Published by ZachNagengast 9 months ago
SuppressTokensFilter
protocol via the decoding options:let options = DecodingOptions(
supressTokens: [220, 50257] // array of tokens you want to supress
)
let transcribeResult = try await whisperKit.transcribe(audioPath: path, decodeOptions: options)
make download-model MODEL=tiny
to download only the specified model instead of the entire model repoFull Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.1.1...v0.1.2
Published by ZachNagengast 9 months ago
swift-transformers
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.1.0...v0.1.1
Published by ZachNagengast 9 months ago
Initial release 🎉