dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.

MPL-2.0 License

Stars
510

Bot releases are hidden (Show)

dsnote - Speech Note 4.6.1 Latest Release

Published by mkiol 2 months ago

Linux Desktop

Changes:

  • General
    • Fix: The application failed to start when the processor did not support the required CPU extension.
  • User Interface
    • Swedish translation has been updated.
  • Accessibility
    • Fix: Special keyboard keys were not supported as a keyboard shortcut. Examples: 'Favorites', 'Launch Mail', 'Refresh', 'Home Page', 'Calculator' and many more...
  • Translator
    • New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
    • Updated models: English to Hungarian, Czech to English, Greek to English

Sailfish OS

Changes:

  • User Interface
    • Swedish translation has been updated.
  • Translator
    • New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
    • Updated models: English to Hungarian, Czech to English, Greek to English
dsnote - Speech Note 4.6.0

Published by mkiol 3 months ago

Linux Desktop

Changes:

  • User Interface
    • Speech Note has been translated into Norwegian language.
    • Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
  • Speech to Text
    • The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
    • Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
    • Separate settings for engines. The configuration of each engine has been separated in the settings. You can separately set the parameters for WhisperCpp and FasterWhisper. The new configuration parameters that have been added to the settings are: Number of simultaneous threads, Beam search width, Audio context size, Use Flash Attention.
    • Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
    • Support for OpenVINO hardware acceleration in WhisperCpp engine. With OpenVINO decoding on CPU is much quicker. If you are not using GPU acceleration, it is recommended to enable OpenVINO in WhisperCpp engine settings. Currently, OpenVINO is enabled only for CPU acceleration.
    • Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
  • Text to Speech
    • Control tags for advance TTS processing. Control tags allow you to dynamically change the speed of synthesized text or add silence between sentences. To use control tags, insert {speed: 0.5} or {silence: 1s} into the text. For convenience, you can also insert predefined control tags using text context menu Insert control tag.
    • Welsh language. New language is enabled with Piper voice.
    • New Piper voices for Spanish, Italian and English
    • New RHVoice voices for Slovak and Croatian
  • Translator
    • Improved Translator UI. The Translate, Switch languages and Add buttons have been placed between text areas which is more convenient.
    • Support for older hardware. Until now, the translator did not work on older processors without CPU AVX extension. Now there is no such restriction anymore.
    • New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
    • Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English
  • Flatpak
    • New library: OpenVINO version 2024.1.0.15008
    • whisper.cpp update to version 1.6.2
    • CTranslate2 update to version 4.3.1

Sailfish OS

Changes:

  • User Interface
    • Speech Note has been translated into Norwegian language.
    • Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
    • Option to enable/disable support for subtitles. Subtitle support is a niche functionality. To simplify the user interface, the subtitle options is not visible by default. To enable them, use the Subtitles support option in the settings.
  • Speech to Text
    • The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
    • Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
    • Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
    • Translate to English option for WhisperCpp models. When enabled, speech is automatically translated into English.
    • Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
  • Text to Speech
    • Welsh language. New language is enabled with Piper voice.
    • New Piper voices for Spanish, Italian and English
    • New RHVoice voices for Slovak and Croatian
  • Translator
    • New button for switching languages.
    • New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
    • Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English
dsnote - Speech Note 4.5.0

Published by mkiol 5 months ago

Linux Desktop

Changes:

  • User Interface
    • Import subtitles embedded into video file. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad
    • Support for more subtitles formats. You can import and export subtitles in SRT, WebVTT and ASS formats.
    • Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified menu bar option.
    • Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
    • Settings option for default action when importing note from a file. You can set Ask whether to add or replace, Add to an existing note or Replace an existing note.
    • Enhanced text editor font settings. You can set the font family, style and size of the font used in the text editor.
    • Text to Text repair options. With these options you can directly fix diacritical marks and punctuation in the text.
    • Text context menu with additional options: Read selection and Translate selection. To activate context menu use mouse right click.
    • New text appending style: After empty line
    • System tray menu for changing active STT/TTS model
    • User friendly names of audio input devices
    • Simplified model filtering. It is now less flexible, but much easier to understand and use.
    • Speech Note has been translated into Ukrainian and Russian languages.
    • Fix: Cancellation was blocking the user interface.
  • Speech to Text
    • Updated Distil model for English: Distil Large-v3. New model is enabled for Whisper and Faster Whisper engines.
    • New Fine-Tuned Whisper models for Slovenian and Polish
    • Fix: Punctuation model could not be downloaded.
  • Text to Speech
    • WhisperSpeech engine that generates voice with exceptional naturalness. The new engine comes with models for English and Polish languages. All models support voice cloning.
    • New voice cloning model for Vietnamese: viXTTS. Model is a fine-tuned version of the phenomenal Coqui XTTS.
    • New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
    • New RHVoice voice for Czech
    • Settings option to enable/disable speech synchronization with subtitle timestamps. This may be useful for creating voice overs.
    • Mixing speech with audio from an existing file. When exporting to a file, you can overlay speech with audio from an existing media file. This can be useful when creating voice overs from subtitles.
    • Context menu option to read from cursor position or read only selected text. To activate context menu use mouse right click.
    • Speech audio is always normalized after TTS processing.
    • Fix: Mimic3 models could not be downloaded.
  • Translator
    • New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
    • Updated models: Czech and Lithuanian
    • Handy buttons to quickly add translated text to the note or to replace it and switch languages
    • Context menu option to translate from cursor position or translate only selected text. To activate context menu use mouse right click.
  • Accessibility
    • New Actions for STT/TTS models switching: switch-to-next-stt-model, switch-to-prev-stt-model, switch-to-next-tts-model, switch-to-prev-tts-model, set-stt-model, set-tts-model
    • New global keyboard shortcuts for STT/TTS models switching (X11 only): Switch to next STT model, Switch to prev STT model, Switch to next TTS model, Switch to prev TTS model
    • Toggle option for keyboard shortcuts (X11 only). When Toggle behavior is enabled, Start listening/reading shortcuts will also stop listening/reading if they are triggered while listening/reading is active.
    • Fix: Accented characters (e.g.: ã, ê) were not transferred correctly to the active window.
  • Flatpak
    • Flatpak runtime update to version 5.15-23.08
    • AMD ROCm update to version 5.7.3
    • PyTorch update to version 2.2.1
    • CTranslate2 update to version 4.2.1
    • Faster-Whisper update to version 1.0.2

A video demonstration of all the changes in 4.5.0: https://www.youtube.com/watch?v=S9MJ7y8-bcw

Sailfish OS

Changes:

  • User Interface
    • Import subtitles in many formats and subtitles embedded into video file. You can import and export subtitles in SRT, WebVTT and ASS formats. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad.
    • Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified pull-down menu option.
    • Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
    • Settings option for default action when importing note from a file. You can set Ask whether to add or replace, Add to an existing note or Replace an existing note.
    • New text appending style: After empty line
    • Speech Note has been translated into Ukrainian and Russian languages.
    • Fix: Cancellation was blocking the user interface.
  • Speech to Text
    • Subtitles support in STT. To generate timestamped text in SRT format, change the text format to SRT Subtitles using the button at the bottom of the text area. Check the settings to find more subtitle options.
  • Text to Speech
    • Speech synchronized with subtitle timestamps in TTS. When the text format is set to SRT Subtitles, the generated speech will be synchronized with the subtitle timestamps. This can be useful if you want to make voice over.
    • New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
    • New RHVoice voice for Czech
    • Settings option to enable/disable speech synchronization with subtitle timestamps.
    • Speech audio is always normalized after TTS processing.
  • Translator
    • New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
    • Updated models: Czech and Lithuanian
dsnote - Speech Note 4.4.0

Published by mkiol 9 months ago

Linux Desktop

Changes:

  • Flatpak
    • Modular Flatpak package (Base package and Add-ons)
    • NVIDIA CUDA runtime update to version 12.2
    • AMD ROCm runtime update to version 5.6
    • PyTorch update to version 2.1.1
  • User Interface
    • Improvements to the model browser
    • Model filtering options
    • Setting option to minimize to the system tray
    • Setting option to enable/disable text in desktop notifications
  • Speech to Text
    • Marathi language. New language is enabled with Whisper and Faster Whisper models.
    • New version of Faster Whisper Large model: 'FasterWhisper Large-v3'
    • 'Distil' versions of Faster Whisper models
    • Whisper and Faster Whisper enabled for Chinese-Cantonese language
    • Support for Speex audio codec in 'Transcribe a file'
    • Translate to English option for Whisper and Faster Whisper models
    • More effective GPU acceleration for Whisper models with AMD graphics cards
    • Subtitles generation (SRT format)
    • Support for multiple audio streams in a video file
  • Text to Speech
    • Marathi language. New language is enabled with Coqui MMS model.
    • Voice cloning with Coqui XTTS and YourTTS models.
      • Coqui XTTS models are enabled for: Arabic, Brazilian Portuguese, Chinese, Czech, Dutch, English, French, German, Hungarian, Italian, Japanese, Korean, Polish, Russian, Spanish and Turkish.
      • YourTTS model is enabled for: English, French and Brazilian Portuguese.
    • Voice samples creator
    • New voices for Serbian and Uzbek languages (RHVoice model)
    • GPU acceleration for Coqui models with AMD graphics cards (in Flatpak version)
    • Speech synchronized with subtitle timestamps
  • Translator
    • New model: Lithuanian to English
    • Option to force text cleaning before translation
    • Text formatting support
    • Translation progress indicator
  • Other
    • Setting option to override GPU version (AMD graphics cards)
    • Setting option to limit number of simultaneous CPU threads
    • Setting option to set Python libraries directory (in non-Flatpak version)

Sailfish OS

  • Speech to Text
    • Marathi language. New language is enabled with Whisper models.
    • Whisper enabled for Chinese-Cantonese language
    • Support for Speex audio codec in 'Transcribe a file'
    • Support for multiple audio streams in a video file
  • Text to Speech
    • New voices for Serbian and Uzbek languages (RHVoice model)
  • Translator
    • New model: Lithuanian to English
    • Translation progress indicator
dsnote - Speech Note 4.3.0

Published by mkiol 11 months ago

Linux Desktop

Changes:

  • Accessibility
    • Global keyboard shortcuts (X11 only)
    • Support for Actions
  • User Interface
    • Desktop notifications
    • Speech speed control in the main app window
    • Opening files with Drag and Drop gesture
    • Fix: Application did not use native widgets on some platforms
  • Translator
    • New model: English to Hungarian
  • Speech to Text
    • New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
    • New engine: Faster Whisper
    • New engine: April-ASR. Models for: English, French and Polish.
    • Inserting text to any active window (X11 only)
    • Copy decoded text directly to the clipboard
    • Stop listening button
    • Support for Opus audio codec in Transcribe a file
    • More effective GPU acceleration for Whisper models (NVIDIA CUDA only)
    • New smaller and quicker Whisper models for English: Distil-Whisper
    • New version of Whisper Large model: Whisper Large-v3
    • Fix: CUDA acceleration for Whisper models did not work on NVIDIA video cards with Maxwell architecture
  • Text to Speech
    • New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
    • New engine: Mimic 3
    • Reading text from the clipboard
    • New Piper voices: Arabic, English, Hungarian, Polish, Czech, German, Ukrainian, Vietnamese, Serbian, French, Spanish, Nepali
    • More steps in Speech speed option
    • Diacritical marks restoration before speech synthesis for Arabic and Hebrew
    • Support for GPU acceleration for Coqui models (NVIDIA CUDA only)
    • Fix: Coqui Chinese MMS Hakka and MinNan voices were broken
    • Fix: Exporting to audio file was not possible when text was very long
  • Other
    • Setting option to disable support for certain graphic cards
    • Setting option Clear cache on close
    • Cache compression (Opus format instead of raw audio)
    • Detecting the availability of the optional features

Sailfish OS

Changes:

  • Translator
    • New model: English to Hungarian
  • Speech to Text
    • New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
    • New engine: April-ASR. Models for: English, French and Polish.
    • Stop listening button
    • Support for Opus audio codec in Transcribe a file
  • Text to Speech
    • New Piper voices: Arabic, English, Hungarian, Polish, Czech, German, Ukrainian, Vietnamese, Serbian, French, Spanish, Nepali
    • More steps in Speech speed option
    • Diacritical marks restoration before speech synthesis for Arabic
    • Fix: Exporting to audio file was not possible when text was very long
  • Other
    • Setting option Clear cache on close
    • Cache compression (Opus format instead of raw audio)
dsnote - Speech Note 4.2.1

Published by mkiol about 1 year ago

Linux Desktop

Changes:

  • Speech to Text
    • Improved AMD GPU acceleration support for Whisper models
dsnote - Speech Note 4.2.0

Published by mkiol about 1 year ago

Linux Desktop

Changes:

  • Translator
    • New models: Hungarian to English, Finnish to English
  • Speech to Text
    • Support for video files transcription
    • Option 'Audio source' to select preferred audio source
    • Whisper engine update and increase in performance.
      Processing time has been reduced by an average of 50%.
    • Improved Nvidia GPU acceleration support for Whisper models
  • Text to Speech
    • Save audio in compressed formats (MP3 or Ogg Vorbis).
      You can also save metadata tags to the audio file, such as track number, title, artist or album.
    • Pause option. You can pause or resume speech reading.
    • New MMS models: Hungarian, Catalan, German,
      Spanish, Romanian, Russian and Swedish
    • Update of RHVoice voice for Uzbek
    • Fix: Many Coqui models couldn't read the numbers or the reading wasn't correct.
    • Fix: Piper models could not be downloaded
  • User Interface
    • Menu options: 'Open a text file' and 'Save to a text file'
    • Command line option to open files
    • Improved UI colors when app is running under GNOME dark theme
    • Option 'Graphical style' to change Qt interface style

Sailfish OS

Changes:

  • Translator
    • New models: Hungarian to English, Finnish to English
  • Speech to Text
    • Support for video files transcription. With 'Transcribe a file' menu option you can
      convert audio file or audio from video file to text.
    • Whisper engine update and increase in performance.
      Processing time has been reduced by an average of 15% (Xperia 10 III).
  • Text to Speech
    • Save audio in compressed formats (MP3 or Ogg Vorbis).
      You can also save metadata tags to the audio file, such as track number, title, artist or album.
    • Pause option. You can pause or resume speech reading.
    • Update of RHVoice voice for Uzbek
    • Fix: Piper models could not be downloaded
  • User Interface
    • Share to Speech Note. You can push text, audio or video content to Speech Note
      using share button in other apps (e.g. Notes, Gallery, Audio recorder, Browser).
dsnote - Speech Note 4.1.0

Published by mkiol about 1 year ago

Linux Desktop

Changes:

  • Speech to Text:
    • Support for GPU acceleration for Whisper models
    • Fix: Whisper wasn't able to decode short speech sentences
  • Text to Speech:
    • Option 'Speech speed' to make synthesized speech slower or faster.
    • New models from Massively Multilingual Speech (MMS) project:
      Albanian, Amharic, Arabic, Basque, Bengali, Bulgarian, Chinese,
      Greek, Hindi, Icelandic, Indonesian, Kazakh, Korean, Latin,
      Latvian, Malay, Mongolian, Polish, Portuguese, Swahili, Tagalog,
      Tatar, Thai, Turkish, Uzbek, Vietnamese, Yoruba
    • New Piper voices: Czech, German, Hungarian, Portuguese, Slovak,
      English
    • Update of RHVoice voices for Slovak and Czech
    • New Coqui voices for Japanese, Turkish and Spanish
    • Fix: Splitting text into sentences was incorrect for: Georgian,
      Japanese, Bengali, Nepali, Hindi
  • Interface
    • Option to change font size in text editor

Sailfish OS

Changes:

  • Speech to Text:
    • Remove of experimental 'Restore punctuation' option
    • Fix: Whisper wasn't able to decode short speech sentences
  • Text to Speech:
    • Option 'Speech speed' to make synthesized speech slower or faster.
    • New Piper voices: Czech, German, Hungarian, Portuguese, Slovak,
      English
    • Update of RHVoice voices for Slovak and Czech
    • Fix: Splitting text into sentences was incorrect for: Georgian,
      Japanese, Bengali, Nepali, Hindi
dsnote - Speech Note 4.0.0

Published by mkiol about 1 year ago

Changes:

  • Translator:
    • Support for offline translations.
  • Interface:
    • User interface redesign
    • Settings option to force specific interface style.
    • App translated to new languages: Dutch and Italian
  • Text to Speech:
    • All existing Piper models were updated.
    • New Piper voices for: English, Swedish, Turkish, Polish,
      German, Spanish, Finnish, French, Ukrainian, Russian,
      Swahili, Serbian, Romanian, Luxembourgish and Georgian
    • New RHVoice model for Slovak language
dsnote - Speech Note 3.1.5

Published by mkiol over 1 year ago

Changes in Linux Desktop version:

  • Text to Speech:
    • New Coqui voice for English: Jenny
  • Speech to Text:
    • Quicker decoding when using DeepSpeech/Coqui models (especially on ARM CPU)

Changes in Sailfish OS version:

  • Speech to Text:
    • Quicker decoding when using DeepSpeech/Coqui models
    • Re-enabled Swedish Vosk model
dsnote - Speech Note 3.1.4

Published by mkiol over 1 year ago

Changes in Linux Desktop version:

  • Interface:
    • Option to show recent changes (About -> Changes)
    • French translation update (Many thanks to @LAfricain)
  • Text to Speech:
    • New Piper model for Chinese
    • New RHVoice model for Uzbek (Beta)
    • Updated RHVoice models for Ukrainian
    • Piper and RHVoice engines updated to most recent versions
  • Speech to Text:
    • Whisper 'Large' models enabled for all languages
    • Whisper supported on older CPUs (i.e. without AVX/AVX2 extensions)
    • Whisper engine update (20% performance improvement, 50% less memory)

Changes in Sailfish OS version:

  • Interface:
    • French translation update (Many thanks to @LAfricain)
  • Text to Speech:
    • New Piper model for Chinese
    • New RHVoice model for Uzbek (Beta)
    • Updated RHVoice models for Ukrainian
    • Piper and RHVoice engines updated to most recent versions
  • Speech to Text:
    • Whisper 'Small' models enabled for all languages
    • Whisper fine-tuned 'Small' models for: Croatian, Czech, Hungarian, Slovak and Romanian
    • Whisper engine update (test on Xperia 10 III: 20% performance improvement, 50% less memory)
dsnote - Speech Note 3.1.3

Published by mkiol over 1 year ago

Changes:

  • New Piper Text-to-Speech models for: Icelandic, Swedish, Russian
  • Whisper 'fine-tuned' Speech-To-Text models for: Czech, Slovak, Slovenian, Romanian, Russian, Hungarian, Polish
  • Whisper models enabled also for: Amharic, Arabic, Bengali, Danish, Estonian, Basque, Persian, Hindi, Croatian, Hungarian, Icelandic, Georgian, Kazakh, Korean, Lithuanian, Latvian, Mongolian, Maltese, Nepali, Romanian, Slovak, Slovenian, Albanian, Swahili, Tagalog, Tatar, Uzbek, Yoruba
dsnote - Speech Note 3.1.1

Published by mkiol over 1 year ago

Changes:

  • Option to save speech to audio file
  • New STT DeepSpeech model for Latvian language
  • Linux Desktop UI (Flatpak release on flathub.org)
  • Coqui TTS models for many languages (only in x86_64 Flatpak version)
dsnote - Speech Note 3.0.0

Published by mkiol over 1 year ago

Changes:

  • Note reading with Text to Speech
  • Restore punctuation option
dsnote - Speech Note 2.0.1

Published by mkiol over 1 year ago

Changes:

  • Translations update: Dutch, Swedish
  • Improved decoding accuracy thanks to noise canceling module.
  • Minor UI fixes
dsnote - Speech Note 2.0.0

Published by mkiol over 1 year ago

Changes:

  • New languages: Arabic, Bulgarian, Bosnian, Esperanto, Persian, Hindi, Japanese, Kazakh, Korean, Macedonian, Malay, Norwegian, Portuguese, Slovak, Serbian, Swedish, Swahili, Tagalog, Uzbek, Vietnamese
  • Support for Vosk engine and models
  • Support for Whisper engine and model (works decently only on ARM64)
  • New DeepSpeech models and update of existing ones
  • Voice Activity Detection
  • Option for text appending style
  • Option for setting default model (model which is used in Speech Keyboard)
dsnote - Speech Note 1.8.0

Published by mkiol over 2 years ago

Changes:

  • New languages: Finnish, Mongolian (experimental), Estonian (experimental)
  • Improved model for Polish language: Polski (mkiol)
  • Experimental German medical model: Deutsch (med)
  • New models for English: English (Coqui Huge Vocabulary), English (Coqui Large Vocabulary)
  • Improved languages browser
  • Support for SFOS 4.4
dsnote - Speech Note 1.7

Published by mkiol over 2 years ago

Experimental release with lang models for German language with medical vocabulary.