README

transcription.stream ******

This project creates a ssh and web accessible platform setup for trasnscribing and diarizing audio files. Files dropped via ssh into transcribe or diarize are treated to their respective process, with the output placed into a dated folder named after the audio file within transcribed. Likewise, uploading files via the ts-web web interface places files into the same folders, reading the results from the transcribed folder.

Expects an nvidia gpu.

Build and run

-Create the transcriptionstream volume docker volume create -name=transcriptionstream

-Then create the ts-web and ts-gpu images from their folders respectively: #ts-web (very small very fast minimal build) docker build -t ts-web:latest . #ts-gpu (this is going to take a while and end up around 13.8GB - while large, it contains the needed models to run offline) docker build -t ts-gpu:latest .

-Run the service by kicking off the docker-compose. The console will provide plenty of updates for running jobs, and lots of noisy info from ts-web. docker-compose -p transcriptionstream up

Notes

ports: 22222/ssh 5006/http

Access the ssh server with on port 22222, placing the audio files you'd like to have transcribed into the transcribe folder, and diarized into the diarize folder. Completed files are placed into a dated, named, folder under transcribed. user: transcriptionstream pass: nomoresaastax

Access the web front end at http://dockerip:5006 Please understand I don't know flask, python, or javascript but was able to put this together with our friend chatgpt and many questions. It was a fun excercise that I still have future plans for. This version provides:

audio file upload/download
actionable task completion alerts (you can click on the alert and the transcription loads in the player)
html 5 web player with speed control and transcription highlighting
in transcription time scrubbing/scrolling synced to audio via the time slider
lots of things done incorrectly

You shouldn't run this in production. This is functional example code. Did you see the part about ts-gpu taking a while to build? It's going to take a while, for me it's about 15 minutes, again, with the benefit of having the models available for offline use.

You should change the password for transcriptionstream in the ts-gpu Dockerfile, and update the secret in ts-web app.py

The transcription option was changed to whisperx from openai's whisper - mostly because it was already there for diarize.py and whisper was breaking the buidld - all that to say the raw text output for transcriptionsdoesn't display correctly in the console and probably ts-web, but the other generated files are good. The transcription option is also set to use the large-v2 model which is not downloaded during build. First transcription probably delayed by that. RUN line can be added to the ts-gpu Dockerfile so it's included in the image build, or you can modify transcribe_example_d.sh to use the medium model as solutions.

Related Projects

ctrl_plus_revise

Your local AI Assistant.

28 Jun 2024 5

willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT...

03 Feb 2023 375

ScribeWizard

ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3

19 Jun 2024 442

LLaMA-LoRA-Tuner

UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click...

03 Apr 2023 438

ezlocalai

ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.

02 Oct 2023 72

voice-chat-ai

🎙️ Speak with AI - Run locally using ollama or OpenAI - XTTS or OpenAI Speech or ElevenLabs

13 Jun 2024 78

Owl

A personal wearable AI that runs locally

13 Jan 2024 520

autoshow

End-to-end scripting workflow to automatically generate show notes from audio/video transcripts w...

17 Apr 2024 28

AI.Labs

openai chatgpt or local llm(llama.cpp gguf format)+TTS+STT+Word+Excel

10 Dec 2023 83

mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously us...

22 Mar 2024 6

openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS t...

09 Oct 2024 60

llama3-playground

A fully-contained, ready-to-run environment to finetune Llama 3 model with custom dataset and run...

10 Jul 2024 1

asktube

AskTube - An AI-powered YouTube video summarizer and QA assistant powered by Retrieval Augmented ...

03 Sep 2024 62

AudioSumma

Record your global audio and transcribe with whisper.cpp and llama.cpp

21 Jun 2024 2

DocPOI_repo

A local chatbot for managing docs

23 Aug 2024 13