Listen, Chat, And Edit on Edge: Text-Guided Soundscape Modification for Real-Time Auditory Experience

What is this project about?

Listen, Chat, and Edit (LCE) is a cutting-edge multimodal sound mixture editor designed to modify each sound source in a mixture based on user-provided text instructions. The system features a user-friendly chat interface and the unique ability to edit multiple sound sources simultaneously within a mixture without the need for separation. Using open-vocabulary text prompts interpreted by a large language model, LCE creates a semantic filter to edit sound mixtures, which are then decomposed, filtered, and reassembled into the desired output.

Project Structure

data/datasets: Contains the scripts used to process dataset and prompts.
demonstration: A demonstration of an input mixure and the edited version.
embeddings: The pkl file recieved from the LLM are stored in this folder.
hparams: Hyperparameters settings for the models.
llm_cloud: Configuration and scripts for cloud-based language model interactions.
modules: Core modules and utilities for the project.
prompts: Handling and processing of text prompts.
pubsub: Setup for publish-subscribe messaging patterns.
utils: Utility scripts for general purposes.
E6692.2022Spring.LCEE.ss6928.pkk2125.presentationFinal.pptx: Final presentation file detailing project overview and results.
profiling.ipynb: Jupyter notebook for profiling the modules in terms of inference speed and gpu memory usage.
run_lce.ipynb: Main executable notebook for the LCE system.
run_prompt_reader.ipynb: Notebook for reading and processing prompts.
run_prompt_reader_profiling.ipynb: Profiling for the prompt reader.
run_sound_editor_nosb.ipynb: Notebook for the sound editor module without SpeechBrain.

Installation

Clone the repository:

git clone https://github.com/SiavashShams/Listen-Chat-Edit-on-Edge.git

Install required dependencies:
```
pip install -r requirements.txt
```

Usage

To run the main LCE application:

run_lce.ipynb

For a demonstration of the system's capabilities, refer to the demonstration folder.

Implementation

Deploy Conv-TasNet on the Jetson Nano.
Deploy LLAMA 2 on a GCP server
Send a prompt to the server. Communication is handled in two methods - one, through SSH and the other, through Pub/Sub service.
LLM computed the embedding and publishes back the embedding, which is input to the Conv-TasNet model.
The resulting audio mixture is ready to be played!

Links

Presentation

Report

References

Thanks to the authors of Listen, Chat, And Edit for their amazing work.

Related Projects

LLaVA-NeXT-Image-Llama3-Lora

LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft

24 Jun 2024 37

modelfusion

The TypeScript library for building AI applications.

25 May 2023 889

autoshow

End-to-end scripting workflow to automatically generate show notes from audio/video transcripts w...

17 Apr 2024 28

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Underst...

06 May 2023 2,738

llm-demo

This repository demonstrates how to do inference with llama-2-7b-chat using llama.cpp on a machin...

11 Mar 2024 3

Get-Things-Done-with-Prompt-Engineering-and-LangChain

LangChain & Prompt Engineering tutorials on Large Language Models (LLMs) such as ChatGPT with cus...

12 Apr 2023 1,094

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and b...

17 Apr 2023 19,659

dynamic_prompting

Dynamic Few-Shot Prompting is a Python package that dynamically selects N samples that are contex...

03 Jul 2024 3

arey

Simple large language model playground app

22 Jul 2023 4

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable fo...

22 Jul 2023 1,967

AI.Labs

openai chatgpt or local llm(llama.cpp gguf format)+TTS+STT+Word+Excel

10 Dec 2023 83

botality-ii

telegram bot for self-hosted local inference of stable diffusion, text-to-speech and large langua...

11 Mar 2023 37

EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

27 Jun 2024 509

LLaMA-LoRA-Tuner

UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click...

03 Apr 2023 438

libre-chat

🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline ca...

26 Jul 2023 128