Listen-Chat-Edit-on-Edge

Listen, Chat, Edit on Edge

Stars
0
Committers
6

Listen, Chat, And Edit on Edge: Text-Guided Soundscape Modification for Real-Time Auditory Experience

What is this project about?

Listen, Chat, and Edit (LCE) is a cutting-edge multimodal sound mixture editor designed to modify each sound source in a mixture based on user-provided text instructions. The system features a user-friendly chat interface and the unique ability to edit multiple sound sources simultaneously within a mixture without the need for separation. Using open-vocabulary text prompts interpreted by a large language model, LCE creates a semantic filter to edit sound mixtures, which are then decomposed, filtered, and reassembled into the desired output.

Project Structure

  • data/datasets: Contains the scripts used to process dataset and prompts.
  • demonstration: A demonstration of an input mixure and the edited version.
  • embeddings: The pkl file recieved from the LLM are stored in this folder.
  • hparams: Hyperparameters settings for the models.
  • llm_cloud: Configuration and scripts for cloud-based language model interactions.
  • modules: Core modules and utilities for the project.
  • prompts: Handling and processing of text prompts.
  • pubsub: Setup for publish-subscribe messaging patterns.
  • utils: Utility scripts for general purposes.
  • E6692.2022Spring.LCEE.ss6928.pkk2125.presentationFinal.pptx: Final presentation file detailing project overview and results.
  • profiling.ipynb: Jupyter notebook for profiling the modules in terms of inference speed and gpu memory usage.
  • run_lce.ipynb: Main executable notebook for the LCE system.
  • run_prompt_reader.ipynb: Notebook for reading and processing prompts.
  • run_prompt_reader_profiling.ipynb: Profiling for the prompt reader.
  • run_sound_editor_nosb.ipynb: Notebook for the sound editor module without SpeechBrain.

Installation

  1. Clone the repository:
    git clone https://github.com/SiavashShams/Listen-Chat-Edit-on-Edge.git
    
  2. Install required dependencies:
    pip install -r requirements.txt
    

Usage

To run the main LCE application:

run_lce.ipynb

For a demonstration of the system's capabilities, refer to the demonstration folder.

Implementation

  • Deploy Conv-TasNet on the Jetson Nano.
  • Deploy LLAMA 2 on a GCP server
  • Send a prompt to the server. Communication is handled in two methods - one, through SSH and the other, through Pub/Sub service.
  • LLM computed the embedding and publishes back the embedding, which is input to the Conv-TasNet model.
  • The resulting audio mixture is ready to be played!

Links

Presentation

Report

References

Thanks to the authors of Listen, Chat, And Edit for their amazing work.

Related Projects