IDESSAI 2024 - Auto-regressive modeling of discrete audio tokens

This repository provides the code for my class at IDESSAI 2024 about auto-regressive modeling of discrete audio tokens. We use Audiocraft to fine tune a pre-trained MusicGen model on a small dataset of tracks from a given style.

If you want to follow on Colab, go to the Audiocraft fine tuning colab.

Requirements

First clone this repository and cd the root folder:

git clone https://github.com/adefossez/audio_mod_idessai.git
cd audio_mod_idessai

Make sure to have an environment with ffmpeg installed, the easiest is with conda/mamba: conda install -c conda-forge ffmpeg.

Then we install audiocraft with slightly different requirements to allow more recent versions of PyTorch (especially on Colab). Note that I had some issues with python3.10 getting a bus error, so maybe try to use python3.12.

# If you need a specific version of cuda, first install it along with torchaudio, for instance
# xformers can be a bit tricky to get when pytorch releases a new version, so we pin 2.4.0.
pip install torch==2.4.0 torchaudio==2.4.0 xformers
pip install -r requirements.txt

# If you want to run locally the notebook, and maybe have some VIM binding ;)
pip install jupyter # jupyterlab-vim

Now let's install clone audiocraft

git submodule init
git submodule update
pip install --no-deps -e audiocraft

Setup

Edit audio_mod_idessai/config.py with the proper URL.

Download the dataset

python -m audio_mod_idessai.config

Launch notebook

jupyter notebook

Related Projects

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the s...

08 Jun 2023 20,690

stable-audio-tools

Generative models for conditional audio generation

23 May 2023 2,555

AudioClassification-PaddlePaddle

基于PaddlePaddle实现的音频分类，支持EcapaTdnn、PANNS、TDNN、Res2Net、ResNetSE等各种模型，还有多种预处理方法

24 Apr 2020 85

riffusion-hobby

Stable diffusion for real-time music generation

25 Nov 2022 3,367

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

29 Jan 2023 2,400

inver-synth

A Python implementation of the InverSynth method (Barkan, Tsiris, Koenigstein, Katz)

23 Nov 2019 26

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 ...

20 Oct 2022 3,213

audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

05 May 2017 2,468

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.

06 Sep 2023 1,102

dcase2024-task6-baseline

DCASE2024 Challenge Task 6 baseline system (Automated Audio Captioning)

30 Jan 2024 2

AudioLDM2

Text-to-Audio/Music Generation

04 Aug 2023 2,248

dswav

23 Nov 2023 14

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

26 May 2019 52,264

vae-audio

Variational auto-encoders for audio

22 May 2019 103

2020_interspeech_gmdp

Generalized Minimal Distortion Principle for Blind Source Separation

07 Aug 2020 19