Repo for the IDESSAI 2024 course on modeling audio with discrete tokens.
This repository provides the code for my class at IDESSAI 2024 about auto-regressive modeling of discrete audio tokens. We use Audiocraft to fine tune a pre-trained MusicGen model on a small dataset of tracks from a given style.
If you want to follow on Colab, go to the Audiocraft fine tuning colab.
First clone this repository and cd the root folder:
git clone https://github.com/adefossez/audio_mod_idessai.git
cd audio_mod_idessai
Make sure to have an environment with ffmpeg
installed, the easiest is with
conda/mamba: conda install -c conda-forge ffmpeg
.
Then we install audiocraft
with slightly
different requirements to allow more recent versions of PyTorch (especially on Colab).
Note that I had some issues with python3.10 getting a bus error, so maybe try to use python3.12.
# If you need a specific version of cuda, first install it along with torchaudio, for instance
# xformers can be a bit tricky to get when pytorch releases a new version, so we pin 2.4.0.
pip install torch==2.4.0 torchaudio==2.4.0 xformers
pip install -r requirements.txt
# If you want to run locally the notebook, and maybe have some VIM binding ;)
pip install jupyter # jupyterlab-vim
Now let's install clone audiocraft
git submodule init
git submodule update
pip install --no-deps -e audiocraft
Edit audio_mod_idessai/config.py
with the proper URL.
python -m audio_mod_idessai.config
jupyter notebook