slakh-pytorch-dataset

Unofficial PyTorch dataset for Slakh

MIT License

Stars

9

View Code on GitHub

Ecosystems: Python

Slakh PyTorch Dataset

Unofficial PyTorch dataset for Slakh.

This project is a work in progress, expect breaking changes!

Roadmap

Automatic music transcription (AMT) usecase with audio and labels

Specify dataset split (original, splits_v2, redux)
Add new splits (redux_no_pitch_bend, ...) (Should also be filed upstream) (implemented by skip_pitch_bend_tracks)
Load audio mix.flac (all the instruments comined)
Load individual audio mixes (need to combine audio in a streaming fashion)
Specify train, validation or test group
Choose sequence length
Reproducable load sequences (usefull for validation group to get consistent results)
Add more instruments (eletric-bass, piano, guitar, ...)
Choose between having audio in memory or stream from disk (solved by max_files_in_memory)
Add to pip

Audio source separation usecase with different audio mixes

List to come

Usage

Download the Slakh dataset (see the official website). It's about 100GB compressed so expect using some time on this point.
Install the Python package with pip:

pip install slakh-dataset

Convert the audio to 16 kHz (see https://github.com/ethman/slakh-utils)
You can use the dataset (AMT usecase):

from torch.utils.data import DataLoader
from slakh_dataset import SlakhAmtDataset


dataset = SlakhAmtDataset(
    path='path/to/slakh-16khz-folder'
    split='redux', # 'splits_v2','redux-no-pitch-bend'
    audio='mix.flac', # 'individual'
    label_instruments='electric-bass', # or `label_midi_programs`
    # label_midi_programs=[33, 34, 35, 36, 37],
    groups=['train'],
    skip_pitch_bend_tracks=True,
    sequence_length=327680,
    max_files_in_memory=200,
)

batch_size = 8
loader = DataLoader(dataset, batch_size, shuffle=True, drop_last=True)

# train model on dataset...

Acknowledgement

This code is based on the dataset in Onset and Frames by Jong Wook Kim which is MIT Lisenced.
Slakh http://www.slakh.com/

Related Projects

Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Genera...

07 Aug 2020 10,385

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

29 Jan 2023 2,400

eva

A screaming vocal samples dataset.

flazy

Functional, lazy-evaluated dataset manipulation library for ML in Python

audio2dataset

Easily turn large sets of audio urls to an audio dataset.

piano_transcription

27 Aug 2020 1,624

CLAP

Contrastive Language-Audio Pretraining

06 Mar 2022 1,358

AudioClassification-PaddlePaddle

基于PaddlePaddle实现的音频分类，支持EcapaTdnn、PANNS、TDNN、Res2Net、ResNetSE等各种模型，还有多种预处理方法

dswav

audio-super-res

Audio super resolution using neural networks

13 Mar 2017 1,161

dcase2024-task6-baseline

DCASE2024 Challenge Task 6 baseline system (Automated Audio Captioning)

musicTagging_MSD

audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

05 May 2017 2,468

music-spectrogram-diffusion-pytorch

onsets-and-frames

A Pytorch implementation of Onsets and Frames (Hawthorne 2018)

05 Dec 2018 206