slakh-pytorch-dataset

Unofficial PyTorch dataset for Slakh

MIT License

Stars
9

Slakh PyTorch Dataset

Unofficial PyTorch dataset for Slakh.

This project is a work in progress, expect breaking changes!

Roadmap

Automatic music transcription (AMT) usecase with audio and labels

  • Specify dataset split (original, splits_v2, redux)
  • Add new splits (redux_no_pitch_bend, ...) (Should also be filed upstream) (implemented by skip_pitch_bend_tracks)
  • Load audio mix.flac (all the instruments comined)
  • Load individual audio mixes (need to combine audio in a streaming fashion)
  • Specify train, validation or test group
  • Choose sequence length
  • Reproducable load sequences (usefull for validation group to get consistent results)
  • Add more instruments (eletric-bass, piano, guitar, ...)
  • Choose between having audio in memory or stream from disk (solved by max_files_in_memory)
  • Add to pip

Audio source separation usecase with different audio mixes

  • List to come

Usage

  1. Download the Slakh dataset (see the official website). It's about 100GB compressed so expect using some time on this point.

  2. Install the Python package with pip:

pip install slakh-dataset
  1. Convert the audio to 16 kHz (see https://github.com/ethman/slakh-utils)

  2. You can use the dataset (AMT usecase):

from torch.utils.data import DataLoader
from slakh_dataset import SlakhAmtDataset


dataset = SlakhAmtDataset(
    path='path/to/slakh-16khz-folder'
    split='redux', # 'splits_v2','redux-no-pitch-bend'
    audio='mix.flac', # 'individual'
    label_instruments='electric-bass', # or `label_midi_programs`
    # label_midi_programs=[33, 34, 35, 36, 37],
    groups=['train'],
    skip_pitch_bend_tracks=True,
    sequence_length=327680,
    max_files_in_memory=200,
)

batch_size = 8
loader = DataLoader(dataset, batch_size, shuffle=True, drop_last=True)

# train model on dataset...

Acknowledgement