DeeJayPEG

is an audio I/O toolbox for effiently transform, store and filter audio spectrograms, especially suited for machine learning tasks that rely on non-negative data such as in source separation. In turn, _ does use an optimized pipeline to transform and scale audio signals and then apply lossy compression to save them efficiently as source images. This makes it an ideal fit to process music data with machine learning libraries such as PyTorch and Tensorflow that have fast, builtin support to load and process images. Last but not least, deejaypeg provides convenient functions to easily apply multichannel Wiener filtering to the sepearated sources.

Features

Multichannel Time-Frequency Transform (STFT)
Log Magnitude compression
Bandwidth reduction
Quantization
Image export

Applications

Source Separtion
Data Prepreprocessing for audio tasks
Create beautiful Spectrograms

Installation

pip install deejaypeg

Usage

Transform

deejaypeg includes a multichannel short-time Fourier transform by wrapping the builtin scipy implementation. For convenience the transform parameters are stored in the deejaypeg.TF object. That way, the inverse transform can easily be called later.

import deejaypeg as djpeg
tf = djpeg.TF(n_fft=2048, n_hop=1024)
X = tf.transform(audio)
inverse = tf.inverse_transform(X)

Filtering

Compression

Bandwidth Reduction

Quantization

Image

Many researchers save their magnitude dataset as numpy pickles or hdf5 files. While this is fast to load and write it uses a significant amount of disk space to store the files (even when zipped). Also, since jpg routines are highly optimized these days, reading jpgs is significantly faster than decoding AAC or MP3 files. Here is a bitrate comparison:

npy 64bit: ~750 kb/s
npy 64bit: zipped: ~680 kb/s
MP3 good quality: 256 kb/s
AAC good quality: 128 kb/s
norbert quantization as 8bit npy: 89 kb/s
norbert quantization as 8bit jpg (q=80): 15 kb/s

We built deejaypeg in the context of source separation models where filtering is applied using the original mixture phase, thus reducing the influence of minor imperfections of the magnitude. We used the PEAQ objective audio quality evaluation to assess the quality difference in a setting where we compress the magnitude of and audio signal and synthesize using the decoded (but compressed) magnitude, together with the original uncompressed mixture phase. The results on 50 music tracks from the MUSDB18 dataset shows, that with the right JPG quality parameter (we pick 80 as our default), difference between the compressed magnitude and the original magnitude are almost imperceptable.

Using the deejaypeg.Coder module, we built the MUSMAG dataset, a dataset of precomputed audio multitrack spectrograms for source separation tasks.

Badges

Extracted from project README