🔈 ⃕ 🖼
is an audio I/O toolbox for effiently transform, store and filter audio spectrograms, especially suited for machine learning tasks that rely on non-negative data such as in source separation. In turn, _ does use an optimized pipeline to transform and scale audio signals and then apply lossy compression to save them efficiently as source images. This makes it an ideal fit to process music data with machine learning libraries such as PyTorch and Tensorflow that have fast, builtin support to load and process images. Last but not least, deejaypeg provides convenient functions to easily apply multichannel Wiener filtering to the sepearated sources.
pip install deejaypeg
deejaypeg includes a multichannel short-time Fourier transform by wrapping the builtin scipy implementation. For convenience the transform parameters are stored in the deejaypeg.TF
object. That way, the inverse transform can easily be called later.
import deejaypeg as djpeg
tf = djpeg.TF(n_fft=2048, n_hop=1024)
X = tf.transform(audio)
inverse = tf.inverse_transform(X)
Many researchers save their magnitude dataset as numpy pickles or hdf5 files. While this is fast to load and write it uses a significant amount of disk space to store the files (even when zipped). Also, since jpg routines are highly optimized these days, reading jpgs is significantly faster than decoding AAC or MP3 files. Here is a bitrate comparison:
q=80
): 15 kb/sWe built deejaypeg in the context of source separation models where filtering is applied using the original mixture phase, thus reducing the influence of minor imperfections of the magnitude. We used the PEAQ objective audio quality evaluation to assess the quality difference in a setting where we compress the magnitude of and audio signal and synthesize using the decoded (but compressed) magnitude, together with the original uncompressed mixture phase. The results on 50 music tracks from the MUSDB18 dataset shows, that with the right JPG quality parameter (we pick 80
as our default), difference between the compressed magnitude and the original magnitude are almost imperceptable.
Using the deejaypeg.Coder
module, we built the MUSMAG dataset, a dataset of precomputed audio multitrack spectrograms for source separation tasks.