Separake: Echo-aware source separation

This repository contains all the code to reproduce the results of the paper Separake: Source separation with a little help from echoes.

We are available for any question or request relating to either the code or the theory behind it. Just ask!

Abstract

It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show that multipath in fact helps sound source separation, even with very simple propagation models. Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. We rather assume that we know the positions of a few virtual microphones generated by echoes and we show how this gives us enough spatial diversity to get a performance boost over the anechoic case. We show improvements for two standard algorithmsone that uses only magnitudes of the transfer functions, and one that also uses the phases. Concretely, we show that multichannel non-negative matrix factorization aided with a small number of echoes beats the vanilla variant of the same algorithm, and that with magnitude information only, echoes enable separation where it was previously impossible.

Authors

Robin Scheibler (TMU)
Diego Di Carlo (INRIA)
Antoine Deleforge (INRIA)
Ivan Dokmani (UIUC)

Contact

Robin Scheibler Ono Laboratory Graduate School of System Design Tokyo Metropolitan University 6-6 Asahigaoka, Hino city, Tokyo 191-0065 Japan

Summary of Files

separake_mu_early.py uses the Ozerov and Fevotte MU algorithm. This is the orignal attempt by Robin.
separake_near_wall.py implements the image microphone model and places the microphones close to a wall. No separation yet.
utilities.py contains auxiliary methods.

Recreate the figures and sound samples

To recreate the figures from the original simulated data (stored in data/paper_results/), run

./make_figures.sh

To redo all the simulation, run

[TBA]

Recorded Data

[TBA]

The recorded samples are stored in the recordings folder. Detailed description and instructions are provided along the data.

Overview of results

TBA

Acknowledgement

Authors of \cite{ozerov2010multichannel} generously provide a MATLAB implementation of MU-NMF and EM-NMF methods for stereo separation. We ported this code to Python 3 and extended it arbitrary number of input channels. We think this implementation could be useful to the community and have released the code\footnote{\textcolor{red}{}Link will go here after review}}.

Implementation Details

First the original code was restricted to the 2-channel case, i.e. $M = 2$. Thus, in order to embrace the specifics of our scenario and for sake of generalization, we extend it to the multi-channel case, that is $\forall M > 1$.

Secondly, the MU-NMF was modified to handle sparsity contraint as described in \ref{sec:mu}.

Third, since EM method degenerates where zero-valued entries are present in the dictionary matrix, $\mD$, all these entries are initially set to a small constant value of \texttt{1e-6}.

Finally, the code was further modified to deal with fixed dictionary and channel models matrices, which are normalized in order to avoid indeterminacy issues \cite{ozerov2010multichannel}.

Now to conclude with, no \textit{simulated annealing} strategies are used in the final experiments. In fact in some preliminary and informal investigations we noticed that this yields better results than using annealing. In the experiments, the number of iterations was set to $300$.

Dependencies

A working distribution of Python 3.5 (but 2.7 should work too).
Numpy, Scipy
We use the distribution anaconda to simplify the setup of the environment.
Computations are very heavy and we use the
MKL extension of
Anaconda to speed things up. There is a free license for academics.
We used ipyparallel and joblib for parallel computations.
matplotlib and seaborn for plotting the results.
mir_eval is used for the BSS evaluation routines it contains.

The pyroomacoustics is used for STFT, fractionnal delay filters, microphone arrays generation, and some more.

pip install pyroomacoustics

List of standard packages needed

numpy, scipy, pandas, ipyparallel, seaborn, zmq, joblib, samplerate, mir_eval

Systems Tested

TBA

License

All the code in this repository is under MIT License.

Related Projects

gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

11 Jul 2016 314

dSEQ-VAE

BAD-VAE: A VAE framework for unsupervised disentanglement of sequential data

01 May 2022 11

ss-vq-vae

Self-supervised VQ-VAE for One-Shot Music Style Transfer

12 Oct 2020 84

signals-and-systems-lecture

Continuous- and Discrete-Time Signals and Systems - Theory and Computational Examples

29 Mar 2016 324

separake