Collection of scripts to create a dataset of noisy multi-channel reverberant mixtures based on wsj1 and CHiME3 datasets.
MIT License
Using anaconda should make it easy to install all the dependencies and reproduce the dataset. After installing anaconda do the following.
git clone [email protected]:fakufaku/create_wsj1_2345_db.git
cd create_wsj1_2345_mix_spatialized
conda env create -f environment.yml
conda activate wsj1_2345_db
The script assumes that you have the following datasets available and
stored in a directory that we'll assume is named <original_datasets_dir>
csr_1
csr_2_comp
CHIME3
For WSJ0 and WSJ1, we assume their respective folders contain subfolders named after each of the DVDs. The detailed original datasets directory structure is shown in detail here.
python ./make_dataset.py config.json <original_datasets_dir> <output_dir>
# convert WSJ1 nist format to regular wav
python ./make_raw_wav.py config.json <original_datasets_dir> <output_dir>
# get the text transcription from the audio
python ./get_trans.py config.json <original_datasets_dir> <output_dir>
# create the mix metadata
python ./create_mixinfo.py config.json <original_datasets_dir> <output_dir>
# simulate propagation and mix the audio, then check
python ./mix.py config.json <original_datasets_dir> <output_dir>
python ./check_mix.py config.json <original_datasets_dir> <output_dir>
# add noise to all the mixtures, then check
python ./noise_add.py config.json <original_datasets_dir> <output_dir>
python ./check_noisy_mix.py config.json <original_datasets_dir> <output_dir>
The dataset generation is controlled by a JSON file like the following
{
"db_name": "wsj1_2345_db",
"combinations":
[
{ "mics": 2, "sources": 2, "seed": 639872833 },
{ "mics": 3, "sources": 3, "seed": 312393873 },
{ "mics": 4, "sources": 4, "seed": 739853286 }
],
"mixinfo_parameters": {
"room": { "l": [5, 10], "w": [5, 10], "h": [3, 4], "t60": [0.2, 0.6] },
"array": {
"xy_jittering": 0.2,
"z": [1, 2],
"radius": [0.075, 0.125],
"min_dist_mics": 0.05
},
"speaker": {
"xy_square": 3.0,
"z": [1.5, 2.0],
"min_dist_array": 0.5,
"min_dist_speaker": 1.0,
"snr": [-5, 5]
},
"noise": {
"snr_range": [10, 30]
},
"wav_upper_limit": 0.9,
"remove_mean_sources": true
},
"tests": {
"snr_tol": 0.5
}
}
Some of the differences with wsj0-2mix/3mix dataset.
<original_datasets_dir>
+-- csr_1
| +-- 11-1.1
| +-- 11-10.1
| +-- 11-11.1
| +-- 11-12.1
| +-- 11-13.1
| +-- 11-14.1
| +-- 11-15.1
| +-- 11-2.1
| +-- 11-3.1
| +-- 11-4.1
| +-- 11-5.1
| +-- 11-6.1
| +-- 11-7.1
| +-- 11-8.1
| +-- 11-9.1
| +-- file.tbl
| +-- readme.txt
+-- csr_2_comp
| +-- 13-1.1
| +-- 13-10.1
| +-- 13-11.1
| +-- 13-12.1
| +-- 13-13.1
| +-- 13-14.1
| +-- 13-15.1
| +-- 13-16.1
| +-- 13-17.1
| +-- 13-18.1
| +-- 13-19.1
| +-- 13-2.1
| +-- 13-20.1
| +-- 13-21.1
| +-- 13-22.1
| +-- 13-23.1
| +-- 13-24.1
| +-- 13-25.1
| +-- 13-26.1
| +-- 13-27.1
| +-- 13-28.1
| +-- 13-29.1
| +-- 13-3.1
| +-- 13-30.1
| +-- 13-31.1
| +-- 13-32.1
| +-- 13-33.1
| +-- 13-34.1
| +-- 13-4.1
| +-- 13-5.1
| +-- 13-6.1
| +-- 13-7.1
| +-- 13-8.1
| +-- 13-9.1
+-- CHIME3
| +-- data
| +-- audio
| +-- 16kHz
| +-- backgrounds
| +-- BGD_150203_010_CAF.CH1.wav
| +-- BGD_150203_010_CAF.CH2.wav
| +-- BGD_150203_010_CAF.CH3.wav
| +-- ...
2020-2021 (c) Robin Scheibler, Masahito Togami, Masaya Wake, LINE Corporation
Code released under MIT License.