Keras WaveNet implementation
Based on https://deepmind.com/blog/wavenet-generative-model-raw-audio/ and https://arxiv.org/pdf/1609.03499.pdf.
~~Generate your own samples:
$ KERAS_BACKEND=theano python2 wavenet.py predict with models/run_20160920_120916/config.json predict_seconds=1
~~
EDIT: The pretrained model had to be removed from the repository as it wasn't compatible with recent changes.
Activate a new python2 virtualenv (recommended):
pip install virtualenv
mkdir ~/virtualenvs && cd ~/virtualenvs
virtualenv wavenet
source wavenet/bin/activate
Clone and install requirements.
cd ~
git clone https://github.com/basveeling/wavenet.git
cd wavenet
pip install -r requirements.txt
Using the tensorflow backend is not recommended at this time, see this issue
Sacred is used for managing training and sampling. Take a look at the documentation for more information.
This implementation does not support python3 as of now.
Once the first model checkpoint is created, you can start sampling.
Run:
$ KERAS_BACKEND=theano python2 wavenet.py predict with models/<your_run_folder>/config.json predict_seconds=1
The latest model checkpoint will be retrieved and used to sample. The sample will be streamed to [run_folder]/samples
, you can start listening when the first sample is generated.
predict_seconds
: float. Number of seconds to sample.sample_argmax
: True
or False
. Always take the argmaxsample_temperature
: None
or float. Controls the sampling temperature. 1.0 for the original distribution, < 1.0 for less exploitation, > 1.0 for more exploration.seed
: int: Controls the seed for the sampling procedure.predict_initial_input
: string: Path to a wav file, for which the first fragment_length
samples are used as initial input.e.g.:
$ KERAS_BACKEND=theano python2 wavenet.py predict with models/[run_folder]/config.json predict_seconds=1
$ KERAS_BACKEND=theano python2 wavenet.py
Or for a smaller network (less channels per layer).
$ KERAS_BACKEND=theano python2 wavenet.py with small
In order to use the VCTK dataset, first download the dataset by running vctk/download_vctk.sh
.
Training is done with:
$ KERAS_BACKEND=theano python2 wavenet.py with vctkdata
For smaller network:
$ KERAS_BACKEND=theano python2 wavenet.py with vctkdata small
Train with different configurations:
$ KERAS_BACKEND=theano python2 wavenet.py with 'option=value' 'option2=value'
Available options:
batch_size = 16
data_dir = 'data'
data_dir_structure = 'flat'
debug = False
desired_sample_rate = 4410
dilation_depth = 9
early_stopping_patience = 20
fragment_length = 1152
fragment_stride = 128
keras_verbose = 1
learn_all_outputs = True
nb_epoch = 1000
nb_filters = 256
nb_output_bins = 256
nb_stacks = 1
predict_initial_input = ''
predict_seconds = 1
predict_use_softmax_as_input = False
random_train_batches = False
randomize_batch_order = True
run_dir = None
sample_argmax = False
sample_temperature = 1
seed = 173213366
test_factor = 0.1
train_only_in_receptive_field = True
use_bias = False
use_skip_connections = True
use_ulaw = True
optimizer:
decay = 0.0
epsilon = None
lr = 0.001
momentum = 0.9
nesterov = True
optimizer = 'sgd'
$ python2 wavenet.py with 'data_dir=your_data_dir_name'
$ python2 wavenet.py test_preprocess with 'data_dir=your_data_dir_name'
predict_initial_input
.The Wavenet model is quite expensive to train and sample from. We can however trade computation cost with accuracy and fidility by lowering the sampling rate, amount of stacks and the amount of channels per layer.
For a downsized model (4000hz vs 16000 sampling rate, 16 filters v/s 256, 2 stacks vs ??):
This is a re-implementation of the model described in the WaveNet paper by Google Deepmind. This repository is not associated with Google Deepmind.