Implementation of "Perceptual Losses for Real-Time Style Transfer and Super-Resolution" in Keras
MIT License
Implementation of "Perceptual Losses for Real-Time Style Transfer and Super-Resolution" in Keras 1.1.0.
Keras implementation of chainer-fast-neuralstyle by Yusuketomoto. There are minor differences that are discussed later.
Tubingen - Starry Night by Vincent Van Gogh
Blue Moon Lake - Starry Night by Vincent Van Gogh
The models should be trained on the MS COCO dataset (80k training images). A validation image must be provided to test the intermediate stages of the training (since there is no direct validation available, and loss value is not very indicative of performance of the network)
Note that with the default model, each iteration takes 0.2 seconds on a 980M GPU.
python train.py "path/to/style/image" "path/to/dataset/" "path/to/validation/image"
There are many parameters that can be changed to facilitate different training behavior. Note that with with the wide and deep model, each iteration requires roughly 0.65 seconds on a 980M GPU.
python train.py "path/to/style/image" "path/to/dataset/" "path/to/validation/image" --content_weight 1e3
--image_size 512 --model_depth "deep" --model_width "wide" --val_checkpoint 500 --epochs 2
A few details to be noted when training:
Due to limitations of having to provide output shape for Deconvolution2D layers, it is not possible to transform multiple images using a single network (unless each image has the same size).
Another limitation is that height and width must be divisible by 4 for the "shallow" model, and divisible by 8 for the "deep" model. This is because the Deconvolution2D layers need very precise output shape, else they will cause an exception.
python transform.py "style name" "path/to/content/image"
There is a total variation weight parameter
python transform.py "style name" "path/to/content/image" --tv_weight 1e-5
--content_weight: Weight for Content loss. Default = 100
--style_weight: Weight for Style loss. Default = 1
--tv_weight: Weight for Total Variation Regularization. Default = 8.5E-5
--image_size: Image size of training images. Default is 256. Change to 512 for "deep" models
--epochs: Number of epochs to run over training images. Default = 1
--nb_imgs: Number of training images. Default is 80k for the MSCOCO dataset.
--model_depth: Can be one of "shallow" or "deep". Adds more convolution and deconvolution layer for "deep" network. Default = "shallow"
--model_width: Can be one of "thin" or "wide". Changes number of intermediate number of filters. Default = "thin"
--pool_type: Can be one of "max" or "ave". Pooling type to be used. Default = "max"
--kernel_size: Kernel size for convolution and deconvolution layers. Do not change. For testing purposes only.
---val_checkpoint: Iteration count where validation image will be tested. Default is -1, which will produce 200 validation images.
--tv_weight: Weight for Total Variation Regularization. Default = 8.5E-5
Using the default parameters, this takes 0.2 seconds per iteration on a 980M GPU Using the "wide" network, this takes 0.3 seconds per iteration on a 980M GPU Using the "deep" + "wide" network, this takes 0.33 seconds per iteration on a 980M GPU
Using the deep + wide network with image size of 512, this takes 0.65 seconds per iteration on a 980M GPU