A flexible framework of neural networks for deep learning
MIT License
Published by hvy over 6 years ago
This is the release note of v4.2.0. See here for the complete list of solved issues and merged PRs.
return_indices
option has been added to F.max_pooling_2d
and F.max_pooling_nd
, so that you can access indices (or indexes
) without using MaxPooling2D
and MaxPoolingND
classes directly.F.prod
(#4826) and F.contrastive
(#4861)MultiprocessParallelUpdater
is initialized (#4750)gradient_check
error (#4817)F.bilinear
(#4834)repeat
attribute of some iterators in Evaluator
is True
(#4865)MultiprocessParallelUpdater
initialization (#4867)__call__
nor forward
is defined in a link (#4888)F.fixed_batch_renormalization
(#4940)return_indices
option to F.max_pooling_2d
(#4952)return_indices
option to F.max_pooling_nd
(#4953)mask
and return_mask
option to F.dropout
(#4954)eps
and return_eps
option to F.gaussian
(#4955)ParallelUpdater
(#4842)F.rollaxis
backward when axis=0
and start=0
(#4843)devices
argument given to MultiprocessParallelUpdater
is neither dict
nor list
(#4847)eps
of the BatchNormalization
layer (#4913)SigmoidCrossEntropy.backward
(#4924)ValueError
in Caffe exporter on Windows with Python 2.7 (#4929)GetItem.check_type_forward
(#4944)extract
method of vision models (ResNet, VGG16, GoogLeNet) (#4941)caffe.py
(#4815, thanks @poutyface!)TransformDataset
(#4819)call_for_each_param
attribute in optimizer hooks (#4857)batch_{re}normalization
about running averages (#4870, thanks @grafi-tt!)gradient_check
docstring to enable doctest (#4926)imp.load_source
in setup.py
(#4859, thanks @vilyaair!)MultiprocessParallelUpdater
tests (#4801)DeprecationWarning
(#4820)F.min
and F.max
tests (#4841)TestCUDAProfileHook
(#4866)multi_gpu
tests as gpu tests (#4881)freeze_running_statistics
in TestFixedBatchRenormalization
(#4908)no_grads
argument of check_backward
from test (#4938)F.average
by changing lower-bound of the sum of weights (#4960)Published by niboshi over 6 years ago
This is the release note of v5.0.0b2. See here for the complete list of solved issues and merged PRs.
chainer.config.dtype
has been introduced. This configuration can be used to switch your model to run with float16
/ float32
/ float64
without modifying your code. In this version of Chainer, this configuration is supported by initializers, built-in datasets and part of built-in Link
s. We're going to improve all built-in Link
s to support this feature towards the final v5 release (#4582).chainer.distributions
has been introduced (see API Reference). We're going to provide more probability distribution implementations towards the final v5 release (#4678).Variable
operators and F.matmul
) now support NumPy-style broadcasting. We're going to improve more built-in Function
s to support broadcast towards the final v5 release (#4679).L.ConvolutionND
and L.DeconvolutionND
now support grouped and dilated convolution.Deconvolution
and LeakyReLU
functions) and import (Deconvolution
and Reshape
layer).return_indices
option has been added to F.max_pooling_2d
and F.max_pooling_nd
, so that you can access indices (or indexes
) without using MaxPooling2D
and MaxPoolingND
class directly.chainer.datasets.TextDataset
has been introduced to reduce host memory when loading large text files. See seq2seq example for an example usage.new_epoch
automatically (#4608)
Optimizer.new_epoch
while using a trainer, or who implement their own updater class). See the Upgrade Guide for details.no_grads
option of check_backward
(#4654)L.DeformableConvolution2D
(#2468)F.average_pooling_nd
(#3486)F.contrastive
(#3830)chainer.config.dtype
and use it in initializers and dataset loaders. (#4510)InverseShift
extension (#4565, thanks @jinjiren!)new_epoch
automatically (#4608)TextDataset
to load line-oriented text files (#4782)CorrectedMomentumSGD
optimizer (#4835)Reshape
layer (#4875)F.diagonal
(#4901)Deconvolution
layer; Fix num_output
parameter in caffe export of Deconvolution
layer (#4936, thanks @tsurumeso!)LeakyReLU
layer (#4937, thanks @tsurumeso!)repeat
attribute of some iterators in Evaluator
is True
(#3436)input_num
to remove array creation overhead (#4162)F.rsqrt
(#4614)MultiprocessParallelUpdater
initialization (#4751)F.bilinear
(#4752)__call__
nor forward
is defined in a link (#4770)gradient_check
error (#4797)F.normalize
(#4799)F.fixed_batch_renormalization
(#4869)return_indices
option to F.max_pooling_2d
(#4890)return_indices
option to F.max_pooling_nd
(#4891)mask
and return_mask
option to F.dropout
(#4907)eps
and return_eps
option to F.gaussian
(#4909)F.broadcast
using type_check.expect_broadcast_shapes
(#4947)ResNeXt50
example (#4479, thanks @akitotakeki!)no_grads
option of check_backward
(#4654)devices
argument given to MultiprocessParallelUpdater
is neither dict
nor list
(#4716)LayerNormalization
(#4744, thanks @anaruse!)ParallelUpdater
(#4774)ResNetLayers
to not pass-through downsample_fb
argument to L.Convolution2D
(#4829)F.rollaxis
backward when axis=0
and start=0
(#4836)GetItem.check_type_forward
(#4845)eps
of BatchNormalization
layer (#4884)SigmoidCrossEntropy.backward
(#4915)ValueError
in caffe exporter on Windows with Python 2.7 (#4928)ChainList
(#4945)F.matmul
(#4949)extract
method of vision models (ResNet, VGG16, GoogLeNet) (#4675)caffe.py
(#4808, thanks @poutyface!)TransformDataset
(#4814)batch_{re}normalization
about running averages (#4818, thanks @grafi-tt!)call_for_each_param
attribute in optimizer hooks (#4849)gradient_check
docstring to enable doctest (#4889)F.moveaxis
and discourage F.rollaxis
(#4900)0xa0
(nbsp) with 0x20
(space) in documentation (#4921)ZippedImageDataset
and MultiZippedImageDataset
to documentation (#4959)imp.load_source
in setup.py
(#4846, thanks @vilyaair!)F.average
by changing lower-bound of weight.sum (#4771)DeprecationWarning
(#4810)F.layer_normalization
with large eps (#4816)F.min
and F.max
tests (#4838)flake8
from AppVeyor test (#4852)TestCUDAProfileHook
(#4863)freeze_running_statistics
in TestFixedBatchRenormalization
(#4868)multi_gpu
tests as gpu
tests (#4872)test_count_params
(#4876)no_grads
argument of check_backward
from test (#4914)pytest-timeout
version to <1.3.0 (#4918)TestResNetLayers
(#4965)Published by hvy over 6 years ago
This is the release notes of v5.0.0b1. See here for the complete list of solved issues and merged PRs.
order_sampler
option to Iterators (#3429)prod
function (#3764)n_batch_axes
option to F.linear
(#4204)ignore_names
option to load_npz
(#4682)PolynomialShift
in the extensions of training (#4693, thanks @tianshilei1992!)rsqrt
in F.batch_normalization
(#4612)MultiprocessParallelUpdater
is initialized (#4717)F.bilinear
(#4738)Chain.repeat
raise error (#4649, thanks @mori97!)ChainList.copy
not supporting mode
argument (#4652)F.normalize
(#4763)lazy_grad_sum
debug mode (#4768)F.batch_normalization
axis document (#4666)L.BatchNormalization
(#4671)train_loop.rst
(#4700, thanks @arisliang!)word2vec.rst
(#4705, thanks @arisliang!)transpose_sequence
(#4719)seq2seq.rst
(#4721, thanks @arisliang!)abs
to document of Variable
(#4757)TestBatchNormalization
and TestBatchNormalizationAxis
(#4558)L.BatchNormalization
: miscellaneous fixes (#4671)MultiprocessParallelUpdater
tests (#4726)hacking
version (#4731)Published by niboshi over 6 years ago
This is the release note of v4.1.0. See here for the complete list of solved issues and merged PRs.
MultiprocessIterator
(#4637)F.rsqrt
performance in CPU (#4634)LogReport
(#4635)rsqrt
in F.batch_normalization
(#4665)F.bilinear
(#4762)Chain.repeat
raise error (#4653, thanks @mori97!)ChainList.copy
not supporting mode
argument (#4664)lazy_grad_sum
debug mode (#4776)F.normalize
(#4781)Reporter
and report
(#4632)word2vec.rst
(#4713, thanks @arisliang!)seq2seq.rst
(#4728, thanks @arisliang!)train_loop.rst
(#4759, thanks @arisliang!)abs
to document of Variable
(#4764)Published by kmaehashi over 6 years ago
This is a major release of Chainer v4.0.0. All the updates from the previous major version (v3.5.0) are found in the release notes below:
See the blog post for the details. Also see the Upgrade Guide for users migrating from Chainer v3 to v4.
Updates from the release candidate are as follows.
forget
function (#4522)Variable.xp
(#4536)Sequential
class for easy model definition of a single-stream computational graph (#4601)FailOnNonNumber
) (#4602)depth2space
& space2depth
funcs by reducing operations (#4590, thanks @ruimashita!)chainer.utils.argument
exception message (#4615)eps
from batch normalization statistics (#4517)FunctionHooks
in chainer.grad
(#4541){de,}convolution_2d
to keyword-only argument. (#4573)to_intel64
not updating VariableNode.data (#4597)to_intel64
to check if it is suitable for iDeep (#4598)Sequential
class (#4628){convolution,deconvolution}_2d
(#4575)EarlyStoppingTrigger
(#4584, thanks @mori97!)chainer.config.lazy_grad_sum
(#4585)Published by niboshi over 6 years ago
This is the release of v5.0.0a1. See here for the complete list of solved issues and merged PRs.
Sequential
class for easy model definition of a single-stream computational graph (#2918)count_params()
method to Link
which enables to count the number of trainable values in a Link
easily (#3101)F.forget
(#3792)MultiprocessIterator
(#4155)axis
option to batch normalization (#4266, thanks @anaruse!)add_extra
option to SVHN (#4478, thanks @akitotakeki!)Variable.xp
(#4497)Trainer
extension to kill the training when NaN
or Inf
is detected in the model parameters (FailOnNonNumber
) (#4545)chainer.print_runtime_info()
method to summarize the versions of libraries used in Chainer and CuPy (#4559)from __future__ print_function
(#4470)F.depth2space
and F.space2depth
by reducing operations (#4482, thanks @ruimashita!)LogReport
(#4528)F.rsqrt
performance in CPU (#4538)chainer.utils.argument
exception message (#4551)FunctionHooks
in chainer.grad
(#4499)eps
from batch normalization statistics (#4505)group
argument of F.convolution_2d
and F.deconvolution_2d
to keyword-only argument. (#4564)to_intel64
to check if it is suitable for iDeep (#4577)to_intel64
not updating VariableNode.data
(#4592)chainer.Reporter
(#3688)F.connectionist_temporal_classification
(CTC) (#4309)optimizer_hooks
namespace (#4468)Variable
(#4516)convolution_2d
and deconvolution_2d
(#4539)chainer.config.lazy_grad_sum
(#4543)EarlyStoppingTrigger
(#4578, thanks @mori97!)MultiprocessIterator
+ OpenCV problem (#4589)get_device_from_array
(#4604)Sequential
(#4605)--no-cache-dir
in Dockerfile (#4532)seq2seq
example ignoring limitation for target length (#4611)TestForwardConsistency
in F.softmax_cross_entropy
(#4554)TestSoftmaxCrossEntropyInvalidReduce
in F.test_softmax_cross_entropy
(#4555)Published by beam2d over 6 years ago
This is the release candidate of v4. See here for the complete list of solved issues and merged PRs.
repeat
(#3735)fft
, ifft
(#4241)local_convolution_2d
: 2D convolution with spatially unshared weights (#4073, thanks @mihirparadkar!)swish
: a new activation function proposed here (#4262, thanks @mizuno-gsinet!)convolution_{1,3}d
, deconvolution_{1,3}d
, average_pooling_{1,3}d
, max_pooling_{1,3}d
, unpooling_{1,3}d
(#4025)
***_nd
variantssimplified_dropconnect
(#3807)zoneout
function (#3949)average_pooling_nd
, max_pooling_nd
, unpooling_nd
(#4132) testing.patch
which calls mock.patch
with wraps
argument (#3883)AMSGrad
(#4032, thanks @kashif!)ZippedImageDataset
and MultiZippedImageDataset
(#4127, thanks @d0i!)ParameterStatistics
; Add option to skip parameters with NaN values in ParameterStatistics
(#4345)n_cell
property to NStepRNN family (#4417, thanks @levelfour!)numpy.split
and cupy.split
(#4153, thanks @ken-nakanishi!)lstm
backward arguments (#4320)BatchMatMulGrad.backward
(#4349)MultiprocessParallelUpdater
(#4368)chainer/link.py
(#4382, thanks @ken-nakanishi!)link.to_intel64
for persistent values (#4384)Variable
tests (#4408)Swish
function class alias (#4409)Link._device_id
to None
in to_intel64
(#4436)to_intel64
is not called (#4457)cupy-cuda91
wheel dependency (#4392)chainer.optimizer_hooks
namespace and move hooks there. (#3977)rsqrt
function with CuPy (#4108)chainer.backends.cuda
(#4259)clipped_relu
(#4307, thanks @tkerola!)F.split_axis
(#4328)chainer.grad
(#4352)cuda.to_gpu
and cuda.copy
(#4380, thanks @tkerola!)group
argument name of Convolution2D
and Deconvolution2D
(#4404)chainer.backends.cuda
in recent added codes (#4414)split_at
(#4434, thanks @corochann!)future.types.newint
(#4435)CUDNN_BN_MIN_EPSILON
(#4466)sep
argument of PrintHook
chainer.backends.cuda
(#3981)Makefile
(#4037)concat_examples
(#4164)chainer.backends.cuda
(#4259)F.shift
to docs (#4379)unpooling_nd
(#4398), in the document of configuration (#4474), in a comment of UpdateRule
(#4494)NStepLSTM/GRU/RNN
(#4425)Evaluator.device
docs (#4437)sep
argument of PrintHook
(#4471)chainer.backends.cuda
(#4259)testing.patch
which calls mock.patch
with wraps
argument (#3883)None
(#4049)MulprocessParallelUpdater
(#4368)MultiprocessParallelUpdater
test for timeout" (#4376)UserWarning
(#4483)chainer.dataset.concat_examples
(#4485)TestSplitAxis
tests on NumPy 1.10 (#4500)Published by kmaehashi over 6 years ago
This is the release note of v3.5.0. See here for the complete list of solved issues and merged PRs.
simplified_dropconnect function
(#4403), zoneout
(#4423), average_pooling_nd
, max_pooling_nd
, unpooling_nd
(#4405)cuda.to_gpu
and cuda.copy
(#4386)batchsize=1
and train=True
(#4433)split_at
(#4441)chainer.backends.cuda
(#4463)ELU.backward
(#4351)MultiprocessParallelUpdater
(#4367)BatchMatMulGrad.backward
(#4393)pip
in installation guide (#4396)docs
requirements (#4396)F.unpooling_nd
documentation (#4411)F.get_item
documentation (#4415, thanks @naoto0804)Evaluator.device
documentation (#4442)Published by hvy over 6 years ago
This is the release of v4.0.0b4. See here for the complete list of solved issues and merged PRs.
pip install ideep4py
, set environment variable with export CHAINER_USE_IDEEP="auto"
, add model.to_intel64()
to your code (where model
is a Chain
object; only needed if you are using supported Optimizers) and run the code in CPU mode. Currently the following functions and optimizers are supported:
F.relu
, F.linear
, F.local_response_normalization
, F.batch_normalization
, F.split_axis
, F.average_pooling_2d
, F.lstm
, F.tree_lstm
, F.convolution_2d
, F.deconvolution_2d
, F.max_pooling_2d
, F.dropout
, F.concat
optimizers.SGD
, optimizers.MomentumSGD
Please see the Upgrade Guide for details.
erf
(#3846)floordiv
and rfloordiv
(#3967)tensordot
(#4253, thanks @anaruse!)ConvolutionND/DeconvolutionND
(#4110)chainer.functions
(#4192)stack
(#4203)matmul
(#4243, thanks @anaruse!)variable.array
instead of variable.data
in reporter.py
(#4260, thanks @crcrpar!)softmax
(#4261, thanks @anaruse!)opt = optimizers.SGD().setup()
syntax (#4290)CupyMemoryProfileHook
(#4300)TimerHook
(#4323)cupyx
namespace (#4363)MultiprocessParallelUpdater
(#3402, thanks @tkerola!)backward_accumulate
to accept only tuples (#4186)NormalizeL2
(#4190)BatchRenormalizationFunction
and add tests (#4191)ELU.backward
(#4347)split_axis
(#4348)multiprocess_parallel_updater
test for timeout (#4370)Link.copy
(#4066)n_step_bigru
(#4157, thanks @Yuichiroh!)chainer.functions
(#4192)backporp
-> backprop
(#4291)optimizer.setup
(#4306)--out
) of seq2seq example (#4238, thanks @okayu9!)test_batch_normalization
(#4224)test_dropout
(#4225)test_concat
(#4226)test_split_axis
(#4227)test_lstm
(#4228)test_local_response_normalization
(#4229)test_max_pooling_2d
(#4233)test_average_pooling_2d
(#4234)multiprocess_parallel_updater
test for timeout (#4370)Published by mitmul over 6 years ago
This is the release note of v3.4.0. See here for the complete list of solved issues and merged PRs.
Adam.lr
is evaluated before updating starts (#4207)stack
(#4340)t
of UpdateRule
(#4184, #4214)gradient_check
(#4202)NormalizeL2
(#4268)BatchRenormalizationFunction
and add tests (#4293)backward_accumulate
to accept only tuples (#4334)--out
) of seq2seq example (#4297, thanks @okayu9!)gru
(#3928)Variable.backward()
(#4196)chainer.functions
(#4269)Link.copy
(#4295)init_scope()
in the mode definition (#4231)Published by kmaehashi over 6 years ago
This is the release of v4.0.0b3. See here for the complete list of solved issues and merged PRs.
Adam
optimizer has been updated to support AdamW. See #4050 for details.Variable.backward
. You can also use it via Updaters.Optimizer.setup
returns self
to enable method chaining (#4141)shift
function (#4041)Optimizer.setup
returns self to enable method chaining (#4141)matmul
(#3768), huber_loss
(#3867)StandardUpdater
and ParallelUpdater
under chainer.training.updaters namespace (#3037)RuntimeError
when Adam.lr
is evaluated before updating starts (#3931)get_training_length
to IntervalTrigger
(#4079, thanks @himkt!)F.linear
(#4093, thanks @jzhoulon!)F.identity
(#4154)check_backward
(#4156)GoogLeNet
to define the network using init_scope (#4171)log1p
in CTC (F.connectionist_temporal_classification
) for stable computation (#4194)F.separate
(#4195)F.connectionist_temporal_classification
) (#4201)VariableNode.data
if new data is assigned (#3869)serialize
to Summary
(#4005, thanks @Hakuyume!)gradient_check
(#4015)t
of UpdateRule
(#4026)GradientMethod
not to raise AttributeError
caused by new optimizer setup (#4077)np.stack
in grouped convolution/deconvolution in CPU mode (#4085)np.stack
in examples (#4087)out1
in inc4c
and inc4d
(#4121, thanks @takaaki82!)Variable.backward()
(#3496)huber_loss
(#3950)gradient_check
(#4015)test_init_docstring
to use importlib to find package (#4091)Published by bkvogel over 6 years ago
This is the release of v3.3.0. See here for the complete list of solved issues and merged PRs.
chainer.cuda
to chainer.backends.cuda
. However, note that chainer.cuda
is still available as well.hasattr
(#4060)Published by niboshi almost 7 years ago
This is the release of v4.0.0b2. See here for the complete list of solved issues and merged PRs.
In this release, you can set up an optimizer with a simpler syntax.
In previous versions, the code would be written as
optimizer = chainer.optimizer.SGD()
optimizer.setup(model)
We now also allow it to be written more concisely as
optimizer = chainer.optimizers.SGD(link=model)
The link
argument should be specified as a keyword argument. Otherwise, some optimizers could wrongly interpret it as a hyperparameter (e.g. lr
). We will enforce a keyword argument from the next release.
We introduced a check for mixed use of CuPy arrays and NumPy arrays in outputs returned from functions. Even though we have previously forbidden this, such functions may have worked without any errors. With the introduction of this check, however, those functions can begin raising errors.
extensions
as a trainer argument (#3528, thanks @neka-nat!)sign
function (#3678)to_cpu
and to_gpu
to accept list, tuple and None
(#3850)should_use_cudnn
and should_use_cudnn_tensor_core
to chainer.cuda
(#3851)hasattr
(#3952)chainer.backends
subpackage (#3974)--noplot
option in MNIST example (#3925)FunctionNode
(#3626)Reporter
(#3795)GRU
(#3858)n_step_gru
, n_step_bigru
, n_step_bilstm
, n_step_rnn
and n_step_birnn
(#3859)expm1
to the documentation. (#3900)GlorotUniform
documentation (#3953, thanks @F-Tag!)StatefulZoneoutLSTM
to documentation (#3957)ConcatWithAsyncTransfer
to the reference manual (#3975, #3979)chainer.function.pad
documentation (#4028)gradient_check.numerical_grad
(#3551, #4003)F.classification_summary
(#3927)test_pad_sequence
in debug mode (#3946)testing.parameterized
(#3954)to_gpu
in RNN tests (#4046)convolution_nd
(#3910), im2col
(#3933), triplet
(#3939), linear_interpolate
(#3944)Published by kmaehashi almost 7 years ago
This is the release of v3.2.0. See here for the complete list of solved issues and merged PRs.
get_variable_or_none
to improve backward performance (#3843)sign
function (#3911)vstack
(#3824), expm1
(#3901), batch_l2_norm_squared
(#3904), maximum
(#3905), mean_absolute_error
(#3906), squared_error
(#3907), im2col
(#3920), linear_interpolate
(#3923), normalize
(#3961), cross_covariance
(#3962), absolute_error
(#3995), sigmoid_cross_entropy
(#4007), copy
(#4019)basic_math
(#3934)should_use_cudnn
and should_use_cudnn_tensor_core
to chainer.cuda
(#3989)to_cpu
and to_gpu
(#4000)testing.parameterized
(#4020)cupy.ndarray
and numpy.ndarray
in n_step_xxx
links (#4034)sigmoid_cross_entropy
doc about t
(#3884)expm1
to the document. (#3912)GlorotUniform
doc (#3956, thanks @F-Tag!)n_step_xxx
(#3958)StatefulZoneoutLSTM
to docs (#3960)chainer.function.pad
document (#4033)--noplot
option in MNIST example (#3930)convolution_nd
testing tolerance. (#3916)im2col
tests (#3935)F.linear_interpolate
double backward tests (#3947)NaN
in test of F.classification_summary
(#3970)test_pad_sequence
in debug mode (#3997)testing.parameterized
(#4020)to_gpu
in RNN tests (#4053)Published by gwtnb almost 7 years ago
This is a minor release. See the list for the complete list of solved issues and merged PRs.
chainer.global_config.autotune = True
for optimizing your ConvNets.Published by kmaehashi almost 7 years ago
This is the release of v4.0.0b1. See here for the complete list of solved issues and merged PRs.
chainer.global_config.autotune = True
for optimizing your ConvNets.t
(#3840)t
option of F.sigmoid_cross_entropy (#3840)--log-interval
and --validation-interval
options to seq2seq example (#3430)Published by mitmul about 7 years ago
This is a major release of Chainer v3.0.0. All the updates from the previous major version (v2.0.0) are found in the release notes below:
The biggest change is the introduction of new-style differentiable functions and resulting support for double backward (gradient of gradient) in many functions. The details are linked below:
As for the backward compatibility, most users of v2.x are not affected by the introduction of new-style function FunctionNode
because the conventional Function
is still supported in v3 (and in the future versions). Even if you are using custom functions written with Function
, you can continue running the same code with Chainer v3.0.0. You need to rewrite such custom functions only when you want to use new features added to the new-style function, e.g. double backprop.
The backward compatibility of the overall APIs is slightly broken, though most users are not affected. See the above release notes for the details of broken compatibility.
grad
functionYou can calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the chainer.grad
function with enable_double_backprop=True
option.
# Both x and y are chainer.Variable objects
y = x * x * x / 3 # Construct a computational graph
gx, = chainer.grad([y], [x], enable_double_backprop=True)
ggx, = chainer.grad([gx], [x], enable_double_backprop=True)
Here, the above calculation of ggx
is equal to:
gx.backward()
x.grad_var # => This is equal to the above ggx
Of course, one more differentiation gives us 2:
gggx, = chainer.grad([ggx], [x], enable_double_backprop=True)
print(gggx) #=> variable([ 2.])
WGAN-GP (which stands for Wasserstein GAN with Gradient Penalty[1]) is one example of a GAN that uses gradients of gradients when calculating the loss. It penalizes the gradient norm for enforcing the Lipschitz constraint. The gradient norm is computed at a random interpolation x_hat
between a generated point x_tilde
and a real example x
. Then, the loss including the penalty term will be further differentiated w.r.t. trainable parameters in the model, so that it actually performs double backward for the discriminator. The code below shows how to implement it using the backward()
method with enable_double_backprop=True
option:
# G (generator) and D (discriminator) should be implemented somewhere else
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
D(x_hat).backward(enable_double_backprop=True)
gradient_penalty = lambda * (x_hat.grad_var – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
You can also implement it using grad()
, which may be faster because it omits the computation of gradients w.r.t. parameters.
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True)
gradient_penalty = lambda * (gx_hat – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
[1]: I. Gulrajani, et. al. “Improved Training of Wasserstein GANs,” https://arxiv.org/abs/1704.00028
Here are some simple comparisons of grad of grad in Chainer and other frameworks:
https://gist.github.com/delta2323/9bbca950ee32c523c7aec2e02ad7f85a
F.flip
function (#3532)F.swapaxis
(#3480), F.permutate
(#3481), F.transpose_sequence
(#3525)KeyError
when using evaluator without target 'main' (#3460)AttributeError
for missing inv_std
in F.fixed_batch_normalization
backward (#3479, thanks @zaburo-ch!)invoke_before_training
argument from Trainer.extend
(#3516)MultiprocessIterator
for non tuple/dict datasets (#3413, thanks @yuyu2172!)chainer.grad
(#3514)to_gpu
(#3519)ParameterStatistics
extension (#3323)contextlib.contextmanager
(#3567)F.swapaxes
, F.squeeze
, F.transpose
(#3415, thanks @naoto0804!), F.separate
, F.select_item
, and F.permutate
(#3417, thanks @naoto0804!), Constant initializer (#3560), init_scope
(#3520), F.reshape
(#3515), ConvNet tutorial (#3509)to_gpu
(#3517)to_gpu
(#3519)__delattr__
in Link
and Chain
(#3416, thanks @naoto0804!)numerical_grad
accuracy (#3495)F.get_item
(#3469, thanks @yuyu2172!)assert_allclose
failure (#3518)upsampling_2d
(#3382)gradient_check
(#3461)dilated_convolution_2d
(#3462)basic_math
(#3463)Published by unnonouno about 7 years ago
This is the release of v4.0.0a1. See here for the complete list of solved issues and merged PRs.
F.scatter_add
(#3442, thanks @yuyu2172!)F.flip
(#3378, thanks @ronekko!)F.transpose_sequence
(#3418)F.copy
(#3419)F.swapaxis
(#3421)F.permutate
(#3424)F.softplus
(#3454)F.where
(#3491)F.clipped_relu
(#3503)LogReport
now serializes the trigger if it has serialize
method (#3396, thanks @Hakuyume!)KeyError
when using evaluator without target 'main' (#2815, thanks @Hiroshiba!, #3445)AttributeError
for missing inv_std
in F.fixed_batch_normalization
backward (#3468, thanks @zaburo-ch!)invoke_before_training
argument from Trainer.extend
(#3036)MultiprocessIterator
for non tuple/dict datasets (#3390, thanks @yuyu2172!)chainer.grad
(#3433)VariableNode.get_variable_or_none
to improve backward performance (#3448)F.batch_normalization
(and L.BatchNormalization
) now supports cuDNN when the inputs are float16 (#3386, thanks @anaruse!)init_scope
(#3121)FunctionNode
(#3441)F.swapaxes
, F.squeeze
, F.transpose
(#3307, thanks @naoto0804!)F.n_step_lstm
(#3349)F.separate
, F.select_item
, and F.permutate
(#3407, thanks @naoto0804!)TupleDataset
(#3432)F.sum
(#3497, thanks @akitotakeki!)F.reshape
(#3255)to_gpu
(#3328)to_gpu
(#3272)using_config
documentation (#3335)get_svhn.py
(#3267, thanks @naoto0804!)__delattr__
in Link
and Chain
(#3406, thanks @naoto0804!)assert_allclose
failure (#3277)F.get_item
(#3443, thanks @yuyu2172!)numerical_grad
accuracy (#3472)roll_axis
(#3375)roi_pooling_2d
(#3376)upsampling_2d
(#3377)basic_math
(#3459)upsampling_2d
(#3410)F.ceil
and F.floor
(#3427)gradient_check
(#3446)dilated_convolution_2d
(#3447)Published by mitmul about 7 years ago
This minor release contains features, bug fixes and improvements to the documentation and installation procedure. See here for the complete list of solved issues and merged PRs.
numerical_grad
(#3141)axis
argument in chainer.functions.average
accepts tuples (#3264)intensive_times
to testing.condition.repeat
to reduce test time. Also some tests are made deterministic (#3334)MultiprocessParallelUpdater
(#2954)matplotlib.pyplot
in PlotReport
(#3111)Variable.backward
only when they are float (#3220)ChainList.init_scope
(#3230)CaffeFunction
to take BatchNorm scaling factor into account (#3295, thanks @hvy!)F.softmax
supports non-contiguous inputs (#3087)ndarray.copy
instead of xp.copy
or feed order=’C’
explicitly to keep the existing behavior (#3078)get_device
from examples (#3140, thanks @naoto0804!)ROIPooling2D
(#3186)Upsampling2D
on CPU (#3318)gradient_check
(#3160), VAE and CTC (#3167, thanks @zchenry!), Chainer's configuration (#3169, thanks @kristofbc!), comment in Optimizer
(#3262)Updater
(#3086, thanks @fiarabbit!), add warnings about preprocessing for dataset with both grayscale and RGB images to the docstring of ImageDataset (#3095, thanks @jinjiren!), add the explanation of the value range of ratio in F.dropout
(#3112), docstrings for warnings (#3115), docstrings for doctest (#3162), example code in docstrings (#3165), Variable.__getitem__
(#3195), F.dropout
(#3196, thanks @fiarabbit!), chainer.Link
(#3226, thanks @chantera!), functions.linear
(#3228, thanks @bonprosoft!), warning messages for cuDNN (#3231), clipped_relu
(#3232), leaky_relu
(#3238), FunctionHook
and tutorial of Function
(#3250), fix truncation of a summary line (#3284), "Introduction to Chainer" (#3286), BPTT example in RNN tutorial (#3291, thanks @fiarabbit!), GRU documentation where stateless/stateful were reversed (#3345), transpose
(#3304, thanks @naoto0804!), where
(#3309, thanks @naoto0804!), initializers.NaN
(#3342)DeprecationWarning
in the tests of Variable
(#3164)to_gpu
(#3313)assert_warns
to ignore warnings (#3317)exponential
(#3358)chainer.training.__init__
(#3055)Published by bkvogel about 7 years ago
This is the release candidate (RC) of v3.0.0. See here for the complete list of solved issues and merged PRs.
CuPy has also been updated to v2.0.0 RC. Please see the release notes for CuPy.
use_cudnn
argument is removed from spatial_transformer_grid
and spatial_transformer_sampler
(#2955). You can use chainer.using_config(‘use_cudnn’, ‘auto’)
to enable cuDNN in these functions.Variable.__hash__
is removed (#2961). Note that Variable
does not support __eq__
, so it was already not hashable.cache_download
now raises OSError
instead of RuntimeError
on a file system error (#2839, thanks @Hakuyume!)transpose
(#3144), reshape
expand_dims
broadcast_to
sum
(#3188), concat
split_axis
(#3189), flatten
(#3190), cast
(#3145), rollaxis
(#3306), select_item
(#3308), __getitem__
(#3243)linear
(#3099), convolution_2d
deconvolution_2d
(#3163), embed_id
(#3183), lstm
(#3206),sigmoid
(#3119), relu
(#3175), leaky_relu
(#3177), softmax
(#3213), log_softmax
(#3217)max_pooling_2d
average_pooling_2d
upsampling_2d
unpooling_2d
spatial_pyramid_pooling_2d
(#3257)-
(#3142), binary -
(#3143), tanh
(#3200), exp
(#3254)mean_squared_error
(#3194), softmax_cross_entropy
(#3296)dropout
(#3356, thanks @bonprosoft!)layer_normalization
(#3219), batch_normalization
and fixed_batch_normalization
(#3275)MGU
(#1101)BatchRenormalization
, batch_renormalization
, and fixed_batch_renormalization
(#2302)batch_matmul
, which existed in v2, is reimplemented for backward compatibility (#3016)arctan2
(#3130)prod
(#3031, thanks @ronekko!)chainer.as_variable()
is added (#3218). It can be used to enforce the type of a value to be Variable
.Variable.array
property is added (#3223). It is equivalent to Variable.data
, but .array
is safer; when you mixed up Variable
with ndarray
, .array
immediately raises an error while .data
does not.chainer.FunctionHook
, which is an alias to chainer.function_hook.FunctionHook
, is added (#3152, #3153)grad
function (#3015). This function takes input and output variables and compute the gradient of outputs w.r.t. the inputs.check_double_backward
utility (#3096, #3268). It can be used to numerically check if the double backprop is consistent with the first-order gradient.axis
argument of average
now supports tuple values (#3118)numerical_grad
is improved (#2966). It now performs numerical check of a randomly chosen directional derivative instead of the full gradient check. This change reduces the number of forward computations run for numerical gradient to a constant of the input dimensionality.Variable.backward()
(#3298). To enable double backprop, you have to explicitly pass enable_double_backprop=True
. Note that when you do not need double backprop, it is better to turn off this option, then backward()
skips constructing the computational graph of backpropagation so that the performance overhead (esp. the memory consumption) is saved.rgb_format
option to get_mnist
(#3263)matplotlib.pyplot
in PlotReport
(#2740)_make_npz
for ResNetLayers
(#3062, thanks @Hakuyume!)softmax
(#3072) and log_softmax
(#3310, thanks @knorth55!)ChainList.init_scope()
(#3129)VariableNode.data
from Parameter.initialize
(#3204, thanks @bonprosoft!)DictDataset
to work in Python 3 (#3237, thanks @bonprosoft!)dropout
(#3239, thanks @naoto0804!)CaffeFunction
to take BatchNorm scaling factor into account (#3261, thanks @hvy!)params
option of check_double_backward
(#3268, see the previous section)to_gpu
(#3269)fix_random
(#3330)MultiprocessIterator
performance, functionality and stability, using Pool
(#3076, thanks @grafi-tt!)roi_pooling_2d
(#3185, thanks @knorth55!)F.cast
skip FunctionNode
application if no cast is needed (#3191)Variable.backward
for manually edited requires_grad
(#3192)stream
option is specified in to_gpu
(#3282)upsampling_2d
on CPU (#3316)enumerate
(#3326)log2
and log10
(#3352)transpose
backward (#3154)IntervalTrigger
are slightly reorganized (#2990, thanks @Hakuyume!).get_device
from examples (#3122, thanks @naoto0804!)FunctionNode
FunctionAdapter
(#3117), initializers.NaN
(#3293)gradient_check
(#3158), Configuration documentation (#3166, thanks @kristofbc!), Variable.grad
(#3265), Hyperparameter
(#3248), BatchNormalization
(#3137)clipped_relu
(#3178), leaky_relu
(#3179), Variable.__getitem__
(#3180), linear
(#3224, thanks @bonprosoft!), link document (#3240), GRU
stateless/stateful (#3340), Updater
(#3084, thanks @fiarabbit!), backslash escaping (#3174), summary markup with periods (#3235), fix for warnings (#3068)dropout
(#3184, thanks @fiarabbit!) (#3116, thanks @naoto0804), where
(#3301, thanks @naoto0804!), transpose
(#3302, thanks @naoto0804!), GRU
(#3089), doctest code in training loop tutorial (#3249), ‘hinge’ (#3108), ‘softmax_cross_entropy(#3105),
LSTM(#3104),
Linear(#3103),
binary_accuracy(#3102),
embed_id` (#3091)DeprecationWarning
in Variable
(#2932)assert_allclose
failure (#2936)Link
(#3155)stream
option in to_gpu
(#3278)assert_warns
to ignore warnings (#3280)relu
(#3299), tanh
(#3305), exponentials (#3354), unpooling_2d
(#3341), local_response_normalization
(#3355)to_gpu
(#3322)get_device
(#3363).git
directory (#3077)