oneDNN

oneAPI Deep Neural Network Library (oneDNN)

APACHE-2.0 License

Stars
3.4K
Committers
299

Bot releases are hidden (Show)

oneDNN - v0.18

Published by vpirogov over 5 years ago

Performance optimizations

  • Improved RNN functionality performance.
  • Improved performance of GEMM-based convolutions
  • Improved performance of backpropagation for stided convolutions on processors with Intel® AVX2 support.
  • Improved performance of the gemm_s8u8s32 and gemm_s8s8s32 functions on processors with Intel® AVX512 and Intel® AVX512-DL Boost instruction sets.
  • Improved inner product performance on processors with Intel AVX512 and Intel AVX512-DL Boost instruction sets.
  • Improved performance of int8 convolutions and deconvolutions on processors with Intel AVX512 and Intel AVX512-DL Boost instruction sets.

New functionality

  • Convolutions support arbitrary elementwise operations in postops.
  • Introduced support of signed int8 data for the inner product primitive.
  • Introduced int8 LSTM cell support.
  • Introduced automatic dispatching between the direct and Winograd convolution algorithms.

API deprecations and breaking changes

  • Previously deprecated APIs were removed:
    • relu function
    • convolution_relu function
    • double precision scales support in sum
    • negative_slope parameter in eltwise
    • omit_stats flag in batch normalization

Usability improvements

  • Added library version information to verbose output and to headers.
  • Added information about detected instruction set to verbose output.
  • Introduced mkldnn_version function.
  • Added APIs to override behaviors controlled via environment variables, including verbose mode and JIT dump.

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Ruslan Baratov @ruslo, Konstantin Basargin @basargin, Jacek Czaja @jczaja, Eugene Zhulenev @ezhulenev, Haitao Feng @fenghaitao, Yinghai Liu @yinghai, Masahiro Sakai @msakai, and Alexander Grund @Flamefire. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v0.17.4

Published by tprimak over 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.17.3:

  • Fix bug in build system for old versions of CMake (61f953e079a6fdec9ef123d6fa7d36242fe797f1)
oneDNN - v0.18-rc

Published by tprimak over 5 years ago

This is a release candidate package for MKL-DNN v0.18. Please provide feedback and report bugs in Github issues.

oneDNN - v0.17.3

Published by tprimak over 5 years ago

This is a patch release containing following changes to MKL-DNN v0.17.2:

  • Fix integer overflow in GEMM (059b5fdd2254733676a1757ffd3ecd2129f90b71)
  • Update Xbyak* to 5.751 (4f809d0ef7a1dde12498b91bd86c985448143a25)
oneDNN - v0.17.2

Published by tprimak almost 6 years ago

This is a patch release containing following changes to MKL-DNN v0.17.1:

  • Fix data race during initialization in the GEMM-based convolution (763513eb83ce5eb120b38aee16ac185b53fa8b91)
  • Fix number of dimensions of a tensor in the backward deconvolution primitive descriptor (5a0a50c2cba0f42c8c54ea07c0b2d9b83bb6cb22)
  • Fix Valgrind* complaints (ed4b08c3c3b0ab5aac6d47a138bde4cf49055374)
oneDNN - v0.17.1

Published by tprimak almost 6 years ago

This is a patch release containing following change to MKL-DNN v0.17:

  • Tentatively turn on reference direct copy reorder for GNU* Compiler Collection (567dfb52326100bceec483aa42e6ea9681505733)
oneDNN - v0.17

Published by tprimak almost 6 years ago

Performance optimizations

  • Improved int8 convolutions performance on processors with Intel® AVX512-DL Boost instruction set support.
  • Improved performance of fp32 convolutions with number of input and output channels not divisible by the SIMD width for processors with Intel® AVX2 instruction set support.
  • Improved performance of Recurrent Neural Networks (RNNs) functionality.
  • Improved performance of int8 deconvolution.
  • Added optimizations for fp32 inference and training for processors with Intel® AVX instruction set support.
  • Added optimizations for convolutions and auxiliary primitives with 3D spatial data for processors with Intel® AVX2 instruction set support.
  • Improved int8 Winograd convolution performance for real-time inference use cases.

New functionality

  • Introduced int8 data-type support for inner-product primitive.
  • Introduced support for int8 convolutions with signed input and signed weights.
  • Introduced 1D spatial data support in convolution and auxiliary primitives. This functionality is optimized for processors with Intel® AVX512 instruction set support.
  • Introduced the Shuffle primitive.
  • Introduced a general-purpose matrix-matrix multiplication function for int8 data (gemm_s8u8s32 and gemm_s8s8s32).
  • Feature preview: Threading Building Blocks (TBB) support.

API deprecations and breaking changes

  • Order of the gates for LSTM cells was changed to input, forget, candidate, output. This might produce incorrect results.
  • Backward RNN primitive creation without the hint in C++ is deprecated.
  • Int8 Winograd convolution behavior with respect to scales is aligned with the direct convolution algorithm.

Usability improvements

  • Primitives now accept tensors with 0 for the dimension and do nothing in that case.
  • Added support for clang sanitizers.
  • Build system extended with the following capabilities:
    • Allow building with static Intel MKL by passing -DMKLDNN_USE_MKL=FULL:STATIC to cmake
    • Allow specifying the Intel MKL to use by passing -DMKLDNN_USE_MKL={DEF,NONE,ML,FULL} to cmake for that
    • Allow using the compiler's OpenMP RT by passing -DMKLDNN_THREADING=OMP:COMP to cmake for that
    • Allow building a static library by passing -DMKLDNN_LIBRARY_TYPE=STATIC to cmake

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Dmitry Baksheev @dbakshee, Yuta Okamoto @okapies, and Eduardo Gonzalez @wmeddie. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

oneDNN - v0.17-rc

Published by tprimak almost 6 years ago

This is a release candidate package for MKL-DNN v0.17. It is made available for testing by the community. Please provide feedback and report bugs in Github issues.

oneDNN - v0.16

Published by tprimak about 6 years ago

Performance optimizations

  • Improved performance of int8 convolutions with number of input and output channels not divisible by SIMD width on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
  • Winograd convolutions optimized for fp32 real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
  • Optimized weights update of dilated convolutions for fp32 data type on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
  • Improved performance of reorder primitive for int8 data type.

New functionality

  • Added dilation support for deconvolution (transposed convolution) primitive.
  • Introduced deconvolution (transposed convolution) primitive for int8 data type.

API deprecations and breaking changes

  • The default behavior of gemm-based convolutions was changed. Now they use internally allocated thread-local scratchpad memory for im2col and col2im operations, weights reduction, and accumulation. This may cause correctness issues when multiple gemm-based convolutions are created in one thread and executed concurrently in different threads. To support concurrent execution, MKL-DNN library must be configured with -DMKLDNN_ENABLE_CONCURRENT_EXEC=TRUE CMake flag.

Usability improvements

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Yasser Zamani @Yasserzamani and Loo Rong Jie @Rongjiecomputer. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

oneDNN - v0.15

Published by tprimak over 6 years ago

Performance optimizations

  • Improved fp32 convolutions performance for real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support
  • Improved int8 depthwise separable convolutions performance on processors with Intel(R) AVX512 instruction set support
  • Improved 3D convolution performance on Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support
  • Optimized dilated convolutions for int8 and fp32 data types
  • Improved performance of pooling primitives for NHWC and NCHW data layouts
  • Improved performance of 3D pooling primitives for plain data layouts
  • Optimized batch normalization backpropagation for Intel(R) processors with AVX and SSE4.2 instruction groups support
  • Improved performance of batch normalization with 3D spatial data

New functionality

  • Feature preview: Introduced training and inference support for GRU cells for recurrent neural network (RNN)
  • Introduced general purpose SGEMM API
  • Introduced deconvolution (or transposed convolution) primitive for 3D spatial data
  • Introduced backward propagation for softmax primitive

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Tuomas Kärnä @tkarna, @msakai, Can Balioglu @cbalioglu, Jacek Czaja @jczaja, Thejan Wijesinghe @ThejanW, Jesse Nicholson @TechnikEmpire, @okdshin, Crissman Loomis @Crissman. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

oneDNN - v0.14

Published by tprimak over 6 years ago

Performance optimizations

  • Improved fp32 Winograd convolution performance on Intel Xeon processors with Intel(R) AVX512 instruction set support.
  • Improved depthwise separable convolutions performance on processors with Intel(R) SSE 4.2, Intel(R) AVX and Intel(R) AVX512 instruction sets support.
  • Improved performance of GEMM-based convolutions backward propagation.
  • Improved performance of auxiliary primitives for NHWC and NCHW data layouts.

New functionality

  • Feature preview: Introduced recurrent neural network (RNN) support. This release includes training and inference support for uni- and bi-directional vanilla RNN and Long Short-Term Memory (LSTM) cells. Use of the new API is demonstrated with an example featuring LSTM model inference with attention based on Google Neural Machine Translation (GNMT) topology.
  • Added Winograd convolution implementation for int8 data type optimized for Intel Xeon processors with Intel AVX512 instruction set support. The implementation includes initial optimizations for future Intel Xeon processors with AVX512_VNNI instruction groups support.
  • Introduced deconvolution (or transposed convolution) primitive
  • Introduced support for 3D spatial data in convolution and auxiliary primitives. The following primitives are optimized for 3D tensors:
    • reorders
    • convolution
    • deconvolution
    • batch normalization
    • pooling
    • eltwise
    • concat
    • inner product

Usability improvements

  • Added flags -DWITH_TEST=OFF -DWITH_EXAMPLE=OFF in build system that disable building tests and examples.
  • Added –DLIB_SUFFIX flag that allows to add suffix to the lib directory.
  • Added prepare_mkl.bat script that automates download of Intel MKL small libraries on Windows.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Zhong Cao @4pao, Dmitriy Gorokhov, Jian Tang @tensor-tang, Daniel M. Weeks @doctaweeks, Tony Wang @tonywang1990, Tao Lv @TaoLv and Xinyu Chen @xinyu-intel. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

oneDNN - v0.13

Published by tprimak over 6 years ago

Performance optimizations

  • Added optimizations for future Intel(R) Xeon(R) processors with AVX512_VNNI instruction groups support. New instructions are used in direct convolutions with int8 and int16 data types.
  • Improved performance of int8 direct forward convolution on Intel Xeon processors with Intel AVX512 instruction set.
  • Improved performance of grouped convolutions and depthwise separable convolutions.

New functionality

  • Extended Batch Normalization to enable fused ReLU on forward and backward propagation.

Usability improvements

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Patric Zhao @pengzhao-intel, Ashok Emani @ashokei, Erik Kruus @kruus and Dmitriy Gorokhov. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

oneDNN - v0.12

Published by vpirogov almost 7 years ago

Performance optimizations

  • Improved performance of fp32 direct and Winograd convolution on Intel(R) Xeon(R) processors with Intel(R) Advanced Vector Instructions 512 (Intel(R) AVX512) support
  • Improved performance of int8 direct convolution on Intel Xeon processors with Intel AVX512 instruction set
  • Improved batch normalization performance on Intel Xeon processors with Intel AVX512 instruction set
  • Optimized dilated convolution backward propagation
  • Improved initialization time of GEMM-based convolution implementations

New functionality

  • Support for int8 inference. These functions support int8 data type:
    • reorders (including quantization and dequantization)
    • convolution
    • pooling
    • eltwise
    • sum
    • concat
  • Layer fusion support with the new post-ops API. Functions that support fusion:
    • forward convolution with eltwise for inference and training
    • convolution with sum for inference
    • batch normalization with eltwise for training

API deprecations and breaking changes

  • ReLU primitive is deprecated. The functionality is a part of eltwise primitive
  • Merged convolution/ReLU primitive is deprecated. The functionality is available using the new post-ops API

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as @kruus, Yong Wu, Daoxin Pan, and Zhiming Wang. We would also like to thank everyone who asked questions and reported issues.

* Other names and brands may be claimed as the property of others.

oneDNN - v0.11

Published by vpirogov almost 7 years ago

Performance optimizations

  • Improved convolution performance on future Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support
  • Improved convolution performance on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support
  • Improved performance of GEMM-based convolutions for small minibatches
  • Improved performance of Winograd convolution algorithm on Intel Xeon Phi processors.

New functionality

  • Added backpropagation support for dilated convolution.
  • Eltwise primitive is extended with support for square, abs, square root, linear, bounded ReLU, soft ReLU and logistic.

Usability improvements

  • Added macOS* support.

Breaking changes to the API

  • All real-value op descriptors' parameters now have float data type (previously double). The change breaks C-API backward compatibility for sum primitive. Please refer to 0bbb22e878a679ca870dc139b5c85e60e5ab78d3 for details. C++ API maintains backward compatibility.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Yu Yang @reyoung, Vladimir Mironov @vamironov, Nishant Patel @nbpatel, Leona Cook @indie, Jayaram Bobba @jbobba, Elena Gvozdeva. We would also like to thank everyone who asked questions and reported issues.

* Other names and brands may be claimed as the property of others.

oneDNN - v0.10

Published by vpirogov about 7 years ago

Performance optimizations

  • Improved performance on processors with Intel(R) AVX512 instruction set support
  • Added optimizations for future Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support

New functionality

  • Added support of Winograd convolution algorithm. The implementation has initial optimizations for Intel Xeon Phi processors with Intel AVX512 instruction set support.
  • Introduced elementwise primitive with 3 types of activations: ReLU (rectified linear unit), ELU (parametric exponential linear unit) and TANH (hyperbolic tangent non-linearity).
  • Added dilation support to forward convolution. The implementation is optimized for processors with Intel(R) SSE 4.2 and Intel(R) AVX instruction sets support.
  • Feature preview: Added int16 support in convolution, ReLU, pooling and inner product for training. Added optimized s16s16s32 convolution flavor for future Intel Xeon Phi processors.
  • Feature preview: Added optimized pooling with int8 support.

Usability improvements

  • Added Windows* support.
  • Added benchdnn test suite for comprehensive functional and performance testing of convolutions. The suite supports int8, int16 and fp32 data types.
  • Primitive implementation information can be queried using impl_info_str.

Deprecated functionality

  • ReLU primitive is deprecated and will be removed in future releases. Activation functions including ReLU are implemented in elementwise primitive.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Guenther Schmuelling @guschmue, Yong Wu, Dmitriy Gorokhov, Menon Jaikrishnan, Erik @kruus, Zhong Z Cao @4pao, Gleb Gladilov and @tensor-tang. We would also like to thank everyone who asked questions and reported issues.

* Other names and brands may be claimed as the property of others.

oneDNN - v0.9

Published by vpirogov over 7 years ago

Performance optimizations

  • Improved performance on processors with Intel(R) AVX2 instruction set support
  • Improved performance on processors with Intel(R) AVX512 instruction set support
  • Added optimizations for Intel(R) Xeon processors with Intel AVX512 instruction set support
  • Added inference optimizations for Intel(R) Atom processors with Intel(R) SSE4.2 support
  • Added JIT implementation of SGEMM for Intel(R) Xeon Phi(TM) processors.

New functionality

  • Average pooling supports 'exclude padding' mode
  • LRN supports arbitrary local size
  • Feature preview: Added int8 support in convolution, ReLU, pooling and inner product. Added optimized u8s8u8 convolution flavor for Intel Xeon processors with Intel AVX512 instruction set support.
  • Feature preview: Added int16 support in convolution, ReLU, pooling and inner product. Added optimized s16s16s32 convolution flavor for future Intel Xeon Phi processors.

Usability improvements

  • Improved build system to enable integration to other projects.
  • Intel(R) OpenMP runtime is used when the library built with binary dependency
  • Feature based dispatcher added to support wide range of Intel(R) processors and compatible

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Ismo Puustinen @ipuustin, Dmitry Gorokhov, Vladimir Dudnik @vladimir-dudnik, @pruthviIntel, and Chris Olivier @cjolivier01. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v0.7

Published by vpirogov over 7 years ago

Changes:

  • Improved performance on processors with Intel(R) AVX2 instruction set support
  • Improved performance on processors with Intel(R) AVX512 instruction set support
  • Extended backward propagation optimizations for Intel(R) AVX2 and Intel AVX512 instruction sets
  • Added SGEMM-based reference convolution implementation significantly improving performance for cases not covered by JIT convolution
  • Added JIT version of SGEMM function for Intel(R) AVX2 instruction set. This change allows to build optimized Intel(R) MKL-DNN without binary component.
  • Added backward propagation examples
oneDNN - v0.5

Published by vpirogov over 7 years ago

Changes:

  • Added runtime CPUID dispatching mechanism
  • Added initial Intel(R) AVX512 optimizations
  • Improved performance on processors with Intel(R) AVX2 instruction set support
  • Added initial backward propagation optimizations
  • Extended batch normalization primitive API with scale/shift and mean/variance parameters
  • Updated XByak to version 5.40
oneDNN - v0.3

Published by vpirogov almost 8 years ago

Changes:

  • Added sum primitive
  • Added backward propagation reference implementation
oneDNN - v0.2

Published by vpirogov about 8 years ago

Changes:

  • Added batch normalization
  • Added split and concat
  • Added linear response normalization inside the channel
  • Added average pooling