oneDNN

oneAPI Deep Neural Network Library (oneDNN)

APACHE-2.0 License

Stars
3.4K
Committers
299

Bot releases are hidden (Show)

oneDNN - v0.21.1

Published by tprimak about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.21:

  • Fixed output channel blocking logic in forward AVX2 convolution that could lead to incorrect result or segfault (6accb47c4588ab6f0c350117faf7f26e850446d2)
  • Fixed int8 grouped convolution for some shapes with the number of input or output channels not being a multiple of 8 on Intel AVX512 systems (878ac2d4b2d561b44a9c2dc19f6988a7da0a71a6)
oneDNN - v0.21

Published by vpirogov about 5 years ago

Performance optimizations

  • Improved int8 and fp32 GEMM and inner product performance.
  • Improved reorder performance for certain shapes.
  • Improved RNN, LSTM, GRU and LBR-GRU training performance.

New functionality

  • Added GELU activation support.

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v1.1-rc

Published by vpirogov about 5 years ago

This is a release candidate for DNNL v1.1. Please provide feedback and report bugs in Github issues.

oneDNN - v0.20.5

Published by vpirogov about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.20.4:

  • Fixed out of bound memory access in GEMM-based grouped convolution weight update (3deeafa47e73fafac7943fbc05c076cdc7247c9d)
  • Fixed segmentation fault in AVX512 convolution for effective negative padding (f231ada6fb67c6bb7e31befcea6fb8b3e88b50ca)
  • Fixed correctness issue in strided depthwise convolution (d7484cbd7b95a969552b295ed2f160ce7246f5fd)
oneDNN - v0.21-rc

Published by anita-intel about 5 years ago

This is a release candidate for Intel MKL-DNN v0.21. Please provide feedback and report bugs in Github issues.

oneDNN - v0.20.4

Published by tprimak about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.20.3:

  • Fixed memory corruption issue in backward convolution with 1x1 kernel and asymmetrical strides (095ddb840721b313612035b60d45cbbee12e270f)
  • Fixed correctness issue in backward convolution (eb330079e36fccfd93e30ec9e9580320d2ff4c41)
oneDNN - v0.20.3

Published by tprimak about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.20.2:

  • Fixed correctness issue in backward pooling with 3d-spatial and negative right padding (c0ddfec6c0d82d51934aa69a4c26f5f9e0145799)
oneDNN - v0.20.2

Published by tprimak about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.20.1:

  • Fixed issue with bfloat16 instructions detection in Xbyak (b59bf2ec38bebd86b73aa59054f735e0fe3fc6ba)
  • Fixed offset calculation issue in weight update depthwise convolution in fp32 and bfloat16 kernels (ddc54e509cc7e62ed69e74247b339842f4ae3fe8, 0982b250fd0bcbc7f972c8fa0be13b5956b78560)
  • Added check that size of generated kernel doesn't exceed the maximum allowed bound in fp32 forward and backward kernels (24abe206f31a0b5f09471c63fabdf8d113a51e6c)
  • Various fixes in RNN primitive:
    • Avoid unaligned pointers usage in vex instructions in GRU cell (8eb14f518b900e8abcb6e9c2acb68e6fa013eb41)
    • Addressed bugs in tests for RNNs (fa534ef28728dbb2f47859fa42fbcc1fc928559a, 3ac4db45098c51dd0b56cd1ba9767360a8f5bbcd)
    • Fixed potential integer overflow (35c5f8a90209532f3fffa1b5e048fb6f3cd0d879)
oneDNN - v1.0.2

Published by tprimak about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v1.0.1:

  • Fixed issue with bfloat16 instructions detection in Xbyak (0f4ba114397358416917e7a078770811a395aa5b)
  • Fixed buffer size in packed GEMM (9764940d0a7081040dc819e13db96df6b85b32a5)
  • Fixed offset calculation issue in weight update depthwise convolution in fp32 and bfloat16 kernels (6b9d41242f0f5eb4ac13245ff29f32f39ae3ae6b, 061499d4af30d0a4f44bd6819e9d673ad68b4b6a)
  • Added check that size of generated kernel doesn't exceed the maximum allowed bound in fp32 forward and backward kernels (67e8cd2da7f313f7246c0ae599b42107d30e37d6)
  • Various fixes in RNN primitive:
    • Proper handling of packed GEMM in extended GEMM (4eb9f5621677e1952cf851ac6514ce7e76156f37)
    • Force no-copy GEMM only for Intel AVX+ systems (2fbc8ba5e4d02122d730d95b2e4af1a741d8599b)
    • Avoid unaligned pointers usage in vex instructions in GRU cell (a147c08f728b8d85aff6bb282532944dd2729c1f)
    • Fixed wrong dimension when creating GEMM primitive descriptor in reference RNN implementation for GPU (eb3c866d3b23aa38d5cf9f210090075df288b461)
    • Fixed Tanh backward calculation in GPU RNN reference implementation (f6e4b97242cc716d2ddd06f87922a62105b6c729)
    • Fixed pack GEMM dispatching for int8 (16b46c7d11ec4e3205b6d2de4b0fa3b02dfa1086)
    • Addressed bugs in tests for RNNs (cf83e83fe7706e273b56ea37a66f88774acebf03, f7c2de21325ceee4bd502b0fb5151677a8423cf4, 960f3f3e7bc21903d614776aa2e5c32d841c615f)
oneDNN - v1.0.1

Published by vpirogov about 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v1.0.0:

  • updated version and soversion to <major>.<minor> and <major> respectively (952a7778f8a624eb31c3bb79c2f299d0953f4030)
oneDNN - v1.0

Published by anita-intel over 5 years ago

Performance optimizations

  • Added SGEMM copy-based kernels for Intel SSE 4.1, Intel AVX, Intel AVX 2 and Intel AVX 512 architectures. With this optimization Intel MKL-DNN’ JIT SGEMM implementation achieves comparable performance to Intel MKL.
  • Improved GEMM performance for n=1.
  • Improved performance of s8s8s32 GEMM.

New functionality

  • Introduced Intel Processor Graphics support covering fp16 and fp32 inference, and fp32 training. Intel MKL-DNN relies on OpenCL* runtime to execute computations on Intel Processor Graphics and provides interoperability with user’s OpenCL code.
  • Added post-ops support in Inner Product and GEMM-based convolution.
  • Introduced bfloat16 training and inference support in reorders, (de-)convolution, pooling, batch normalization, local response normalization, eltwise, inner product, shuffle, sum, and concat. The implementation relies on new instructions targeting future Intel Xeon Scalable processor (codename Cooper Lake). On Intel Xeon processors with Intel AVX512 support bfloat16 arithmetic is emulated.
  • Added GELU activation support.

Usability improvements

  • Introduced new developer guide and new examples.
  • Removed dependency on Intel MKL (or Intel MKL small libraries) as JIT implementation delivers comparable performance.
  • Introduced explicit scratchpad management.
  • Lowered requirements for Intel SSE4 optimizations to Intel SSE 4.1.
  • Added out of the box Intel VTune profiling support.
  • Introduced binary distribution.

Breaking changes to the API

This is a major release that introduces several breaking changes. See v1.0 transition guide for the full list of changes and replacement functions.

  • Removed previously deprecated APIs.
  • Removed experimental s16 data type support.
  • Removed unused parameters rounding_mode and padding_kind
  • Removed view primitive. The functionality is supported directly by memory descriptor.
  • Separated RNN primitive into separate primitives for each cell type.
  • Separated cell states and hidden states in LSTM cell.
  • Changed matrix layout in GEMM to row-major and calling convention to C-style.
  • Changed the offset handling in integer GEMM (now the offsets are subtracted from matrices A and B).
  • Changed execution API to accept memory buffers at primitive execution
  • Simplified memory descriptor and removed memory primitive descriptor entity

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Andrew Senkevich, Benjamin Fitch, Nathan Greeneltch @nathan-greeneltch-intel, Ilia Taraban, Shigeo Mitsunari @herumi, Nikolay Tyukaev, Ivan Samsonov, Kalina Tomasz, @basargin and Louie Tsai @louie-tsai. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

oneDNN - v0.20.1

Published by anita-intel over 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.20.0:

  • Addressed static initialization order issue in bf16 converters (aef88b7c233f48f8b945da310f1b973da31ad033)
  • Fixed out of bound memory access in LRN implementation for Intel AVX2 (1a5eca7bf7ca913d874421912370fc852ddfe986)
oneDNN - v0.20

Published by anita-intel over 5 years ago

Performance optimizations

  • Improved GEMM-based convolutions performance.
  • Improved softmax performance.
  • Added arbitrary eltwise fusion support in GEMM-based convolutions and inner product.

New functionality

  • Introduced bfloat16 data type support in reorders, (de-)convolution, pooling, batch normalization, local response normalization, eltwise, inner product, shuffle, sum, and concat. The implementation relies on new instructions targeting future Intel Xeon Scalable processor (codename Cooper Lake). On the processors with Intel AVX512 support bfloat16 arithmetic is emulated.

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v1.0-rc

Published by tprimak over 5 years ago

This is a release candidate for Intel MKL-DNN v1.0. Please provide feedback and report bugs in Github issues.

oneDNN - v0.20-rc

Published by tprimak over 5 years ago

This is a release candidate for Intel MKL-DNN v0.20. Please provide feedback and report bugs in Github issues.

oneDNN - v0.19

Published by tprimak over 5 years ago

Performance optimizations

  • Improved int8 convolutions performance for small batch cases.
  • Improved performance of grouped convolutions with the number of channels in a group being multiple of 4.
  • Improved GEMM-based convolutions performance.
  • Improved performance of RNN cells.
  • Improved SGEMM performance for Intel® AVX2 and Intel® AVX512 instruction sets.

New functionality

  • Introduced int8 support in 1D convolution, deconvolution, inner product, and batch normalization

Usability improvements

  • Added CMake package configuration file

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Haitao Feng @fenghaitao, Klein Guillaume @guillaumekln, Alexander Grund @Flamefire, Rui Xia @harrysummer, and Shigeo Mitsunari @herumi. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v0.19-rc

Published by tprimak over 5 years ago

This is a release candidate for MKL-DNN v0.19. Please provide feedback and report bugs in Github issues.

oneDNN - v1.0-pc2

Published by tprimak over 5 years ago

This is preview candidate 2 for Intel MKL-DNN v1.0.

It introduces support for Intel(R) Processor Graphics and implements changes announced in v1.0 RFC. Please provide feedback and report bugs in Github issues.

oneDNN - v0.18.1

Published by tprimak over 5 years ago

This is a patch release containing following changes to Intel MKL-DNN v0.18.0:

  • Fix bug in build system to do not break transitive linking when being used as a subproject (245b331e5ef4962f6bffdff2d207b185e362a58a)
  • Fix fix bias conversion in int8 gemm-based convolution (9670998b82b3e5e1ddb1bf052654b39a890b28ca)
oneDNN - v1.0-pc

Published by tprimak over 5 years ago

This is preview candidate for MKL-DNN v1.0.

The preview candidate implements changes announced in v1.0 RFC. Please provide feedback and report bugs in Github issues.