oneDNN

oneAPI Deep Neural Network Library (oneDNN)

APACHE-2.0 License

Stars
3.4K
Committers
299

Bot releases are visible (Hide)

oneDNN - graph-v0.5

Published by vpirogov over 2 years ago

This is the Alpha release for oneDNN Graph API based on oneDNN v2.6 release.

Functionality

  • Introduced FP32 and BF16 training support on CPU.

  • Introduced multiple layer perceptron (MLP) fusion supported by oneDNN Graph compiler with optimized code generation (experimental).

  • Updated API to comply with oneDNN Graph API specification v1.0-alpha.

Known Issues and Limitations

  • The weight’s opaque layout can be queried only from a compiled partition, which requires that input tensor shapes must be known at compilation time.

  • MHA and MLP fusion are not activated on machines without AVX-512 support, as oneDNN Graph compiler generates AVX-512 and newer instructions.

Thanks to the Contributors

This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

oneDNN - v2.6

Published by harrymao2022 over 2 years ago

Performance Optimizations

  • Intel Architecture Processors
    • Improved performance for future Intel Xeon® Scalable processors (code name Sapphire Rapids). The functionality requires Linux kernel 5.16 or later.
    • Improved performance of matmul primitive for processors with Intel AVX-512 support.
  • Intel Graphics Products
    • Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
    • Improved performance for future Intel Arc graphics (code name Alchemist and DG2).
  • AArch64-based Processors
    • Improved binary primitive performance with Arm Compute Library (ACL).
    • Improved shuffle primitive performance for processors with SVE 512 support.

Functionality

  • Introduced bfloat16 destination support for int8 convolution, matmul and inner product primitives for processors with Intel AVX-512 support and or future Intel Xeon® Scalable processors (code name Sapphire Rapids)
  • Extended RNN primitive with support for AUGRU cell.
  • Added support for non-zero negative slope in ReLU post-op for batch normalization primitive.
  • Introduced support for mixed source and destination data types in softmax primitive.
  • Introduced persistent cache API. This functionality allows to serialize and reuse JIT kernels.

Usability

  • Added build time options to manage the set of supported instruction set architectures on Intel Graphics Products. See ONEDNN_ENABLE_PRIMITIVE_GPU_ISA for more details. This feature further reduces the binary footprint.
  • Extended built time options ONEDNN_ENABLE_PRIMITIVE and ONEDNN_ENABLE_WORKLOAD to GPU implementations. This feature further reduces the binary footprint.
  • Reduced stack consumption in GEMM implementation.
  • Added command line help to benchdnn.

Deprecated Functionality

  • Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Breaking Changes

  • Removed performance optimizations for Intel Xeon Phi processors. oneDNN will continue to be functional on these processors using Intel AVX2 codepath.

Thanks to the Contributors

This release contains contributions from the project core team as well as Arthur Mitrano @aaraujom, Aslan @aslanxie, Attila T. Áfra @atafra, Damian Szwichtenberg @dszwicht, Diana Bite @diaena, Joel Dippold @jedippold, Jonathan Deakin @jondea, Jonathan Louis Kaplan @JLouisKaplan-Arm, Kentaro Kawakami @kawakami-k, Luke Ireland @LukeIreland1, Mesut Meterelliyoz @mmeterel, Nathan John Sircombe @nSircombe, Peter Caday @petercad, Tengfei Han @Tengfei09, and Thiago Macieira @thiagomacieira. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v2.5.4

Published by tprimak over 2 years ago

This is a patch release containing the following changes to v2.5.3:

  • Improved performance for batch normalization for tbb/threadpool (421a2cef07e2fe730a8ee6bbd0c55ad7e154eb3c, 7b7b763e8e264ec46bba1772d28aad07abf20d50)
  • Fixed implicit conversion from double to float in examples (866b9ac4429d2a3e9751546ba101d0df11cfb519)
  • Fixed issue in int8 matmul primitive for specific shapes (035c2d42e99e79956e4fe833f01b7b6e5509913c, 9a1bf19b40ed5493d8bbcc3ef5cb4d276a85e78e)
  • Fixed performance regression for matmul primitive with binary post op and broadcast (dcd61efe83cb5b64e60fa6d294320a34fb8734c3, 31dec32e71624e9c7f6b4da39a0d2da757b06906)
  • Fixed performance regression in binary primitive when using NHWC layout (228493c38711bf62d4dd4b534af890c2ae6b2ad1)
oneDNN - v2.6-rc

Published by harrymao2022 over 2 years ago

This is a release candidate for oneDNN v2.6. Please provide feedback and submit defect reports via Github issues.

Performance Optimizations

  • Intel Architecture Processors
    • Improved performance for future Intel Xeon® Scalable processors (code name Sapphire Rapids). The functionality requires Linux kernel 5.16 or later.
    • Improved performance of matmul primitive for processors with Intel AVX-512 support.
  • Intel Graphics Products
    • Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
    • Improved performance for future Intel Arc graphics (code name Alchemist and DG2).
  • AArch64-based Processors
    • Improved binary primitive performance with Arm Compute Library (ACL).
    • Improved shuffle primitive performance for processors with SVE 512 support.

Functionality

  • Extended RNN primitive with support for AUGRU cell.
  • Introduced support for mixed source and destination data types in softmax primitive.
  • Introduced persistent cache API. This functionality allows to serialize and reuse JIT kernels.

Usability

  • Added build time options to manage the set of supported instruction set architectures on Intel Graphics Products. See ONEDNN_ENABLE_PRIMITIVE_GPU_ISA for more details. This feature further reduces the binary footprint.
  • Extended built time options ONEDNN_ENABLE_PRIMITIVE and ONEDNN_ENABLE_WORKLOAD to GPU implementations. This feature further reduces the binary footprint.
  • Reduced stack consumption in GEMM implementation.
  • Added command line help to benchdnn.

Deprecated Functionality

  • Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Breaking Changes

  • Removed performance optimizations for Intel Xeon Phi processors. oneDNN will continue to be functional on these processors using Intel AVX2 codepath.

Thanks to the Contributors

This release contains contributions from the project core team as well as Arthur Mitrano @aaraujom, Aslan @aslanxie, Attila T. Áfra @atafra, Damian Szwichtenberg @dszwicht, Diana Bite @diaena, Joel Dippold @jedippold, Jonathan Deakin @jondea, Jonathan Louis Kaplan @JLouisKaplan-Arm, Kentaro Kawakami @kawakami-k, Luke Ireland @LukeIreland1, Mesut Meterelliyoz @mmeterel, Nathan John Sircombe @nSircombe, Peter Caday @petercad, Tengfei Han @Tengfei09, and Thiago Macieira @thiagomacieira. We would also like to thank everyone who asked questions and reported issues.

oneDNN - graph-v0.4.2

Published by vpirogov over 2 years ago

This is a patch release containing the following changes to graph-v0.4.1:

  • Fixed compiled partition cache by checking CPU threading number (68f262a, 343246e)
  • Enabled binary add and multiply patterns (71a0cfef)
  • Fixed the MHA (multi-head attention) patterns in compiler backend and benchdnn graph (45bbcb3, caaf841)
  • Fixed the build issues for semi-compiler backend (62dd2ca, 738276a, 347f1a9, 2123326)
oneDNN - v2.5.3

Published by tprimak over 2 years ago

This is a patch release containing the following changes to v2.5.2:

  • Fixed accuracy issue in GELU post-op (3ff2c3d6cfe1bd0b2eee015f97c6fe0515f14bff)
  • Added ability to enable code only on non-x64 systems (ff7ae00c074ea09048bcc27f905385d1f6a6b830)
  • Fixed issue in reorder primitive on non-x64 systems (5917860513ec94274a157d9642695393858cd205)
  • Fixed build issue on OSX11 and older cmake (d9c8bbeb884ab916d47aa57b0bec9abbd7f89542)
  • Fixed assert in reorder primitive (79090bc2907c7b201aa5ecd9a7ffe3a347fe2444)
  • Documentation fixes (d29075852b8713c487253e7c664b2c78e3663327, ee7eacb012b95a677c3c1dae0d05e730d99f180e, 543b8f86f4659d36dddcff1d2bddb810d3740889)
  • Fixed potential division by zero in example for binary primitive (2fffd96b7461ec1088b271e2df9d1fe077d45ff0)
  • Fixed SIGFPE issue in reorder primitive (8c291fca11b08786948a1e198c7bca051e525347)
  • Fixed potential size overflow in inner product primitive (c10f74a0e71d79ad06d813e403e63e2ee0e2b260)
  • Added logic to reduce the number of threads (tasks spawned for threadpool) for small shapes (8f885e76a2221565ababd3718a2c1441b1300780, 405398994009fb97ea137a7e300a494489c29bc7, 49ec406751d2ba03e9166d36aec94e4d6dd236bd, 2977360e146148f17d60861829a41c674856e8f6)
  • Fixed SEGFAULT issue in matmul primitive (62c1170d7741c261722167d06f28d7f5e18d14ee, a993d522ff68f310186b73c5e1ec473c221c7869)
  • Added bf16 support for sum post-op (3d2c37e4b069d4188741c1a8c40e6eb7404e68a2)
  • Added fp:precise compiler flag for Intel Compiler identified as IntelLLVM (1558a4bfd894d73f55030d06df73584af71525d6)
  • Fixed issue in bf16 convolution primitive when fused with binary (b379fd9c3715af38fb2067f39aea19fa90191024)
  • Fixed issue in backward depthwise convolution (d5e4122f6429cb73312620c9100a65a0ad66a0a7, f5cac2346d6198a958f50c8be7cbf968191018aa, eeaa19c4e87c2ee96a50a0df0dadd8d045e94774)
  • Fixed SEGFAULT in int8 convolution with eltwise post_op (32a629fef18b087554dabcaa983d0158654b2fe3)
  • Fixed NaN issue in bf16 backward inner product (0c5e49205d63f05b72779c2e2f9419bb42144e64)
  • Fixed performance regression for binary with broadcast (f79b03072dbdc373ce3a5435c41d899ddee9eddb, 58ce3c1de0e8e8c387275f0d649a3a26b726c640)
oneDNN - graph-v0.4.1

Published by vpirogov almost 3 years ago

This is a patch release containing the following changes to graph-v0.4:

  • Upgraded oneDNN to v2.5.2 (b557b497, 6aae6f7a)
  • Enabled MatMul + Div + Add fusions (effa3350, 3f5a8f7a, f9ffcc5c)
oneDNN - v2.5.2

Published by tprimak almost 3 years ago

This is a patch release containing the following changes to v2.5.1:

  • Fixed performance regression in binary primitive with broadcast (b9721743614f9dcb477a86d82fc19a96dc7e5736, ff751229eeb7ff546491f54b1060c03ec241c673)
  • Fixed issue with SYCL device properties initialization (cabc5ca62e1b109161bc7cfccaa0ca5ba1f7b639, 095f13e77b9440307b152590f61c6e21e0c026a5)
  • Fixed issue in matmul primitive with zero points (3157354dd5498fa83f2d8da17b25138d78a5c13b)
  • Fixed segmentation fault in depthwise convolution primitive for shapes with huge spatial size for processors with Intel AVX-512 support (68347644ace88ef9dc7bcbba928674e3f9ac1b08, 1d2addcf5a11a6bf034001e809cbea1c89942f0f)
  • Fixed issue in forward convolution primitive for processors with Intel AVX2 support (d691137c245efab99651c95387e1713d3cf91fb7)
  • Fixed performance regression on GPUs with SYCL runtime (d8364e5b4c88f27143894bb7835c65eb22770e16)
oneDNN - graph-v0.4

Published by tprimak almost 3 years ago

This is a technical preview for oneDNN Graph API based on oneDNN v2.5.

Functionality

  • Introduced bf16 inference support.
  • Introduced multi-head attention (MHA) fusion supported by oneDNN Graph compiler with optimized code generation (experimental).
  • Updated API to comply with oneDNN Graph API specification v0.9.

Known Issues and Limitations

  • Some subgraphs might not be recognized as a partition even if it matches the general pattern description due to internal implementation.
  • The weight’s opaque layout can be queried only from a compiled partition, which requires that tensor shapes must be known at compilation time.
  • MHA fusion is not activated on machines without AVX-512 support, as oneDNN Graph compiler generates AVX-512 and newer instructions.

Thanks to the Contributors

This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

oneDNN - v2.5.1

Published by vpirogov almost 3 years ago

This is a patch release containing the following changes to v2.5:

  • Improved performance of binary primitive and binary post-op with broadcast over batch and spatial dimension (6d4b092ac3aaf81cf71e85e5a639c46f942c1e5c, c4dc38a70de60a92e541e581994f6a53e90c8110, be261ab0e3dae81fbf2b41b2a4038ffb940c5c75, 3ec15b6976eab124db4a5d22a02d8d1e8a2c2001, f1c2f9f3400446addd636194305138b4e6ce8a0b)
  • Fixed undefined behavior for cases when different number of threads used at primitive creation and execution (0af92ec8ad0575883c04bc14436f13a5cc02d8fa, ba2e5a95d5585d28630971eba0edf59caa3673b0, 8863e34de693072cf5d299503a2601ab4cfacabe, 57b1e7ad3d80d61b2b3f820fd08090e548acd9b7, 72b54def7d421ee57acaa60642935c8348610632, 9b394dd5e8f661a7c0582daae9dc8fc562bf8220, 2d4d88a7c7e701aacdcde164bfbc166a73e9feef, 4c3e771c109bdc09f909baf339721dea26ceb6c6, 2458105c93b5451370efee2117a137e6223a63df, 67990405dc63d01f11cc5b464e1ce0a5106e0232, edc40fd6e65ee2a7cadd33129d4873fe4e6f6ccf)
  • Replaced deprecated SYCL APIs with SYCL 2020 alternatives (2c2f4a4707484e15c59adb0aac3563a2ca4f202c, a090db862cc7ae8e77365f61cb8d716b3af3af99)
  • Fixed documentation formatting issues (812085dd49ffe432b49ed6b86f28c37734fc2eeb, 591a0f449295c0d971348bbcfd3a3acf454158fd, 7eadf81d3bba83f4e38044144e165213dce09234, 75a2f06b7ad30b03b0f3eee20f786b94d744a5fb, b73c8a7034b3a7b5c3e95d181c267aea0d411092, ca1eb7710121ef2bacaca79d536471abf31daf25)
  • Updated Microsoft Visual Studio build instructions (add953a66fdda58237aad5c57b93e106886b3b45, 42b9904847ae8d206a87fdb0222708a4334b676a)
oneDNN - v2.5

Published by vpirogov almost 3 years ago

Performance Optimizations

  • Intel Architecture Processors
    • Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16.
    • Improved performance of matmul primitive for processors with Intel AVX-512 support.
  • Intel Graphics Products
    • Introduced initial optimizations for future Xe Architecture graphics (code name Ponte Vecchio).
    • Improved pooling and layer normalization primitives performance.
  • AArch64-based Processors
    • Improved softmax and logsoftmax primitives performance with Arm Compute Library (ACL)

Functionality

  • Introduced support for compiler with SYCL 2020 standard support.
  • Introduced support for the ICX/ICPX and DPCPP compiler drivers distributed with Intel oneAPI DPC++ Compiler on Windows.

Usability

  • Added compile time option to manage the set of supported instruction set architectures on Intel64/AMD64 processors. See 'DNNL_ENABLE_PRIMITIVE_CPU_ISA' for more details. This feature further reduces the binary footprint.
  • Added environment variables and build options with ONEDNN prefix.
  • Introduced support for QNX operating system.
  • Introduced support for RISC-V architecture.

Breaking Changes

  • The Intel MKL-DNN compatibility API is removed. See Transition from Intel MKL-DNN to oneDNN page for instructions on moving to the new API.
  • Updated minimal supported ACL version to 21.11 (was 21.08).

Deprecated Functionality

  • Support for Intel Xeon Phi processors is deprecated and will be removed in the next release.
  • Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Thanks to the Contributors

This release contains contributions from the project core team as well as Aaron Franke @aaronfranke, Arthur Mitrano @aaraujom, Crefeda Rodrigues @cfRod, Diana Bite @diaena, Joel Dippold @jedippold, Joe Konno @thac0, Jonathan Deakin @jondea, Luke Ireland @LukeIreland1, Mark Ryan @markdryan, Mesut Meterelliyoz @mmeterel, Michel Migdal @Michoumichmich, Nathan John Sircombe @nSircombe, Pablo Romero @pablorcum, Peter Caday @petercad, Sergey Razumovskiy @srazumov, and Tsao Zhong @CaoZhongZ. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v2.5-rc

Published by vpirogov almost 3 years ago

This is a release candidate for oneDNN v2.5. Please provide feedback and submit defect reports via Github issues.

Performance Optimizations

  • Intel Architecture Processors
    • Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
    • Improved performance of matmul primitive for processors with Intel AVX-512 support.
  • Intel Graphics Products
    • Introduced initial optimizations for future Xe Architecture graphics (code name Ponte Vecchio).
    • Improved pooling and layer normalization primitives performance.
  • AArch64-based Processors
    • Improved softmax primitive performance with Arm Compute Library (ACL)

Functionality

  • Introduced support for compiler with SYCL 2020 standard support.
  • Introduced support for the ICX/ICPX and DPCPP compiler drivers available in the Intel oneAPI DPC++ Compiler.

Usability

  • Added compile time option to manage the set of supported instruction set architectures on Intel64/AMD64 processors. See 'DNNL_ENABLE_PRIMITIVE_CPU_ISA' for more details. This feature further reduces the binary footprint.
  • Added environment variables and build options with 'ONEDNN' prefix.
  • Introduced support for QNX operating system.
  • Introduced support for RISC-V architecture.

Breaking Changes

Deprecated Functionality

  • Support for Intel Xeon Phi processors is deprecated and will be removed in the next release.
  • Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Thanks to the Contributors

This release contains contributions from the project core team as well as Aaron Franke @aaronfranke, Arthur Mitrano @aaraujom, Crefeda Rodrigues @cfRod, Diana Bite @diaena, Joel Dippold @jedippold, Joe Konno @thac0, Jonathan Deakin @jondea, Luke Ireland @LukeIreland1, Mark Ryan @markdryan, Mesut Meterelliyoz @mmeterel, Michel Migdal @Michoumichmich, Nathan John Sircombe @nSircombe, Pablo Romero @pablorcum, Peter Caday @petercad, Sergey Razumovskiy @srazumov, and Tsao Zhong @CaoZhongZ. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v2.4.4

Published by vpirogov almost 3 years ago

This is a patch release containing the following changes to v2.4.3:

  • Fixed incorrect results for reorder with zero-points on CPUs (ee63629a52dfb355d9f93ed5ac0c72440ad24b65)
  • Fixed an issue with reorder with zero-points not respecting rounding mode on processors without Intel DL Boost support (a165c4a5284820b341d7e5e114f0de689b87f38f)
  • Fixed correctness issue in bfloat16 inner product weight gradient on processors with Intel DL Boost support (b782f190bd40f5b3bbae2d66c5922cc8793f236c)
  • Improved bfloat16 inner product weights gradient performance on processors with Intel AMX support (ebf9f817f12c2db4447eb4000aadab411879fe36)
  • Fixed potential undefined access in convolution, inner product, matmul, and RNNs primitives on processors with Intel AMX support (dcd98ad372108f6fa33510b6e9acd7bae491df83)
oneDNN - graph-v0.3

Published by vpirogov almost 3 years ago

This is a technical preview for oneDNN Graph API based on oneDNN v2.4.

Functionality

Known Issues and Limitations

  • Some subgraphs might not be recognized as a partition even if it matches the general pattern description due to internal implementation.
  • The weight’s opaque layout can be queried only from a compiled partition, which requires that tensor shapes must be known at compilation time.

Thanks to the Contributors

This release contains contributions from the project core teams as well as Tian Feng, Zhang Guoming, Jiong Gong, Chunyuan Wu, Nishant Patel, Yiqiang Li, Yang Sheng, Yunfei Mao, Kiefer Kuah and others.

oneDNN - v2.4.3

Published by vpirogov almost 3 years ago

This is a patch release containing the following changes to v2.4.2:

  • Fixed and issue with reorder primitive producing NaN results for some cases on future Intel Xeon Scalable processor (code name Sapphire Rapids) (ac20af36ea3aced6a1dafe83447ffaee424fefa3)
  • Fixed performance regression for inner product primitive for future Intel Xeon Scalable processor (code name Sapphire Rapids) (ac6a24d4d2db987665bb0832ea31b3f11ca42844, 2cf3526fdbb246b51db597ac730268a6391f87e2, d02dddf7b35ca4ea3c63a9d4aaa9a7c4be2d1cd8, bcdc17531179cc414c3d6e858f6708938ebcb7bb)
  • Fixed segmentation fault in int8 deconvolution primitive with asymmetric quantization for processors with Intel AVX-512 support (6ba086ae6be066222d46693803aefeacecf3ed0b)
oneDNN - v2.4.2

Published by tprimak about 3 years ago

This is a patch release containing the following changes to v2.4.1:

  • Fixed performance regression for convolution primitive for the shapes with 3D spatial for future Intel Xeon Scalable processor (code name Sapphire Rapids) (aca0af1d192ef40ec93b45d24edd509e2905156d)
  • Fixed segmentation fault in bfloat16 forward and backward inner product primitive or future Intel Xeon Scalable processor (code name Sapphire Rapids) (ae8cf18e35a8ed069a0898d77f5548781ecf5e4b, 3de9549d5817ea50638ad208293bb2e2fbb15048)
  • Fixed reorder primitive with compensation (6ba086ae6be066222d46693803aefeacecf3ed0b)
  • Fixed issue in scratch pad size calculation for BRGEMM-based convolutions (dd9eceb88ed683806fb853a33bfab71982414fa3)
oneDNN - v2.3.3

Published by vpirogov about 3 years ago

This is a patch release containing the following changes to v2.3.2:

  • Reverted check for memory descriptor stride validity for unit dimensions (861c6252a5957bd908a5183fcaa4cd7e29b61192)
  • Fixed build errors on Fucshia OS (753b5310317938d37379a34c7d817fb94f61efec)
  • Fixed implicit conversion in GPU GEMM implementation (30dee23e2d60e255ecc0b0dc199cdccb284c66b1)
  • Addressed issues detected by clang TSan (888ab523863e59433da70e72a14927a590673f89, 7555fd839fe04320fd0a470019f798b0b958c45e, 4ffdb3cd09f46df7b7d544c76f0b63d8cb4801ca, 57b8ffd8ec2bea4c15687e52cf906c7c5cb1202a, b52b2c09a54968b5bc8739034f2bcb925af0b9ce, 84b200f6493a094f46b48503e4519d457ad7d0b5, 67deb8eae31cecb42f8339c2baf9cb7b9ff73962)
  • Fixed undefined access issues detected by clang UBSan (5bab17c967e4550f7168943605cdd8b93833d564, 3494b1e973f2db02dac09c70ad6292ac07fa881a, 688536052e9395f8c8e8ecddc9cc28d42ff19272, 8cbe861995c27ee1be24b22c13067f011e919c7b, b13a2156857a16bb0677469b80eba30b56f6f91f, 859622df60226b3de38169ec9cd40a7b3715d6b9, 5813c99f69d8033289cbfff47a97df4cb0420abd)
  • Fixed memory leak in CPU GEMM implementation (45e3039fbed8407cc3bc09db5228f6e620ba3286, fd6d14caa1085ead840b6f5470d6bc69f66336f8)
  • Fixed int8 convolution correctnes issues on Intel Integrated Graphics (b7d40a0ec245bb33960e8dc5571d1e0c462c3c3b, 72e48568014a22cfc0886794bcae0d36527c339b)
  • Fixed access violation issue in GEMM implementation on Windows (aac6b2325379c3954ff773c8eca9a903958ba90a)
oneDNN - v2.4.1

Published by vpirogov about 3 years ago

This is a patch release containing the following changes to v2.4:

  • Reduced scratch pad size requirements for BRGEMM-based convolutions (a81ce3cce3a82ad074d1cdc50b73d4104910c1e0)
  • Worked around an issue with the number of threads detection on AMD processors (ad901e5489564d0035be0b4ec41f1cff4be96610)
oneDNN - v2.4

Published by vpirogov about 3 years ago

Performance Optimizations

  • Improved primitive cache performance for Intel Graphics products.
  • Intel Architecture Processors
    • Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
    • Improved binary primitive performance for cases when one of the tensors is broadcasted.
    • Improved performance of reduction primitive, reorder, shuffle primitives.
    • Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support
    • Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support
    • Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support
  • Intel Graphics Products
    • Introduced initial optimizations for future Intel Arc graphics (code name Alchemist and DG2).
    • Improved performance of convolution and deconvolution primitives with new JIT convolution kernel generator implementation. These optimizations are identified by jit:ir marker in oneDNN verbose log.
  • AArch64-based Processors
    • Added support for bfloat16 acceleration with Arm Compute Library (ACL). The behavior is controlled by floating point math mode API.
    • Improved inner product, matmul, and eltwise primitives performance with ACL.
    • Introduced support for sum and for indirect and Winograd convolution implementations with ACL.
  • NVIDIA Graphics
    • Improved convolution performance with eltwise post-op.

Functionality

  • Introduced PReLU post-op support in convolution and matmul.
  • Extended maximum allowed post-ops chain for compute primitives (convolution, deconvolution, inner product, and matmul) to 32.
  • Introduced support for zero points in sum post-op for convolution and matmul. The functionality is implemented only for CPUs.
  • Extended binary primitive with support for mixed data types for input tensors. The functionality is implemented only for CPUs.
  • Extended sum post-op for convolution and matmul primitives with support for mixed data types. The functionality is implemented only for CPUs.
  • Added Unified Shared Memory (USM) support for OpenCL GPU runtime.

Usability

  • Added compile time options to manage the set of supported primitives and workload types. See DNNL_ENABLE_WORKLOAD and DNNL_ENABLE_PRIMITIVE in build options for more details. This feature allows to reduce binary footprint of the library for specialized applications.
  • Reduced overall library size by trimming down use of templates, OpenCL headers, and TBB headers. The configurations that benefitted the most are CPU only configuration with TBB threading and GPU only configuration. Note, that binary footprint depends on the compiler used to build the library and build options.
  • Introduced floating point math mode API. The API allows the library to use bfloat16 or float16 hardware acceleration in fp32 operations. Currently this mode is supported only on AArch64 processors when oneDNN is built with ACL.
  • Added a build option DNNL_LIBRARY_NAME to change the library name and CMake target. This feature helps projects that use multiple oneDNN configurations.

Breaking Changes

  • Updated minimal supported ACL version to 21.08 (was 21.05).

Deprecated functionality

  • Intel MKL-DNN compatibility API is deprecated and will be removed in the next update. See Transition from Intel MKL-DNN to oneDNN page for instructions on moving to new API.
  • Support for Intel Xeon Phi processors is deprecated and will be removed in the next release.

Thanks to the Contributors

This release contains contributions from the project core team as well as
Aleksandr Nikolaev @alenik01, Arthur Mitrano @aaraujom, Crefeda Rodrigues @cfRod, Diana Bite @diaena, Jing Xu @jingxu10, Kentaro Kawakami @kawakami-k, Kevin Putnam @intelkevinputnam, Mesut Meterelliyoz @mmeterel, MITSUNARI Shigeo @herumi, Nathan John Sircombe @nSircombe, Nicolas Chauvet @kwizart, Peter Caday @petercad. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v2.4-rc

Published by vpirogov about 3 years ago

This is a release candidate for oneDNN v2.4. Please provide feedback and submit defect reports via Github issues.

Performance Optimizations

  • Improved primitive cache performance for Intel Graphics products.
  • Intel Architecture Processors
    • Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
    • Improved binary primitive performance for cases when one of the tensors is broadcasted.
    • Improved reorder primitive performance for memory formats with padding and/or zero points.
  • Intel Graphics Products
    • Introduced initial optimizations for future Intel Arc graphics (code name Alchemist and DG2).
  • AArch64-based Processors
    • Improved inner product and eltwise primitives performance with ACL.
    • Introduced support for sum and for indirect and Winograd convolution implementations with ACL.
  • NVIDIA Graphics
    • Improved convolution performance with eltwise post-op.

Functionality

  • Introduced PReLU post-op support in convolution and matmul.
  • Extended maximum allowed post-ops chain for compute primitives (convolution, deconvolution, inner product, and matmul) to 32.
  • Introduced support for zero points in sum post-op for convolution and matmul. The functionality is implemented only for CPUs.
  • Extended binary primitive with support for mixed data types for input tensors. The functionality is implemented only for CPUs.
  • Extended sum post-op for convolution and matmul primitives with support for mixed data types. The functionality is implemented only for CPUs.
  • Added USM support for OpenCL GPU runtime.

Usability

  • Added compile time options to manage the set of supported primitives and workload types. See DNNL_ENABLE_WORKLOAD and DNNL_ENABLE_PRIMITIVE in build options for more details. This feature allows to reduce binary footprint of the library for specialized applications.
  • Reduced overall library size by trimming down use of templates, OpenCL headers, and TBB headers. The configurations that benefitted the most are CPU only configuration with TBB threading and GPU only configuration. Note, that binary footprint depends on the compiler used to build the library and build options.
  • Introduced floating point math mode API. The API allows the library to use bfloat16 or float16 hardware acceleration in fp32 operations. Currently this mode is not supported in the implementation.
  • Added a build option DNNL_LIBRARY_NAME to change the library name and CMake target. This feature helps projects that use multiple oneDNN configurations.

Breaking Changes

  • Updated minimal supported ACL version from 21.08 (was 21.05).

Deprecated functionality

Thanks to the Contributors

This release contains contributions from the project core team as well as
Aleksandr Nikolaev @alenik01, Arthur Mitrano @aaraujom, Diana Bite @diaena, Jing Xu @jingxu10, Kentaro Kawakami @kawakami-k, Kevin Putnam @intelkevinputnam, MITSUNARI Shigeo @herumi, Nathan John Sircombe @nSircombe, Nicolas Chauvet (kwizart) @kwizart, Peter Caday @petercad. We would also like to thank everyone who asked questions and reported issues.