oneDNN

oneAPI Deep Neural Network Library (oneDNN)

APACHE-2.0 License

Stars
3.4K
Committers
299

Bot releases are hidden (Show)

oneDNN - v2.0-beta05

Published by anita-intel over 4 years ago

This is a preview release for oneDNN v2.0. The release is a patch release based on DNNL v2.0-beta04.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Known Limitations

  • Weight gradient convolution for bfloat16 datatype with 1d spatial tensor and dilation may produce incorrect result on CPU.
  • Weight gradient convolution for bfloat16 datatype with 2d spatial tensor and dilation may crash on Intel AVX512 systems.
  • Optimized primitives can crash or fail for huge spatial sizes on CPU.
  • dnnl_sgemm, dnnl_gemm_u8s8u32, and inner product functionality does not support sizes exceeding 2^32.
  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • Intel Processor Graphics Gen11 is not supported.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

oneDNN - v1.4-rc

Published by anita-intel over 4 years ago

This is a release candidate for DNNL v1.4. Please provide feedback and report bugs in Github issues.

oneDNN - v1.3

Published by anita-intel over 4 years ago

Performance optimizations

  • Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
  • Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
  • Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
  • Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.

New functionality

  • Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
  • Introduced peephole support for LSTM cell on all supported processors. The functionality is not implemented for Intel Processor Graphics.
  • Implemented matmul primitive for Intel Processors Graphics.
  • Extended binary primitive with min and max algorithms support.
  • Extended eltwise primitive:
    • Introduced erf-based implementation of gelu algorithm
    • Introduced pow algorithm
    • Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
  • Extended set of operations for memory descriptors:
    *Added support for changing the number of dimensions with existing dnnl::memory::desc::reshape() method

Thanks to the contributors

This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Aaron Mark Johnson @aaronjohnson, Benjamin Hipple @bhipple, Sergey Nesterov @cepera, @gaurav1086, Ilya Taraban @itaraban, Mesut Meterelliyoz @mmeterel, @nSircombe, Peter Caday @petercad, and Rafik Saliev @rsaliev. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v1.2.2

Published by tprimak over 4 years ago

This is a patch release containing following changes to v1.2.1:

  • Fixed overflow in transposition in bfloat16 weights gradient convolution (0d283894be89ba22ba6251c1ab8cae816ebe3f24)
  • Added work around corrupted unique_ptr usage in scratchpad (91c89a9628feee9e4539b53c7c96f7d1f3110269)
  • Fixed int8 deconvolution with int32 output on Intel AVX2 systems (ef2d6527209b104efe8a7fd2c1ec7b7f70c695bc)
  • Fixed fixed segmentation fault in concat due to incorrect memory alighment #668 (7a0c3a922827632308aafb03037dc4c3ae2af9da)
  • Fixed performance regression in no-copy gemm dispatching #525 (89a303b68e7a3497490e37bf11025d7d31b5d283)
  • Fixed segmentation fault in fp32 weights gradient convolution with dilation and large padding (50546ad4426ea48f4b6bb67665560f1c9cb26333)
  • Fixed bfloat16/fp32 scalability for eltwise primitive (e281a4a5d312115cdd1f97d43b14e0d6eb494a43)
oneDNN - v1.3-rc

Published by tprimak over 4 years ago

This is a release candidate for DNNL v1.3. Please provide feedback and report bugs in Github issues.

oneDNN - v0.21.4

Published by tprimak over 4 years ago

This is a patch release containing following changes to v0.21.3:

  • Fixed large padding handling in input tensor transposition in bfloat16 weights gradient convolution (6df67fe)
  • Fixed performance of reference convolution (2e1d048)
  • Fixed "code is too big" error in case of extreme large spatial size (ed0be61, 4dee389, 59759ba)
oneDNN - v1.2.1

Published by vpirogov over 4 years ago

This is a patch release containing following changes to v1.2:

  • Improved GEMM performance for 1 thread (1fd2bc010ba09b44e3e675d68d80d8f41c747fec)
  • Fixed RNN cell backpropagation computations (4b15a0cbbf13e5c7e6aca66f40847e9b27619087)
  • Fixed alpha and beta handling in vanilla RNN cell (70f8b879ea7a0c38caedb3320b7c85e8497ff50d)
  • Reduced sizes in performance profiling example to avoid memory overflow for systems with less than 2 GB memory (f6e2ef9896d63302c5e6eba2094dca3ac346e5ad)
  • Fix correctness for strided convolution with 1x1 filter with non-matching source and destination formats (0405c9a29f15899883ee62a905716cdeed5ce1fa)
  • Removed lambda calls from OpenMP loops as a workaround for Intel C/C++ Compiler 19.1 (a603593fd6186ba0385cf5b1630c13f6909ab3ac)
  • Added -O1 flag for backward convolution gtests as a workaround for Intel C/C++ Compiler 19.1 (495b91fdc6fdfd6647eac193e8c80e41d23c24e8)
oneDNN - v2.0-beta04

Published by anita-intel over 4 years ago

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.2.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Known Limitations

  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • Intel Processor Graphics Gen11 is not supported.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

oneDNN - v1.2

Published by anita-intel over 4 years ago

Performance optimizations

  • Improved 1D backward convolution performance on CPU.
  • Improved int8 inference performance on pre-Intel AVX512 systems.
  • Improved int8 inference performance for 3D spatial data on CPU.
  • Improved performance of convolution and other primitives on GPU.

New functionality

  • Introduced general purpose matrix-matrix multiplication primitive. The functionality supports fp32, bfloat16, and int8 data types with asymmetric quantization.
  • Introduced logsoftmax and resampling primitives.
  • Introduced clip and log algorithms support in elementwise primitive.
  • Introduced int8 and bf16 data types support for binary primitive (CPU only).
  • Introduced fully functional support of int8 (inference) and bfloat16 (inference and training) datatypes on GPU. The functionality is not intended for getting performance improvement over f32 on current Intel Integrated Graphics, but to make conformance experiments.

Usability improvements

  • Added JIT code annotations for linux-perf profiler.
  • Added mechanism to control CPU dispatcher behavior at runtime via DNNL_MAX_CPU_ISA environment variable or a function call.
  • Extended DNNL_VERBOSE output with more information about runtimes and devices.

Thanks to the contributors

This release contains contributions from the project core team as well as Aaron Johnson @aaronjohnson, Attila T. Áfra @atafra, Ben Fitch, Ilya Taraban @itaraban, Michał Gallus @Sand3r-, Peter Caday @petercad, Qiyou Chen @chenqy4933 and Jun Luan @junluan. We would also like to thank everyone who asked questions and reported issues.

oneDNN - v0.21.3

Published by tprimak almost 5 years ago

This is a patch release containing following changes to v0.21.2:

  • Reduced the upper-bound of memory requirement for gemm-based convolution to reduce the probability of OOM error (cd99749c97e1cb6a7ec96f3ffa9e225a445b8a24)
  • Significantly reduced the size required for 1x1 convolution (564344566ad5cd8e1f9e6bdb5defc77b88a19b64)
  • Added new dummy stream (cba5823ad881b837957c89d388241bbdc245a0bf)
oneDNN - v1.1.3

Published by tprimak almost 5 years ago

This is a patch release containing following changes to v1.1.2:

  • Fixed the mean and variance memory descriptors in layer normalization (65f19088b5ca2804699b0c73440c9949ebca6ffd)
  • Fixed the layer normalization formula (c176cebaa1793718720b254613adac83a937710e)
oneDNN - v1.2-rc

Published by anita-intel almost 5 years ago

This is a release candidate for DNNL v1.2. Please provide feedback and report bugs in Github issues.

oneDNN - v1.1.2

Published by tprimak almost 5 years ago

This is a patch release containing following changes to v1.1.1:

  • Fixed threading over the spatial in bfloat16 batched normalization (017b6c93dc10b2f0e53199d29cf6c26daafc5417)
  • Fixed read past end-of-buffer error for int8 convolution (7d6f45e7e72882d2c0d9041e65fd64f132ec321b)
  • Fixed condition for dispatching optimized channel blocking in fp32 backward convolution on Intel Xeon Phi(TM) processor (846eba1c230a66a664ded18ba25e9468aaadd4bf)
  • Fixed fp32 backward convolution for shapes with spatial strides over the depth dimension (002e3ab561556e12119be6b223b59fb1563908b5)
  • Fixed softmax with zero sizes on GPU (936bff4803a0743ec6e956b0bd459a0f2c01b378)
  • Fixed int8 deconvolution with dilation when ih <= dh (3e3bacba51f51e03bc2ae758800aead7d6876e79)
  • Enabled back fp32 -> u8 reorder for RNN (a2c2507617edc06af45359670f11a353406342bf)
  • Fixed segmentation fault in bfloat16 backward convolution from kd_padding=0 computation (52d476c04bd8fe453d07934ca2a9834c87f6aafe)
  • Fixed segmentation fault in bfloat16 forward convolution due to push/pop imbalance (4f6e3d57af9ba7501f65df768d0b2f91765582fc)
  • Fixed library version for OS X build (0d850053c8b78728393f069593610c1d321444cf)
  • Fixed padding by channels in concat (a265c7dad34dc0dd72089f4fa8a6cd1c55f75f8)
  • Added full text of third party licenses and copyright notices to LICENSE file (79f204c76bc5c72f32a858285ae5fda593def0fb)
  • Added separate README for binary packages (28f4c96d2626e36e7196eb53d9299f9b6bd70961)
  • Fixed computing per-oc mask in RNN (ff3ffaba8c2739766ff44f8563d918673ebad994)
  • Added workaround for number of cores calculation in Xbyak (301b088c106c844e5c7592ba183a361698e54208)
oneDNN - v2.0-beta03

Published by tprimak almost 5 years ago

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.1 and the release notes below include incremental changes.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

New functionality

  • SYCL API extensions and interoperability with SYCL code
  • Support for Intel DPC++ compiler and runtime

Usability

  • SYCL interoperability examples

Known Limitations

  • Some f32/f16 convolutions with non-square spatial shape of filters may produce incorrect results on GPU.
  • Some bf16 backward convolutions with 3D spatial and negative padding may produce segfault on CPU.
  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • RNN primitive may hang on GPU if the number of recurrent cells is bigger than 40.
  • int8 RNN may produce incorrect results on GPU.
  • Backward propagation of Layer Normalization primitive produces incorrect results.
  • Intel Processor Graphics Gen11 is not supported.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

oneDNN - v1.0.4

Published by vpirogov almost 5 years ago

This is a patch release containing following changes to v1.0.3:

  • Resolved int8 batch normalization performance degradation in comparison to v0.21 (ec191189dc228ce65ca27b5e9fa1ee535ef0728f)
oneDNN - v1.1.1

Published by vpirogov almost 5 years ago

This is a patch release containing following changes to v1.1:

  • Fixed zero padding for memory formats with rank 3 and below (f97e1748552d36e8f35e1ad5a5d50bf1751c43cf)
  • Fixed 'deprecated std::copy' warning with Microsoft C++ Compiler (ee276af2d13ead05458d55f6ddc1771d52516397)
  • Fixed tail scaling for int8 inner product (f2b68c7d66be60dd4fc13af78a3b2cece1cd61a3)
  • Fixed correctness issue for int8 GEMM with N=1 (0dd5c13ff7d8efac73818952a4fc143fa2d4371e)
  • Sum does not override the data type for destination memory descriptor when used with any (53019818512939394cf919c3b3bfe333c488a15c)
  • Addressed following corner cases in CPU convolution implementation:
    • Fixed tail processing in int8 depthwise convolution (7711b77f9ad990d1e68a6f3076aadf9952b81c3d)
    • Fixed bias padding in bfloat16 depthwise convolution (0696ba6340ba4bdf1cd616a9613336da857cc7ca)
    • Fixed correctness issue in s8s8 flavor of depthwise convolution (b614482db20248d974034bf66631b26924a15dbe)
    • Fixed correctness issue in dilated convolution weight gradient implementation (c6ec0f95a29141112195e99173b4b83f8a3ab6d1)
oneDNN - v1.0.3

Published by vpirogov almost 5 years ago

This is a patch release containing following changes to v1.0.2:

  • Fixed zero padding for memory formats with rank 3 and below (4d78aafb99bff74784aabe0f3761c486c79cd995)
  • Fixed tail scaling for int8 inner product (41b5a7e86446863c8ab76c6f3623019621848349)
  • Sum does not override the data type for destination memory descriptor when used with any (e979edae6c3d39ccf39cfed8855980eb00777cf0)
  • Improved s8s8 GEMM and inner product performance (4b44aa53edb86e1ea5b6085d00d6eae9202f4b7c)
  • Reduced memory consumption of GEMM-based algorithm for convolution weight gradient (f46b044fc2623289f56e0305fd8453c5fc9683e6)
  • Fixed negative padding processing in pooling (48ba96a242106a9a144faf23e53e4a79a9ddedee)
  • Addressed memory leak in GPU deconvolution (686fc41f1188505b374e1e7a4f807ef0b5874824)
  • Addressed memory leak in GPU stream (1206b2f711b8619171ad299d272c277cce2a768a)
  • Fixed fp16 GEMM correctness on GPU (c2425d44e4d7df31b63eb2c2d1ac943ef95e67a2)
  • Fixed GEMM correctness on GPU for the case of small M dimension (ac2683fd747e0eeb85b5c909e509a851bdf0287d)
  • Addressed following corner cases in CPU convolution implementation:
    • Fixed tail processing in int8 depthwise convolution (3a0943b8ce04747d3574a7028222548f986b2438)
    • Fixed bias padding in bfloat16 depthwise convolution (3d9af7cd6fb5e65603025f09a1a8322e4e6af3f8)
    • Fixed correctness issue in s8s8 flavor of depthwise convolution (e4d9049dbcac339007ae36672639ccdf96c29390)
    • Fixed correctness issue in GEMM-based algorithm for 3D convolutions (161ac408a240f469d3c81e6913b2545ed45d026f)
    • Fixed corner case issues in Intel AVX512 implementation of convolution weight gradient (68f51246d7f32ba35563395f1a46225bfaa02c83)
    • Disabled not supported cases for depthwise convolution weight gradient (5e6e6c8ca499d3a094c32c4ec3f1361fa9ba6ec6)
    • Convolution with 1x1 filter returns unimplemented for cases that have padding in spatial dimensions (9d7cc77816e374760693401a2ee05334e98d68f7)
    • Fixed negative padding support in general convolution kernel (b1c602a57b948f08979f314476829cd85f2651f5)
    • Fixed padding handling in depthwise convolution backpropagation (04712f6253294405bde2fe55dee597b04c7563e6)
    • Added support for negative padding in h and d spatial dimensions (7ddce823f9363e55005bbd27eeee3c97436aa20b)
    • Fixed segfault in strided convolution backpropagation (b04f3f5d71984cb3af87ef59ca579032ff2ade5b)
    • Fixed memory corruption in convolution backpropagation (8877bc97572dfb18b172c8c60ece73cb84ad150b)
oneDNN - v0.20.6

Published by vpirogov about 5 years ago

This is a patch release containing following changes to v0.20.5:

  • Fixed performance regression in GEMM (cfc5c3db91685584efe4c8c46b4b488ee80a8959)
oneDNN - v0.21.2

Published by tprimak about 5 years ago

This is a patch release containing following changes to v0.21.1:

  • Fixed performance regression in GEMM (95346214b9cbd689b750ab093910e439f0f83d9b)
  • Fixed int8 dilated convolution for some shapes with input heights <= dilation over the heights dimension (e68f1514061e4f58cc67a9669985ea3c4563acaf)
  • Addressed static initialization order issue in bf16 converters (ae8efdeebf1b576e9d25a8601301b4791219cde9)
  • Fixed fast reference backward convolution dispatching for 3D-spatial case (5994d63ffeec9830c280b5d6fb38ab6d6d97da4e)
oneDNN - v1.1

Published by anita-intel about 5 years ago

Performance optimizations

  • Improved functionality performance with TBB threading achieving comparable performance with OpenMP threading.
  • Improved int8 and fp32 GEMM performance on system with Intel AVX-512 and Intel VNNI support.
  • Improved softmax performance for NHWC and corresponding blocked layouts.
  • Improved RNN cell performance and decreased dependency of RNN performance from the compiler vectorization capabilities.
  • Improved reorders performance for some shapes.

New functionality

  • Introduced layer normalization and binary elementwise primitives support (CPU engine).
  • Introduced swish (CPU and GPU engines) and gelu (GPU engine) activation support in elementwise primitive.
  • Introduced bfloat16 data type support in RNN cells (CPU engine).
  • Introduced initial int8 and bfloat16 data types support for GPU functionality.

Usability improvements

  • TBB threading support is promoted to production quality.
  • Introduced support for memory format any for memory-bound primitives backpropagation. This mechanism allows to match gradient memory format with source and destination memory formats from forward pass.
  • Changed default compiler flags to target Intel SSE4.1 instruction set to make builds portable.
  • (experimental) Introduced caching mechanism that reduces primitive creation time for repeated primitive creation. The functionality is disabled by default and has to be enabled in compile time.

Validation improvements

  • Extended benchdnn to cover all supported primitives.
  • Introduced robust validation method for RNN cells in benchdnn. The approach allows to replace activations with linear function to make error accumulation more predictable and decrease the number of false positives.
  • Extended convolution test coverage.

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Ilia Taraban, Jacek Czaja @jczaja, William Tambellini @WilliamTambellini, Tomasz Kalina, Mateusz Guziak, Daniel Haidachuk, Konstantin Basargin @basargin, Aaron Johnson @aaronjohnson, and Jeremy Wong @jrmwng. We would also like to thank everyone who asked questions and reported issues.