tensorflow-directml-plugin

DirectML PluggableDevice plugin for TensorFlow 2

APACHE-2.0 License

Downloads
4.7K
Stars
185
Committers
7

Bot releases are hidden (Show)

tensorflow-directml-plugin - tensorflow-directml-plugin 0.4.0 Latest Release

Published by maggie1059 over 1 year ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.4.0

  • Add DirectML kernels for CudnnRNNCanonicalToParams and CudnnRNNParamsToCanonical
  • Add support for grouped convolution in Conv2DBackpropFilter and Conv3DBackpropFilter
  • Add float16 support for _FusedConv2D
tensorflow-directml-plugin - tensorflow-directml-plugin 0.3.0

Published by maggie1059 almost 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.3.0

  • Set tensorflow-cpu==2.10.0 as a hard dependency due to incompatibility with Keras 2.11's default optimizers.
  • Fix overflow in BatchNorm ops when float16 or mixed precision is used.
  • Remove unnecessary Cast operation in ReduceMin and ReduceMax ops.
tensorflow-directml-plugin - tensorflow-directml-plugin 0.2.0

Published by PatriceVignola almost 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.2.0

  • Improve TensorBoard profiling and capturing chrome traces
  • Add support for exponential_avg_factor != 1.0 in FusedBatchNorm
  • Add an int32 kernel registration for Fill
tensorflow-directml-plugin - tensorflow-directml-plugin 0.1.1

Published by PatriceVignola about 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.1

  • Fix a crash in InTopKV2 when k is bigger than the size of the axis dimension.
tensorflow-directml-plugin - tensorflow-directml-plugin 0.1.0

Published by PatriceVignola about 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.0

  • Upgrade the DirectML version to 1.9.1, which includes minor bug fixes and performance improvements.
  • Add DirectML kernels for the RngSkip and RngReadAndSkip operators.
  • Add DirectML kernels for the StatelessRandomGetKeyCounterAlg, StatelessRandomGetKeyCounter and StatelessRandomGetAlg operators.
  • Add a DirectML kernel for SparseApplyAdagrad.
  • Add a DirectML kernel for StatelessRandomUniformV2.
  • Add a DirectML kernel for InTopKV2.
  • Add DirectML kernels for MatrixDiagV3 and MatrixDiagPartV3.
  • Add emulated support for int64.
  • Add a dependency on tensorflow-cpu>=2.10.0. Users should install the tensorflow-cpu package instead of tensorflow or tensorflow-gpu when using tensorflow-directml-plugin.
  • Add int32 support for StridedSlice.
  • Add CPU emulated versions of UnsortedSegmentSum, UnsortedSegmentMax, UnsortedSegmentMin and UnsortedSegmentProd to get rid of device placement errors in transformer models.
  • Add a C API for Linux. The C API can be downloaded from the releases page in the tensorflow-directml-plugin GitHub repository.
  • Add support for multiple devices.
  • Add integer support for Relu.
  • Add int32 support for Pack.
  • Fix the incomplete adapter description on Linux.
  • Fix a crash in ArgMin and ArgMax when the output type was int16 or uint16.
  • Fix an undefined behavior when retrieving a list of strings from an attribute.
  • Fix a memory leak in the BFC allocator.
  • Fix a memory leak in the graph optimizer.
  • Fix a memory leak in SegmentReduction.
  • Fix a memory leak in StridedSlice.
  • Fix a memory leak in the emulated random kernels.
  • Fix the validation of Range to allow values near INT_MAX.
  • Get rid of warnings related to unsupported DataFormatDimMap and DataFormatVecPermute operators.
  • Prevent unbounded growth of command allocator memory.
  • Optimize output allocation for inputs that can be executed in-place and directly forwarded to the output.
  • Increase the available memory by allowing devices to allocate shared (nonlocal) memory.
  • Improve the performance of the unsorted segment operators by batching GPU->CPU copies together.
  • Increase the performance of emulated operators by reducing the number of eager context and eager ops creation.
Package Rankings
Top 5.44% on Pypi.org
Badges
Extracted from project README
Build Status