Bot releases are hidden (Show)

tensorflow-directml-plugin - tensorflow-directml-plugin 0.4.0 Latest Release

Published by maggie1059 over 1 year ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.4.0

Add DirectML kernels for CudnnRNNCanonicalToParams and CudnnRNNParamsToCanonical
Add support for grouped convolution in Conv2DBackpropFilter and Conv3DBackpropFilter
Add float16 support for _FusedConv2D

tensorflow-directml-plugin - tensorflow-directml-plugin 0.3.0

Published by maggie1059 almost 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.3.0

Set tensorflow-cpu==2.10.0 as a hard dependency due to incompatibility with Keras 2.11's default optimizers.
Fix overflow in BatchNorm ops when float16 or mixed precision is used.
Remove unnecessary Cast operation in ReduceMin and ReduceMax ops.

tensorflow-directml-plugin - tensorflow-directml-plugin 0.2.0

Published by PatriceVignola almost 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.2.0

Improve TensorBoard profiling and capturing chrome traces
Add support for exponential_avg_factor != 1.0 in FusedBatchNorm
Add an int32 kernel registration for Fill

tensorflow-directml-plugin - tensorflow-directml-plugin 0.1.1

Published by PatriceVignola about 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.1

Fix a crash in InTopKV2 when k is bigger than the size of the axis dimension.

tensorflow-directml-plugin - tensorflow-directml-plugin 0.1.0

Published by PatriceVignola about 2 years ago

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.0

Upgrade the DirectML version to 1.9.1, which includes minor bug fixes and performance improvements.
Add DirectML kernels for the RngSkip and RngReadAndSkip operators.
Add DirectML kernels for the StatelessRandomGetKeyCounterAlg, StatelessRandomGetKeyCounter and StatelessRandomGetAlg operators.
Add a DirectML kernel for SparseApplyAdagrad.
Add a DirectML kernel for StatelessRandomUniformV2.
Add a DirectML kernel for InTopKV2.
Add DirectML kernels for MatrixDiagV3 and MatrixDiagPartV3.
Add emulated support for int64.
Add a dependency on tensorflow-cpu>=2.10.0. Users should install the tensorflow-cpu package instead of tensorflow or tensorflow-gpu when using tensorflow-directml-plugin.
Add int32 support for StridedSlice.
Add CPU emulated versions of UnsortedSegmentSum, UnsortedSegmentMax, UnsortedSegmentMin and UnsortedSegmentProd to get rid of device placement errors in transformer models.
Add a C API for Linux. The C API can be downloaded from the releases page in the tensorflow-directml-plugin GitHub repository.
Add support for multiple devices.
Add integer support for Relu.
Add int32 support for Pack.
Fix the incomplete adapter description on Linux.
Fix a crash in ArgMin and ArgMax when the output type was int16 or uint16.
Fix an undefined behavior when retrieving a list of strings from an attribute.
Fix a memory leak in the BFC allocator.
Fix a memory leak in the graph optimizer.
Fix a memory leak in SegmentReduction.
Fix a memory leak in StridedSlice.
Fix a memory leak in the emulated random kernels.
Fix the validation of Range to allow values near INT_MAX.
Get rid of warnings related to unsupported DataFormatDimMap and DataFormatVecPermute operators.
Prevent unbounded growth of command allocator memory.
Optimize output allocation for inputs that can be executed in-place and directly forwarded to the output.
Increase the available memory by allowing devices to allocate shared (nonlocal) memory.
Improve the performance of the unsorted segment operators by batching GPU->CPU copies together.
Increase the performance of emulated operators by reducing the number of eager context and eager ops creation.