DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

APACHE-2.0 License

Downloads
44.4K
Stars
5K
Committers
95
DALI - DALI v1.39.0 Latest Release

Published by stiepan 4 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for CUDA 12.5 (#5478).
  • Migrated fn.decoders.image* operators to use nvImageCodec as a decoding backend (#5470).
  • Improved error handling (#5466, #5494, #5486, #5491).

Fixed Issues

  • Fixed DALI TF plugin compatibility with TensorFlow 2.9 (#5499).
  • Fixed S3 fn.readers.file support for pad_last_batch=True (#5493).
  • Fixed a bug that resulted in long build times for some pipelines with enabled conditional execution (#5475).

Improvements

  • Add a mention of blogpost in Automatic Augmentation docs (#5508)
  • Removal of Python 3.8 notes from documentation (#5502)
  • Add default schema and use it in OpSpec argument queries. (#5500)
  • Add missing blocking argument documentation to the external source operator (#5501)
  • Trim line length in the documentation/examples for the new theme (#5479)
  • Refactoring in Pipeline, OpGraph and old Executor + name lookup improvement in old OpGraph and Pipeline. (#5495)
  • Improve error message about FFmpeg not being available (#5494)
  • Extend docs by adding info about @do_not_convert for NUMBA and Python ops (#5488)
  • New OpGraph (#5485)
  • Fix tests for sanitizer build (#5492)
  • Github comment acceptance formating table fix (#5490)
  • Remove image decoder memory padding from examples (#5484)
  • Adding git lfs as a compilation prerequisite (#5483)
  • Remove unused JIT workspace policy. (#5487)
  • Add a warning about pipeline definition being executed only once. (#5486)
  • Move to CUDA 12.5 (#5478)
  • Pin NPP version for CUDA 12 (#5480)
  • Reintroduce "Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default" (#5470)
  • Move to new, unified, NVIDIA sphinx theme (#5471)
  • Add DALI video plugin skeleton (#5328)
  • Don't initialize NVML when not setting affinity. (#5472)
  • Add MXNet deprecation message to the docs and plugin (#5465)
  • Add first-class check for nested datanodes in math/arithmetic ops. (#5466)

Bug Fixes

  • Fix DALI TF plugin incompatibility with TF 2.9 (#5499)
  • Coverity May 2024 (#5497)
  • Fix S3 FileReader when using repeated samples (pad_last_batch=True) (#5493)
  • Improve the video decoder errors (#5491)
  • Add extra rpath for prebuilt ffmpeg dependencies for video plugin (#5481)
  • Use dynamic programming in OpGraph::HasConsumersInOtherStage (#5475)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

DALI 1.39 is the final release that will support the MXNet integration.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, and experimental.inputs.video do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.39.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.39.0

or just:

pip install nvidia-dali-cuda120==1.39.0
pip install nvidia-dali-tf-plugin-cuda120==1.39.0

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.39.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.39.0

or just:

pip install nvidia-dali-cuda110==1.39.0
pip install nvidia-dali-tf-plugin-cuda110==1.39.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.38.0

Published by stiepan 5 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for AWS S3 urls in DALI readers (#5415, #5434).
  • Improved support for enum types in types.Constant, fn.cast, fn.random.choice (#5422).
  • Improved error reporting (#5428).

Fixed Issues

  • Fixed checkpoint clean-up in C API. (#5453)

Improvements

  • Dependency update for May 2024 - black, boost-pp, cv-cuda, pybind11, rapidjson (#5458)
  • Introduce DALI_PRELOAD_PLUGINS (#5457)
  • Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default (#5445)
  • Bump up NUMBA version used in tests to 0.59.1 (#5451)
  • Extend the documentation footer (#5454)
  • Remove the use of (soon deprecated) aligned_storage. (#5455)
  • Make shared IterationData a first class member of Workspace. (#5447)
  • Tasking module (#5436)
  • Add AWS SDK support to all file readers (FileReader, NumpyReader, WebdatasetReader...) (#5415)
  • Fix test_enum_types.py for Python3.11 (#5443)
  • Remove files related to QNX that are no longer used (#5438)
  • Remove usage of THRUST host&device vector (#5439)
  • Add CMake to aarch64 base docker images (#5437)
  • Refactoring of File Reader classes to accommodate for AWS SDK S3 integration (#5434)
  • Replace Ops class name with proper operator API name (#5428)
  • Use CMake binary release (#5435)
  • Improve support for DALI enum types (#5422)
  • Disable some JAX iterator tests in sanitizer run (#5427)

Bug Fixes

  • Fix GTest Death Style Tests and LoadDirectory test in conda (#5469)
  • Revert "Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default (#5445)" (#5464)
  • Pin JAX version for multigpu test (#5460)
  • Use C++17 standard in nodeps test. (#5459)
  • Fix Coverity issues (May/2024) (#5453)
  • Fix equalize unit test (#5456)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

DALI 1.39 will be the last release to support MXNet integration.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.38.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.38.0

or just:

pip install nvidia-dali-cuda120==1.38.0
pip install nvidia-dali-tf-plugin-cuda120==1.38.0

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.38.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.38.0

or just:

pip install nvidia-dali-cuda110==1.38.0
pip install nvidia-dali-tf-plugin-cuda110==1.38.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.37.1

Published by JanuszL 6 months ago

Key Features and Enhancements

There are no new features in this release

Fixed Issues

  • Fixed DALI TF plugin source compilation during installation #5448

Improvements

There are no new improvements in this release

Bug Fixes

  • Fixed DALI TF plugin source compilation during installation #5448
  • Pin all nvJPEG2k subpackages #5442

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.37.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.37.1

or just:

pip install nvidia-dali-cuda120==1.37.1
pip install nvidia-dali-tf-plugin-cuda120==1.37.1

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.37.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.37.1

or just:

pip install nvidia-dali-cuda120==1.37.1
pip install nvidia-dali-tf-plugin-cuda120==1.37.1

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.37.0

Published by stiepan 6 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for running JAX defined augmentations in the iterator and pipeline. (#5406, #5426, #5432)
  • Improved error reporting with a stack trace pointing to the offending operation in user code. (#5357, #5396)
  • Added CPU fn.random.choice operator. (#5380, #5387)
  • Added support for CUDA 12.4. (#5353, #5410)
  • Improved iterators checkpointing (#5374, #5375, #5371, #5356)
  • Optimized fn.resize operator for better GPU utilization (#5382)
  • Added option to skip bboxes in fn.random_bbox_crop with the fraction of area within the crop below user-provided threshold. (#5368) 

Fixed Issues

  • Fixed handling of special values of the stream field in CUDA Array Interface v3 (#5425).
  • Fixed insufficient synchronization around scratch memory in nvImageCoded-based decoders (fn.experimental.decoders.*). (#5408)
  • Fixed readers saving incorrect checkpoint when restored and saved back in the same epoch. (#5378)

Improvements

  • Add JAX-defined augmentation examples (#5426)
  • Extend context and name propagation in errors (#5396)
  • Add experimental jax operator (#5406)
  • Enable Bandit security scan (#5402)
  • Reworks links in the RST documentation (#5413)
  • Refactor to remove duplicated logic in traverse_directories utility function (#5419)
  • Update DALI deps version (#5417)
  • Changes to dali/util/numpy (#5416)
  • Add libcurl-devel (#5412)
  • Move to CUDA 12.4 U1 (#5410)
  • Separate excutor interface and implementation files. (#5411)
  • Make the video reader use cudaVideoDeinterlaceMode_Adaptive only for non-progressive videos (#5392)
  • Skip AutoAug test when sanitizers are on (#5403)
  • Unpin typing_extensions in tests (#5405)
  • Dependency update 03-2024 (#5397)
  • Review Bandit reported vulnerabilities (#5398)
  • Support checkpointing in JAX decorators (#5374)
  • Workaround ASAN bug ignoring RPATH (#5388)
  • Update supported TensorFlow version (#5386)
  • Disable more video tests on selected machines (#5385)
  • Extend fn.random.choice to support n-D inputs (#5387)
  • Add random choice CPU operator for 0D samples (#5380)
  • Resize: Optimize block sizes, use dynamic amount of shared mem. (#5382)
  • Support checkpointing in JAX peekable iterator (#5375)
  • Increase DALI TF Plugin loading timeout (#5381)
  • Improve iterator checkpointing (#5371)
  • Improve logs when the DALI TF plugin loading process fails (#5379)
  • Add option to prune bboxes based on % area in Crop ROI (#5368)
  • Improve op deprecation and deprecate sequence reader (#5372)
  • Fix typo in nvcuvid error (#5373)
  • Optimize sanitizer operator tests (#5352)
  • Introduce operator origin stack trace in the error message (#5357)
  • Make ExternalContext more flexible (#5356)
  • Enable CUDA 12.4 build (#5353)

Bug Fixes

  • Add nose as a dependency to iterators tests (#5433)
  • Disable jax_function notebook conversions for unsupported Python3.8 (#5432)
  • Improve handling of CUDA Array Interface v3 (#5425)
  • Fix debug build (#5414)
  • Add stream synchronization before decode for nvImageCodec <= 0.2 (#5408)
  • Fix Loader checkpointing bug (#5378)
  • Fix pixelwise_masks support when the ratio is on in the coco reader (#5407)
  • Fix storage of non-POD random distributions. (#5395)
  • Fix nvImageCodec version check. (#5399)
  • Fix bug in checkpointing C API (#5390)
  • Add nose to the package list for TL1_separate_executor. (#5393)
  • Use host sync allocation for nvImageCodec <= 0.2 (#5391)
  • Remove temporary lock file from wheel (#5384)
  • Disable type annotation tests in sanitizer build (#5383)
  • Fix CUDA 12.4 with ASAN (#5370)
  • Skip video tests on M60 (#5369)
  • Enable eager mode tests, fix mixed ops and improve coverage (#5367)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.37.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.37.0

or just:

pip install nvidia-dali-cuda120==1.37.0
pip install nvidia-dali-tf-plugin-cuda120==1.37.0

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.37.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.37.0

or just:

pip install nvidia-dali-cuda120==1.37.0
pip install nvidia-dali-tf-plugin-cuda120==1.37.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.36.0

Published by stiepan 7 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for checkpointing in MXNet iterator and CPU TensorFlow plugin (#5334, #5315).
  • Added morphological operators (fn.experimental.dilate, fn.experimental.erode) (#5294).
  • Integrated nvImageCodec for decoding in fn.experimental.decoders (#5297, #5336, #5324, #5333, #5339).
  • Added fn.random_crop_generator operator (#5304).
  • Added support for multiple inputs and relative shapes and anchors in fn.multi_paste (#5331).

Fixed Issues

  • Fixed insufficient synchronization in MXNet iterator (#5364).
  • Fixed auto_reset argument handling in iterator plugins (#5340).
  • Fixed missing calls to nvml::Shutdown (#5317).
  • Limited a number of progressive scans for jpeg decoding (#5316).

Improvements

  • Propagate module and display name of the operator to backend (#5344)
  • Update dependencies (#5349)
  • Map backend exceptions into Python exception types (#5345)
  • Emphasise the optical flow is calculated at input resolution. (#5350)
  • Refactor custom ops classes to use python_op_factory as base (#5338)
  • Add origin stack trace capture for DALI operators (#5302)
  • Test fused decoder with two separate pipelines (#5343)
  • [Cutmix] Make fn.multi_paste more flexible, fix validation (#5331)
  • Enable checkpointing in TensorFlow plugin (CPU only) (#5334)
  • Copy out nvImageCodec conda package from the build (#5336)
  • Add error message when GPU is not available (#5329)
  • Enable build with statically linked nvimgcodec + hard dependency for dynamic linking (#5324)
  • Add tf_stack util to autograph (#5322)
  • Rewrite median blur to use nvcvop tools (#5327)
  • Add morphological operators and the nvcvop module (#5294)
  • Add OpSpec::ArgumentInputIdx (#5330)
  • Simplify workspace object. Ensure predictable argument order in OpSpec. (#5325)
  • Support checkpointing in MXNet iterator (#5315)
  • Set rpath at cmake level (do not wait for bundle-wheel) (#5323)
  • Interpolation modes documentation upgrade (#5314)
  • Update links in DALI documentation (#5321)
  • Integrate nvimagecodec (#5297)
  • Add naive_histogram custom operator to test suite (#4731)
  • Add RandomCropGenerator (#5304)
  • Use small videos in checkpointing tests (#5305)

Bug Fixes

  • Use synchronous copy to framework array in the absence of a stream (#5364)
  • Process TFRecord reader binding classes only when it is enabled (#5360)
  • Adjust stack formatting in backend to match Python (#5354)
  • Link test operators against nvml wrapper (#5355)
  • Fix range check in Workspace::SetInput (#5358)
  • Make async_pool immune to stream handle reuse. (#5348)
  • Coverity fixes for 1.36 (#5342)
  • Fix "auto_reset" argument handling (#5340)
  • Fix cupy tests (#5341)
  • Add nvimagecodec libs to DALI_EXCLUDES + test utils to dump mismatched images (#5339)
  • Fix warning about nvImageCodec version (#5333)
  • Silence warning about DOWNLOAD_EXTRACT_TIMESTAMP while fixing the cmake <3.24 builds (#5326)
  • Fix inconsistent calls to nvml::Init and nvml::Shutdown (#5317)
  • Limit the number of progressive scans for jpeg decoding (#5316)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.36.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.36.0

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.36.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.36.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.35.0

Published by stiepan 8 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for PaddlePaddle and JAX iterators (#5279, #5282).
  • Added support for checkpointing of iterators with different last batch policies (#5298, #5278).
  • Added tutorial on running DALI with T5X (#5286).
  • Added do_not_convert decorator to address problems with parallel fn.external_source and conditional execution (#5263).

Fixed Issues

  • Fixed missing nvmlShutdown calls (#5311).
  • Fixed fn.readers.video handling of sequences bigger than 2GB (#5307).
  • Fixed fn.resize handling of samples larger than 2GB (#5306).
  • Fixed support for multi node JAX sharding (#5242).
  • Fixed handling of decorated callbacks and methods in fn.external_source (#5268).
  • Fixed insufficient synchronization when restoring random generators from a checkpoint (#5273).

Improvements

  • Skip slow checkpointing tests when sanitizer is enabled (#5310)
  • Support checkpointing in PaddlePaddle iterator (#5279)
  • Bump protobuf version requirements in CMake. (#5312)
  • Dependency update (#5308)
  • Add SSL support to aarch64 deps containers (needed by CMake) (#5300)
  • Enable OpenSSL support (allow https downloading from cmake) (#5299)
  • Expose checkpointing in C API (#5287)
  • Enable AG-transformed code to show user code in exception (#5291)
  • Remove stage-related public APIs (from Pipeline and Executor) (#5244)
  • Add T5X tutorial (#5286)
  • Temporarily reduce the number of epochs in SBSA torch test (#5290)
  • Add a tool for correcting typos (#5193)
  • Support checkpointing in JAX iterator (#5282)
  • Expose do_not_convert decorator (#5263)
  • Fix Pipeline docs formatting (#5283)
  • Generalize iterator checkpointing tests (#5278)

Bug Fixes

  • Add missing calls to nvmlShutdown (#5311)
  • Fix bugs in iterator checkpointing, enable other last batch policies (#5298)
  • Fixes support for video sequences bigger than 2GB (#5307)
  • Fix resizing of volumes larger than 2G. (#5306)
  • Fix memory leak in C API test (#5303)
  • Fix support for multi node JAX sharding (#5242)
  • Fix TestPytorch (#5284)
  • Fix capitalized property descriptions (#5285)
  • Fix the lack of flavor in the conda DALI package name (#5281)
  • Allow source to be a decorated function or method (#5268)
  • Fix f-string in RN50 data pipeline test (#5280)
  • Remove old assumption from iterator docs (#5277)
  • Suppress leaks from inside dlopen (#5245)
  • Fix PaddlePaddle plugin docs (#5276)
  • Fix synchronization issues in RNG checkpointing utils (#5273)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.35.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.35.0

or just:

pip install nvidia-dali-cuda120==1.35.0
pip install nvidia-dali-tf-plugin-cuda120==1.35.0

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.35.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.35.0

or just:

pip install nvidia-dali-cuda110==1.35.0
pip install nvidia-dali-tf-plugin-cuda110==1.35.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.34.0

Published by stiepan 9 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for CUDA 12.3 U2 (#5262)
  • Added support for checkpointing in fn.random_resized_crop (#5246)

Fixed Issues

  • Fixed synchronization problem when restoring GPU random operator checkpoints (#5273).
  • Fixed warnings on pipeline teardown in debug mode. (#5267)
  • Added check for reentrant version of CFITSIO for fits reader. (#5239)
  • Fixed scalar inputs handling in GPU fn.lookup_table. (#5257)
  • Added missing validation for bboxes in fn.ssd_random_crop (#5240)
  • Added validation that prevents running parallel externeral source without Python workers (#5238)

Improvements

  • Split conda built into core and python bindings (#5259)
  • Dependency update - 2024/01 (#5271)
  • Add framework attributes to DLFW iterator tests. (#5266)
  • Move to CUDA 12.3 U2 (#5262)
  • Add links to DALI success stories in README (#5247)
  • Add missing imports to AA simple examples in docs (#5243)
  • Support checkpointing in random_resized_crop (#5246)
  • Add linter GitHub Action (#5236)
  • Adjust the error message on failed IsDenseTensor check (#5237)
  • Format docs directory with black (#5214)

Bug Fixes

  • Fix PaddlePaddle plugin docs (#5276)
  • Fix synchronization issues in RNG checkpointing utils (#5273)
  • Fix missing Shutdown method warning in debug pipeline. (#5267)
  • Fix issues detected by Coverity (2024.01) (#5272)
  • Check if reentrant version of CFITSIO is used (#5239)
  • Fix LookupTable GPU for scalar inputs (#5257)
  • Add dimension check in ssd_random_crop (#5240)
  • Add validation preventing running PES without Python workers (#5238)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.34.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.34.0

or just:

pip install nvidia-dali-cuda120==1.34.0
pip install nvidia-dali-tf-plugin-cuda120==1.34.0

For CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.34.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.34.0

or just:

pip install nvidia-dali-cuda110==1.34.0
pip install nvidia-dali-tf-plugin-cuda110==1.34.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.33.0

Published by stiepan 10 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Enhanced experimental support for checkpointing (saving and resuming DALI pipelines at arbitrary iteration) (#5232, #5195, #5166):
    • Support data readers checkpointing (#5198, #5213, #5184, #5182, #5165, #5181, #5180, #5183, #5162, #5139).
    • Support randomized GPU operators checkpointing (#5216, #5148).
    • Improved checkpointing documentation (#5230).
  • Improved Python annotations and signatures (#5217, #5159, #5167, #5154, #5188, #5158, #5150).
    • Added annotations for JAX and Pytorch iterators (#5129, #5197).
    • Improved PythonFunction annotations (#5207, #5149).
    • Improved data type annotations (#5179, #5153)
  • Improved JAX support:
    • Added pmap compatibility for JAX data_iterator (#5185).
    • Improved JAX, Flax and Paxml training examples (#5176, #5205).
  • Moved to CUDA 12.3U1 and enbaled GDS and nvJPEG2k support for the SBSA platform (#5209, #5170).
  • Added Python 3.11 support and experimental support for Python 3.12 (#5174)

Fixed Issues

  • Fixed fn.normalize handling of batch of empty samples (#5223).
  • Fixed infinite video decoder seek loop (#5218).
  • Fixed computation of maximal threads number for kernels in GPU fn.transpose and fn.normalize. (#5208)
  • Fixed handling of empty slices and slicing of empty inputs. (#5204)
  • Fixed scalar constant dimensionality inference (#5191)
  • Fixed sharding in Caffe reader (#5172)

Improvements

  • Mark missed operator stateless, assure checkpointing tests coverage (#5232)
  • Add improved checkpointing docs (#5230)
  • Support checkpointing in Numpy reader (#5198)
  • Set the maximum supported Python version to Tensorflow 2.13 in tests (#5234)
  • Dependency update 23/12 (#5231)
  • Support checkpointing in External Source (#5213)
  • Support checkpointing in other random operators (#5216)
  • Remove a redundant call to Executor::GetTensorQueueSizes() (#5225)
  • Fix links to release notes and docs archive (#5227)
  • Install black with jupyter formatting capabilities (#5226)
  • Fix the minimal Python version for TF 2.14 and 2.15, adds 2.13 (#5221)
  • Unify positional input name handling in docs and signatures (#5217)
  • Add launch_bounds to SliceNormalizeKernel_2D kernels (#5206)
  • Update test_RN50_external_source_parallel_train_ddp.py to work with the latest PyTorch (#5219)
  • Rework Flax and Paxml training tutorials (#5205)
  • Mark all remaining stateless operators (#5195)
  • Install black formatter in the aarch64 build (#5211)
  • Fix typos in docs and comments (#5194)
  • Bump up TensorFlow version in tests (#5175)
  • Enbale nvJPEG2k support for the SBSA platform (#5209)
  • Extend Python Function and plugin type annotations (#5207)
  • Update readme about the Black formatting (#5212)
  • Add "Format DALI with black" to blame ignore revs (#5210)
  • Add type annotations for JAX plugin (#5197)
  • Fix/simplify slice usage in examples. (#5203)
  • Re-enable Python linter (#5189)
  • Format DALI with black (#5169)
  • Adjust configuration for autoformatting with black (#5168)
  • Introduce TensorLike to signatures utilizing Array Interface (#5179)
  • Update nvJPEG2k to 0.7.5 version (#5202)
  • Add doc build artifacts to gitignore (#5187)
  • Update FFmpeg to 6.1 (#5186)
  • Add pmap compatibility for JAX data_iterator (#5185)
  • Support checkpointing in Nemo reader (#5184)
  • Support checkpointing in Webdataset reader (#5182)
  • Support checkpointing in mxnet and tfrecord (#5165)
  • Support checkpointing in Caffe reader (#5181)
  • Support checkpointing in experimental video reader (#5180)
  • Support checkpointing in sequence reader (#5183)
  • Enable Python 3.11 test and 3.12 experimental support (#5174)
  • Make sure that GDS on SBSA is not tested for drivers below cuda 12.2 (#5177)
  • Add checkpointing benchmarks (#5166)
  • Improve WarpAffine input documentation (#5178)
  • Refactor JAX basic training example (#5176)
  • Support checkpointing in Coco Reader (#5162)
  • Move to CUDA 12.3U1 and enable GDS support for SBSA (#5170)
  • Expose Tensors types stubs in nvidia.dali.tensors module (#5153)
  • Make the video reader warn instead of failing on unreadable files (#5163)
  • Implement checkpointing for random GPU operators (#5148)
  • Generate overloads for Multiple Input Sets in interface files (#5159)
  • Improve API submodules discoverability in some language engines (#5167)
  • Expose current pipeline in python functions (#5156)
  • Set signature for fn API (#5154)
  • Add file to ignore Flake8 commits in blame (#5161)
  • Suppress sanitizer leaks reported from xla and prevent hang in git clone (#5152)
  • Add checkpointing support to Video Reader (#5139)
  • Add type annotations for numba and pytorch plugins (#5129)
  • Improve Python Function signature annotation (#5149)
  • Dependency update 2023.11 (#5146)
  • Bump up CUDA version used for tests (#5100)

Bug Fixes

  • Patch libtiff for CVE-2023-6277 (#5224)
  • Coverity fixes: fix fn.normalize handling of batch of empty samples, fix broken assertion in copy_with_stride (#5223)
  • Avoids infinite video decoder seek loop (#5218)
  • Fix and move caching from MaxThreadsPerBlock to MaxThreadsPerBlockStatic. (#5208)
  • Remove redundant overload signature for operators without inputs (#5188)
  • Fix hanlding of empty slices and slicing of empty inputs. (#5204)
  • Fix slice/stack/cat/scalar usage in RNNT pipeline tests. (#5201)
  • Fix Python3.11 tests (#5196)
  • Fix scalar constant dimensionality & hide constant op documentation (#5191)
  • Fix sharding in Caffe reader (#5172)
  • Fix the check for dynamically generated number of outputs (#5158)
  • Improve symbol visibility when importing (#5150)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The following operators: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, random_resized_crop, and experimental.decoders.image_random_crop do not currently support checkpointing.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.33.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.33.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.33.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.33.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.32.0

Published by stiepan 11 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added Python signatures/type hints to the DALI Python API (#5096, #5039, #5112, #5118, #5124, #5143).
  • Added experimental support for checkpointing DALI pipelines at arbitrary iterations (fn.readers.file, CPU fn.random generators, and stateless operators) (#5085, #5088, #5103, #5114, #5113, #5142, #5128, #5144).
  • Added support for CUDA 12.3 (#5106).

Fixed Issues

  • Fixed a potential crash on process teardown when using the fn.python_function in the DALI pipeline (#5138).
  • Removed unused arguments from fn.fast_resize_crop_mirror. The operator was deprecated in favor of fn.resize_crop_mirror (#5123).
  • Fixed a potential crash on process teardown when using CPU fn.resize in the DALI pipeline (#5133).
  • Fixed constructing tensors from stream-aware __cuda_array_interface__ v3 (#5125).
  • Fixed the crop_pos_z handling for a fixed crop window in the fn.crop operator (#5119).
  • Fixed releasing Python tensors without GIL in fn.external_source. The problem led to crashes when using fn.external_source in no_copy or parallel mode with conditional execution enabled in the pipeline (#5101).

Improvements

  • Improve checkpointing docs (#5142)
  • Add type hints to python-defined ops, run and tfrecord APIs (#5118)
  • Allow mid-epoch checkpointing in FileReader (#5113)
  • Mark more stateless operators (#5114)
  • Load ASAN during the build with sanitizers (#5121)
  • Add default arg values to JAX decorator (#5115)
  • Improve docs of variance and stddev (#5130)
  • Update python op tutorial (#5120)
  • Generalize __module__ handling and hide private modules docs (#5112)
  • Add mid-epoch checkpointing to Loader (#5103)
  • Add ViT data processing pipeline to hw_bench_script (#5110)
  • Add JAX Getting Started Tutorial (#5095)
  • Generate type hints for fn and ops APIs (#5096)
  • Move to CUDA 12.3 (#5106)
  • Add the BUILD ID to the Xavier wheel name (#5099)
  • Add Fast-Forward to Loader base (#5088)
  • Update installation guide to move https://pypi.nvidia.com/ or just pypi (#4815)
  • Remove Python 3.7 support and replace defaults with Python 3.8 (#5089)
  • Add better error for VFR check (#5092)
  • Move the snapshot queue from loader to reader (#5085)
  • Refactor ops and fn APIs (#5039)

Bug Fixes

  • Fix Python Functions ops signature (#5143)
  • Fix stateless tests (#5144)
  • Fix exit sequence. (#5138)
  • Fix data_iterator docs (#5140)
  • Reimplement (Fast)ResizeCropMirror in terms of Resize. Remove dead code. (#5123)
  • Ensure that the default host resource is destroyed after ResamplingFilters CPU instance. (#5133)
  • Fix CVE-2023-45853 in zlib (#5116)
  • Fix stub generation during DALI build (#5124)
  • Improve checkpointing error message (#5128)
  • Fix constructing tensors from cuda_array_interface v3. (#5125)
  • Fix lack of crop_pos_z handling for fixed crop window (#5119)
  • Fix CVE-2022-33065 in libsnd (#5105)
  • Add DALI package dependencies to the custom build test (#5122)
  • Make sure that the relase of python memory from the DALI tensor happens inside GIL (#5101)
  • Relax UpdatePropertiesFromSamples check constraints (#5098)

Breaking API changes

  • DALI 1.31 was the final release that supported Python 3.7.

Deprecated features

  • The operator fn.fast_resize_crop_mirror was deprecated in favour of fn.resize_crop_mirror.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.32.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.32.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.32.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.32.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.31.0

Published by stiepan 12 months ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Preliminary experimental support for pipeline checkpointing. (#5061, #5057)
  • Added data_iterator and peekable_data_iterator decorators for simplified JAX iterators definitions. (#5050, #5049)
  • Added the "Training neural network with DALI and Pax" tutorial. (#5060)

Fixed Issues

  • The fn.permute_batch operator can now be used with the conditional execution (if expressions). (#5063)
  • Fixed support for videos with different bit depths in the video decoder. (#5055)
  • Input operators with multiple outputs can be fed with data by the operator name. (#5066)

Improvements

  • Expose checkpointing in Python pipeline (#5061) 
  • Update deps: RapidJSON, OpenCV (#5079) 
  • Fix coverity issues 10/23 (#5083) 
  • Add Pax tutorial (#5060) 
  • Add Efficientnet pipeline to hw_decoder_bench (#5076) 
  • Add JAX iterator decorator (#5050) 
  • Update libwep library to remediate CVE-2023-5129 (#5075) 
  • Replace optional stream in SaveState with AccessOrder  (#5062) 
  • Add implicit scope to batch_permutation (#5063) 
  • Fix enumeration formatting in conditionals docs (#5067) 
  • Update DALI key visual (#5069) 
  • Deprecate Python 3.7 starting DALI 1.31 (#5068) 
  • Extend HW image decoder bench script to support multiple GPUs (#5065) 
  • Remove the avformat_find_stream_info call from the video loader when not needed (#5047) 
  • Add ability to serialize/deserialize Checkpoint (#5057) 
  • Remove dali::any in favor of std::any. (#5058) 
  • Disable container overflow sanitizer all the time (#5053) 
  • Replace PaddlePaddle ResNet50 example with one from the DeepLearningExamples (#5048) 
  • Make the ResNet50 example compatible with TensorFlow 2.13 (#5045) 
  • Reorganize JAX plugin (#5049) 
  • Replace GPU dltensor per-sample copying kernel with a batched one (#5038) 
  • September dependency update (#5043) 

Bug Fixes

  • Add user-friendly message about missing numpy (#5081) 
  • Set external input by op name instead of tensor name (#5066) 
  • Fix the support of videos with different bith dept in the video reader (#5055) 
  • Set layout from argument in external source (#5064) 
  • Update JAX version to 0.4.13 (#5056) 
  • Extend Numba compatibility checks. Skip Numba GPU tests on incompatibe systems. (#5054) 

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

  • Python 3.7 support is deprecated starting from DALI 1.31.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.31.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.31.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.31.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.31.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.30.0

Published by stiepan about 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added support for running custom CPU and GPU Python operators (fn.*python_function) inside DALI asynchronous pipelines (#4965, #5038).
  • Improved support for GPU Numba operator (plugin.numba.fn.experimental.numba_function) (#4000).
  • Improved (fn.crop_mirror_normalize) performance (#4993, #4992).
  • Added support for strides in subscript operator (#5007).
  • Added support for video in predefined automatic augmentations (#5012).
  • Added case insensitive mode in fn.readers.webdataset (#5016).
  • Moved to CUDA 12.2U2 (#5027).
  • Added Flax training examples (#5004, #4978).

Fixed Issues

  • Fixed GPU fn.readers.numpy global shuffling (#5034).
  • Fixed finalization of custom operator plugins during pipeline shutdown (#5036).
  • Fixed synchronization issue in fn.resize operator family that could result in distorted outputs in initial iterations (#4990).

Improvements

  • Replace GPU dltensor per-sample copying kernel with a batched one (#5038)
  • September dependency update (#5043)
  • Make download_pip_packages.sh resilient to errors (#5044)
  • Move to CUDA 12.2U2 (#5027)
  • Clean up and refactor code around Multiple Input Sets (#5035)
  • Move to the upstream CV-CUDA 0.4 (#5032)
  • Revert "Make nesting conditionals supported only for Python 3.7+" (#5031)
  • Move all remaining video files to LFS (#5025)
  • Refactor custom op wrappers into separate files of ops module (#5028)
  • Add pipeline checkpointing to the Executor (#5008)
  • Refactor ops into a submodule (#5018)
  • Add checkpointing support to ImageRandomCrop (#4999)
  • Replace deprecated fluid APIs to recommended APIs of Paddle (#5020)
  • fix: CMakeLists.txt typo (#5006)
  • Support video in predefined automatic augmentations (#5012)
  • Extend GPU numba support (#4000)
  • Add opt-in support for case insensitive webdataset (#5016)
  • Add optimized variant of CMN for HWC to HWC pad FP16 case (#4993)
  • Added Stride to Subscript and Slice Kernel (#5007)
  • Add optimized variant of CMN for HWC to HWC case (#4992)
  • Add multiple GPU code to Flax example (#5004)
  • Pin inputs to decoder operators as well (#5003)
  • Add checkpointing support to stateless operators used in EfficientNet (#4977)
  • Use a different way to ensure that the right version of libabseil is used in conda (#4991)
  • Make samples' descriptors copy in resize op fully asynchronous (#4989)
  • Remove mentions of experimental from conditional tutorial. (#4988)
  • Enable python operators in async pipelines (#4965)
  • Make sure that the right version of libabseil is used in conda (#4987)
  • Coverity fixes - 08.2023 (#4970)
  • CPU fn.random operators checkpointing (#4961)
  • Add Flax training example (#4978)
  • Make error reporting more verbose for rand augment tests (#4958)

Bug Fixes

  • Propagate to conda build packages required for DALI installation (#5041)
  • Fix wheel predownload (#5023)
  • Fix GPU numpy reader global shuffling (#5034)
  • Change the way the input operators are traversed during the pipeline shutdown (#5036)
  • Fix issues detected by Coverity as of 2023.09.04. (#5030)
  • Fix CUDA block sizes in Numba GPU tests. (#5026)
  • Change Loader to make checkpoints at the end of an epoch (#5019)
  • Disable Flax tutorial test (#5015)
  • Fix resize processing cost calculation (#5009)
  • Fix abs diff computation in check_batch test utility (#4957)
  • Fix sync in Resize operator family (#4990)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.30.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.30.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.29.0

Published by stiepan about 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added GPU fn.experimental.median_blur operator. (#4950, #4975)
  • Improved JAX support:
    • Added support for jax.Sharding to dali.plugin.jax.DALIGenericIterator (#4969).
    • Improved examples and tutorials (#4973, #4956, #4944, #4937).
  • Optimized the HWC to CHW transposition variant of the fn.crop_mirror_normalize operator (#4972).
  • Moved to CUDA 12.2U1 (#4966)

Fixed Issues

  • Fixed layout broadcasting in arithmetic expressions (#4951).
  • Added missing layout propagation in fn.reductions (#4947).

Improvements

  • Trim CV-CUDA to expose only median blur to reduce the binary size (#4985)
  • Add optimized variant of CMN for HWC to CHW case (#4972)
  • Enable CV-CUDA build for xavier (#4976)
  • Update DALI_deps version (#4971)
  • Add automatic parallelization JAX example (#4973)
  • Exclude median_blur test from xavier tests (#4975)
  • Move to CUDA 12.2 U1 (#4966)
  • Add basic jax.Sharding support for the iterator (#4969)
  • Enable cv-cuda in conda build (#4968)
  • Fix wheel bundling with cvcuda for debug builds (#4959)
  • Fix Getting Started link in README (#4962)
  • Add multigpu JAX tutorial (#4956)
  • Add median blur operator (#4950)
  • Fix updated linter errors (#4960)
  • Support checkpointing in FileReader (#4954)
  • Add CV-CUDA as a subproject (#4949)
  • Remove the direct use of cuda_for_dali auxiliary namespace. (#4953)
  • Checkpointing classes (#4946)
  • Make sure that lossless support is disabled when it fails to initialize (#4934)
  • Add L3 short test for RN50 training (#4614)
  • DALI_deps update 13 Jul 2023 (#4945)
  • Add JAX tutorial tests (#4944)
  • Update OpenCV 4.7.0 to 4.8.0, patch for CVE-2023-1999 (#4941)
  • Fix L1 Jupyter Conda Job (#4942)
  • Update the TensorFlow version used in tests (#4940)
  • Add basic JAX tutorial (#4937)

Bug Fixes

  • Checkpoint after running epoch (#4983
  • Propagate layout in fn.reductions (#4947)
  • Fix layout broadcasting arithm ops (#4951)
  • Fix coverity issues - July 2023 (#4948)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.29.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.29.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.28.0

Published by stiepan over 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added CUDA 12.2 support (#4930, #4938, and #4939).
  • Added cudaMallocAsync support (#4900, #4923, and #4921).
  • Improved JAX multiprocessing support (#4929, #4927, #4919, #4906, and #4920).
  • Added DALIRaggedIterator, a DALI Pytorch plugin iterator that supports non-uniform tensors (#4911).

Fixed Issues

No major fixes are included in this release.

Improvements

  • Fix OpticalFlow test premature exit on sm < 8 (#4933)
  • Remove dependency on forked libcudacxx (#4938)
  • Add JAX multinode multigpu tests (#4929)
  • Adding handling of non-uniform tensors in DALI Pytorch plugin (#4911)
  • Reworks supported Python versions (#4924)
  • Disable cudaMemPoolReuseAllowOpportunistic in cudaMallocAsync for <r470.60 (#4931)
  • Move to CUDA 12.2 (#4930)
  • Remove template from tensor rule-of-five for c++20 compat (#4928)
  • Add JAX container test job (#4927)
  • Extends guards against intercepting by asan certain functions (#4925)
  • Fix CUDA_remove_toolkit_include_dirs CMake function (#4922)
  • Add alignment to cuda_malloc_async_memory_resource. (#4923)
  • Add source_info to the tensors produced by video readers (#4916)
  • Add JAX multigpu sharding tests (#4919)
  • Add basic JAX multi process test (#4906)
  • Add libabseil as a runtime DALI dependency in conda (#4907)
  • Remove pinning Cython version from PyThon SSD test (#4913)
  • Add a memory resource based on cudaMallocAsync (#4900)

Bug Fixes

  • Fix memory_resource compilation in conda build (#4939)
  • Disable JAX iterator tests in ASAN build (#4920)
  • Fix number of devices for JAX multigpu test (#4921)
  • Remove unnecessary cudaDeviceSynchronize from memory resource perf test. (#4908)
  • Fix broken assertion in sequence operator (#4905)

Breaking API changes

  • DALI 1.27 was the final release that supported Python 3.6.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.28.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.28.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.27.0

Published by stiepan over 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added O_DIRECT support mode support to fn.readers.tfrecord (#4820).
  • Added JAX integration (#4867, #4883, #4853).
  • Added the GPU backend for fn.experimental.readers.fits images that are stored in the FITS format (#4752).

Fixed Issues

  • Assured deterministic outputs for multiple instances of auto_augment pipelines that are built with the same seeds (#4885).
  • Fixed the blocking option in the external source operator (#4874).
  • Fixed the returning empty pixel mask for COCO samples with no objects (#4856).
  • Fixed the handling of unsupported images by image decoders in fn.experimental.decoders (#4846).

Improvements

  • Update deps 23/06 (#4902)
  • Add O_DIRECT support to the TFRecord reader (#4820)
  • Relax the gast version requirement (#4896)
  • Add DALI iterator for JAX (#4867)
  • Fix coverity issues (#4897)
  • Add deprecation warning for Python3.6 (#4895)
  • Use memory pool for large host allocs (#4886)
  • Improve the feed_input documentation regarding prefetching (#4875)
  • Support nesting data structures in conditionals (#4880)
  • Add JAX multi GPU tests (#4883)
  • Move the mention of the EfficientNet example to a box (#4882)
  • Update the Protobuf version to 23.01 and adjust the build system to it (#4861)
  • Add basic JAX integration (#4853)
  • Limit the version of typing_extensions for the TensorFlow test (#4863)
  • Add GPU implementation for Fits reader (#4752)
  • Disable Numba CPU tests on AARCH64. (#4862)
  • Update readme text and code highlighting (#4858)
  • Disable NUMBA CPU test for runs with memory sanitizer (#4854)
  • Adjust numpy reader tests for nose2 (#4851)
  • Update support for Numba 0.57 (#4845)
  • Move to CUTLASS 3.1 (#4841)
  • Add a test that triggers a failure in Python (#4836)
  • Improve VA reservation robustness (#4826)
  • fix: bad relative path (#4822)

Bug Fixes

  • Skip DLPack CPU export test for incompatible Numpy (#4904)
  • Fix parsing numpy header (#4903)
  • Remove outdated info from iterators docs. (#4899)
  • Bugfix (async_pool): Store original alignment in 'padded_'. (#4898)
  • Fix the augmentation coalescing in AA (#4887)
  • Skip tests for incompatible env (#4894)
  • Make nesting conditionals supported only for Python 3.7+ (#4888)
  • Fix DALI FW iterator reset for DROP last batch policy (#4881)
  • Assure same operator initialization order in the AA graph (#4885)
  • Fix the lack of support for the blocking option in the external source operator (#4874)
  • Disable container overflow errors (#4878)
  • Fix the wrong assignment of the default values in build_helper.sh (#4871)
  • Disable JAX support for unsupported Python versions (#4870)
  • Disable FITS test when not building with CFITSIO support. Fix build without libTIFF. (#4866)
  • Fix layout propagation in jpeg compression distortion (#4864)
  • Fix returning empty pixel mask for COCO samples with no objects (#4856)
  • Bugfix in imgcodec: filter should happen after set decode result (#4846)
  • Don't run image decoder tests in test discovery stage. (#4833)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

DALI 1.27 is the final release that will support Python 3.6.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.27.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.27.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.27.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.27.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.26.0

Published by stiepan over 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added O_DIRECT mode support to fn.readers.numpy (#4796, #4848).
  • Added an option to filter out iscrowd entries from COCO (#4792).
  • Moved to CUDA 12.1 update 1 (#4798).
  • Made DALI GPU tensors directly convertible to PyTorch (#4800).

Fixed Issues

  • Fixed a memory leak in the fn.experimental.remap operator (#4790).
  • Fixed the recognition of new CuPy ndarrays in fn.external_source (#4793).

Improvements

  • Cumulative dependency update for May, 2023. (#4823)
  • Add O_DIRECT support in numpy_reader (#4796)
  • Add a native dataloader to RN50 PyTorch example (#4807)
  • Fix coverity issues (Apr 2023) (#4803)
  • Move to CUDA 12.1 update 1 (#4798)
  • Make DALI array_interface memory writable (#4800)
  • Add support for filtering in/our iscrowd entries from COCO (#4792)
  • Add bug and question templates to DALI github repo (#4782)
  • Rework conditional-like execution tutorial for arithmetic ops (#4795)
  • Add "depleted" operator trace (#4794)
  • Add "repeat_last" option to ExternalSource and handle it in Pipeline. (#4775)
  • Use dedicated GTC 2023 event links (#4781)

Bug Fixes

  • Fix race condition in the CPU numpy reader (#4848)
  • Update required packages for TL1_python-self-test_conda (#4843)
  • Fix FITS tests with python3.7, reduce memory usage in rand aug tests (#4844)
  • Fix FITS reader test with Python3.6 (#4835)
  • Fix TensorFlow tests (#4837)
  • Fix conda test and tests on Xavier (#4827)
  • Restrict the urllib3 version in tests to <2.0 (#4824)
  • Fix error propagation from the QA test (#4821)
  • Make TL0_python-self-test-base-cuda using the local CUDA toolkit (#4811)
  • Fix scratchpad usage in Remap. Add more documentation to scratchpad. (#4790)
  • Fix the regex that recognizes CuPy arrays. (#4793)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.26.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.26.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.26.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.26.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.25.0

Published by stiepan over 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added the experimental flexible image transport system (FITS) reader (fn.experimental.readers.fits) for the CPU backend (#4591).
  • Added the CPU backend for the histogram equalization operator (fn.experimental.equalize) (#4742).
  • Added the CPU backend for the 2-D convolution for images and video (fn.experimental.filter) (#4764).
  • Added support for feeding pipeline inputs as named arguments in Pipeline.run (#4712).
  • Improved the automatic augmentations and conditional execution in the following ways:
    • Support for CPU inputs in predefined automatic augmentations (#4772).
    • Reduced memory consumption (#4697).
    • Support for conditional execution in debug mode (#4738).
    • EfficientNet training example with DALI AutoAugment (#4678).
    • More predefined policies for AutoAugment (#4753).
    • Support for numerical types in the if predicate and not expression (#4715).
  • Operator improvements:
    • Improved the performance of CPU brightness and contrast operators for uint8 samples (#4737).
    • Improved the fn.readers.webdataset performance (#4708).
    • Support booleans in fn.readers.numpy (#4745).
    • Added support for booleans in the DALI iterator for PyTorch (#4757).

Fixed Issues

  • Fixed possible hangs on a pipeline build or teardown when using fn.experimental.decoder.image (#4727).
  • Fixed D2D copy synchronization that might result in fn.experimental.decoders.video returning incorrect frames for high-resolution videos (#4717).
  • Fixed buffer exhaustion in fn.experimental.decoder.image (#4723).
  • Fixed GPU unary arithmetic operators (for example, math.abs and math.floor) incorrectly processing non-scalar samples (#4746).
  • Fixed host JPEG decoder leaking memory on incorrect files (#4748).
  • Fixed missing source information in the numpy reader output (#4714).
  • Fixed error message in assertion in base_iterator.py (#4726).

Improvements

  • Expose Automatic Augmentation docs (#4760)
  • Rename sample to data in automatic augmentation APIs (#4774)
  • Support CPU samples in predefined automatic augmentations (#4772)
  • Make conditionals work in debug mode (#4738)
  • Improve PyPi DALI description (#4769)
  • Use lookup table for uint8 inputs in mul-add kernel (#4737)
  • Add more AutoAugment policies (#4753)
  • Add CPU filter operator (#4764)
  • Simplify AutoAugment graph (#4751)
  • Add links to DALI related GTC'23 talks (#4743)
  • Make python output unbuffered in tests (#4766)
  • Adjust docs config for newer Sphinx version (#4765)
  • Update DALI_DEPS sha version (#4763)
  • Move enable_conditionals option to regular @pipeline_def (#4747)
  • Adds bool type support to PyTorch DALI integration (#4757)
  • Update deps: pybind, FFmpeg, zstd (#4749)
  • Update TensorFlow version used in tests (#4739)
  • Stop building the DALI TF plugin for conda (#4741)
  • Enable bool support in the numpy reader operator (#4745)
  • Add CPU equalize operator (#4742)
  • Add experimental FITS reader for CPU backend (#4591)
  • Adjust RN50 TF performance test threshold (#4734)
  • Add timestamps to QA test output. (#4733)
  • Update nvJPEG2k to 0.7 version (#4728)
  • Add a requirement for CUDA toolkit for CUDA 12 builds (#4588)
  • DALI Pipeline inputs as named arguments to Pipeline.run() (#4712)
  • Update RN50 PyTorch test speed threshold (#4724)
  • Add links for DALI installations to docs (#4716)
  • Support numerical types in if predicate and not expression (#4715)
  • Reduce memory footprint of conditional execution (#4697)
  • Add EfficientNet example using automatic augmentations with DALI (#4678)
  • Change WDS index version representation to integer + refactor version utilties. (#4708)
  • Update OpenCV build recipe (#4693)
  • Update GTC 2022 sessions' links in the README (#4705)

Bug Fixes

  • Update CLANG version (#4768)
  • Fix the lack of proper error handling in selected tests (#4759)
  • Update fix assert error messages in base_iterator.py (#4726)
  • Fix bug in Arithmetic unary op implementation (#4746)
  • Fix memory leak in host JPEG decoder (#4748)
  • Fix missing source info in the numpy reader output (#4714)
  • Fix buffer exhaustion in the frames_decoder_gpu (#4723)
  • Fix nightly tests after merging RunArg (#4732)
  • Move the --pending and cv.notify_all() inside the critical section to prevent the notification from going unobserved. (#4727)
  • Pass the correct shape to auto augs in the EfficientNet example (#4721)
  • Fix pytorch-lightning example with Python3.6 (#4722)
  • Fix pytroch-lightning notebook example (#4719)
  • Fix D2D copy in the GPU frames decoder (#4717)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.25.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.25.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.25.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.25.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.24.0

Published by stiepan over 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Introduced an automatic augmentation module with AutoAugment, RandAugment, and TrivialAugment (#4694, #4699, #4696, #4702, #4704, #4706, #4710).
  • Added CUDA 12.1 support (#4684).
  • Added support for the and, or, and not boolean operators in pipelines (#4629, #4676).

Fixed Issues

  • Reduced memory consumption by video decoder (#4682).

Improvements

  • Update TF dataset API usage to align with 2.13rc (#4707)
  • Rename as_param to mag_to_param (#4710)
  • Add RandAugment and TrivialAugment to auto_aug module (#4704)
  • Add AutoAugment and ImageNet policy (#4702)
  • Fix The Canonical Link Relation in the sphinx documentation (#4703)
  • Rework DALI examples to use native PyTorch amp (#4683)
  • [AA] Add select operator util (#4696)
  • Add augmentations used by AA (#4699)
  • [AA] Add auto augmentation wrapper (#4694)
  • Add simple sanity test for DALI Conditionals in tf.function (#4689)
  • Add support for CUDA 12.1 (#4684)
  • Add CPU-only and variable batch tests for conditionals (#4668)
  • Make daliPipelineHandle a pointer to an opaque C++ structure. (#4599)
  • Enable JPEG fancy upsampling for mixed image decoder (#4662)
  • Release buffered libaviutil packets (#4682)
  • Overcome problem with testing TensorFlow with sanitizers (#4671)
  • New CropMirrorNormalize out of experimental module (#4644)
  • Do not install PaddlePaddle from the wheel in the L3 test (#4665)
  • Enable Python 3.10 tests in CI (#4598)
  • Use nvjpeg2k ROI API directly (#4654)
  • Add a long DALI description in DALI wheel (#4658)
  • Update the DALI roadmap link in the README to use the 2023 version (#4659)
  • Add lazy and and or, and not lazy not support (#4629)
  • Reduce the size of the generated doxygen docs (#4657)
  • Naive histogram custom operator example/template (#4615)

Bug Fixes

  • Do not use numpy.typing when not available (#4706)
  • Fix SkipTest usage for fancy upsampling tests (#4698)
  • Add missing constexpr to set_size in the tensorlayout (#4692)
  • Augment exception handling with ImportError (#4681)
  • Fix the logical expression tests to avoid short-cutting them (#4676)
  • Fix API type check tests for frameworks (#4670)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The experimental.decoder.image may hang during a pipeline build or a teardown.
    The issue has been fixed in nightly builds and will be fixed in release 1.25.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.24.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.24.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.24.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.24.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.23.0

Published by stiepan over 1 year ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Enabled conditional execution: support for if/else statements with runtime predicates inside pipeline (#4561, #4618, #4602, #4589, #4617).
  • Added GPU experimental.inputs.video operator that supports decoding large videos from memorybuffer across multiple iterations (#4613, #4584, #4603, #4564).
  • Added support for lossless JPEG decoding on CPU and GPU with fn.experimental.decoders.image (#4625, #4600, #4587, #4572, #4592, #4548).
  • Added fn.experimental.tensor_resize operator (#4492).
  • Added fn.experimental.equalize operator (#4575, #4565).
  • Added API for pre-allocation and releasing of memory pools (#4563, #4556).

Fixed Issues

  • Fixed GPU fn.constant operator synchronization issue (#4643).
  • Fixed out-of-bounds access with trailing wildcard in fn.reshape (#4631).
  • Fixed insufficient alignment issues in GPU video decoding (#4622).

Improvements

  • Dependencies update (#4649)
  • Reduce L0 test time (#4645)
  • Extend input API utilities to support input operators (#4642)
  • Add slice_flip_normalize_* to the minimum build (used by imgcodec)
  • VideoInput<MixedBackend> (#4613)
  • Move slice_flip_kernel* to separate compilation units (#4637)
  • Bump nvCOMP to 2.6.1 (#4638)
  • Add fn.experimental.crop_mirror_normalize (#4562)
  • Simplify setup stage of Cast operator (#4633)
  • Move to CUDA 12.0U1 (#4632)
  • Fix the warning in the build with sanitizer (#4626)
  • Optimize CPU time of JPEG lossless decoder (#4625)
  • Support inferring batch size from tensor argument inputs (#4617)
  • reshape: restore the support for trailing wildcard in rel_shape (#4623)
  • Add DALI Conditionals documentation (#4589)
  • Enable nose2 test timer (#4610)
  • New SliceFlipNormalizeGPU kernel (#4356)
  • DataId mechanism for fn.inputs.video operator (#4584)
  • Add experimental.tensor_resize operator (#4492)
  • MixedBackend support for InputOperator (#4603)
  • Fix HasHwDecoder (#4601)
  • Track DataNodes produced by .gpu() in conditionals (#4602)
  • Update the math expression docs (#4568)
  • Clear operator traces before launching the operator (#4605)
  • Skip JPEG lossless tests for compute capability < SM60 (#4600)
  • Add experimental python 3.11 support (#4586)
  • Improve error message when trying to decode JPEG lossless images on the CPU backend (#4587)
  • Improve pipeline graph traversal (#4583)
  • Make .so files patched in one go when the wheel is produced (#4582)
  • Operator trace mechanism (#4564)
  • Add equalize operator (#4575)
  • Add equalize kernel (#4565)
  • Support for JPEG lossless images in GPU fn.experimental.decoders.image (#4572)
  • Add experimental support for if statements in DALI (#4561)
  • Add CodeQL workflow for GitHub code scanning (#4438)
  • Update nvCOMP to 2.6 (#4579)
  • Give the ability to link each part of CUDA toolkit statically (#4570)
  • Fix TL0_python-self-test-base-cuda for CUDA 12 (#4577)
  • Add functions to preallocate pools and release unused pool memory (#4563)
  • Disable strict_overflow warning. (#4567)
  • Remove unused define_graph argument from build pipeline method (#4555)
  • Add release_unused function to memory pools. (#4556)
  • Change CUDA C++ standard to C++17 (#4506)
  • Create axes_utils.h (#4548)

Bug Fixes

  • Fixing API utils (#4651)
  • constant operator: Set proper stream in constant storage. (#4643)
  • Coverity 2023.01-02 (#4641)
  • Allow 1-off discrepancies in the equalize op between GPU and CPU baseline (#4639)
  • Fix pipeline leak in InputOperatorMixedTest (#4630)
  • reshape: Prevent out-of-bounds access with trailing wildcard in rel_shape (#4631)
  • Fix @autoserialize problem with unknown module (#4628)
  • Fix classification of argument input-only operators in AutoGraph (#4618)
  • Fix stack op error message so that it reports dim of offending operand (#4616)
  • Make sure that ulMaxWidth is aligned to 32 bytes in the video decoder (#4622)
  • Fix sanitizer error: memory & pipeline leaks (#4619)
  • Fix rel_shape length validation in reshape (#4595)
  • Fix non-VMM pool release_unused. Don't rely on cudaGetMemInfo in preallocation tests. (#4596)
  • Fix errors reported by LASAN (#4594)
  • Add nvjpeg calls used for lossless jpeg decoding to the stub generator (#4592)
  • Fix passing WITH_DYNAMIC_* falgs to conda build (#4597)
  • Fix pool preallocation tests (#4585)
  • Fix imgcodec fallback and error handling (#4573)
  • Fix CUDA_TARGET_ARCHS handling in CMake 3.18+ (#4559)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.23.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.23.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.23.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.23.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.22.0

Published by stiepan almost 2 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added CUDA 12.0 support (#4502).
    • Reduced binary size for CUDA 12 builds.
  • Added CPU experimental.inputs.video operator that supports decoding video from memorybuffer across multiple iterations to reduce memory usage (#4519).
  • Added GPU fn.experimental.filter (convolution) operator (#4298, #4525).
  • Added support for decoding raw H264 and H265 streams from memory (#4480).

Fixed Issues

No major issues were fixed in this release.

Improvements

  • Update DALI TensorFlow examples to work with 2.11 (#4554)
  • Update nvCOMP to 2.5 (#4550)
  • Fix TL1_custom_src_pattern_build test (#4546)
  • Allow CPU dtype source in GPU cast_like (#4547)
  • Add GPU filter operator (2D, 3D) (#4525)
  • Remove usage of the unified memory from the remap test (#4544)
  • Split DALI operator tests into two jobs (#4543)
  • Update suppression list for sanitizer tests (#4542)
  • Update Boost preprocessor and rapidjson (#4538)
  • Update libtiff (#4531)
  • Fix linter errors & numpy dependency workaround (#4532)
  • VideoInput operator for the CPU (#4519)
  • Use pointer in NVDECLease. Store owner pointer in NVDECLease. (#4523)
  • Extract ResizeAttrBase to be reused in TensorResizeAttr (#4515)
  • Add GPU filter kernel (#4298)
  • Propagate SourceInfo (when unambiguous) from inputs to outputs. (#4518)
  • Limit NumPy version to pre-1.24 (#4527)
  • Avoid signed/unsigned comparison in clamp<S, U>. (#4524)
  • Update YOLO example for the latest to support the latest TensorFlow version (#4522)
  • Utilities and refactoring pre-VideoInput operator (#4513)
  • Enable CUDA 12.0 support (#4502)
  • Extracting InputOperator from ExternalSource (#4505)
  • Add expand_dims utility (#4493)
  • Remove Operator inheritance from VideoDecoderBase (#4508)
  • Extend decoding support (#4480)
  • Place AutoGraph as private submodule of DALI and enable tests (#4504)
  • Link CFITSIO library with cmake (#4487)

Bug Fixes

  • Add the missing installation of sanitizer to the deps image (#4521)
  • Fix DALI build without FFmpeg (#4534)
  • Replace usages of numpy.bool with bool (#4526)
  • Fix missing #include <optional>. (#4520)
  • Fix exclusion of CFITSIO test when BUILD_CFITSIO=OFF (#4510)
  • Don't look for duplicate arguments in parent schemas. (#4507)
  • Fix size argument to strncpy in cfitsio_test. Fix copyright notice. (#4509)

Breaking API changes

  • DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit
  • DALI 1.21 was the last release built for CUDA 10.2.

Deprecated features

No features were deprecated in this release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

NOTE: DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

CUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. 
They are built with the latest CUDA 11.x/12.x toolkit respectively but they can run on the latest, 
stable CUDA 11.0/CUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).
However, using the most recent driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

Install via pip for CUDA 12.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda120==1.22.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda120==1.22.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.22.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.22.0

Or use direct download links (CUDA 12.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.21.0

Published by JanuszL almost 2 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements:

  • Added experimental image decoding operators with support for the following higher dynamic ranges (#4223):
    • experimental.decoders.image
    • experimental.decoders.image_crop
    • experimental.decoders.image_random_crop
    • experimental.decoders.image_slice
  • Added the GPU debayer operator (#4495, #4486).

Fixed Issues

The following issues were fixed in this release:

  • Fixed the issue where the GPU numpy reader was crashing on a DALI process teardown with cufile 1.4.0 (#4466).
  • Fixed the issue where the GPU video decoder was failing in multi-GPU settings (#4517).

Improvements

  • Optimizing ShiftPixelCenter kernel configuration (#4430).
  • Update "Compiling from source" tutorial (#4010).
  • Imgcodec's decode operator (#4223).
  • Move to use CMake in DALI deps where possible (#4445).
  • Bump supported tf version (#4459).
  • Optimize inflate tests (#4456).
  • Execute whole Keras code in the expected device scope (#4462).
  • Update the TensorFlow test to work with 2.11.x (#4460).
  • Crop rounding argument to control the conversion of anchors to integral values (#4461).
  • Make Transpose's perm argument optional (by default, reverse dims) (#4465).
  • Add CastLike operator (#4467).
  • Accept negative axis in Cat and Stack operators (#4468).
  • Code drop AutoGraph based on TensorFlow 2.10.0 (#4485).
  • Remove build and doc files from AutoGraph (#4489).
  • Rearrange AutoGraph tests (#4490).
  • Adjust the documentation template for the latest sphinx_rtd_theme (#4481).
  • Bump the nvidia-tensorflow to 22.11 in tests (#4472).
  • Improve error reporting in the video decoder (#4484).
  • Move to generic CUDA_CALL for nvCOMP (#4474).
  • Extend the warning about the lack of the necessary CUDA libraries (#4473).
  • Allow negative axes in reductions module (#4470).
  • Add kernel-wrapper around NPP debayer calls (#4486).
  • Remove TF-specific codepaths from AutoGraph (#4491).
  • Lint the AutoGraph code (#4494).
  • Add bytes_per_sample_hint parameter to parallel external source (#4155).
  • Add debayer operator (#4495).
  • Remove trailing comments from .flake.ag (#4497).
  • Update DALI_DEPS_VERSION (#4496).
  • Deprecate CUDA 10.2 (#4503).
  • Extract CachingList from ExternalSource (#4501).

Bug Fixes

  • Do not call nvcomp with no input (#4434).
  • Fix libtiff CVE-2022-3970 (#4448).
  • TL3 SSD Install pycocotools from latest NVIDIA cocoapi repo (#4457).
  • Fix numpy reader crash (#4466).
  • Fix stub generation for dynamic linking (#4478).
  • Fix issues found by static analysis (#4477).
  • Fix PES tests with Python3.6/3.7 (#4500).
  • Patch FFmpeg for CVE-2022-3965, CVE-2022-3964 (#4499).
  • Fix video decoder cache for multiple GPUs (#4517).

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

  • DALI 1.21 is the final release that will support CUDA 10.2.

Known issues:

  • The GPU numpy reader might crash during the DALI process teardown with cufile 1.4.0.
  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • Experimental VideoReaderDecoder does not support open GOP.
    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.
    As a workaround, you can manually synchronize the device before returning the data from the callback.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker.

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.21.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.21.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.21.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.21.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code: