DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

APACHE-2.0 License

Downloads
44.4K
Stars
5K
Committers
95
DALI - DALI v1.2.0

Published by banasraf over 3 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • noise.shot CPU and GPU operators (#2861)
    • noise.gaussian CPU and GPU operators (#2846)
    • jpeg_compression_distortion CPU and GPU operators (#2823)
  • New mathematical operations (#2853):
    • Square and cubic root (sqrt, rsqrt, and cbrt)
    • Logarithms of different bases (log2 and log10)
    • Power (** operator and pow function)
    • Absolute value (abs and fabs)
    • Roundings (ceil and floor)
    • Trigonometric functions (sin, cos, and tan)
    • Inverse trigonometric functions (asin, acos, atan, and atan2)
    • Hyperbolic functions (sinh, cosh, and tanh)
    • Inverse hyperbolic functions (asinh, acosh, and atanh)
  • Added a Python wrapper for the fn.experimental.numba_function (#2886, #2835, #2903, #2893, and #2887)
  • Image decoder improvements:
    • Enabled ROI decoding in the hardware decoder (#2734).
    • Added support for the alpha channel in PNG and JP2 decoding (#2867).
    • Added support for YCbCr and BGR in JP2 decoding (#2867).
  • Updated the CUDA version to 11.3 (#2870).
  • Improved the documentation (#2915, #2911, #2927, #2862, and #2858).

Fixed issues

This DALI release includes the following fixes:

  • Fixed the readers.numpy cache issue (#2932).
  • Fixed an error in readers.nemo_asr (#2928).
  • Fixed a bug that caused the video reader hang (#2916).

Improvements

  • Improve Tensors docs (#2915)
  • DALI core allocation functions (#2930)
  • Update FFmpeg build guide and update DALI_deps version (#2911)
  • Default memory resources (#2890)
  • Better error message when insufficient data in cache (#2924)
  • Add a link to the TensorFlow ResNet50 training script in the Readme (#2927)
  • Numba func notebook (#2886)
  • Enable HW decoder ROI support (#2734)
  • Use a custom color space conversion kernel for all conversions (#2907)
  • Update packages used for DALI tests (#2906)
  • Refactor TF Dataset code and lint it (#2909)
  • Add ShotNoise CPU and GPU operators (#2861)
  • Remove workaround for the problem with patchelf changing TLS alignment for CUDA < 10.2 and > 11.1 (#2879)
  • Add dali_data_type_vec (#2887)
  • Composite resource + renaming. (#2891)
  • Update deps in third_party and conda (#2878)
  • Python wrapper for numba (#2835)
  • Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 (#2867)
  • Better error handling in pipeline.py (#2864)
  • Update DALI deps (#2876)
  • Enable CUDA 11.3 based builds (#2870)
  • Updates MXNet plugin documentation regarding last_batch_policy (#2862)
  • README update with GTC2021 materials (#2860)
  • RNGBase to be used as base for noise augmentations + Add GaussianNoise operator (as an example) (#2846)
  • Pinned async resource (#2858)
  • Add more mathematical operations (#2853)
  • Add JpegCompressionDistortion CPU and GPU operators (#2823)
  • Split Python tests into smaller chunks (#2847)
  • Asynchronous pool memory resource (#2814)

Bug fixes

  • Add missing opencv-python dependency to TL2_FW_iterators_perf test (#2939)
  • Fix numpy reader header cache (#2932)
  • NemoAsrReader: Call Reset() on tensor vector holding the batch, to clear any previous shared data pointer. (#2928)
  • Fix DALI compilation for CUDA 11 pre 11.3 version (#2925)
  • Make dynlink_xxx use statically linked functions to load symbols. (#2931)
  • Fix test_detection_pipeline.py (#2929)
  • Add a missing av_bsf_flush call to a VideoRader seek function (#2916)
  • Run Optical Flow on stream 0 when running driver > 460. (#2914)
  • Fix nvcc warning about unused arguments in ResampleDepth_Channels (#2913)
  • Fix CUDA 10.0 compilation (#2917)
  • Use stream 0 in VideoDecoder when running driver >460 / CUDA >= 11.3. (#2902)
  • Fix docs and rename numba_func to numba_function (#2903)
  • Allow to specify optional args of Python-only types (#2898)
  • DALI TF install tool: Verify that a compatible prebuilt plugin is available for the required TF version before proceeding to attempt installation (#2882)
  • Fix coverity issues by adding lacking CUDA_CALL (#2888)
  • Fix failing test for Numba Func (#2893)
  • Fix double accumulation in horizontal resampling. Add test. (#2871)
  • Add espilon to math function tests and adjust epsilon for rsqrt. (#2865)
  • Make not schedule any pipeline run when the iterator has prepare_first_batch=False (#2859)
  • Adjust the filenames of decoder test files and update licenses (#2844)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.2.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.2.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.2.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.2.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.1.0

Published by banasraf over 3 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Documentation improvements (#2834, #2824, #2831, #2758, #2820, and #2822).
  • The following operators were added:
    • The experimental numba_func operator that allows the use of Numba functions in the DALI pipeline (#2804).
    • The expand_dims and squeeze operators for shape manipulation (GPU and CPU) (#2800, #2791, #2792).
    • The multi_paste operator (GPU) (#2681).
  • The following kernels were added:
    • JPEG compression distortion (GPU) (#2801, #2830, and #2839).
    • JPEG color conversion and chroma subsampling (GPU) (#2771).
  • Enabled CUDA kernels compression to decrease the DALI binaries size (#2833).
  • Added the src_dims argument to the reshape operator (#2788).

Fixed issues

This DALI release includes the following fixes:

  • Fixed a race condition in readers.nemo_asr when pad_last_batch is set to True (#2828).
  • Fixed the optical flow initialization issue (#2816).
  • Fixed a race condition in the data loader (#2773).

Improvements

  • Remove 0 default value from mean/std arguments of normalize. (#2834)
  • Add JpegCompressionDistortionGPU kernel (#2830)
  • Updates the pipeline docs page (#2824)
  • Enable CUDA kernels compression in the final binary (#2833)
  • Updates build documentation (#2831)
  • Update key visual (#2822)
  • Add NumbaFunc operator (#2804)
  • Add JPEG distortion kernel (#2801)
  • Add AddArg overloads for enum types (#2819)
  • Update third party dependencies to latest release versions (#2811)
  • Add an ability to provide a custom DALI_extra sha via env variable (#2810)
  • Move all deps into subrepos (#2756)
  • Reshape, Reinterpret, Squeeze and ExpandDims tutorial. (#2791)
  • Separate creation of dependency creation and CUDA installation (#2786)
  • Remove intermediate stage from CUDA toolkit dockerfile (#2803)
  • Add Expand dims operator (#2800)
  • Update TensorFlow ResNet50 example to the latest horovod 21.03 (#2793)
  • Add squeeze operator (#2792)
  • Add JPEG color conversion and chroma subsampling kernel (#2771)
  • Add src_dims to reshape operator (#2788)
  • GPU MultiPaste (#2681)
  • Add --upgrade to pip install commands in documentation (#2758)
  • Use flattened view of the array for copying to shared memory. (#2783)

Bug fixes

  • Fix JPEG distortion kernel quality parameter handling (#2839)
  • Fix typo "funcions" <- "funcions" in math doc (#2820)
  • Update DALI_deps to include FLAC security patch (#2826)
  • Fix coverity issues (#2812)
  • Fix optical flow parameter initialization. (#2816)
  • Add host fallback when nvjpegDecodeJpegDevice and nvjpegDecodeJpegHost fail (#2805)
  • ExternalSource - discard data from all callbacks when one raises StopIteration (#2784)
  • Exclude PyTorch-lighting test with MNIST (#2785)
  • Fix iteration number tracking with pipeline.reset (#2777)
  • Fix a race when loader starts reading even the metadata is not ready yet (#2773)
  • Fix race condition in NemoAsrReader when pad_last_batch is set to True (#2828)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.1.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.1.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.1.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.1.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v1.0.0

Published by banasraf over 3 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • The API documentation has been improved:
    • The functional API has became the main DALI API (#2653).
    • Rewrote all examples to use the functional API (#2761, #2755, #2744, #2748, #2745, and #2716).
    • Applied layout and editorial changes (#2729, #2730, #2713, #2710, #2703, and #2694).
  • New operators:
    • A GridMask GPU operator for GridMask data augmentation (#2652).
    • A RandomObjectBBox operator with caching to randomly select a bounding box (#2718, #2696, #2677, and #2657).
    • A MultiPaste operator, is required to implement Mosaic augmentation (#2583).
  • External Source can now run the per-sample callbacks in parallel (#2543).
  • Added pipeline_def decorator, which is an easier to define a pipeline with the functional API (#2757 and #2629).
  • Moved all decoders to a dedicated Python module (#2741, #2743, and #2725).
  • Moved all readers to a dedicated Python module (#2720, #2721, #2717, #2715, and #2722).
  • Exposed the pipeline output names in the C API (#2665).
  • Introduced the following named Slice operator arguments (#2625):
    • start/rel_start
    • end/rel_end
    • shape/rel_shape
  • Enabled additional codecs and demuxers in FFmpeg (#2651).
  • Added an option to disable the first batch preparation during the iterator construction (#2664).

Fixed issues

This DALI release includes the following fixes:

  • Fixed the JPEG 2000 ROI decoding (#2692).
  • Fixed the layout length check in Transpose (#2693).
  • Fixed the .gpu() usage detection and error for CPU-only pipelines (#2682).

Improvements

  • Rework frameworks notebooks to fn API (#2761)
  • Bump up OpenCV-python version in tests (#2749)
  • Enhance deprecated argument documentation (#2755)
  • Convert notebooks to fn API: audio_processing, custom_operator, serialization (#2744)
  • Expose all pipeline constructor arguments as properties. (#2757)
  • Convert notebooks to fn API: sequence_processing (#2748)
  • Gridmask Gpu (#2652)
  • Run external source callback in parallel (#2543)
  • Bump up nvidia-tensorflow version to 1.15.5 21.02 (#2738)
  • Rewrite image processing examples to fn api. (#2745)
  • Update augmentation gallery (#2716)
  • Remove dynlink CUDA libs from the build image (#2739)
  • Rework getting started (#2729)
  • Adjust Python decoders tests to decoders module (#2741)
  • Adjust notebooks to new decoder module (#2743)
  • Update memory resource interfaces. (#2742)
  • Move decoders to decoders module (#2725)
  • Add Examples and Tutorials metadata title (#2730)
  • Adjust test to new readers module (#2720)
  • Adjust examples to new readers module (#2721)
  • Documentation home update (#2713)
  • Move tfrecord reader to readers module (#2722)
  • Move readers to dedicated submodule (#2717)
  • Add hash-based caching to RandomObjectBBox. (#2718)
  • Add break of VideoReader loop when keyframe past requested has been reached (#2706)
  • Improve set_outputs to accept list or tuple of data nodes as well (#2698)
  • Documentation: New layout of Examples and Tutorials section (#2710)
  • Rename test files for readers (#2715)
  • Add error checking if provided shape to tfrecord can house underlying data (#2705)
  • Documentation editorial changes: Init caps for all headings, Copyright update (#2703)
  • Add documentation to functional API (all fn.*) + New documentation layout (#2653)
  • Parallel random object BBox (#2677)
  • Rework ThreadPool and spinlock (#2696)
  • Improvements in Dockerfile.deps so that RUN commands are easily run in a non-docker environment (#2686)
  • Fix formatting of Resnet-N with Tensorflow example (#2694)
  • Operator RandomObjectBBox (#2657)
  • MultiPaste operator (#2583)
  • Add better exception granurality to memory::alloc_shared and memory::alloc_unique (#2683)
  • Make DALI pipeline use default seed (-1) when None is set to seed (#2676)
  • Make preparation of the first batch during the iterator construction optional (#2664)
  • Parallelize commands in bundle-wheel.sh (#2672)
  • Pipeline decorator (#2629)
  • Move to CUDA 11.2 update 1 (#2668)
  • Make sure that OpenCV decoding fallback follows EXIF information handling (#2666)
  • Expose names of Pipeline outputs in C API (#2665)
  • Enable named Slice arguments: start/rel_start, end/rel_end, shape/rel_shape (#2625)
  • Update nvidia-tensorflow in qa scripts to 20.12 (#2654)
  • Enable more codecs and demuxers in FFmpeg (#2651)

Bug fixes

  • Fix paddle ssd (#2765)
  • Fix Gluon example (#2764)
  • Remove redundant dimension from Optical Flow example. (#2762)
  • Fix 403 error when downloading Mnist dataset in Pytorch Lighting example (#2759)
  • Fix documentation instances of deprecated fn.image_decoder (#2754)
  • Shutdown executor when an error occurs in the executor itself, not in one of operators. (#2750)
  • Fix libcufile.so name to have *.0 sufix (#2735)
  • Fix test exclude pattern for Xavier (#2731)
  • Fix auto replacement of deprecated args for schema inheritance (#2733)
  • Fix constant input promotion for mixed backend. (#2726)
  • Fix type of slice's rel_shape argument (#2714)
  • Fix a regression in RandomObjectBBox: weights not set to default. (#2719)
  • Update TensorFlow ReseNet50 example to work with the latest TF 2.4.x version (#2704)
  • Add auto generated docs files to .gitignore (#2711)
  • Update DALI PyTorch ligthing example to work with the newest lighting (#2697)
  • Fix JPEG2K fused decoding (with ROI), add native tests for JP2k decoding (#2692)
  • Fix TL1_tensorflow-dali_test (#2687)
  • Remove unnecessary cuda runtime dependency from alloc.h (#2691)
  • Fix layout length check in Transpose. (#2693)
  • Replace eval with safer ast.literal_eval (#2690)
  • Fix .gpu usage detection and error for CPU only pipelines (#2682)
  • Add support for TensorFlow 2.4.1 in tests and for TF plugin (#2679)
  • Fix wrong early exit in function inside bundle-wheel.sh (#2675)
  • Fix apex compilation on Ubuntu 20.04 in TL1_ssd_training (#2671)
  • Fix cmake installation in TL1 for Ubuntu 20.04 (#2669)
  • Remove the split stages implementation of the hybrid image decoder (#2753)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

  • fn.audio_decoder / ops.AudioDecoder has been renamed to fn.decoders.audio / ops.decoders.Audio.
  • fn.image_decoder / ops.ImageDecoder has been renamed to fn.decoders.image / ops.decoders.Image.
  • fn.image_decoder_crop / ops.ImageDecoderCrop has been renamed to fn.decoders.image_crop / ops.decoders.ImageCrop.
  • fn.image_decoder_random_crop / ops.ImageDecoderRandomCrop has been renamed to fn.decoders.image_random_crop / ops.decoders.ImageRandomCrop.
  • fn.image_decoder_slice / ops.ImageDecoderSlice has been renamed to fn.decoders.image_slice / ops.decoders.ImageSlice.
  • fn.caffe2_reader / ops.Caffe2Reader has been renamed to fn.readers.caffe2 / ops.readers.Caffe2.
  • fn.caffe_reader / ops.CaffeReader has been renamed to fn.readers.caffe / ops.readers.Caffe.
  • fn.coco_reader / ops.CocoReader has been renamed to fn.readers.coco / ops.readers.Coco.
  • fn.file_reader / ops.FileReader has been renamed to fn.readers.file / ops.readers.File.
  • fn.mxnet_reader / ops.MXNetReader has been renamed to fn.readers.mxnet / ops.readers.MXNet.
  • fn.nemo_asr_reader / ops.NemoAsrReader has been renamed to fn.readers.nemo_asr / ops.readers.NemoAsr.
  • fn.numpy_reader / ops.NumpyReader has been renamed to fn.readers.numpy / ops.readers.Numpy.
  • fn.sequence_reader / ops.SequenceReader has been renamed to fn.readers.sequence / ops.readers.Sequence.
  • fn.tfrecord_reader / ops.TFRecordReader has been renamed to fn.readers.tfrecord / ops.readers.TFRecord.
  • fn.video_reader / ops.VideoReader has been renamed to fn.readers.video / ops.readers.Video.
  • fn.video_reader_resize/ops.VideoReaderResize has been renamed to fn.readers.video_resize / ops.readers.VideoResize.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.0.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.0.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.0.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.0.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.31.0

Published by banasraf over 3 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • Gridmask CPU and GridMask Data Augmentation (https://arxiv.org/abs/2001.04086), which is useful for the EfficientNet pipeline (#2582).
    • ROIRandomCrop CPU, where an operator is required to perform the biased random crop in segmentation applications (#2638).
  • Added support for the variable batch size in ExternalSource (#2481, #2641).
  • Added support for the time-major layout in the following spectrogram processing operators:
    • GPU and CPU Spectrogram (#2619, #2617)
    • GPU and CPU MelFilterBank (#2620)
  • Refactored and unified the following RNG operators:
    • Uniform (#2531)
    • CoinFlip (#2577)
  • Reworked the custom operators documentation (#2568).
  • Applied performance improvements in the JPEG decoder (#2655, #2610).

Fixed issues

  • Fixed the length that was reported by DALI FW iterators when the DROP policy is used (#2611)
  • Provided a workaround for a compiler problem that caused an Invalid device function error. (#2656)
  • Fixed RandomBBoxCrop errors while using the crop_shape argument (#2605)

Improvements

  • Use pinned memory for staging buffer for HW nvJPEG decoder (#2655)
  • Find bounding boxes of multiple labels (#2650)
  • Add ROIRandomCrop operator (#2638)
  • Add FW iterators handling of variable batch size and improve ES examples (#2641)
  • Connected components (#2640)
  • Gridmask Cpu (#2582)
  • Iter-to-iter variable batch size (#2481)
  • Enable support for different layouts in the MelFilterBank (#2620)
  • Rework ops.random.CoinFlip (#2577)
  • Enable time-major layout in Spectrogram CPU (#2619)
  • Update clang format (#2524)
  • Improve Optical Flow error verbosity (#2618)
  • TF dataset tests rework (#2539)
  • Time major Spectrogram (GPU-only) (#2617)
  • Integrate RMM (#2609)
  • Propagate scalar in transform.scale (#2581)
  • Remove redundant JPEG decoder initialization from peeking shape function (#2610)
  • Rework ops.random.Uniform (#2531)
  • Rework custom operator docs (#2568)

Bug fixes

  • Workaround a compiler problem that caused Invalid device function error. (#2656)
  • Python fixes: argument inputs, external source, docs (#2646)
  • Fix SeparateQueuePolicy handling of the CPU stage (#2636)
  • Fix variable batch size for list of tensors. Make constants constant again. (#2637)
  • Fix Uniform discrete distribution (#2635)
  • Fix a double set of preserve schema arg and uninitialized var (#2632)
  • Add handling of empty inputs and tiny outputs in Resize op and Resampling kernels. (#2634)
  • Refactor functions that extract a range of samples from TLS and TLV. (#2628)
  • Fix RandomBBoxCrop errors while using crop_shape argument (#2605)
  • Update ResNet50 example to work with TensorFlow 2.x (#2537)
  • Keep reference to owner of data in Python Tensor and TensorList (#2606)
  • Enable nvJPEG2K for CUDA 11.2 builds (#2614)
  • Disable mmap based test for Xavier (#2612)
  • Fix length reported by DALI FW iterators when DROP policy is used (#2611)
  • Use smaller block in Warp (#2613)

Breaking API changes

Deprecated features

  • ops.Uniform was moved to ops.random.Uniform
  • ops.CoinFlip was moved to ops.random.CoinFlip

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.31.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.31.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.31.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.31.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.30.0

Published by klecki over 3 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Optimized CPU resampling (#2540).
  • Added the following mathematical expressions:
    • Disallowed unwanted __bool__ conversions (#2538).
    • Added exp and log math functions (#2555).
  • Added the images argument for the COCOReader, which allows for the custom ordering of images and fixed a bug in the segmentation data parsing (#2548, #2597).
  • Added support for the nvJPEG preallocate API for a batched hardware decoder (#2544).
  • Added support surfaces with strides over 2G (#2600).
  • Enabled CUDA 11.2 builds (#2553).
  • Documentation improvements:
    • Added a supported matrix to the documentation (#2519).
    • Added a geometric transform tutorial. (#2530).
  • Allowed DALI to be compiled with Clang (#2416).
  • Added CUDA API checks in utility functions (#2517) and tests (#2516).

Fixed issues

  • Fixed the autoreset option in the iterator for the DROP policy (#2567).

Improvements

  • Make Nvjpeg2kTest more verbose (#2509)
  • Compile DALI with Clang (#2416)
  • Try to actually find the library instead of arbitrarily deciding it can't be there (#2511)
  • Enable GDS for conda build by default (#2515)
  • Pool memory resource (#2518)
  • Add GTest Event Listener with CUDA validation after TEST (#2516)
  • Disable GPU numpy reader test form sm < 6.0 (#2514)
  • Mention WarpAffine in transforms.* documentation (#2527)
  • Ops rework to prepare iter-to-iter batch size variability (#2408)
  • Fix unchecked CUDA API calls in utility functions (#2517)
  • Bump up nvidia-tensorflow version in tests (#2526)
  • Cleanup warnings in CUDA code (#2523)
  • Add debug info to RN50 pipeline (#2522)
  • Add a supported matrix to the documentation (#2519)
  • Add ArgValue utility (#2528)
  • Remove pinning numpy version in TL1_ssd_training test (#2536)
  • Remove unreachable return statement (#2541)
  • Vectorize CPU resampling (#2540)
  • Remove constraint on input type for RandomResizedCrop. Update tests. (#2549)
  • Hide ArithmeticGenericOp doc and disallow bool (#2538)
  • Support for nvJPEG preallocate API for batched HW decoder (#2544)
  • Add exp and log math functions (#2555)
  • Add COCOReader files arg support and fix bug in the segmentation data parsing (#2548)
  • Event pool (#2520)
  • Rework random number generators. RNGBase operator template and NormalDistribution. (#2513)
  • Enable CUDA 11.2 builds (#2553)
  • Adjust range of tested log inputs (#2564)
  • Add geometric transform tutorial. (#2530)
  • Add synchronization after randomizer construction. (#2565)
  • Move to the upstream version of paddle paddle (#2561)
  • Move examples to fn api (#2566)
  • Remove legacy API based nvJPEG decoder implementation (#2591)
  • Support surfaces with strides over 2G (#2600)
  • COCOReader images argument can be used to provide a custom order of images (#2597)

Bug fixes

  • Fix build for Jetson platform (#2512)
  • Fix aarch64 build errors (#2529)
  • Fix broken uniform operator python tests (#2556)
  • Fix Clang build (#2560)
  • Fix Xavier test crash caused by NumPy faulty build (#2596)
  • Fix autoreset option in iterator for DROP policy (#2567)
  • Fix uniform distribution test expectations (#2589)

Breaking API changes

Deprecated features

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.30.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.30.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.29.0

Published by jantonguirao almost 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • NumpyReader GPU Operator with the support of GPU Direct Storage (#2477)
    • NvJpeg2K decoding was enabled in ImageDecoder operator (#2501)
    • segmentation.RandomMaskPixel operator for creating random masks containing foreground pixels (#2445)
    • OneHot for GPU (#2436)
  • Move all NVTX infrastructure into core and create DALI domain (#2472)
  • New Examples:
    • Add mask processing to COCO Reader with Augmentations example (#2426)
    • Add reductions example (#2457)
    • Example of random_mask_pixel to perform biased random crop (#2474)
    • Update ExternalSource framework examples (#2482)
  • Operator Improvements:
    • Pad: Add support for per-sample shape and alignment requirements (#2432)
    • RandomResizedCrop: enable channel-first and video support + add tests (#2430)
    • PythonFunction Operator: support for output layouts (#2486)
    • Optimize the DCT GPU kernel. (#2471)
    • COCOReader: Support for uncompressed RLE masks (#2478)
    • transforms.Rotation to accept scalar inputs (#2494)
  • Move to CUDA 11.1 update 1 (#2419)

Fixed issues

  • NumpyReader : Replace std::regex with custom implementation (#2489) - fix ABI incompatibility issues
  • Fix the dimensionality of labels in SSDRandomCrop. (#2488)

Improvements

  • Move to CUDA 11.1 update 1 (#2419)
  • RandomResizedCrop: enable channel-first and video support + add tests (#2430)
  • Pad operator: Add support for per-sample shape and alignment requirements (#2432)
  • Update clang to 10.0 (#2424)
  • Add mask processing to COCO Reader with Augmentations example (#2426)
  • Make custom nvJEPG allocator return a relevant allocation status (#2438)
  • Make the custom nvJPEG allocator not throw and return only the status (#2443)
  • Add SearchableRLEMask utility (#2441)
  • Add GPU support to OneHot operator (#2436)
  • Reduce axes names (#2425)
  • Remove CUDA headers and generate stubs in runtime (#2420)
  • TensorVector update for iter-to-iter variable batch size (#2435)
  • Fix build with all options off, relax libclang required version (#2455)
  • Add support for UINT8 and INT8 outputs in CMN + scale and shift arguments (#2458)
  • CocoReader Parse RLE masks only when piwelwise masks are requested (#2462)
  • Add reductions example (#2457)
  • Enables direct linking with libcuda.so instead of dlopen (#2459)
  • Add segmentation.RandomMaskPixel operator (#2445)
  • Skips the building of prebuilt DALI package for nvidia-tensorflow (#2451)
  • Pad to square tests (#2442)
  • Enable compile time generation of dynlink wrappers for nvml (#2463)
  • Deprecate squeeze_labels option from MXNet iterator and enhance .squeeze function to match numpy style interface (#2450)
  • Hide hidden ops and improve Enum docs quality (#2470)
  • Enforce uniform rank and type of the outputs read by CPU DataReader. (#2476)
  • Move all NVTX infrastructure into core and create DALI domain (#2472)
  • MXNet Iterator: Revert to squeeze_labels=True behavior by default (#2479)
  • Example of random_mask_pixel to perform biased random crop (#2474)
  • Update DALI dependency (#2483)
  • Update ExternalSource framework examples (#2482)
  • Optimize the DCT GPU kernel. (#2471)
  • Support the output layouts in the PythonFunction Operator (#2486)
  • transforms.Rotation to accept scalar inputs (#2494)
  • Rework tutorials general (#2480)
  • Add support for GPU based numpy reader (#2477)
  • Per sample ExternalSource (#2469)
  • Use atol instead of rtol (#2499)
  • Lifts the restriction and enables enable_frame_num and enable_timestamps for filenames (#2468)
  • Reenable nvJPEG2000 (#2501)
  • Disables GDS for the default build configuration (#2502)
  • COCOReader: Support for uncompressed RLE masks (#2478)
  • Memory manager - interfaces, utilities, monotonic resources, malloc resource (#2497)
  • Update Jetson compilation guide (#2508)
  • Makes sure that cuFile and nvJPEG2k are not possible to set when not supported (#2510)

Bug fixes

  • Fix seed in RandomResizedCrop test. (#2437)
  • QNX build fix (#2440)
  • Fix lack of proper loading of best_prec1 from the checkpoint (#2466)
  • Fix the dimensionality of labels in SSDRandomCrop. (#2488)
  • NumpyReader : Replace std::regex with custom implementation (#2489)
  • Fix CPU only mode in C API (#2496)
  • Fix bugs reported by static analysis (#2491)
  • Fix typo in STYLE_GUIDE.md (#2503)
  • Fix NVJPEG2K_ENABLED test macros (#2504)

Breaking API changes

Deprecated features

  • Deprecate squeeze_labels option from MXNet iterator and enhance .squeeze function to match numpy style interface (#2450)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.29.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). Using the latest driver may enable additional functionality. More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.29.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.28.0

Published by klecki almost 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • Affine transform generators, which are operators that generate scale, rotate, shear, translate, crop transform matrices (#2309).
      • You can use the transforms.Combine operator to combine these matrices (#2317).
      • These transformations can be applied to data by using the CoordTransform operator.
    • Added min, max, and clamp arithmetic operators (#2298).
    • Cat and Stack Operators to concatenate and stack Tensors for the CPU and the GPU (#2301, #2339, #2350).
    • The following reductions for the CPU and the GPU (#2342, #2379 #2395):
      • Min, Max, Sum, Mean, MeanSquare, RootMeanSquare, Std, Variance
    • The MFCC operator for the GPU (#2423).
    • The SelectMasks operator (#2381).
    • Add operators for batch reordering:
      • BatchPermutation for generating random reordering of the batch.
      • PermuteBatch, which reorders tensors in a batch, based on a list of provided indices (#2417).
    • Operator Compose: PyTorch-style API to compose the operators (#2393).
  • Improvements in existing operators:
    • Added SeekFrames to the audio decoder. The redesign allows you to decide the decoded data type at runtime (#2334).
    • Added the ability to handle UTF8 text to the NemoAsrReader (#2358).
    • Added explicit file list support to the FileReader (#2389).
    • Improvements in the COCO reader API (#2406).
      • The COCOReader API now outputs relative mask polygon coordinates when the option ratio is set to True (#2375).
    • RandomBBoxCrop now optionally outputs the indices of the bounding boxes that passed the centroid filter (#2374).
  • The late initialization of torch_gpu_device in the Pytorch plugin (#2411).
  • The automatic constant-to-input promotion (#2361) and generalized handling of operator arguments (#2393).
  • Added a MNIST example for DALI and PyTorch Lightning (#2360).
  • Added the last_batch_policy to the framework iterator (#2269).
  • New builds:
    • Python 3.9 is now enabled (#2333).
    • The DALI wheels for CUDA 11 are built with CUDA 11.1 and use Enhanced Compatibility to work with CUDA 11.0 (#2302, #2356, #2367, and #2413).
    • Added support for the SM_86 architecture (#2364).
    • Added the ability to cross-build Python wheels for Jetson (#2313).

Bug fixes

  • Fix error when VideoReader is prematurely terminated (#2336)
  • Fix failure in affine transforms tests (#2337)
  • Fix the problem of output outliving the pipeline in python (#2341)
  • Fix lack of proper layout setting in the VideoReader (#2346)
  • Fix uniform generator operator (#2352)
  • Bugfixes: Default nfft value and to_snake_case implementation (#2353)
  • Fixes problems in the weekly build (#2372)
  • Fix a problem with reference to "incomplete" type (error in Clang/CUDA). (#2377)
  • Fix how DALI handles StopIteration from the ExternalSource (#2373)
  • Fix TL1_nodeps_build and TL0_cpu_only (#2391)
  • Fix CPU only mode for arithm operators (#2400)
  • Preserve shape of psuedoscalars in arithmetic ops. (#2359)

Improvements

  • Add affine transform generators: TransformScale, TransformRotation, TransformShear, TransformCrop (#2309)
  • Change code/docs language to be more inclusive (#2322)
  • Update nvidia-tensorflow test package to 20.9 and bump tensorflow-gpu minor versions (#2320)
  • Update example usage of DALIClassificationIterator in docs strings (#2306)
  • Reduce video reader memory consumption (#2308)
  • TensorJoin kernel for CPU (#2301)
  • Enable automatic python modules for operator (#2329)
  • Split GaussianBlur Python test (#2332)
  • Add CombineTransforms operator (#2317)
  • Append TensorListShapes (#2291)
  • Enable CUDA 11.1 builds (#2302)
  • Add min, max and clamp arithmetic ops (#2298)
  • Update TensorFlow plugin documentation (#2328)
  • Remove Python 3.5 support, enable Python 3.9 (#2333)
  • Enable nvJPEG2k build for CUDA 11.1 (#2343)
  • Add BUILD_DALI_NODEPS to allow building dali_core and dali_kernels without extra third party libraries present in the system (#2321)
  • Add SeekFrames to audio decoder. Redesign to allow deciding decoded data type at runtime. (#2334)
  • Add discrete mode to Uniform operator (#2340)
  • Test for utility CMake function (find_dali) (#2325)
  • Propagate new build options to other build utilities (#2349)
  • Add support for N-dim tensors to OneHot (#2345)
  • Adds a separate option to preallocate nvjPEG2k memory (#2347)
  • Tensor join GPU (#2339)
  • Reductions: min, max (#2342)
  • Tensor concatenation and stacking (#2350)
  • Use inverse (source-to-destination) matrix in WarpAffine operator (#2338)
  • Disable more dependencies for nodeps build (#2355)
  • Update DALI trademark information (#2351)
  • Reduce GPU memory fraction in TF tests to 0.5. (#2357)
  • Automatic constant-to-input promotion. (#2361)
  • Add support for SM_86 architecture (#2364)
  • Use current class next implementation in init, to avoid special handling of first batch in child classes (#2363)
  • Add ability to cross-build Python wheels for Jetson (#2313)
  • Add NemoAsrReader handling of UTF8 text (#2358)
  • Enable CUDA 11 compatibility mode (#2356)
  • Add MNIST example for DALI and PyTorch Lightning (#2360)
  • Add last_batch_policy to the framework iterator (#2269)
  • COCOReader to output relative mask polygon coordinates when the option ratio is set to True (#2375)
  • RandomBBoxCrop to optionally output the indices of the bounding boxes that passed the centroid filter (#2374)
  • Enable compatibility layer in tests for CUDA 11 (#2367)
  • Reduce Sum Op (#2379)
  • Install DALI license, copyright and acknowledgments explicitly (#2392)
  • Add layout support to OneHot operator (#2388)
  • Generalized handling of operator arguments + operator Compose. (#2393)
  • GPU DCT kernel (#2398)
  • Bump up Nvidia TF version to 20.10 (#2397)
  • More reductions (#2395)
  • Late initialization of torch_gpu_device in pytorch plugin (#2411)
  • Add a link to CUDA Enhanced Compatibility Across Minor Releases guide (#2410)
  • Add explicit file list support to FileReader. (#2389)
  • Add TransformTranslation deprecation placeholder Op (#2412)
  • Bump up the CuPy to one that supports CUDA 11.0 (#2413)
  • Add a missing include in filesystem.cc (#2414)
  • Add a warning about the Python function incompatibility with TensorFlow (#2415)
  • Improvements in COCO reader API (#2406)
  • Add operators for batch reordering (#2417)
  • Add SelectMasks operator (#2381)
  • GPU MFCC operator. (#2423)
  • Make base image for dockers customizable at the build time (#2427)

Breaking API changes

  • Python 3.5 is no longer supported by the official DALI wheels.

Deprecated feature

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.28.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). Using the latest driver may enable additional functionality. More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.28.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.27.0

Published by klecki almost 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • CoordTransform Operator for applying a linear transformation to points or vectors (#2288)
    • GaussianBlur Gpu Operator (#2314, #2311, #2254)
    • Nemo ASR Reader (#2234)
    • Resize 3D - operator can now process 3D inputs (#2226)
    • Add Translate affine transform generator (#2297) - in the next release it will be moved to a dedicated module.
  • Use true scalars (except in classification readers) - 0-dim Tensors represent scalar values (#2318)
  • Adjust documentation after review (#2175)
  • Support for ZSTD compression for TIFF files (#2273)
  • Support for Run-Length Encodings and Pixelwise Masks in COCO Reader (#2248)
  • Support more types in Lookup table (#2290)

Bug fixes

  • Fixes crash in RandomBBoxCrop when no labels are provided (#2265)
  • Fix minor issues reported by static analysis (#2276)
  • Fix detection pipeline test on Ampere (#2304)
  • Fix BUILD_LIBSND=OFF build (#2316)
  • Fix build for LMDB disabled (#2319)

Improvements

  • Update build and test deps to the latest version (#2250)
  • Resize 3D + resize tests (#2226)
  • Allow passing a <= 0 values in the file list to allow more flexible frame indexing (#2264)
  • Extend host decoder to support jpeg2000 (#2270)
  • Add file_list argument support to the Numpy reader operator (#2274)
  • Allow Slice to silently assume absolute anchor and shape when those are represented by an integer (#2282)
  • TransformPoints kernel (#2287)
  • Add inline to LookaheadParser methods (#2289)
  • Add deprecation handling in backend (#2279)
  • Support more types in Lookup table (#2290)
  • Adjust documentation after review (#2175)
  • Transform points op (#2288)
  • Support for ZSTD compression for TIFF files (#2273)
  • Support for Run-Length Encodings and Pixelwise Masks in COCO Reader (#2248)
  • Extract a DecodeAudio implementation from Audio decoder operator (#2294)
  • Extend test_RN50_data_pipeline.py test (#2295)
  • Add ConvolutionGPU kernel based on CUTLASS (#2254)
  • Add Translate affine transform generator (#2297)
  • Add *.cuh and *.inl to list of headers to bundle (#2307)
  • Add Nemo ASR reader (#2234)
  • Add SeprableConvolutionGPU kernel (#2311)
  • Add GaussianBlur Gpu Operator (#2314)
  • Use true scalars (except in classification readers) + bug fixes (#2318)
  • Add nvjpeg2k support to GPU Image Decoder. Extend nvjpeg memory pool to support nvjpeg2k allocators.
  • Adds a separate option to preallocate nvjPEG2k memory (#2347)
  • Due to some decoding problems disable nvJPEG2K support for now by the default

Breaking API changes

Deprecated feature

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.27.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.27.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.27.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.27.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.26.0

Published by klecki about 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • Add PeekShape operator to learn the decoded image shape (#2205)
  • Add an ability to run DALI without GPU (#2165)
  • Optimize single-channel audio resampling with SSE2 (#2240)
  • Add ability to pass DALI TensorList or a list of DALI Tensors to exernal source (#2244)
  • Enhance error messages in case of not supported data types in operators (#2211)
  • Add a more verbose message about unsupported videos (#2203)
  • Use copy kernel when making a contiguous batch during ShareUserData, if user requested it (#2200)

Bug fixes

  • Fix typo in VERSION
  • Fix lack of input type checking in GPU variant of Spectrogram operator (#2192)
  • Fix TensorListView::to_static (#2216)
  • Temporarily freeze protobuf packages versions in Conda (#2222)
  • Fix VideoReader error checking when opening files (#2223)
  • Fix NVTX annotations (#2215)
  • Fix docker/build.sh to use Python 3 for TF plugin (#2214)
  • Fix hw_decoder_load=0.0 for ImageDecoder related tests that require deterministic results (#2232)
  • Fix a memory leak in the audio decoder (#2235)
  • Fix for TF nightly container (#2236)
  • Fix wrong jupyter execution syntax (#2241)
  • Fix TL1_ssd_training test (#2243)

Improvements

  • Use copy kernel when making a contiguous batch during ShareUserData, if user requested it (#2200)
  • Update ExternalSource documentation (#2201)
  • Use NVCC to detect cuda release version (#2194)
  • DALI TF stop requiring DALI to be installed before build_ext step (#2204)
  • Add PeekShape operator to learn the decoded image shape (#2205)
  • Remove dummy package (#2207)
  • Add a more verbose message about unsupported videos (#2203)
  • Enhance error messages in case of not supported data types in operators (#2211)
  • Add more supported types to SliceBase (#2210)
  • Add an ability to run DALI without GPU (#2165)
  • Add CUTLASS to third party with an initial code layout (#2237)
  • Make the CUTLASS template files pass lint check (#2238)
  • Use SSE2 for single-channel audio resampling (#2240)
  • Add nvidia-tensorflow to DALI tests (#2075)
  • Update APEX version to the latest stable and tested version (#2246)
  • Fuzzing targets (#2219)
  • Add ability to pass DALI TensorList or a list of DALI Tensors to exernal source (#2244)
  • 3D resampling (#1489)
  • Skip VP9 tests instead of failing if codec is not supported. (#2251)

Breaking API changes

Deprecated feature

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.26.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.26.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.26.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.26.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI 0.25.1

Published by klecki about 4 years ago

Key Features and Enhancements

This is a patch release that contains only fixes.

Bug fixes

  • Fixed a crash that occurred when DALI CUDA 11 runs on pre 450.x driver with the compatibility layer (#2208, #2230).

Known issues

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.25.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.25.1

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.25.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.25.1

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

SBSA aarch64 CUDA 11.0 direct download link:

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.25.0

Published by klecki about 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added support for aarch64 Server Base System Architecture (#2110) - we provide a build for CUDA 11 that can be installed following the installation guide.
  • New operators:
    • Normal Distribution GPU Operator (#2125)
    • Video reader resize (#2097)
  • Improvements to ExternalSource Op:
    • Added the no_copy option, which allows DALI to borrow a user's memory instead of copying it (#2024).
    • Removed the redundant copy in the ExternalSource operator (#2124)
  • Reworked the Resize operator family, including video, channel-first, RoI, and multiple-type support (#2164) with the new Resize tutorial (#2189).
  • Bundled all python versions into one wheel (#2096).
    • One DALI wheel can be used with all supported Python versions, including 3.5, 3.6, 3.7 and 3.8.
  • Improved error messages and added information about the Operator of origin (#2065).
  • Extended the following C APIs to copy output and input samples:
    • daliOutputCopy (#2145) and daliOutputCopySamples (#2161, #2186).
    • These APIs allow you to use the copy kernel and reduce the amount of copied memory and to use the copy kernel in ShareUserData (#2200).
  • Performance improvements:
    • Arithmetic Ops GPU (#2137)
    • Priorities in CPU thread pool allowing for better load balancing with uneven samples (#2092, #2102)

Bug fixes

  • Fix aarch64 builds that are still gcc 5.x based (#2099)
  • Fix conda build after the new build of libprotobuf was released (#2101)
  • Fix the lack of setting the right device in the ExternalSource (#2112)
  • Fix lack of a proper include to set CUDART_VERSION inside nvml.h and nvml_wrap.h (#2113)
  • Fix layout propagation in Gaussian Blur (#2118)
  • Fix layout propagation in Erase (#2133)
  • Fix TF dataset notebook (#2135)
  • Fix lack of MXNet plugin docs generation (#2146)
  • Fix TL3_RN50_convergence test for PaddlePaddle (#2159)
  • Workaround a bug in compiler, magically converting instance call to static call. (#2162)
  • Fix the need to have a numpy installed when test_utils.py is just imported (#2166)
  • Fix missing layouts in operators (#2136)
  • Fix QNX build (#2199)

Improvements

  • Update to CUDA 11 GA toolkit (#2094)
  • Allow nvJPEG to pre-allocate pinned and device buffers during construction (#2081)
  • Add zero-copy to the ExternalSource operator (#2024)
  • Introduce priorities in ThreadPool (#2092)
  • Video reader resize (#2097)
  • Detect version of CUDA based on libcudart.so.* name (#2105)
  • Add Operator origin information to most errors (#2065)
  • Enhance Pad documentation (#2098)
  • Bundle all python versions into one wheel (#2096)
  • Use new nvmlDeviceGetCpuAffinityWithinScope API for thread binding (#2093)
  • Use new ThreadPool API to post work with priority (#2102)
  • TensorListView generalized reshape and reinterpret (#2108)
  • Update aarch64_linux build to Jetpack 4.4 and CUDA 10.2 (#2107)
  • Renable VP9 video tests after driver update (#2117)
  • Remove usage of future from DALI (#2119)
  • Removes redundant copy in ExternalSource operator (#2124)
  • Add more verbose info when HwDecoderUtilizationTest is skipped (#2106)
  • Per-stream/per-device object pool. (#2127)
  • Fix PaddlePaddle test broken by rarfile update not compatible with Python 3.5 (#2130)
  • Add missing and a partial check in linter for this include file. (#2131)
  • Add libprotobuf-static as DALI conda build dependency (#2132)
  • Auto apply dataset options (#1963)
  • Add an option to use a copy kernel to feed external input (#2122)
  • Adjust mel filter test to librosa change (#2144)
  • Add dependency to dali_kernels to dali lib (#2143)
  • Tune Arithmetic Op launch specification (#2137)
  • Add daliOutputCopy (#2145)
  • Reduce memory usage in VideoReadeResize test (#2149)
  • Normal Distribution GPU Operator (#2125)
  • Remove pinning of numba version as librosa 0.8.0 has been released (#2151)
  • Add an ability to suppress _iterator_deprecation_warning (#2154)
  • Span-of-arrays flattening + minor layout utils (#2156)
  • Remove deprecated use of ltrb in BboxRandomCrop (#2141)
  • Improve PyTorch and MXNet ExternalSource examples (#2147)
  • Enable DALI build and tests for SBSA (#2110)
  • Add --disable-mmap flag to RN50 data pipeline test (#2163)
  • Make TF dataset build for 2.3.0 (#2160)
  • Enforce recordio indices are not empty (#2157)
  • Add daliOutputCopySamples (#2161)
  • Use TIFFGetFieldDefaulted and remove warning about falling back to GenericImage decoder (#2153)
  • Add an information about the faulty image to CreateImage invocation in nvjpeg_decoder_decoupled_api.h (#2174)
  • Add proper error handling where there are no valid sequences in the VideoReader (#2180)
  • Update instruction how run ResNet50 example for PyTorch (#2170)
  • Add the possibility to skip individual samples when using daliOutputCopySamples (#2186)
  • Change DALI build command to use minor CUDA version as well (#2155)
  • Reworked Resize operator family - video, channel-first, RoI and multiple type support (#2164)
  • Move to Update 1 release of CUDA 11 toolkit (#2188)
  • Make the test deterministically pick video files. (#2190)
  • Resize tutorial (#2189)
  • Use copy kernel when making a contiguous batch during ShareUserData, if user requested it (#2200)

Breaking API changes

  • Remove deprecated use of ltrb in BboxRandomCrop (#2141)

Deprecated feature

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.25.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.25.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.25.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.25.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.24.0

Published by klecki about 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New Operators:
    • Preemphasis GPU (#2025).
    • GaussianBlur CPU (#1987, #2009, #2038).
  • Operator Improvements:
    • Extended the Slice and Crop family of operators with out-of-bounds policies, which provides support for padding and trimming to existing shape (#2000, #2056, #2044).
    • Moved the memory hint allocation in the Resize to the build phase (#2033).
    • Optimized the Transpose GPU operator to improve the performance on non-uniform data batches (#2011, #2032).
  • Support for GPU data input data in the ExternalSource operator (#1997).
    • Added built-in support for GPU CuPy and PyTorch tensors in ExternalSource (#2050).
    • Added the ability to provide an external stream, stream 0, or automatic stream selection for GPU data access (#2050).
    • Added DLPack input support to the ExternalSource operator (#2023).
  • Add an ability to dump info about operator output buffer size (#2039)
  • Improved error checking with external libraries (#2062, #2063).

Bug fixes

  • Fix semantics of the masks_meta field (#2029)
  • Fix shape comparison in C API tests. (#2045)
  • Fix conda build after TensorFlow 2.2 release (#2048)
  • Fix Slice pad support when last dimension is padded (#2056)
  • Fix TL1_jupyter_conda test (#2058)
  • Fix CropMirrorNormalize benchmark (#2059)
  • Fix epoch_size method in the pipeline (#2071)
  • Undefined name: RuntimeErrorError --> RuntimeError (#2076)
  • Use ==/!= to compare constant literals (str, bytes, int, float, tuple) (#2078)
  • Fix Assertion is always true in Python tests (#2077)
  • Fix undefined name errors in Python reshape tests (#2079)
  • Fix conda build after the new build of libprotobuf was released (#2101)

Improvements

  • Add Convolution CPU kernel (#1987)
  • Lock numba version to 0.49 when librosa is used (#2016)
  • Add a deprecation warning for python 3.5 (#2021)
  • Change locked version of numba to at most 0.49, as 0.47 is the last release for py35 (#2020)
  • Add Preemphasis GPU operator (#2025)
  • Add out-of-bounds-policy (including pad support) to Slice/Crop (#2000)
  • Change from a custom manylinux3 to prebuild and upstream manylinux2014 (#1965)
  • Enable python ExternalSource operator for the GPU data (#1997)
  • Batched GPU transposition (#2011)
  • Move memory hint allocation in the Resize to the build phase (#2033)
  • Replace cuTT in Transpose operator with DALI kernel + move permute to core. (#2032)
  • Separable convolution (#2009)
  • Build DALI with OpenMP SIMD (#1992)
  • Use empty tensors for DL FW plugins instead of zeroed one (#2030)
  • Lanczos3 downscale + interp type notebook. (#2041)
  • Update docs layout template after sphinx_rtd_theme update (#2046)
  • Makes TF RN50 TL3 test to compile ahead of time (#2028)
  • Add an ability to dump info about operator output buffer size (#2039)
  • Add Gaussian window calculation for Gaussian Op (#2053)
  • Remove cuda 9 related packages from tests, update cupy to 7.5 (#2049)
  • Use Slice kernel to implement Pad operator (instead of SliceFlipNormalizePermutePad) (#2057)
  • Add PyTorch support in ExternalSource + fix handling of CUDA streams in Python frontend (#2050)
  • Add GaussianBlur CPU Op (#2038)
  • HW Decoder utilization test (#2022)
  • Add DLPack input support to the ExternalSource operator (#2023)
  • Add better return value/error status checks (#2062)
  • Check libjpeg return codes (#2063)
  • CropMirrorNormalize full pad support (#2044)
  • Remove confusing main from nosetest files (#2083)
  • Update to CUDA 11 GA toolkit (#2094)
  • Detect version of CUDA based on libcudart.so.* name (#2105)
  • Reduce Paddle RN50 test gpu mem fraction to 80% (#2152)

Breaking API changes

Deprecated feature

  • Added a deprecation warning for Python 3.5 (#2021).
  • Deprecated output_dtype and use dtype (#2051).
  • Added an argument deprecation mechanism and deprecated "image_type" in Crop, Slice, and CropMirrorNormalize (#2061).

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.24.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.24.0

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.24.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.24.0

Or use direct download links (CUDA 10.0):

https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.24.0-1446725-cp35-cp35m-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.24.0-1446725-cp36-cp36m-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.24.0-1446725-cp37-cp37m-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.24.0-1446725-cp38-cp38-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda100/nvidia-dali-tf-plugin-cuda100-0.24.0.tar.gz

Or use direct download links (CUDA 11.0):

https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.24.0-1472979-cp35-cp35m-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.24.0-1472979-cp36-cp36m-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.24.0-1472979-cp37-cp37m-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.24.0-1472979-cp38-cp38-manylinux2014_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-0.24.0.tar.gz

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.23.0

Published by klecki over 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • DALI packages name add -cuda110 and -cuda100 suffixes to indicate CUDA version and allow hosting all packages under single pip index. This is important only for installation, the DALI module in Python is still nvidia.dali regardless of CUDA version. See the https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html installation guide for details.
  • New and improved Operators:
    • Normalize Operator for GPU (#1974, #1981, #1986).
    • Support for epsilon and delta degrees of freedom arguments for Normalize Operator (#1964).
    • SequenceRearrange Operator (#465).
    • Erase Operator for GPU (#1971).
  • Improve how iterators count padded samples based on the reader (#1831) - the provided iterators can now query reader for the epoch size and sharding and handle the shard size changing from epoch-to-epoch when it's not evenly divisible by number of shards (rank) and batch size. More details can be found in https://docs.nvidia.com/deeplearning/dali/user-guide/docs/advanced_topics.html#sharding
  • CUDA 11 build scripts for DALI were added (#2008).

Bug fixes

  • Fix out-of-source build (#1975)
  • Fix typo in installation documentation (#1976)
  • Fix reference counting issue in the PythonFunction operator (#1978)
  • Fix the wording for preset OF argument (#1994)
  • Fix generation of Erase Region in kernel test (#1996)
  • Fix GPU spectrogram when window_length != nfft (#1999)
  • Fix MelFilterBank bug: setup block descriptors when changing shape between iterations. (#2001)
  • Change locked version of numba to at most 0.49, as 0.47 is the last release for py35 (#2016, #2020)

Improvements

  • Mean and Standard Deviation GPU kernels (#1919)
  • Linter script change: from CMake to Python (#1951)
  • Update links to the new location, remove deprecated installation guide (#1955)
  • Adding more Numpy data types (#1961)
  • Extend HSV example with RandomGrayscale implementation (#1962)
  • Add workaround for the problem with patchelf changing TLS alignment (#1952)
  • Add epsilon and ddof (delta degrees of freedom) arguments to Normalize. (#1964)
  • Small docs improvements (#1970)
  • Add Sequence Rearrange Op (#465)
  • Add a helper class for fast unsigned division, usable on host and device. (#1967)
  • Fix documentation drop-down menu and other links (#1972)
  • Erase GPU operator (#1971)
  • Update TF versions supported (#1973)
  • Add -cudaXXX to dali package name (#1948)
  • Add more error checking (#1979)
  • Make DALI test to be fPIE (#1980)
  • Normalize GPU kernel (#1974)
  • Normalize GPU - pImpl + Bessel's corrections (#1981)
  • Slice CPU kernel pad support (#1977)
  • Makes GTest and Google Benchmark fPIE, DALI binaries as dynamically relocatable (#1982)
  • Add more error checking in TensorFlow DALI integration (#1991)
  • Normalize operator for GPU backend (#1986)
  • Slice GPU kernel with multi-channel pad support (#1983)
  • Split Slice benchmarks into CPU and GPU (#1995)
  • Improve how iterators count padded samples based on the reader (#1831)
  • Remove boost from the dependencies as it is no longer used anyway (#2006)
  • Enable file path arguments (#2002)
  • Enable CUDA 11 builds (#2008)
  • Silence CUDA 11 compute 35 and 50 deprecation warning (#2010)
  • Drop CUDA 9 from docs (#2012)

Breaking API changes

  • DALI packages name add -cuda110 and -cuda100 suffixes to indicate CUDA version and allow hosting all packages under single pip index.
  • CUDA 9 is no longer supported. DALI 0.22.0 was the last release that provides CUDA 9 build.

Deprecated feature

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.23.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100

or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.23.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110

Or use direct download links (CUDA 10.0):

https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.23.0-1396139-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.23.0-1396139-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.23.0-1396139-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-0.23.0-1396139-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda100/nvidia-dali-tf-plugin-cuda100-0.23.0.tar.gz

Or use direct download links (CUDA 11.0):

https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.23.0-1396141-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.23.0-1396141-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.23.0-1396141-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-0.23.0-1396141-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-0.23.0.tar.gz

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.22.0

Published by klecki over 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • DALI now supports CUDA 11:
    • DALI builds for CUDA 11 are now available.
    • CUDA 9 support has been deprecated.
    • DALI 0.22.0 is the final release that provides a CUDA 9 build.
  • Support is now available for the Ampere Hardware JPEG decoder.
  • The following new operators are now available:
    • NumpyReader, which allows you to read standard .npy (NumPy) files (#1858).
    • CoordFlip for CPU and GPU (#1894, #1895).
  • Readers can be set to read files directly instead of using mmap, which improves network filesystems performance (#1909).
  • DALI can be built as a CMake subproject (#1924).

Bug fixes

  • Fix TL1_tensorflow-dali_test (#1869)
  • Hotfix of external_source.py (#1878)
  • Build fix for aarch64 (incorrect cmake dependency) (#1883)
  • Fix TL1_ssd_training test by freezing apex version (#1898)
  • Fix support for dynamic per-sample shape in Warp operators (#1911)
  • Remove Optical flow test bug (#1902)
  • Fix jitter operator illegal memory access (#1914)
  • Fix setup_packages.py after pip update to 20.1 version (#1916)
  • Fix TL1_python-nvjpeg_test test dependency (#1926)
  • L1 test fix for Xavier (#1936)
  • Fix tensorflow_dataset test to run on any power of 2 number of GPUs (#1935)
  • Fix a race condition in ExternalSourceTest test (#1943)

Improvements

  • Add support for array and cuda_array interface for DALI tensor (#1857)
  • Add collapse_dim and collapse_dims for TensorListShape. (#1862)
  • Add support for TensorFlow 2.2.0rc2 (#1860)
  • Add ExternalSource to "C API" (#1865)
  • Numpy reader (#1858)
  • Add TensorGPU and TensorListGPU constructors based on CUDA array interface (#1868)
  • Bump up OpenCV version to 4.3.0, libturbo-jpeg to 2.0.4, libtiff to 4.1.0, FFmpeg to 4.2.2 (#1783)
  • Add "no exec check" to SmallVector to prevent warnings in host-only functions. (#1870)
  • Allow for a separate dali_tf_plugin pip wheel step (#1856)
  • QA tests: Fix nvidia-dali-tf-plugin to uninstall weekly and nightly packages (#1877)
  • make install target for installing DALI on system where it's build (#1854)
  • Allow RandomBBoxCrop thresholds to refer to relative overlap alternatively to IoU (#1874)
  • Add a link to release notes in the docs (#1881)
  • Operator diagnostics mechanism (#1880)
  • Reductions: position-dependent preprocessing, kernels for unhandled edge cases (#1884)
  • Update Horovod in Tensorflow test (#1887)
  • Add an ability to strip DALI whl binary from debug symbols (#1897)
  • Extend conda testing (#1784)
  • Copy out core* files if the test_body fails (#1890)
  • Make volume return 1 for 0-dim shape. (#1906)
  • Update DALI PyTorch RN50 example to the latest AMP version (#1888)
  • Add a specialized TF dataset for conda (#1910)
  • Deserialize pipeline in python API (#1912)
  • Add CoordFlip CPU operator (#1894)
  • Restore an ability to use direct read of files instead of mmap (#1909)
  • Use only ImportError in setup_packages (#1922)
  • Collect exit code from test_body (#1923)
  • Coordinate Flip GPU operator (#1895)
  • DALI as a git submodule (#1924)
  • Add Erase GPU Kernel (#1903)
  • C API ExternalSource for GPU input (#1892)
  • Fix warning condition in ExternalSource (#1934)
  • Reduce GPU - kernel frontend (#1882)
  • Add checking alignment argument for 0 in the pad operator (#1937)
  • Move from http://xiph.org to GitHub for libflac, libvorbis and libogg (#1938)
  • C API function: inherit parameters from serialized pipeline (#1932)
  • Use LinearTransformation kernel in ColorTwist GPU Op (#1918)
  • Adjust test sizes for Erase GPU Kernel (#1939)
  • Use user stream by default in copy_to_external/feed_ndarray (#1921)
  • Move to TensorFlow 2.2.0 from 2.2.0-RC2 (#1946)
  • Add support for random_shuffle argument in test_RN50_data_pipeline (#1945)
  • Proper DALI initialization in process & daliInitialize function (#1929)
  • Update clang version to 8.0.1 in deps image (#1949)
  • Add support for nvjpeg HW decoder, including rework to accommodate different decoding methods in one batch
  • Fix "hw_decoder_load" handling for slice/cropImageDecoder for nvJPEG
  • Move HW decoding to separate stream
  • Fix linter in nvjpeg HW decoder
  • Deprecate CUDA 9
  • Add CUDA 11 to the installation guide and build.sh

Breaking API changes

None

Deprecated feature

  • CUDA 9 support is deprecated. DALI 0.22.0 is the last release that provides CUDA 9 build.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.22.0
or for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.22.0
or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/11.0 nvidia-dali==0.22.0

Or use direct download links (CUDA 9.0):
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.22.0.tar.gz

Or use direct download links (CUDA 10.0):

https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.22.0.tar.gz

Or use direct download links (CUDA 11.0):

https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.22.0.tar.gz

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.21.0

Published by klecki over 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Introduced experimental Functional API (#1598):
    • Operators can be used directly with a single call, no need to create an instance with a constructor
    • DALI pipeline can be used in Context Manager
    • There is no need to subclass Pipeline
  • Simplified usage of ExternalSource (#1598, #1832) - it accepts callbacks or generators as a parameter.
  • Added Python 3.8 build and support (#1782)
  • Allowed seed to be set for serialized pipeline (#1844)

New operators:

  • ToDecibels GPU operator (#1837)
  • One hot encoding CPU operator (#1807)

Bug fixes

  • Fix positional argument propagation in TF Dataset (#1798)
  • Fix parameter name in data_node._check. (#1816)
  • Fix Transpose bugs - degenerate dims and non-uniform GPU (#1817)
  • Fix sharding.png image link in multigpu example (#1821)
  • Fix collecting vector arguments in rotate_params. (#1841)
  • Fix a leak of the last created DALI pipeline instance (#1845)
  • Remove of usage of internal Sphinx _MockImporter method (#1861)
  • Make SSDRandomCrop calculate crop window in double precision (#1848)

Improvements

  • Move RNNT test to Torch specific tests (#1805)
  • Propagate layout in cast operator (#1801)
  • Add proper type info for optional arguments in schema (#1769)
  • Add missing new line for section anchor in rst (#1808)
  • Add missing #include <cstdint> to util and math_util. (#1810)
  • Update file_list argument description in FileReader (#1779)
  • Functional API + improved ExternalSource + improved Pipeline (#1598)
  • GPU reduction kernels part 1 - non-directional batched and global reductions (#1806)
  • Enable NVTX profiling information for CUDA 10 by default (#1793)
  • Make read function provided to FFmpeg return AVERROR_EOF for EOF (#1814)
  • Make DALI buildable for Python 3.8 (#1782)
  • Allow empty arrays in MXNet iterator (#1815)
  • Ignore VS Code settings directory in Git (#1826)
  • Reworks setup_packages script (#1820)
  • Add one hot encoding operator (CPU backend) (#1807)
  • New page layout of Supported Operations & "How to verify DALI build" description in compilation tutorial (#1722)
  • Generator support in ExternalSource (#1832)
  • 3d RandomBboxCrop (#1785)
  • Update TF RN50 performance test threshold to make it pass on dgx1v32GB (#1838)
  • ToDecibels GPU kernel (#1836)
  • Add ReduceAllGPU kernel (#1839)
  • Directional reduction CUDA kernels (#1840)
  • Rename CPU reductions; separate reduction functors from kernels. (#1846)
  • ToDecibels GPU operator (#1837)
  • Allow seed to be set for serialized pipeline (#1844)
  • Change StrictVersion to LooseVersion in TensorFlow plugin (#1851)
  • Make reader respect shard_id pipeline argument in tf.data.Dataset with multiple GPUs example (#1850)

Breaking API changes

None

Deprecated feature

  • CUDA 9 support will end in several releases (#1684)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.21.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.21.0

Or use direct download links (CUDA 9.0):
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.21.0.tar.gz

Or use direct download links (CUDA 10.0):

https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.21.0.tar.gz

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.20.0

Published by klecki over 4 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

Added operators:

  • Spectrogram for GPU (#1786)
  • MelFilterBank for GPU (#1796)

Allow align-only behavior in Pad operator by treating shape argument as minimum shape (#1764)
Added data_ptr method to Tensor and TensorList (#1773) - it enables __array_interface__ and __cuda_array_interface__ support.
Extended shape support in DALI Dataset for TensorFlow (#1723)
Documentation improvements: layouts, Python API.
Added Gluon iterator plugin (#1683)

Bug fixes

  • Fix bug in TransposeCPU & ToDecibels operators (#1729)
  • Fix BBFlip issues (#1738)
  • Fix build without NVJPEG (#1739)
  • Fix precision loss in CropWindowGenerator (#1735)
  • Fix warnings reported by static analysis tool: (#1734)
  • Fixed the test failure on Power and x86 (#1752)
  • Fix out of range detection in get_item for TensorList (#1758)
  • Fix a race condition in AsyncPipelinedExecutor destructor and WorkerThread (#1757)
  • Fix bug in the COCOReader with masks (#1724)
  • Fix test_plugin_manager (#1749)
  • Fix typo in TensorListGPU docs, show getitem docs (#1746)
  • Fix SSD type mismatch (#1767)
  • Fix TF dataset build (#1792)
  • Fix DALI TF plugin build (#1789)
  • Fix positional argument propagation in TF Dataset (#1798)

Improvements

  • Add Gluon iterator plugin (#1683)
  • Adjust mxnet DALIClassificationIterator doc (#1718)
  • Change default value in ToDecibels, add one test (#1720)
  • Add error handling when trying to serialize Python Operators (#1730)
  • Use CMake's CUDA language support (#1733)
  • Allow 1 and 2 dimmensional input for Slice (#1741)
  • Specialize mul artihm op for bool (#1737)
  • Optical flow test against ground truth. (#1753)
  • Add /usr/local/cuda/bin to PATH in the main Dockerfile (#1756)
  • Add an ability to read noncontinuous RecordIO and TFRecord files (#1747)
  • Allow align-only behavior in Pad operator by treating shape argument as minimum shape (#1764)
  • Enable XLA for TensorFlow RN50 tests and use passthrough ImageNet for MXNet (#1760)
  • Add Reinterpret operator as a flavor of Reshape (#1768)
  • Short-time Fourier transform for GPU (#1721)
  • Adds data_ptr method to Tensor and TensorList (#1773)
  • Correct COCOReader mask doc (#1772)
  • Add GPU variant of Spectrogram operator (#1786)
  • Extend shape support in DALI Dataset for TF (#1723)
  • MelFIlterBank GPU kernel (#1787)
  • MelFilterBank GPU operator (#1796)
  • Test for RNNT data pipeline (CPU) (#1745)
  • Add data layout documentation and input layout expectations in operator's documentation (#1766)
  • Move RNNT test to Torch specific tests (#1805)

Breaking API changes

None

Deprecated feature

  • CUDA 9 support will end in several releases (#1684)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.20.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.20.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.19.0

Published by JanuszL over 4 years ago

Bug fixes

  • Update examples with COCO data set and fix reader behavior for padding (#1557)
  • Fix TensorFlow dataset test (#1641)
  • Fix typo in QNX cmake files (#1648)
  • Remove allocation-dependent test assert (#1650)
  • Fix several explicit "something is implicitly deleted" warnings (#1652)
  • Fix formatting of the example in the FW iterators docs (#1649)
  • Fix hang in decoder benchmark (#1672)
  • Fix error message (#1680)
  • Fix torch stream initialization in TorchPythonFunction (#1681)
  • Fix multi-channel fill value check in Erase operator (#1675)
  • Tests fix after examples refactor (#1687)
  • Fix Reshape docstring typo (#1691)
  • Add synchronization to read/write operations in image decoder cache (#1702)
  • Fix Buffer linkage and Reshape bug (#1714)
  • Fix TL1 tests (#1710)
  • Fix Pad operator bug (#1713)

Improvements

  • Allow Crop and CropMirrorNormalize to crop sequences as if they were volumetric images (#1605)
  • Erase CPU operator (#1609)
  • Improved Reshape (#1634)
  • Add GetDimIndices utility to tensor_layout.h (#1640)
  • Add example with booleans, comparisons, bitwise and muxing (#1631)
  • Remove unimplemented scale parameter in ops.VideoReader. (#1658)
  • Change ambiguous here in docs developer version (#1657)
  • Docs layout and navigation changes (#1635)
  • GPU PythonFunction operator (#1655)
  • Rename Tensor to TensorList in Supported Ops doc (#1661)
  • Add Pad CPU operator (including aligned padded shape support) (#1642)
  • Remove the ColorTwist deprecation message (#1646)
  • Change PipelineAPIType to Enum (#1636)
  • Directional reductions (for CPU) - mean standard deviation, sum, mean square; with tree reduction. (#1653)
  • Add support to UINT8 data type in SequenceWrapper (#1643)
  • Moving operators around. (#1667)
  • Normalize CPU vol 2 (#1666)
  • GPU PyTorch operator (#1662)
  • Proposing new structure of DALI examples (#1540)
  • VideoReader example (#1612)
  • MovingMeanSquared kernel (#1668)
  • Allow extra dimensions with extent 1 in Spectrogram operator & AudioDecoder changes (#1679)
  • Make DataIter a base class for MXNet DALIGenericIterator (#1669)
  • Add Transpose CPU Operator (#1677)
  • Remove not supported python versions from manylinux build (#1694)
  • Add deprecation message about CUDA 9 (#1684)
  • Mitigate the OS file-max limit in the VideoReader (#1659)
  • Adds support to StopIteration raised inside framework iterators (#1625)
  • Enable FFTS builds for ARM (Xavier, QNX) (#1686)
  • Normalize operator for CPU backend (#1670)
  • Python operator notebook (#1685)
  • Change backend_impl at to getitem - return TensorXPU (#1682)
  • Normalize tutorial (#1697)
  • Adjust setup_packages.py to the latest pip version (#1698)
  • Remove gif as supported extension (#1700)
  • Making "Supported backend" title in docs appear correctly
  • Update supported TF versions, update setup_packages.py (#1693)
  • Add pass-through info to OpSchema to add shared data to stage outputs. (#1707)
  • Nonsilence operator (#1701)
  • Constant operator and Python wrapper. (#1699)
  • Add support in CropMirrorNormalize for uneven sizes of mean and std (#1708)
  • Shrink host buffers (#1712)
  • Move pipeline ownership from Dataset to Iterator (#1704)
  • Align Rn50 data processing pipeline for TensorFlow with upstream examples (#1706)
  • Add a note how to set DALI_EXTRA_PATH to run jupyter examples (#1703)
  • Gpu python operator notebook (#1715)
  • Update Memory consumption and Custom operator docs sections (#1719)
  • Use prebuild cupy for TL0_jupyter test (#1728)

Breaking API changes

None

Deprecated feature

  • CUDA 9 support will end in several releases (#1684)
  • Access to Tensors of TensorListCPU and TensorListGPU with at was replaced by array subscript operator. (#1682)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.19.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.19.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.18.0

Published by klecki almost 5 years ago

Bug fixes

  • Fix setup_packages.py for CUDA versions that are not listed explicitly (#1554)
  • Fix problem with TensorFlow and cupy tests (#1568)
  • Fix ToContiguousXXX for more than 2 inputs. (#1572)
  • Use prebuild cupy in tests (#1570)
  • Fix a race condition in GetGPUAllocator (#1575)
  • Use different stream base for different videos. (#1592)
  • Fixing numpy version to 1.17.0 to avoid error in pycocotools/cocoeval due to implicit conversion from float64 to integer (#1618)
  • Formatting fix. (#1597)
  • Fix Transpose operator for batch size 1 as well as 1 channel images (#1624)
  • Fix static analysis problems (#1559)
  • Fix check if resampling is needed in audio decoder. (#1630)
  • Temporary fix due to missing PILLOW_VERSION symbol when using torchvision (#1626)

Improvements

  • Add support for Unary Ops: + and - (#1392)
  • Improve support for labels in VideoReader. (#1500)
  • Bump up Protobuff version to the latest one (#1543)
  • Add comparison operators and bool handling in arithmetic ops (#1541)
  • Cleanup formatting of Supported Operations (#1578)
  • Bump up protobuf and libturbo-jpeg version in aarch64-linux and qnx build, fix libsnd dependency (#1573)
  • Update PR template (#1571)
  • Add an ability to return a duplicated outputs from the DALI pipeline (#1556)
  • Add explicit call docstring, fix Supported backends (#1547)
  • Add DCT 1D CPU kernel (#1569)
  • Bump protobuf version in docs (#1586)
  • Add interdoc link to define_graph, fix note (#1590)
  • Split Expression Factory into separate translation units (#1587)
  • Add bitwise operators: &, |, ^ (#1594)
  • Resampling decoder (#1582)
  • Extract windows GPU (#1538)
  • Remove old PythonFunction implementation (#1585)
  • Mock imports when building docs where possible (#1593)
  • Load libnvcuvid before we test if cuvidReconfigureDecoder symbol exists (#1591)
  • Bump protobuf version in conda build (#1606)
  • Update VideoReader testcase, use nvmlSystemGetDriverVersion (#1617)
  • Name the dataloader shuffling seed (#1621)
  • Add docs for arithmetic expressions (#1600)
  • Add data source info to error message in TFRecord and Caffe parsers (#1620)
  • Remove the need to have GPU available when DALI is just imported (#1601)
  • MFCC CPU operator (#1577)
  • Update CUDA version detection for Conda (#1629)

Breaking API changes

  • Python 2.7 is no longer available. To stay up-to-date with DALI, upgrade to Python 3.5 or later.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.18.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.18.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.17.0

Published by JanuszL almost 5 years ago

Bug fixes

  • Fix scalar batch handling in arithmetic ops (#1449)
  • Coverity fixes (#1408)
  • Fix removal of device_id initialization in OF (#1459)
  • Static analysis fixes (#1469)
  • Fix start index function (#1482)
  • Add missing dependencies to conda recipe (#1483)
  • Fix for bundle-wheel.sh (#1499)
  • More of static analysis fixes (#1496)
  • Fix race between consecutive invocations of stage, reduce number of events (#1493)
  • Fixes ExternSource for the GPU (#1452)
  • Fix pip package discovery (#1534)
  • Wait for thread pool to finish work in BrightnessConstrast (#1549)
  • Fix doc string (#1546)
  • Fix color operators. (#1555)
  • Fix color operators even more (#1558)
  • Fix stream usage in HSV and BrighnessContrast. (#1566)
  • Fix problem with TensorFlow and cupy tests (#1568)

Improvements

  • Add favicon to docs (#1453)
  • Resampling ND - ground work (#1366)
  • Warp 3D (#1442)
  • Add sequence and 3D support in flip operator (#1439)
  • Make thread pinning optional in the mixed ImageDecoder (#1465)
  • Improve accuracy of 3D rotation (#1466)
  • Add ability to read LMDB without any labels stored inside (#1440)
  • AudioDecoder for WAV format (#1447)
  • Add support for PaddlePaddle (#1371)
  • Update docs for fill_last_batch parameter to match the real behavior (#1479)
  • Remove used requirement from paddle SSD demo docs (#1486)
  • FFT CPU 1D implementation (based on ffts) (#1446)
  • Utilize libcudart.so version to detect the CUDA toolkit version (#1477)
  • Allow for more verbose Pipeline's graph logging (#1487)
  • CMake switch for audio support (#1480)
  • Add polygons mask support to COCOReader (#1455)
  • Change TF versions supported by dataset (#1492)
  • Additional deps for AudioDecoder (#1485)
  • Add ExtractWindows CPU kernel (#1461)
  • Add MNIST TensorFlow test (#1467)
  • Remove deprecated edge.py (#1498)
  • Add PowerSpectrum CPU operator (#1460)
  • Add Spectrogram CPU Operator (#1468)
  • Add MNIST examples (#1491)
  • Add notebooks with example usage of arithmetic ops (#1438)
  • Add ToDecibels CPU kernel (#1516)
  • Adding librosa dependency to qa/TL1_jupyter_plugins/test.sh (#1517)
  • Fix Keras GPU example (#1520)
  • Preemphasis operator (#1515)
  • Fix for WaitForWork in Preemphasis (#1523)
  • AudioDecoder operator (#1481)
  • Lower the accuracy threshold for paddle RN50 test (limited to 25 epochs only) (#1528)
  • Remove cache options from fused ImageDecoder documentation (#1495)
  • Add ToDecibels CPU operator (#1518)
  • Add deprecation warning for Python 2.7 (#1521)
  • Split tests per framework if possible (#1519)
  • Add zlib dependency warning to libtiff build step (#1530)
  • Rephrase supported backends documentation (#1497)
  • Extend supported ops doc to include info about volumetric data. (#1531)
  • Disable clamping when converting from bool (#1536)
  • Add adobe analytics tracking script into docs (#1539)
  • ColorTwist operator cleanup (#1532)
  • NormalDistribution operator (#1529)
  • Hide the docs for internal operators (#1542)
  • MelFilterBank CPU kernel (#1522)
  • Disables cupy test for python 2.7 (#1544)
  • Boundary condition handling (#1552)
  • Add spaces in Python 2.7 end of life warning (#1553)
  • Add MelFilterBank CPU operator (#1535)
  • Add more formats to FileReader (#1561)
  • Make the presence of unique visitor script counting optional in docs (#1560)
  • Adjust color ops; make contrast-neutral gray configurable (#1562)

Breaking API changes

  • DALI 0.17 is the last official release for Python 2.7, which reaches the end of life on January 1st, 2020. To stay up to date with DALI, please upgrade to Python 3.5 or later.
  • The asCPU method is no longer available and has been replaced with as_cpu.
  • ColorTwist operator was deprecated and replaced by BrightnessContrast and HSV operators cleanup (#1532)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.17.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.17.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI - DALI v0.16.0

Published by klecki almost 5 years ago

Bug fixes

  • Fix DALI TF plugin CXX11 ABI issue (#1361)
  • Fix DALI TF installation for TF 2.0 (#1386)
  • Fix Pad op default fill_value and axes (#1410)
  • Fix Tensorflow examples for TF 2.0 (#1420)
  • Fix input tiling in arithmetic ops (#1426)
  • Fix link error in debug mode. (#1429)
  • Fix RN50 MXNet TL3 test (#1424)
  • Fix scalar batch handling in arithmetic ops (#1449)

Improvements

  • Rearrange docker images (#1333)
  • GTest naming in STYLE_GUIDE (#1330)
  • Add 3D case to shape layout verification in CropAttr (#1344)
  • Add fallback to host when nvjpegJpegStreamParse fails (#1335)
  • Surface2D -> ND generalization (#1348)
  • Add multichannel (C>3) pipeline tests (#1219)
  • Improve last_batch_padded and Running DALI pipeline docs (#1351)
  • Undo pytorch download changes (#1353)
  • Provide prebuilt plugins for manylinux2010 based pip packages (#1346)
  • Clean include file depenedencies (#1362)
  • Add warning if avformat_open_input fails (#1363)
  • Workaround for a segfault in NVCC 9 with (#1365)
  • HSV manipulation operator for GPU & CPU (#1338)
  • Backend implementation for binary arithmetic Operator (#1322)
  • Add skip_vfr_check option to VideoReader (#1367)
  • Support float16 in Cast GPU operator (#1368)
  • Add implementation of BmpImage::PeekShapeImpl, including number of channels (#1332)
  • Add Vp9 codec support (#1331)
  • Add torch dependency to TL1_separate_executor (#1373)
  • Add TF Dataset GPU (#1354)
  • Add ability to cross compile ldmb (#1374)
  • Move Tensor(List)Shape, Tensor(List)View to dali/core (#1341)
  • Relax check for libnvidia-opticalflow is test script. (#1381)
  • Disable Vp9 tests temporarily (#1383)
  • Make it possible to build DALI with any CUDA version (#1345)
  • Add multigpu TF dataset test (#1382)
  • Generalize helper code to unary inputs (#1379)
  • Force inline and affine transformation (#1389)
  • GPU dltensor operator (#1261)
  • Enhance Slice API to specify axes represented in the arguments (#1336)
  • Allow default compiler build if TF compiler version is unknown (#1396)
  • NewWarpAffine -> WarpAffine; optimize CPU warp for affine mapping. (#1387)
  • Allow build DALI for different architectures as well (#1397)
  • Remove PyTorch iterator double buffering (#1399)
  • Improve wording for PREBUILD_TF_PLUGINS option (#1407)
  • Move builtin operators to dali/pipeline. (#1406)
  • Enhance CaffeReader and Caffe2Reader to support multiple LMDB files (#1360)
  • Expose arithm ops in Python (#1355)
  • Add Pad operator (#1180)
  • Enable CUDA 10 compatibility layer for Conda build (#1339)
  • Enforce crop argument minimum size (#1401)
  • Rotate operator using Warp kernel (#1403)
  • Allow empty lists in arguments (#1413)
  • Add missing license in python tests (#1412)
  • Support TF 1.15 and 2.0 in tests (#1400)
  • Fix DALIDataType enum in Python (#1419)
  • BrightnessContrast operator example (#1414)
  • Add additional_decode_surfaces parameter to videoreader (#1393)
  • CPU argument input (#1423)
  • Add support for Constant inputs and type-erased tiles (#1391)
  • Support TF v2.0 in jupyter examples (#1425)
  • Limit number of Input/Output type combinations in Slice kernel family (#1418)
  • Add TF 1.15 and 2.0 support for TF dataset (#1395)
  • New warp example + minor fixes (#1158)
  • Add initial support for constants in python API (#1421)

Breaking API changes

  • DALI 0.17 is the last official release for Python 2.7, which reaches the end of life on January 1st, 2020. To stay up to date with DALI, please upgrade to Python 3.5 or later.
  • Removed the following deprecated operators:
    • Remove previously deprecated operator NormalizePermute (CropMirrorNormalize should be used instead) (#1402)
    • Remove deprecated HostDecoder and nvJPEGDecoder (#1398)
  • Crop, CropMirrorNormalize and Slice operator possible output types are limited to one of uint8_t, int16_t, uint16_t, int32_t, float, float16 or passing through the input type (#1418).
  • Move dali/pipeline/operators to dali/operators (#1380)
  • DALI library modularization (#1384)
  • CPU argument input (#1423)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.16.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.16.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here