DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

APACHE-2.0 License

Downloads
44.4K
Stars
5K
Committers
95
DALI - DALI v0.15.0

Published by klecki almost 5 years ago

Bug fixes

  • Fix Transpose operator when data shape with dimension of size 1 (#1244)
  • Fix DALI_Extra clone (#1276)
  • Fix conda check in DALI TF installation script (#1284)
  • Fix problems with seeking when stream start_time is != 0. (#1287)
  • Fix TypeTable initialization (#1321)
  • Fix CropMirrorNormalize compilation with GCC 8 (#1320)
  • Suppress warning when FileReader encounters dot and dot-dot entries (#1318)
  • Fix the wrong usage of find_library when searching for FFmpeg libs (#1317)
  • Fix last_batch_padded docs (#1314)
  • Fix pytorch download url (#1334)
  • Undo pytorch download changes (#1353)
  • Fix DALI TF plugin CXX11 ABI issue (#1361)
  • Add torch dependency to TL1_separate_executor (#1373)
  • Fix DALI TF installation for TF 2.0 (#1386)
  • Relax check for libnvidia-opticalflow is test script. (#1381)

Improvements

  • Replace std::pair alias with actual type (#1248)
  • Add support for volumetric (i.e. 3D) crop (depth, height and width) (#1210)
  • Refactor storage type specialization for operator aguments (#1245)
  • CPU DLTensor Operator (#1233)
  • Change Outputs and SharedOuputs return type to tuple (#1243)
  • Add non_blocking option to CopyToExternalTensor (#1254)
  • Improve heuristic for variable frame rate detection (#1242)
  • Add pipeline validation (#1267)
  • Add lookup table operator (#1251)
  • make_string for arguments, which have operator<< (#1174)
  • Tensor layout (#1237)
  • Rework Support Ops to use TensorList (#1259)
  • Improve logic in DALI TF plugin installation (support conda installation use case) (#1271)
  • size_t -> int for vec, mat, box etc... (#1277)
  • ImageDecoder libtiff implementation (#1264)
  • Add check for OF support (#1278)
  • ImageDecoder libtiff implementation (types.ANY_DATA, YCbCr, ImageDims to TensorShape) (#1280)
  • Handle nchannels>3 in ImageDecoder (#1285)
  • Use alternative compiler (e.g. g++-5.4) when available (#1290)
  • Add support for UCF-101 dataset and upgrade ffmpeg version from 3.4.2 to 4.2 (#1241)
  • Add info about libtiff dependency in the documentation (#1294)
  • Check whether random row access is allowed in libtiff based decoder implementation (#1295)
  • Make cspan (#1298)
  • BrightnessContrast operator (#1188)
  • Parse number of channels in PNGImage::PeekShape (#1288)
  • Add support for decoding multiple resolution videos in the same pipeline. (#1144)
  • Conda recipe: Point to local git repository for build source, relax version dependencies and use on conda-forge for some dependencies (#1303)
  • TiffImage::PeekShapeImpl parse and return number of channels (#1304)
  • Introduce byte_io.h including byte sequence reading utils (ReadValueBE and ReadValueLE) (#1310)
  • Add parsing of number of channels in JpegImage::PeekShapeImpl (#1306)
  • Layout refactor (#1250)
  • Add CMake VERBOSE_LOGS switch (#1319)
  • Add BMP tests (#1316)
  • Make DALI_extra repo path settable from the env (#1323)
  • Linear transformation GPU kernel (#1262)
  • Use DALI_extra images in more tests (#1177)
  • Reshape op (#1327)
  • Add tf dataset (#1299)
  • Adjust QA scripts remove installing pip whl from direct links as pip will disregard the "-f" option in that case (#1328)
  • Add CropMirrorNormalize 3D support (#1326)
  • Add layout handling to Transpose operator (#1329)
  • Add shape layout input to crop window generator signature (#1340)
  • Linear Transformation kernel for CPU (#1300)
  • Rearrange docker images (#1333)
  • Provide prebuilt plugins for manylinux2010 based pip packages (#1346)
  • Add 3D case to shape layout verification in CropAttr (#1344)

Breaking API changes

  • Change Outputs and SharedOuputs return type to tuple (#1243)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.15.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.15.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.14.0

Published by klecki about 5 years ago

Bug fixes

  • Fix fp16 bug from #1129 and add fp16 test case (#1160)
  • Fix framework iterators behavior when iter_setup raises StopIteration (#1136)
  • Fix nvjpeg legacy API (#1179)
  • Attempt different driver urls in setup_test_common.sh (#1193)
  • fix nightly bug in video reader (#1194)
  • Fix conversions to int64 / uint64. (#1205)
  • Attempt to fix issue with tf plugin install and gcc 4.8 (#1214)
  • Fix PyTorch spelling (#1230)

Improvements

  • BrightnessContrast CUDA kernels (#1142)
  • Adjust Operator::Run to take reference instead of pointer (#1168)
  • Add a STYLE_GUIDE for DALI, adjust Kernel example (#1167)
  • Extend external source operator capacity (#1127)
  • Make Deallocate public API (#1182)
  • Remove .cpu function (#1181)
  • Allow stream() to be called for every Workspace (#1178)
  • Improve error messages for file_list arg problems in FileReader (#1184)
  • Add multi gpu python notebook (#1186)
  • HSV Kernel for CPU (#1187)
  • Adjust CropMirrorNormalize to Setup API (#1140)
  • Expose tensor as dlpack (#1154)
  • Add const noexcept qualifiers to IsContiguous. (#1211)
  • ROI utils (#1189)
  • Add qa test for multi gpu example (#1202)
  • Add support for 3d shapes in crop window (#1207)
  • DALI for aarch64-QNX platform (#522)
  • Unified naming for float16 type. (#1212)
  • Add types to DALIDataType that were missing (#1213)
  • CPU warp, with tests. (#1159)
  • Conda Recipe for DALI (#1156)
  • Update file reader doc (#1222)
  • Track DALI_extra version in DALI (#1229)
  • Add Shapes operator returning sample shapes. (#1223)
  • New Warp operator (#1153)

Breaking API changes

  • Remove .cpu function (#1181)
  • Adjust Operator::Run to take reference instead of pointer (#1168)
  • Extend external source operator capacity (#1127) - it now requires input to be set for every iteration
  • Unified naming for float16 type. (#1212)

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.14.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.14.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.13.0

Published by klecki about 5 years ago

Bug fixes

  • Upgrade PyTorch to 1.2, TorchVison to 0.4 (#1155)
  • Add use_batched_decode argument to nvJPEGDecoder API (only for legacy nvJPEGDecoder implementation) (#1151)
  • Make loading of the versioned libnvidia-opticalflow.so the primary path (#1147)
  • Fix tests that are not using prolog/epilog functions (#1143)
  • Provide default initialization for scratch sizes in KernelRequiements. (#1141)
  • Fix coco loader (#1135)
  • Fix GET_PROC_EX macro (#1128)
  • Fix typo in installation doc (#1126)
  • Fix capitalization in docs for docker dir (#1122)
  • Fix pipeline serialization/deserialization for logical_id (#1121)
  • Make use right PyTorch capitalization everywhere (#1119)
  • Fix Gluon example that mixes simple and iterator DALI API (#1117)
  • Fix lint in ../dali/pipeline/operators/reader/loader/loader.h (#1113)
  • Fix float16 support in DALI TensorFlow plugin (#1086)
  • Fix python operator with side effects. (#1105)
  • Fix warning (#1061)
  • Fix test header inclusion (#1100)
  • Make dali_kernel_test_lib respect BUILD_TEST (#1101)
  • Fix a race condtion in async pipeline executor (#1103)
  • Typo fixed in getting started notebook (#1091)
  • Reduced batch size to avoid out of memory condition in 19.07 container. (#1089)
  • Fix error of indexing shape in Optical Flow (#1087)
  • Disable video_reader_op test when we disable NVDEC (#1077)
  • Add video error message (#1067)
  • Fix sampling of chroma in the VideoReader op (#1054)
  • Fix detection pipeline example (#1055)
  • Fix fp16 bug from #1129 and add fp16 test case (#1160)

Improvements

  • Adjust customdummy plugin in Docs to new API (#1150)
  • Add view overload to get TensorListView from TensorVector. (#1152)
  • Warp kernels (#1063)
  • Add Setup API to Operator (#1045)
  • Input & output TYPED_TEST (#1133)
  • Refactor SliceFlipNormalizePermutPad (super)kernel (#1129)
  • Add virtual env and conda test case for DALI TF plugin (#1107)
  • Add test for water operator (#1075)
  • BrightnessContrast kernel first implementation (#1060)
  • Add default_cuda_stream_priority documentation (#1131)
  • Fast coco reader (#1098)
  • Optimize docker images building(#1053)
  • Remove explicit Multiple Input Sets handling from C++ Backend (#1088)
  • Document pre-built WML CE packages in Installation docs (#1124)
  • Upgrade VideoCodecSDK to 9.0.20 (#1120)
  • UniformRandomFill for unified storage (#1070)
  • Calculation layout setup for GPU kernels. (#1106)
  • Rework multiple input sets API (#1104)
  • Use per-sample RNG in SSDRandomCrop and RandomBBoxCrop (#1109)
  • Add compile-time mapping for DALIDataType. For use in TYPE_SWITCH. (#1108)
  • Reworks how the reader pick samples from the shuffling buffer (#1005)
  • Add checking if Python API is not mixed between simple, scheduled and iterator (#1074)
  • Enable OpticalFlow test on CI (#1096)
  • Make protobuf linking mode configurable (#1102)
  • Kernel manager (#1079)
  • Add JIRA Task placeholder in PR template (#1090)
  • Replace vector<shared_ptr> with TensorVector (#1040)
  • Deprecate NormalizePermute in favor of CropMirrorNormalize (#982)
  • Adjust TensorFlow ResNet50 example to 1.14 version API (#1081)
  • Update DALI TF plugin docs to be aligned with the current functionality (#1066)
  • Adds BUILD_TF_PLUGIN flag to one-click build script (#1051)
  • Enforce shares_data_ in Buffer (#1057)
  • Improved sampler (#1071)
  • Change test prefix from L*_ to TL*_ (#1069)
  • Rounding Convert and ConvertSat added. (#1068)
  • Copy multiple collections to scratchpad. (#1044)
  • Use DALI_extra in loader test (#1064)
  • Add filename to LMDB reader errors (#1059)
  • Add make check target that runs basic tests (#1019)
  • Bounding box representation (#1052)
  • Add option to enable fast IDCT in libjpeg-turbo (#1031)
  • Adjust Tests to use DALI_EXTRA (#1056)
  • Basic geometric transform functions. (#1047)
  • Add TorchPythonFunction operator (#1033)
  • Add support for reading video files with labels using file_list argument (#1029)
  • add tensorflow 1.14 (#1037)
  • Enable sink operators. (#1004)
  • Update PR template (#1043)

Breaking API changes

  • Added Setup API to Operator with pure virtual SetupImpl
  • Multiple Input Sets handling was removed from backend and is only python level syntactic sugar
  • Reader sampling from shuffling buffer was adjusted
  • Replace vector<shared_ptr> with TensorVector as input and output of CPU Operators allowing for contiguous outputs from CPU Ops
  • Deprecate NormalizePermute in favor of CropMirrorNormalize (#982)
  • Enforce shares_data_ in Buffer - sharing data cannot be implicitly reallocated and must match allocation size

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.13.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.13.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.12.0

Published by JanuszL about 5 years ago

Bug fixes

  • Remove dependency with gitlab-master in DALI TF (#1038)
  • Added include(CheckSymbolsExists) to cmakelists (#1035)
  • Fix uninitialized number of dimensions in TensorListShape. (#1023)
  • Add const-qualifiers to TensorShape first and last functions. (#1020)
  • Add missing bracket in the BoxEncoder docs (#1018)
  • Adjust espilon in tests. (#1017)
  • Add ASAN support, fix reported problems in the unit tests (#362)
  • Fix for OF test (#1008)
  • Fix nvjpeg_decoder legacy api build (#1006)
  • Fix scratchpad allocation in CropMirrorNormalize (#1000)
  • Fix Resize ratio calculation (#997)
  • Add missing device guard in the reader prefetch thread (#978)
  • optical flow test fix (#976)
  • Make errors from build_helper propagate correctly (#961)
  • Add casting to float before normalization in SliceFlipNormalizePermute tests (#974)
  • Fix displacement filter (#524)
  • Fix output allocation in operator benchmark (#959)
  • Handle NULL pointer in ctypes_void_ptr (#965)
  • Fix error of indexing shape in Optical Flow (#1087)
  • Reduced batch size to avoid out of memory condition in 19.07 container

Improvements

  • Create pull request template (#1039)
  • Add environment variables to DALI TF build image (#1034)
  • Replace HostDecoder and nvJPEGDecoder with generic ImageDecoder (#1028)
  • Add deprecated operator warning when using it (#1030)
  • Expose and document fine grain control API for pipeline run (#972)
  • Use TensorListShape for TensorList shape (#1025)
  • Rework nvidia-dali-tf-plugin build (#1007)
  • Span improvements. (#1032)
  • Add ImageDecoder operator, selecting implementation based on device argument (#995)
  • Removed unified memory from resampling filters. (#1026)
  • Add mechanism to mark an operator as deprecated in favor of another one (#1001)
  • Add matrix types + tests. (#1014)
  • Use TensorShape in dali::Tensor (#1015)
  • Introduce number of samples to TensorListShape (#1010)
  • Video reader label (#998)
  • Add path to json in case of error in the COCO reader (#1011)
  • Add vector types. (#1009)
  • Add no squeeze option and dynamic shape for MXNet and PyTorch plugins (#988)
  • Update test_python_function_operator.py (#880)
  • Restructure subdirectories in nvjpeg decoder (#999)
  • Add printing of error string enums with nvJPEG error codes (#983)
  • Remove deprecated __init__ usage from backend (#993)
  • Replace usage of NormalizePermute by CropMirrorNormalize (#994)
  • Remove OldCropMirrorNormalize (#992)
  • Optimize python operator outputs copy. (#958)
  • Rework how DALI handles py_buffer format string (#985)
  • Improve obtaining TensorFlow build flags for prebuild DALI plugins (#963)
  • Replace CropMirrorNormalize with new implementation (#989)
  • Add COCO tfrecord support (#979)
  • Add test cases for Flip operator (#973)
  • Add NewCropMirrorNormalize GPU (#970)
  • Read COCO categories from json file in COCOReader (#986)
  • Add -std=c++14 to cuda nvcc flags in custom plugin example (#984)
  • Add max_size upperbound option to Resize with resize_short (#960)
  • Enable no-crop by default in NewCropMirrorNormalize (#977)
  • Change type traits to use C++14 library aliases. (#975)
  • Use c++14 standard (#971)
  • Change storage device from boolean to enum in workspace (#967)
  • Add new SliceFlipNormalizePermute CPU kernel. (#949)
  • Remove lint from the default target list (#964)
  • Add split_scenes and transcode_scenes doc in Superres example (#944)
  • Update libjpeg-turbo to 2.0.2 version (#951)
  • Add lint as the first class, separate target to CMake (#952)
  • Create test_optical_flow.py (#911)
  • Adjust TensorFlow ResNet50 example to 1.14 version API (#1081)
  • Change test prefix from L*_ to TL*_ (#1069)

Breaking API changes

  • CPU operators have moved from per-sample processing (pipeline process sample after sample, all the way through the pipeline) to batch-procession (all samples are processed by the first operator before moving to the next operator). This may result in a small performance degradation for some use cases. However, in the long term it will make some currently unavailable optimizations possible, together with making possible operations that need to view the whole batch during the processing (like random sample blending inside a batch).
  • Deprecated _run, _share_outputs and _release_outputs in favor of schedule_run, share_outputs and release_outputs
  • Replaced HostDecoder and nvJPEGDecoder with generic ImageDecoder. ImageDecoder is the recommended way function for the image decoding, and old API will be removed in the future

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may be not compatible with TensorFlow 1.14.0 release. The DALI TensorFlow plugin requires that the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc 4.8.5, depending on the particular version) be present on the system.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.12.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.12.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.11.0

Published by JanuszL over 5 years ago

Bug fixes

  • Fix propagation of DALI build SHA, flavor and timestamp (#948)
  • Fix warning (#947)
  • Fix data race in displacement filter (#945)
  • Fix OF sequence number bug (#896)
  • Drop TF 1.14rc0 from test as it doesn't have working TensorBoard (#941)
  • Make Transpose operator as one supporting sequences (#928)
  • Update aarch64 build docs (#931)
  • Fix lint error (#932)
  • Fix lint result being ignored for include/dali. Fix linter errors in include/dali. (#923)
  • Fix floating point precision error to calculate width and height for resizing (#917)
  • Fix wrong registration of python operators after loading plugin (#910)
  • Bound installed torchvision version with present CUDA version in tests (#912)
  • Update README and iterator docs (#889)
  • Fix SSD example and tests (#908)
  • Disable threading inside the OpenCV (#887)
  • Fix lint error printing in Python 3. (#907)
  • Fix compilation error in assert(size(shample_shape)). (#901)
  • fix cmake warning (#886)
  • Restore performance in JoC RN50 inference (#962)

Improvements

  • Change CPU to batch processing (#936)
  • Add specializations of Operator class for all backends (#934)
  • Replace the displacement flip with dedicated operator. (#849)
  • Replace current crop and slice with new version based on slice kernel (#930)
  • Add multiple inputs and outputs in the python operator (#942)
  • Add ThreadPool to Host Workspace (#935)
  • Make test_detection_pipeline to use DALI extra as an option (#922)
  • Add the seqence reader example (#895)
  • Box encoder gpu offsets (#939)
  • Add cascading notify in thread pool (#933)
  • Add optional offset computation to BoxEncoder (#921)
  • Add sanity test for PyTorch SuperRes example (#633)
  • Remove prebuild TensorFlow plugins from DALI (#920)
  • New slice operator (#913)
  • Remove unnecessary copies by using const ref or move (#655)
  • view_as_tensor_gpu utility function & copy tensor (#658)
  • Use SmallVector in TensorShape. (#915)
  • Add GTC 2019 video and presentation do the documentation (#926)
  • Optimize slice kernel. (#924)
  • Update nvJPEG version (#919)
  • Rework DeviceGuard to restore original context upon the exit (#882)
  • Slice GPU batched kernel (#905)
  • Add ability to use docker based build for insource-builds (#891)
  • NewCrop: support for 4D inputs (#900)
  • Upgrade PyTorch to 1.1.0 in QA tests config (#909)
  • Device-usable TensorsShape and core utils. (#903)
  • Add SmallVector class. (#902)
  • Add N-dimensional Slice CPU kernel (#893)
  • DALI for aarch64-linux platform (#856)
  • Make linter to work with Python3 (#904)
  • VideoReader stride (#755)
  • Device-side testing. (#897)
  • Update docs of the Readers (#894)
  • Add L1 test for split queues executor (#780)
  • Generic N-dimensional GPU slice kernel (#877)
  • Update info about operators supporting sequences (#885)
  • Move error handling to DALI core. (#867)
  • Add possibility to build debug dali using build.sh (#857)
  • Add proper errors to the ExternalSource (#875)
  • Use raw ImageNet data for RN50 convergence test (#636)
  • Simplified README with links to NVIDIA docs
  • Add as_tensor with provided shape method to python API (#953)

Breaking API changes

  • CPU operators have moved from per-sample processing (pipeline process sample after sample, all the way through the pipeline) to batch-procession (all samples are processed by the first operator before moving to the next operator). This may result in a small performance degradation for some use cases. However, in the long term it will make some currently unavailable optimizations possible, together with making possible operations that need to view the whole batch during the processing (like random sample blending inside a batch).
  • CropCastPermute is removed. CropMirrorNormalize should be used instead (with the default values for normalization).

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may be not compatible with TensorFlow 1.14.0 release. The DALI TensorFlow plugin requires that the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc 4.8.5, depending on the particular version) be present on the system.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.11.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.11.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.10.0

Published by JanuszL over 5 years ago

Bug fixes

  • Fix CropMirrorNormalize crop_pos_x/y argument for the CPU (#853)
  • Update SSD L1 test (#863)
  • Add stream to memset calls (#862)
  • Replace bc calls with awk (#850)
  • Fix pipeline serialization with make_continious inside (#848)
  • add dot (#852)
  • Remove unreliable tests that expected reallocation to give different pointer. (#851)
  • Fix MXNet L3 and PyTorch L1 and L3 tests (#845)
  • Fix tests for Ubuntu 18.04 and Python 3.7 (#797)
  • Fix numerical issue in clamping cropped bounding boxes. (#846)
  • Move RapidJSON to third_party (#835)
  • Add fallback to so.1 for optical flow library loading (#822)
  • Added more options to build.sh script (#828)
  • Update SSD example to report global speed and use proper number of shards (#810)
  • Fix one_config_only condition in test_template.sh (#823)
  • Prevent manylinux3 image build from pruning other docker images (#795)
  • Install OpenMPI for CUDA 10 when not present in the system (#821)
  • Add dependencies silently required by opencv-python (#820)
  • Fix test_detection_pipeline for python2 (#809)
  • Do not install glib-2.0 in qa tests (#816)
  • Reimplement GetSingleOrRepeatedArg without use of exceptions for normal flow.
  • Fix no_dali run for SSD example (#803)
  • Make test scripts verbose (#804)
  • Updating OF docs & example (#799)
  • Fix DALI version for non-release builds (#800)
  • Improve error message when unable to set CPU affinity (#775)
  • Move changing the value in callback before the barrier (#784)
  • enabling tests (#789)
  • Rename GetRequirements to Setup. (#778)

Improvements

  • Add basic PyTorch DALI example, fix links to files in docs (#864)
  • Move to CPU based pipeline in L3 RN50 TF test (#865)
  • Add info about nightly and weekly DALI builds (#859)
  • Generalized tensor list view (#791)
  • Move doxygen doc generation to build docs phase (#860)
  • QA tests: splitting plugin manager and tf plugin package tests (#830)
  • Add options for OF, NVDEC and NVML support (#838)
  • Add python tests for multi-input CropMirrorNormalize (#818)
  • Fix unnecesary memory usage when reallocating (#847)
  • Add collect_sources and collect_headers macros for CMake (#837)
  • Add python function operator - DALI-571 (#732)
  • Add performance treshold to L3 tests (#801)
  • Add "dali_core" library. (#832)
  • Upgrade to CMake 3.11 (#825)
  • Add Boost Preprocessor to third_party (#826)
  • Add location specifiers to span functions. (#824)
  • Improve documentation of ExternalSource and RandomResizedCrop (#815)
  • Better logging and gitignore update (#806)
  • Align build.sh with docs (#792)
  • Non-static kernels. (#786)
  • Add ability to build nightly/weekly version of DALI (#770)
  • Add new test case for cached_batch_copy (#783)
  • Add support of separate prefetch queues in TF plugin (#761)

Breaking API changes

  • None

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.10.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.10.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.9.1

Published by JanuszL over 5 years ago

Bug fixes

  • Make LMBD close properly with lazy init (#790)
  • Handle exception when NVDEC is not available for the video format (#752)
  • Minor fixes: Warning fix + enable one missing test (#764)
  • Fix build for the old version of nvJPEG (#760)
  • Fix pipeline completion callback (#745)
  • Make progressive JPEG to be always decoded by host huffman decoder (#739)
  • Use cv::COLOR_BGR2RGB instead of CV_BGR2RGB (#743)
  • Fix calculation of the average speed RN50 for TensorFlow test (#719)
  • Fix CacheLoad call in old nvjpeg (#728)
  • Fix Tensorflow validation pipeline (#722)
  • Fix bilinear resampling 1st row/column. (#697)
  • l1 fix (#687)
  • Handling sync pipeline with prefetch_queue_depth of 1 in Python (#688)
  • Fix shuffle_after_epoch option (#812)
  • Provide optional stream to copy_to_external API. Fix sync issue (#807)
  • Fix initialization of CUDA context on the default device during pipeline creation (#829)

Improvements

  • Add new function case for lazy init (#777)
  • Add L3 SSD test (#782)
  • Separate L0 & L2 FW iterators tests. Clear previous data in iterators loop (#779)
  • Make EpochSize prepare metadata when Reader has lazy init (#768)
  • L1 OF example (#757)
  • Make ssd random crops filter boxes the same way (#771)
  • Fix skip_cached_images feature (#769)
  • Change SSD L1 test options (#766)
  • Update SSD example to use distributed JoC model (#759)
  • Change CHECK_STRUCT_HAS_MEMBER to use CXX (#762)
  • Add support for Netpbm .pnm (.ppm/.pgm/.pbm) images using OpenCV (#599)
  • Evaluation at every epoch in TF RN50 (#717)
  • Change cast in resampling setup to silence a warning (#749)
  • Refactor nvdecoder: remove useless thread (#733)
  • Add more checks to AspectRatio test (#635)
  • Enable test for TensorFlow and CUDA 10 (#721)
  • Pinned allocator for nvJPEG CPU stage (#664)
  • Add lazy loading (#746)
  • Image cache batch copy (#742)
  • Add new ws policy for separated executor (#671)
  • Add test cases for nvJPEGDecoder fused crop variants (#716)
  • Resampling in mini-batches (#744)
  • Adding default cuda stream priority option (#734)
  • Adding test for DALI FW iterators (#706)
  • Mark stage buffers as consumed with stream callback (#712)
  • Move Optical Flow from aux to pipeline/operators/optical_flow (#720)
  • Disable hybrid huffman threshold by default, as it seems to lower performance (#736)
  • Disable Optical Flow temporal hints by default (#723)
  • Remove misleading info about OpenCV 2 support from readme. (#686)
  • Enforcing workers termination by waking up workers in Executor dtor (#699)
  • Temporarily remove broken OpticalFlow example (#731)
  • Add APEX building to SSD L1 test (#727)
  • Update as_array returned shape and update detection pipeline test (#724)
  • Skip image loading if the image is in cache (#669)
  • Handle empty tensors in the backend and frontend (#713)
  • Remove unused dependencies from tests (#715)
  • Update test_pipeline.py (#704)
  • Update support PyTorch version in README (#714)
  • Fix python L1 test for nvJPEG (#711)
  • Special handling for progressive JPEG in nvJPEG decoder (#695)
  • Use seed sequence for RandomCropAttr. Ensure consistency between different implementations of random crop attr (#692)
  • Add different decoder options to test_RN50_data_pipeline.py (#689)
  • Add unit tests for COCOReader (#709)
  • Enabling hint for Optical flow calculation (#702)
  • Rework FW Plugins to prefetch only as many batches as needed (#703)
  • Change Og to O0 to enable debug symbols in stacktrace (#701)
  • Refactor detection pipeline test (#693)
  • Make COCOReader options mutually exclusive (#698)
  • Update nvJPEG thresholds and add filename info in GPU stage (#672)
  • Optical flow support for BGR and GRAY types (#684)
  • Enable cubic filtering test. (#690)
  • Make BbFlip on CPU act as on GPU (#661)
  • Obtaining output tensor size from OpticalFlowAdapter (#680)
  • Update README (#648)
  • Add note about hue argument unit in color augumentation. (#683)
  • OF integration & example (#659)
  • Fix Presize test - add Buffer::padding() (#670)
  • Fix RN50 example for PyTorch (#667)
  • Update RN50 examples to use nvJPEG random crop decoding (#663)
  • Sort operators in docs, add padding to allocations (#660)
  • Turing OF adapter (#644)
  • Add range iterator constructor for dyn TensorShape (#629)
  • Add test for buffer presize (#647)
  • GTest submodule update (#646)

Breaking API changes

  • Internal python pipeline API has changed, if any function _* was used they need to be updated to reflect new semantic

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.9.1
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.9.1

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.8.1

Published by JanuszL over 5 years ago

Bug fixes

  • Fix nvJPEGDecoder cache when using new nvJPEG decoupled API (#748)
  • Stop using tf-nightly-gpu since it broke the build. Use latest released version instead (#730)

Improvements

  • None

Breaking API changes

  • None

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.8.1
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.8.1

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.8.0

Published by JanuszL over 5 years ago

Bug fixes

  • Change unconditional move to forward (#651)
  • Fix use after move error (#650)
  • Disable split_stages when old nvjpeg is present (#645)
  • Replace host_defines.h with cuda_runtime.h (#643)
  • Fix WriteImageScaleBias for GPUBackend images (#639)
  • Use the same lock to condition for cv in ExternalSource (#628)
  • Fix missing header in crop_window.h (#638)
  • Add -Wsign-compare to CMAKE_CXX_FLAGS for Clang (#626)
  • Fix reading expired c_str(). (#620)
  • More fixes for tests (#618)
  • Fix out-of-range write. (#614)
  • Fix L1 jupyter plugin test (#612)
  • Fix tests for CUDA 10 (#605)
  • Fixing note admonitions and section separators (#604)
  • Fix development documentation warning (#607)
  • Disabled failing tests (#608)
  • Fix linter (#603)
  • Fix description in RN50 test pipeline (#596)
  • Fix aspect ratio distribution. (#583)
  • Fix nvJPEGDecoder unit tests (#576)
  • Fix TensorFlow RN50 training example for multinode (#569)
  • Fixed broken DEBUG build. (#571)
  • Fix memory corruption caused by sparse tensor handling (#555)
  • Fix Slice documentation (remove arguments from Crop) (#554)
  • Fix import in tensorflow-plugin-sparse-tensor example (#546)
  • Fix OpenCV 3.x compatibility (#548)
  • Fix lint in Crop GPU 2.0 (#542)
  • Fix lint in Crop GPU (#541)
  • Fix Crop GPU for supporting Crop derivatives (#538)
  • Fix segmentation fault by dropping usage of std::function in TypeInfo (#535)
  • Fix random bounding box crop (#512)
  • Align SSDRandom crop with RandomBBoxCrop + Slice (#578)
  • Fix use after move error (#650)
  • Change unconditional move to forward (#651)

Improvements

  • Expose support for setting up separated execution in Python (#624)
  • Add prefetched batch queue in Reader (#641)
  • Separate Executor Queues - Generalize Executors (#577)
  • Allow crop window dimensions to be argument inputs (#637)
  • Utility kernels for Optical flow (#565)
  • Removing dimension from TensorView (#625)
  • Add ROI resize to CPU resampling (crop+flip). (#631)
  • Add split_stages to nvJPEGDecoder* operators (#634)
  • Docker multi-stage build for CUDA (#586)
  • Generalize Executor tests (#609)
  • nvJpegDecoderCrop, nvJpegDecoderRandomCrop, nvJpegDecoderSlice (#543)
  • Store Queues of Buffers for corresponding TensorNodes (#551)
  • Add any_of and all_of in kernels util (#627)
  • Add cache for nvjpeg decoder with decoupled api (#616)
  • Unified filtering setup for CPU and GPU. (#613)
  • Refactor Executor and OpGraph (#540)
  • Add two-stage splitted nvJPEGDecoder with new decoupled API (#582)
  • SSD multi-gpu example (#517)
  • Enable argument inputs in Mixed operators (#621)
  • Add WorkspaceDataFactory with traits for Tuples and WS (#602)
  • Refactor OpGraph - OpNodes and TensorNodes (#513)
  • Add better colors for Executor NVTX marks (#619)
  • Add single nvJPEGDecoder with new decoupled API (#579)
  • Make nvJPEGDecoder cache global (#594)
  • Add ROI-based GPU resampling (#606)
  • Change is_cpu from TensorMeta to StorageDevice enum (#598)
  • Change DALIOpType to dali::OpType enum class (#597)
  • Add options for COCOReader (#588)
  • Add CUDA 10.0 version whl support (#570)
  • Move master docs warning to the top of page (#601)
  • adding dali_extra support (#595)
  • Adding CUStream class to Dali (#589)
  • Add warning about C++ API stability (#587)
  • Improve RN50 pipeline test (#584)
  • Unify random crop generation. (#590)
  • PyTorch SuperRes with VideoReader example (#380)
  • Add advanced section in the documentation (#575)
  • nvJPEGDecoder with cache (#550)
  • Resize with resampling kernels (#520)
  • add cuPointerGetAttributes (#580)
  • Add cubic filter for CPU and GPU. (#574)
  • Make sticking to data shard optional (#563)
  • add turing optical flow (#572)
  • Resampling for GPU (#518)
  • Add option to select targeted CUDA archs (#564)
  • Add printing of average TensorFlow training performancein L3 test (#566)
  • Resampling for CPU (#556)
  • Common (CPU, GPU) changes to directory structure for resampling. (#558)
  • Generalize volume function. (#559)
  • Set proper CMAKE_XXX_FLAGS for different build types (#553)
  • Add RN50 data pipeline perf test (#549)
  • Make every GPU stick to its shard (#545)
  • Add a proper error reporting for build.sh docker script (#547)
  • Add ability to return sparse tensor on CPU for TF DALI op (#509)
  • Remove excessive #include checks from cpplint. (#544)
  • OpticalFlow Operator (#526)
  • Kernel API extensions and refactoring. (#536)
  • Add documentation for operators expecting sequence inputs (#525)
  • Make nvJpeg operator to fallback to the CPU even for wrong images (#539)
  • Samplers for CPU and GPU surfaces. (#533)
  • Flatten Sequence and Sequence Crop GPU operator (#477)
  • Workaround for Flip misaligned by 1 pixel. (#534)
  • Simple argument parser (#531)
  • Transpose Operator for GPU (#514)
  • Enhance documentation of Crop, and seed argument (#532)
  • Enhance Crop arguments documentation (#529)
  • Add link to functions in operator table in docs (#519)
  • Add surface type for image processing. (#528)
  • OpticalFlowAdapter generalization (#505)
  • Host decoder external crop (#503)
  • Add error message to views.h (#501)
  • Remove redundant checks in Crop GPU (#515)
  • Common utilities for kernels. (#496)
  • Add InternalOp in OpSchema for better blacklisting of internal ops (#652)
  • Use resampling in both RandomResizedCrop and Resize (#642)

Breaking API changes

  • None

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.8.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.8.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.7.0

Published by JanuszL over 5 years ago

Bug fixes

  • Fix TensorFLow example (#511)
  • Update CUDA synchronicity for VideoReader (#508)
  • Change download path for FFmpeg in Dockerfile (#507)
  • Let OpenCV build to pick turbojpeg from system, as it was building turbojpeg anyway (libjpeg usage was deprecated) (#490)
  • Make L3 PyTorch really fail when it fails (#502)
  • Temporary fix for broken tensorflow import (keras-preprocessing is importing pandas, which is not installed) (#498)
  • Fix L3 RN50 tests accuracy (#468)
  • Fix the table in README.rst
  • Fix FP16 type support on CPU (#464)
  • Fixes for presizing. (#472)
  • Fix ssd random crop (#470)
  • Force BOOST_PP to recognize NVCC as supporting variadic macros. (#463)
  • fix bug in TensorView creation (#456)
  • Add -y to ffmpeg split for CI (#445)
  • Fix problems with the external input operator (#453)
  • Fix compatibility with OpenCV 4 and 2 (#446)
  • Remove BUILD_ID from sdist package name as it is interpreted as part of the version by pip (#425)
  • Fix broken lint build (#419)

Improvements

  • Add HostDecoderRandomCrop (#462)
  • Add Element Extract Operator (#420)
  • Make as_cpu return a non pinned TensorList to avoid cudaMallocHost calls (#500)
  • Add more verbose error message when TensorFlow plugin shape doesn't m… (#495)
  • Add TestOpArg constructors for string literals. (#499)
  • Update Creating Op doc to new Workspace::Output API (#492)
  • Add dali_kernels and dali_kernel_test libraries. (#451)
  • Tweak DaliOperatorTest (#485)
  • Add read_ahead option to file readers (#489)
  • Change TensorView backend in OF API
  • Make OperatorBase public and move InstantiateOperator to operator.h (#487)
  • Refactor GPU Reader Op (#483)
  • Alias typename in OF stub (#484)
  • Implementation of DaliOperatorTest (#404)
  • OF stub implementation (#478)
  • Update Docker build in the README (#479)
  • Add build script and runner docker file (#236)
  • Proper affinity handling (#471)
  • Add Boost info to Readme.rst (#475)
  • Remove default info (#473)
  • Add options for COCO reader (#469)
  • Add per-operator presize hints to stage output queues. (#466)
  • Add test to check if DALI whl bundles all neccessary libs it links to (#461)
  • Per-operator buffer presizing. (#439)
  • dali::any - almost complete implementation of std::any. (#459)
  • Add Python 3.7 DALI build (#455)
  • Update a WS::Output call in debug mode (#458)
  • Change nvcc invocation in CMake to dry run (#457)
  • Make *Workspace::Output return type non-const ref (#449)
  • Generalize -gencode flags generation (#450)
  • Makes files to be mmaped instead of reading (#406)
  • Add support for step, stride & shuffling in SequenceReader, filter extensions for file readers (#363)
  • Improve the random generator initialization (#430)
  • Kernel API example + tests (#386)
  • Refine builds and test (#437)
  • Add clean catch of Reader's prefetch error by Python thread (#429)
  • API for optical flow (#434)
  • Add cmake WERROR option description in the readme. (#441)
  • Add dtype argument for VideoReader (#436)
  • Get Tensor(List)View from Tensor(List) (#409)
  • Remove opencv package from TensorFlow test (#433)
  • Color space conversion operators (#395)
  • Make files read ordered inside class for file loader (#415)
  • Change TensorReference to EdgeReference for code clarity (#411)
  • Rename nvidia-dali-tf-plugin package to include build id (#414)
  • Add layout to VideoReader (#413)

Breaking API changes

  • None

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • There is no clear distinction in the documentation between operators supporting Video sequences and images

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.7.0

Or use direct download links:

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.6.1

Published by JanuszL over 5 years ago

Bug fixes

  • Deliver exactly 1 epoch from DALIGenericIterator in pyTorch (#391)
  • Avoid adding MakeContiguous twice for the same output (#405)
  • Stop returning memory allocated with new from the C API (#396)
  • Add missing argument in TF ResNet demo README (#397)
  • Fix error message formatting in python (#387)
  • TensorView fixes. (#378)
  • Fix spelling (#372)
  • Fix build warnings in the video loader operator. (#368)
  • Fix device selection in PipelinedExecutor. (#361)
  • Blacklist operators that should not be exposed (#355)

Improvements

  • Add optional resize_longer argument to resize op. Extend COCOReader op to optionally return img_ids. Add optional min_canvas_size argument to paste op. (#402)
  • Make TF plugin to be compiled during installation (#398)
  • Add building DALI against nightly TF release (#390)
  • TensorWrapper implementation for testing (#401)
  • Add notebook example for VideoReader (#376)
  • ArgumentKey impl for Testing API (#383)
  • Kernel API design (#330)
  • Change names from yuv to ycbcr in VideoReader for clarity (#385)
  • Plugin Manager (#364)
  • Make mixed in docs ops table start with capital letter (#384)
  • Apply modernize-use-override (#381)
  • Add gpu box encoder (#371)
  • Add auto_reset parameter for MXNet and PyTorch iterators (#379)
  • Move operators to separate, static libdali_operators.a lib (#374)
  • Testing API Proposal (#338)
  • Add non-owning Tensor datatypes for kernels (#346)
  • Add Step and multiple containers support in VideoReader (#360)
  • Add guards for CUuid for cuda 10 (#375)
  • Dynamic linking for CUDA driver api (#373)
  • Add normalized and image_type arguments for VideoReader (#351)
  • Change nvcuvid link to dyn + hint for FFmpeg (#370)
  • Add fallback for nvJPEG to the CPU (#365)
  • Add new, faster RapidJSON parser (#339)
  • Change cpp #ifdef to #if in VideoReader (#359)
  • Add PyTorch and MXNet example with various readers (#343)
  • Make '-werror' optional in CMake (#353)
  • Add description of commit message style to Contributing guide (#350)
  • Remove semicolons in plugins (#345)

Breaking API changes

  • PyTorch iterator returns exact number of samples per epoch, so last batch could be smaller if epoch size is not divisible by the batch size. To keep the old behavior when data is wrapped up use “stop_at_epoch” argument

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.6.1

Or use direct download links:

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.6.0

Published by JanuszL almost 6 years ago

Bug fixes

  • Fix problem with GPU DALI operator in the TensorFlow evaluated on the CPU (#335)
  • Fix obtaining color augmentation per sample (#337)
  • Fix command line in the TF Example README (#327)
  • Fix issues reported by valgrind (#308)
  • Fixes in BBFlip and consistency in bbox format (#300)
  • Fix line endings from CRLF to LF (#315)
  • Fix for race condition on Displacement Filter Impl (#311)
  • Fixed slice coordinates calculation (#312)
  • Fix validation pipeline for accuracy in TF example (#305)
  • Fix ResizeAttr usage in Resize operator (#299)
  • Skip 0 sized images in the MxNet reader. (#303)
  • Fix tfrecord2idx compatibility for python3 (#288)
  • Fix clang build (#276)
  • TF Example: updates TF op call with the right args (#295)

Improvements

  • Add TensorFlow RN50 demo to the Sphinx documentation (#352)
  • Add rst doc for ssd pytorch example (#349)
  • Added SSD training example (#342)
  • Add base of VideoReader (#316)
  • Implement SequenceCrop Operator for CPU (#283)
  • Pytorch/MXNet plugin - use dictionary of categories (#282)
  • adding ifdef for jpeg turbo support (#341)
  • Remove NonConstRef check in cpplint.py (#340)
  • Add supported device by every operator to docs (#326)
  • Added cpu box encoder for SSD support (#325)
  • Alligns TensorFlow operator supported types with what DALI can provide (#332)
  • Sequence Reader for extracted frames (#281)
  • Add CPU operator for TensorFlow plugin with an example (#322)
  • Increase num_threads in TF RN example (#321)
  • Support for multiple labels in MXNet reader (#319)
  • Bbox crop label filtering (#320)
  • Add a wrapper for TensorFlow plugin to make pipeline serialization transparent (#310)
  • Added bounding box flipping on GPU. (#314)
  • Minimal changes for CPU CropMirrorNormalize (#257)
  • Added bounding box paste for CPU backend. (#294)
  • Add ability to return CPU TensorList as numpy array (#304)
  • Remove debug prints from async_pipelined_executor (#298)
  • Documentation Badge Added (#291)
  • TF Example: specify steps arg to tf.Estimator.evaluate for ending the evaluation (#293)
  • GPU version of RandomBBoxCrop and Slice (#269)
  • Printing the right error message in OperatorInstance init (#286)
  • Remove stat call during file discovery in the reader (#275)
  • Make libjpegturbo root dir hint preceding pkgconfig (#285)
  • Make Dali linking with static libprotobuf if possible (#284)
  • Make TensorFlow DALI operator able to return the arbitrary number of outputs (#265)

Breaking API changes

  • DALI TensorFlow operator has new API - please check examples for the reference
  • PyTorch and MXNet python iterators API has changed - please check examples for the reference

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.6.0

Or use direct download links:

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here
DALI - DALI v0.5.0

Published by JanuszL almost 6 years ago

Bug fixes

  • Fixed docstring of prefetch queue depth (#263)
  • Add checking if there is any supported jpeg inside batch for batch decode (#245)
  • Add enforce for num_shards > shard_id (#246)
  • Make jupyter example fully compatible with python3 (#233)
  • Add .clang-format for Google C++ style guide (#210)
  • Update MxNet version in the README (#204)
  • Fixed race condition in AsyncPipelinedExecutor destructor (#271)

Improvements

  • Increased seed size to int64 (#252)
  • SSD support for COCO reader (#196)
  • Move PyTorch example training pipeline to the CPU (#247)
  • Add version variable to init (#250)
  • Tiff decoding (#248)
  • Object orienting image module (#222)
  • Changing Tensor::ntensor() return type (#242)
  • Type safe reader with user-provided custom-type handling (#232)
  • Add pipelined execution completion callback setter (#226)
  • Add better errors in decoders (#218)
  • Make ABI test working with installed whl (#220)
  • Added new examples to online docs (#270)
  • Added Clang to Dockerfile.deps and pass CC and CXX as arguments (#264)
  • Added example demo for ResNet with TensorFlow and DALI (#251)
  • Remove unused private field (#205)

Breaking API changes

  • Random seed type changed from INT to INT64, therefore, serialized pipelines from versions prior to 0.5 are not compatible with the current DALI version.

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.5.0

Or use direct download links:

DALI - DALI v0.4.1

Published by JanuszL almost 6 years ago

Bug fixes

  • Fixed TF 1.11 and TF 1.12 compatibility (#237)
  • Fixed PyTorch iterator for multi-GPU (#239)

Improvements

  • Made jupyter tests executing inplace (#255)
  • Removed hardcoded pipeline length in PipelinedExecutor (#239)
  • Adjusted PyTorch example to use new nvJpeg API (#239)
  • Remove double-buffering on the MXNet side (#258)

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.4.1

Or use direct download links:

DALI - DALI v0.4.0

Published by JanuszL almost 6 years ago

Bug fixes

  • Fixed ability to use the same output from the support operator by CPU and GPU stage
  • Removed inconsistent-missing-override Clang warning (#197)
  • Fixed clang warnings in half.hpp and tests (#194)
  • Resolved conflicting build dirs (#189)
  • Removed the redundant imports and spaces in pytorch example (#190)
  • Fixed table in README.rst
  • Fixed reporting of the end of epoch in MXNet and pyTorch plugins (#180)
  • Fixed parsing of JPEG headers (#175)
  • Maked assigning of the classes to discovered dirs by file reader base on alphabetic order.
  • Fixed BMP size reading
  • Moved wait in multiple input sets case to the common place to guard against problem reoccurring in newly added ops
  • Removed batch_size_ from CoinFlip operator (#152)
  • Fixed corruption in MXNet reader when image is split between multiple records (#216)

Improvements

  • Added bounding box mirror operator (#188)
  • Added random crop for SSD (#176)
  • Added COCO dataset reader (#110)
  • Removed visibility of all non DALI symbols and test if ABI is clean (#191)
  • Added support for pad in MXNet plugin (#186)
  • Reduced memory usage (#195)
  • Made libprotobuf internal to DALI only (#179)
  • Added CUDA 10 based build (#178)
  • Made use epoch_size instead of hardcoded values (#174)
  • Added random paste operator (#105)
  • Added clang build (#163)
  • Added png in testing pipeline, add some of tiff routines
  • Made files to be copied after build not only when libdali is rebuild
  • Put common test code into one file
  • Upgraded OpenCV to 3.4.3 (#168)
  • Added color-twist operator (#164)
  • Changed MxNet to 1.3.0 no-beta (#183)
  • Added better sharding when number of shards does not divide the dataset size evenly (#181)
  • Updated google benchmark to v1.4.1 + several fixes (#182)
  • Added CPU versions of Crop/CropCastPermute operators (#148)
  • Added info about posting questions and problems
  • Updated PyTorch example to be alligned with the reent APEX release (#206)
  • Improved load balancing nvJPEG work (#217)
  • Updated nvJPEG to 0.2.0 version (#227)
  • Added fine grained control over output buffers in the pipeline (#212)

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.4.0

Or use direct download links:

DALI - DALI v0.3.0

Published by JanuszL about 6 years ago

Bug fixes

  • Adjusted PyTorch Dali pipeline to be similar to MXNet example (#107)
  • Add CPU fallback for BMP images and conscious fail for GIF (#124)
  • Enable FileReader shuffling for GPU0 (#134)
  • Fix squeeze for tensor with 1 element
  • Fix segfault in MXNetReader when given bad path to index file
  • Increase timeout, parametrize Python version in Jupyter tests (#126)
  • Fix segfault in Filereader if directory does not exist.
  • Update Workspace docstrings (#111)
  • Allow pkg_config to fail in the search for JpegTurbo
  • Fixed wrong rewind in TFRecord reader (#167)

Improvements

  • Added CPU version of Resize operator (#127)
  • Added Caffe reader to TF multi reader example (#103)
  • Added filtering extensions that FileReader can read (#137)
  • Made DALI understand float16 input from python
  • Added float16 as possible output type to python
  • Added flip operator (#130)
  • Added 'at' method to TensorListGPU (#131)
  • Refactored tests (#91)
  • Shortened git SHA in the Sphinx docs to 7 chars (#108)
  • Made files to be copied during build into build_dir. (#87)
  • Added links to GTC presentation to README
  • Reduced number of pinned memory allocations (#169)

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.3.0

Or use direct download links:

DALI - DALI v0.2.0

Published by JanuszL about 6 years ago

Bug fixes

  • Avoid full construction of the pipeline during construction and fix seed support in serialized pipelines (#16)
  • Fix as_tensor not keeping the parent alive in Python (#60)
  • Fix for "invalid resource handle" in multi-gpu training
  • Fixes to PyTorch example. Need to reset DALI iterators between epochs. Putting model/loss computation back to default stream due to encountered memory access errors otherwise (#15)
  • Move example file_list to proper dir (#38)
  • Added fallback to host decoder when image is not JPEG but PNG instead (like n02105855_2933.JPEG from ImageNet) (#118)

Breaking API changes

  • The API for the Resize operator changed to match other similar operators like ResizeCropMirror.
  • The API for the TensorFlow plugin changed to allow specifying the whole shape of the tensor instead of N, H, and W separately; which enables handling both NCHW and NHWC outputs.
  • The type of labels produced by the TensorFlow plugin have changed. In DALI version 0.1.2, it was always tf.float32. In this release, a new optional parameter called label_type is introduced to the TensorFlow plugin to control the type of label. The default value for label_type is tf.int64 to better align with the label type in TFRecord.

Improvements

  • Add NVTX ranges for Operators run (#73)
  • Add a note about NGC containers in README (#78)
  • Unfused Crop operator and CropCastPermute operator (#50)
  • Make build more restrictive Werror (#71)
  • Add links to docs in README (#72)
  • Expanded TF compatibility tests
  • Add example with multiple readers pluged into TF (#58)
  • Make pkg-config optional for CMake (#59)
  • Resize refactor (#63)
  • Add type casting in Python (#54)
  • Add check that third_party git submodules are synced
  • Add fallback in cmake when .pc file is not available for libjpeg-turbo (#49)
  • Sphinx documentation (#36)
  • Fix nvJpeg include dir (#47)
  • Add private attribute naming convention to Pipeline::current_seed_ (#46)
  • Add a shape argument for the output of the TF plugin (#45)
  • Bump up libturbo-jpeg version to 1.5.3 (#44)
  • Clean up dependencies list and dependency checks (#42)
  • Switch over completely to FindProtobuf.cmake from CMake 3.9.6 (#41)
  • Update README for prerequisites (#40)
  • Add error checking for file_list format in file_loader. (#37)
  • Add test support for various versions of pyTorch (#35)
  • Add polymorphism for TF plugin outputs (#33)
  • Add tensor layout checking (#32)
  • Avoid rebuilding *.cu files during 'make install' after 'make' (#25)
  • Add CUDA 8, OpenCV 2 support and options to disable libjpeg-turbo and nvJPEG (#22)
  • Add CONTRIBUTING.md file and updated contribution section in the README.md (#20)
  • Avoid full construction of the pipeline during construction and fix seed support in serialized pipelines (#16)
  • Add int64 as label type and set it as default (#125)

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.2.0

Or use direct download links:

DALI - DALI v0.1.2

Published by cliffwoolley about 6 years ago

Bug fixes

  • Fix compatibility with TensorFlow 1.9 (#52)
  • Update to nvJPEG v0.1.2 to fix batched decoding when a batch contains both grayscale and color images (#79)

Improvements

  • Add Tensorflow 1.7 support (#24)
  • Better overlap when using DALI with multi-GPU in MXNet and pyTorch (#76)

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.1.2

Or use direct download links:

DALI - DALI v0.1.1

Published by cliffwoolley over 6 years ago

Bug fixes

Improvements

  • Binary compatibility of the pre-built DALI binaries with pre-built DL frameworks is improved (https://github.com/NVIDIA/DALI/issues/13).
    • In support of this, most dependencies are now statically linked into the pre-built binaries, and the list of symbols exported from the shared objects are significantly reduced.
    • A beneficial side effect is that CUDA 9.0 Toolkit is no longer required to be installed to use pre-built binaries; only the corresponding NVIDIA Driver is required. This for example allows compatibility with a DL framework otherwise built against CUDA 9.1 or 9.2.

Binary builds

Install via pip:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.1.1

Or use direct download links: