Bot releases are hidden (Show)

arrayfire - v3.9.0 Latest Release

Published by umar456 about 1 year ago

v3.9.0

Improvements

Add oneAPI backend #3296
Add support to directly access arrays on other devices #3447
Add asynchronous reduce all functions that return an af_array #3199
Add broadcast support #2871
Improve OpenCL CPU JIT performance #3257 #3392
Optimize thread/block calculations of several kernels #3144
Add support for fast math compiliation when building ArrayFire #3334 #3337
Optimize performance of fftconvolve when using floats #3338
Add support for CUDA 12.1 and 12.2
Better handling of empty arrays #3398
Better handling of memory in linear algebra functions in OpenCL #3423
Better logging with JIT kernels #3468
Optimize memory manager/JIT interactions for small number of buffers #3468
Documentation improvements #3485
Optimize reorder function #3488

Fixes

Improve Errors when creating OpenCL contexts from devices #3257
Improvements to vcpkg builds #3376 #3476
Fix reduce by key when nan's are present #3261
Fix error in convolve where the ndims parameter was forced to be equal to 2 #3277
Make constructors that accept dim_t to be explicit to avoid invalid conversions #3259
Fix error in randu when compiling against clang 14 #3333
Fix bug in OpenCL linear algebra functions #3398
Fix bug with thread local variables when device was changed #3420 #3421
Fix bug in qr related to uninitialized memory #3422
Fix bug in shift where the array had an empty middle dimension #3488

Contributions

Special thanks to our contributors:
Willy Born
Mike Mullen

arrayfire - v3.8.3

Published by umar456 over 1 year ago

v3.8.3

Improvements

Add support for CUDA 12 #3352
Modernize documentation style and content #3351
memcpy performance improvements #3144
JIT performance improvements #3144
join performance improvements #3144
Improve support for Intel and newer Clang compilers #3334
CCache support on Windows #3257

Fixes

Fix issue with some locales with OpenCL kernel generation #3294
Internal improvements
Fix leak in clfft on exit.
Fix some cases where ndims was incorrectly used ot calculate shape #3277
Fix issue when setDevice was not called in new threads #3269
Restrict initializer list to just fundamental types #3264

Contributions

Special thanks to our contributors:
Carlo Cabrera
Guillaume Schmid
Willy Born
ktdq

arrayfire - v3.8.2

Published by umar456 over 2 years ago

v3.8.2

Improvements

Optimize JIT by removing some consecutive cast operations #3031
Add driver checks checks for CUDA 11.5 and 11.6 #3203
Improve the timing algorithm used for timeit #3185
Dynamically link against CUDA numeric libraries by default #3205
Add support for pruning CUDA binaries to reduce static binary sizes #3234 #3237
Remove unused cuDNN libraries from installations #3235
Add support to staticly link NVRTC libraries after CUDA 11.5 #3236
Add support for compiling with ccache when building the CUDA backend #3241

Fixes

Fix issue with consecutive moddims operations in the CPU backend #3232
Better floating point comparisons for tests #3212
Fix several warnings and inconsistencies with doxygen and documentation #3226
Fix issue when passing empty arrays into join #3211
Fix default value for the AF_COMPUTE_LIBRARY when not set #3228
Fix missing symbol issue when MKL is staticly linked #3244
Remove linking of OpenCL's library to the unified backend #3244

Contributions

Special thanks to our contributors:
Jacob Kahn
Willy Born

arrayfire - v3.8.1 Release

Published by 9prady9 almost 3 years ago

v3.8.1

Improvements

moddims now uses JIT approach for certain special cases - #3177
Embed Version Info in Windows DLLs - #3025
OpenCL device max parameter is now queries from device properties - #3032
JIT Performance Optimization: Unique funcName generation sped up - #3040
Improved readability of log traces - #3050
Use short function name in non-debug build error messages - #3060
SIFT/GLOH are now available as part of website binaries - #3071
Short-circuit zero elements case in detail::copyArray backend function - #3059
Speedup of kernel caching mechanism - #3043
Add short-circuit check for empty Arrays in JIT evalNodes - #3072
Performance optimization of indexing using dynamic thread block sizes - #3111
ArrayFire starting with this release will use Intel MKL single dynamic library which resolves lot of linking issues unified library had when user applications used MKL themselves - #3120
Add shortcut check for zero elements in af_write_array - #3130
Speedup join by eliminating temp buffers for cascading joins - #3145
Added batch support for solve - #1705
Use pinned memory to copy device pointers in CUDA solve - #1705
Added package manager instructions to docs - #3076
CMake Build Improvements - #3027 , #3089 , #3037 , #3072 , #3095 , #3096 , #3097 , #3102 , #3106 , #3105 , #3120 , #3136 , #3135 , #3137 , #3119 , #3150 , #3138 , #3156 , #3139 , #1705 , #3162
CPU backend improvements - #3010 , #3138 , #3161
CUDA backend improvements - #3066 , #3091 , #3093 , #3125 , #3143 , #3161
OpenCL backend improvements - #3091 , #3068 , #3127 , #3010 , #3039 , #3138 , #3161
General(including JIT) performance improvements across backends - #3167
Testing improvements - #3072 , #3131 , #3151 , #3141 , #3153 , #3152 , #3157 , #1705 , #3170 , #3167
Update CLBlast to latest version - #3135 , #3179
Improved Otsu threshold computation helper in canny algorithm - #3169
Modified default parameters for fftR2C and fftC2R C++ API from 0 to 1.0 - #3178
Use appropriate MKL getrs_batch_strided API based on MKL Versions - #3181

Fixes

Fixed a bug JIT kernel disk caching - #3182
Fixed stream used by thrust(CUDA backend) functions - #3029
Added workaround for new cuSparse API that was added by CUDA amid fix releases - #3057
Fixed const array indexing inside gfor - #3078
Handle zero elements in copyData to host - #3059
Fixed double free regression in OpenCL backend - #3091
Fixed an infinite recursion bug in NaryNode JIT Node - #3072
Added missing input validation check in sparse-dense arithmetic operations - #3129
Fixed bug in getMappedPtr in OpenCL due to invalid lambda capture - #3163
Fixed bug in getMappedPtr on Arrays that are not ready - #3163
Fixed edgeTraceKernel for CPU devices on OpenCL backend - #3164
Fixed windows build issue(s) with VS2019 - #3048
API documentation fixes - #3075 , #3076 , #3143 , #3161
CMake Build Fixes - #3088
Fixed the tutorial link in README - #3033
Fixed function name typo in timing tutorial - #3028
Fixed couple of bugs in CPU backend canny implementation - #3169
Fixed reference count of array(s) used in JIT operations. It is related to arrayfire's internal memory book keeping. The behavior/accuracy of arrayfire code wasn't broken earlier. It corrected the reference count to be of optimal value in the said scenarios. This may potentially reduce memory usage in some narrow cases - #3167
Added assert that checks if topk is called with a negative value for k - #3176
Fixed an Issue where countByKey would give incorrect results for any n > 128 - #3175

Contributions

Special thanks to our contributors: HO-COOH, Willy Born, Gilad Avidov, Pavan Yalamanchili

arrayfire - v3.8 Release

Published by umar456 almost 4 years ago

v3.8.0

New Functions

Ragged max reduction - #2786
Initialization list constructor for array class - #2829 , #2987
New API for following statistics function: cov, var and stdev - #2986
Bit-wise operator support for array and C API (af_bitnot) - #2865
allocV2 and freeV2 which return cl_mem on OpenCL backend - #2911
Move constructor and move assignment operator for Dim4 class - #2946

Improvements

Add f16 support for histogram - #2984
Update confidence connected components example for better illustration - #2968
Enable disk caching of OpenCL kernel binaries - #2970
Refactor extension of kernel binaries stored to disk .bin - #2970
Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
Improve warnings messages from run-time kernel compilation functions - #2996

Fixes

Fix bias factor of variance in var_all and cov functions - #2986
Fix a race condition in confidence connected components function for OpenCL backend - #2969
Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
Fix randn by passing in correct values to Box-Muller - #2980
Fix rounding issues in Box-Muller function used for RNG - #2980
Fix problems in RNG for older compute architectures with fp16 - #2980 #2996
Fix performance regression of approx functions - #2977
Remove assert that check that signal/filter types have to be the same - #2993
Fix checkAndSetDevMaxCompute when the device cc is greater than max - #2996
Fix documentation errors and warnings - #2973 , #2987
Add missing opencl-arrayfire interoperability functions in unified back - #2981

Contributions

Special thanks to our contributors: P. J. Reed

arrayfire - v3.7.3 Release

Published by 9prady9 almost 4 years ago

v3.7.3

Improvements

Add f16 support for histogram - #2984
Update confidence connected components example for better illustration - #2968
Enable disk caching of OpenCL kernel binaries - #2970
Refactor extension of kernel binaries stored to disk .bin - #2970
Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
Improve warnings messages from run-time kernel compilation functions - #2996

Fixes

Fix bias factor of variance in var_all and cov functions - #2986
Fix a race condition in confidence connected components function for OpenCL backend - #2969
Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
Fix randn by passing in correct values to Box-Muller - #2980
Fix rounding issues in Box-Muller function used for RNG - #2980
Fix problems in RNG for older compute architectures with fp16 - #2980 #2996
Fix performance regression of approx functions - #2977
Remove assert that check that signal/filter types have to be the same - #2993
Fix checkAndSetDevMaxCompute when the device cc is greater than max - #2996
Fix documentation errors and warnings - #2973 , #2987
Add missing opencl-arrayfire interoperability functions in unified back - #2981
Fix constexpr relates compilation error with VS2019 and Clang Compilers - #3049

Contributions

Special thanks to our contributors: P. J. Reed

arrayfire - v3.8 Release Candidate

Published by umar456 about 4 years ago

v3.8.0 Release Candidate

New Functions

Ragged max reduction - #2786
Initialization list constructor for array class - #2829 , #2987
New API for following statistics function: cov, var and stdev - #2986
Bit-wise operator support for array and C API (af_bitnot) - #2865
allocV2 and freeV2 which return cl_mem on OpenCL backend - #2911
Move constructor and move assignment operator for Dim4 class - #2946

Improvements

Add f16 support for histogram - #2984
Update confidence connected components example for better illustration - #2968
Enable disk caching of OpenCL kernel binaries - #2970
Refactor extension of kernel binaries stored to disk .bin - #2970
Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
Improve warnings messages from run-time kernel compilation functions - #2996

Fixes

Fix bias factor of variance in var_all and cov functions - #2986
Fix a race condition in confidence connected components function for OpenCL backend - #2969
Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
Fix randn by passing in correct values to Box-Muller - #2980
Fix rounding issues in Box-Muller function used for RNG - #2980
Fix problems in RNG for older compute architectures with fp16 - #2980 #2996
Fix performance regression of approx functions - #2977
Remove assert that check that signal/filter types have to be the same - #2993
Fix checkAndSetDevMaxCompute when the device cc is greater than max - #2996
Fix documentation errors and warnings - #2973 , #2987
Add missing opencl-arrayfire interoperability functions in unified back - #2981

Contributions

Special thanks to our contributors: P. J. Reed

arrayfire - v3.7.2 Release

Published by 9prady9 over 4 years ago

v3.7.2

Improvements

Cache CUDA kernels to disk to improve load times(Thanks to @cschreib-ibex) #2848
Staticly link against cuda libraries #2785
Make cuDNN an optional build dependency #2836
Improve support for different compilers and OS #2876 #2945 #2925 #2942 #2943 #2945
Improve performance of join and transpose on CPU #2849
Improve documentation #2816 #2821 #2846 #2918 #2928 #2947
Reduce binary size using NVRTC and template reducing instantiations #2849 #2861 #2890
Improve reduceByKey performance on OpenCL by using builtin functions #2851
Improve support for Intel OpenCL GPUs #2855
Allow staticly linking against MKL #2877 (Sponsered by SDL)
Better support for older CUDA toolkits #2923
Add support for CUDA 11 #2939
Add support for ccache for faster builds #2931
Add support for the conan package manager on linux #2875
Propagate build errors up the stack in AFError exceptions #2948 #2957
Improve runtime dependency library loading #2954
Improved cuDNN runtime checks and warnings #2960
Document af_memory_manager_* native memory return values #2911
Add support for cuDNN 8 #2963

Fixes

Bug crash when allocating large arrays #2827
Fix various compiler warnings #2827 #2849 #2872 #2876
Fix minor leaks in OpenCL functions #2913
Various continuous integration related fixes #2819
Fix zero padding with convolv2NN #2820
Fix af_get_memory_pressure_threshold return value #2831
Increased the max filter length for morph
Handle empty array inputs for LU, QR, and Rank functions #2838
Fix FindMKL.cmake script for sequential threading library #2840
Various internal refactoring #2839 #2861 #2864 #2873 #2890 #2891 #2913
Fix OpenCL 2.0 builtin function name conflict #2851
Fix error caused when releasing memory with multiple devices #2867
Fix missing set stacktrace symbol from unified API #2915
Fix zero padding issue in convolve2NN #2820
Fixed bugs in ReduceByKey #2957
Add clblast patch to handle custom context with multiple devices #2967

Contributions

Special thanks to our contributors:
Corentin Schreiber
Jacob Kahn
Paul Jurczak
Christoph Junghans

arrayfire - v3.7.1 Release

Published by umar456 over 4 years ago

v3.7.1

Improvements

Improve mtx download for test data #2742
Improve Documentation #2754 #2792 #2797
Remove verbose messages in older CMake versions #2773
Reduce binary size with the use of NVRTC #2790
Use texture memory to load LUT in orb and fast #2791
Add missing print function for f16 #2784
Add checks for f16 support in the CUDA backend #2784
Create a thrust policy to intercept temporary buffer allocations #2806

Fixes

Fix segfault on exit when ArrayFire is not initialized in the main thread
Fix support for CMake 3.5.1 #2771 #2772 #2760
Fix evalMultiple if the input array sizes aren't the same #2766
Fix error when AF_BACKEND_DEFAULT is passed directly to backend #2769
Workaround name collision with AMD OpenCL implementation #2802
Fix on-exit errors with the unified backend #2769
Fix check for f16 compatibility in OpenCL #2773
Fix matmul on Intel OpenCL when passing same array as input #2774
Fix CPU OpenCL blas batching #2774
Fix memory pressure in the default memory manager #2801

Contributions

Special thanks to our contributors:
padentomasello
glavaux2

arrayfire - v3.7 Release

Published by umar456 over 4 years ago

v3.7.0

Major Updates

Added the ability to customize the memory manager(Thanks jacobkahn and flashlight) [#2461]
Added 16-bit floating point support for several functions [#2413] [#2587] [#2585] [#2587] [#2583]
Added sumByKey, productByKey, minByKey, maxByKey, allTrueByKey, anyTrueByKey, countByKey [#2254]
Added confidence connected components [#2748]
Added neural network based convolution and gradient functions [#2359]
Added a padding function [#2682]
Added pinverse for pseudo inverse [#2279]
Added support for uniform ranges in approx1 and approx2 functions. [#2297]
Added support to write to preallocated arrays for some functions [#2599] [#2481] [#2328] [#2327]
Added meanvar function [#2258]
Add support for sparse-sparse arithmetic support [#2312]
Added rsqrt function for reciprocal square root [#2500]
Added a lower level af_gemm function for general matrix multiplication [#2481]
Added a function to set the cuBLAS math mode for the CUDA backend [#2584]
Separate debug symbols into separate files [#2535]
Print stacktraces on errors [#2632]
Support move constructor for af::array [#2595]
Expose events in the public API [#2461]
Add setAxesLabelFormat to format labels on graphs [#2495]
Added deconvolution functions [#1881]

Improvements

Better error messages for systems with driver or device incompatibilities [#2678] [#2448][#2761]
Optimized unified backend function calls [#2695]
Optimized anisotropic smoothing [#2713]
Optimized canny filter for CUDA and OpenCL [#2727]
Better MKL search script [#2738][#2743][#2745]
Better logging of different submodules in ArrayFire [#2670] [#2669]
Improve documentation [#2665] [#2620] [#2615] [#2639] [#2628] [#2633] [#2622] [#2617] [#2558] [#2326][#2515]
Optimized af::array assignment [#2575]
Update the k-means example to display the result [#2521]

Fixes

Fix multi-config generators [#2736]
Fix access errors in canny [#2727]
Fix segfault in the unified backend if no backends are available [#2720]
Fix access errors in scan-by-key [#2693]
Fix sobel operator [#2600]
Fix an issue with the random number generator and s16 [#2587]
Fix issue with boolean product reduction [#2544]
Fix array_proxy move constructor [#2537]
Fix convolve3 launch configuration [#2519]
Fix an issue where the fft function modified the input array [#2520]
Added a work around for nvidia-opencl runtime if forge dependencies are missing [#2761]

Contributions

Special thanks to our contributors:
@jacobkahn
@WilliamTambellini
@lehins
@r-barnes
@gaika
@ShalokShalom

arrayfire - v3.6.4 Release

Published by umar456 over 5 years ago

v3.6.4

The source code with sub-modules can be downloaded directly from the following link:

http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.4.tar.bz2

Fixes

Address a JIT performance regression due to moving kernel arguments to shared memory #2501
Fix the default parameter for setAxisTitle #2491

arrayfire - v3.6.3 Release

Published by 9prady9 over 5 years ago

v3.6.3

The source code with sub-modules can be downloaded directly from the following link:

http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.3.tar.bz2

Improvements

Graphics are now a runtime dependency instead of a link time dependency #2365
Reduce the CUDA backend binary size using runtime compilation of kernels #2437
Improved batched matrix multiplication on the CPU backend by using Intel MKL's cblas_Xgemm_batched#2206
Print JIT kernels to disk or stream using the AF_JIT_KERNEL_TRACE environment variable #2404
void* pointers are now allowed as arguments to af::array::write() #2367
Slightly improve the efficiency of JITed tile operations #2472
Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
Handled very large JIT tree generations #2484 #2487

Bug Fixes

Fixed af::array::array_proxy move assignment operator #2479
Fixed input array dimensions validation in svdInplace() #2331
Fixed the typedef declaration for window resource handle #2357.
Increase compatibility with GCC 8 #2379
Fixed af::write tests #2380
Fixed a bug in broadcast step of 1D exclusive scan #2366
Fixed OpenGL related build errors on OSX #2382
Fixed multiple array evaluation. Performance improvement. #2384
Fixed buffer overflow and expected output of kNN SSD small test #2445
Fixed MKL linking order to enable threaded BLAS #2444
Added validations for forge module plugin availability before calling resource cleanup #2443
Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
Fix errors on exits when using the cuda backend with unified #2470

Documentation

Updated svdInplace() documentation following a bugfix #2331
Fixed a typo in matrix multiplication documentation #2358
Fixed a code snippet demonstrating C-API use #2406
Updated hamming matcher implementation limitation #2434
Added illustration for the rotate function #2453

Misc

Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
Display a more informative error message if CUDA driver is incompatible #2421 #2448
Changed forge resource management to use smart pointers #2452
Deprecated intl and uintl typedefs in API #2360
Enabled graphics by default for all builds starting with v3.6.3 #2365
Fixed several warnings #2344 #2356 #2361
Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
Refactored void* memory allocations to use unsigned char type #2459
Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
Reorganized and fixed some internal backend API #2356
Updated compilation order of CUDA files to speed up compile time #2368
Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
Marked graphics dependencies as optional in CPack RPM config #2365
Refactored a sparse arithmetic backend API #2379
Fixed const correctness of af_device_array API #2396
Update Forge to v1.0.4 #2466
Manage Forge resources from the DeviceManager class #2381
Fixed non-mkl & non-batch blas upstream call arguments #2401
Link MKL with OpenMP instead of TBB by default
use clang-format to format source code

Contributions

Special thanks to our contributors:
Alessandro Bessi
zhihaoy
Jacob Khan
William Tambellini

arrayfire - v3.6.2

Published by syurkevi almost 6 years ago

v3.6.2

The source code with sub-modules can be downloaded directly from the following link:

http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.2.tar.bz2

Features

Batching support for cond argument in select() [#2243]
Broadcast batching for matmul [#2315]
Add support for multiple nearest neighbours from nearestNeighbour() [#2280]

Improvements

Performance improvements in morph() [#2238]
Fix linking errors when compiling without Freeimage/Graphics [#2248]
Fixes to improve the usage of ArrayFire as a sub-project [#2290]
Allow custom library path for loading dynamic backend libraries [#2302]

Bug fixes

Fix overflow in dim4::ndims. [#2289]
Remove setDevice from af::array destructor [#2319]
Fix pow precision for integral types [#2305]
Fix issues with tile with a large repeat dimension [#2307]
Fix grid based indexing calculation in af_draw_hist [#2230]
Fix bug when using an af::array for indexing [#2311]
Fix CLBlast errors on exit on Windows [#2222]

Documentation

Improve unwrap documentation [#2301]
Improve wrap documentation [#2320]
Fix and improve accum documentation [#2298]
Improve tile documentation [#2293]
Clarify approx* indexing in documentation [#2287]
Update examples of select in detailed documentation [#2277]
Update lookup examples [#2288]
Update set documentation [#2299]

Misc

New ArrayFire ASSERT utility functions [#2249][#2256][#2257][#2263]
Improve error messages in JIT [#2309]
af* library and dependencies directory changed to lib64 [#2186]

Contributions

Thank you to our contributors:
Jacob Kahn
Vardan Akopian

arrayfire - v3.6.1 Release

Published by umar456 over 6 years ago

v 3.6.1

The source code for this release can be downloaded here:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.1.tar.bz2

Improvements

FreeImage is now a run-time dependency [#2164]
Reduced binary size by setting the symbol visibility to hidden [#2168]
Add logging to memory manager and unified loader using the AF_TRACE environment variable [#2169][#2216]
Improved CPU Anisotropic Diffusion performance [#2174]
Perform normalization after FFT for improved accuracy [#2185, #2192]
Updated CLBlast to v1.4.0 [#2178]
Added additional validation when using af::seq for indexing [#2153]
Perform checks for unsupported cards by the CUDA implementation [#2182]
Avoid selecting backend if no devices are found. [#2218]

Bug Fixes

Fixed region when all pixels were the foreground or background [#2152]
Fixed several memory leaks [#2202, #2201, #2180, #2179, #2177, #2175]
Fixed bug in setDevice which didn't allow you to select the last device [#2189]
Fixed bug in min/max where the first element of the array was a NaN value [#2155]
Fixed graphics window indexing [#2207]
Fixed renaming issue when installing cuda libraries on OSX [#2221]
Fixed NSIS installer PATH variable [#2223]

arrayfire - Feature Release v3.6.0

Published by mlloreda over 6 years ago

v3.6.0

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.0.tar.bz2

Major Updates

Added the topk() function. 1
Added batched matrix multiply support.2 3
Added anisotropic diffusion, anisotropicDiffusion().Documentation 3.

Features

Added support for batched matrix multiply. 1 2
New anisotropic diffusion function, anisotropicDiffusion(). Documentation 3.
New topk() function, which returns the top k elements along a given dimension of the input. Documentation. 4
New gradient diffusion example.

Improvements

JITed select() and shift() functions for CUDA and OpenCL backends. 1
Significant CMake improvements. 2 3 4
Improved the quality of the random number generator 5
Corrected assert function calls in select() tests. 5
Modified af_colormap struct to match forge's definition. 6
Improved Black Scholes example. 7
Used CPack to generate installers. 8. We will be using CPack to generate installers beginning with this release.
Refactored black_scholes_options example to use built-in af::erfc function for cumulative normal distribution.9.
Reduced the scope of mutexes in memory manager 10
Official installers do not require the CUDA toolkit to be installed starting with v3.6.0.

Bug fixes

Fixed shfl_down() warnings with CUDA 9. 1
Disabled CUDA JIT debug flags on ARM architecture.2
Fixed CLBLast install lib dir for linux platform where lib directory has arch(64) suffix.3
Fixed assert condition in 3d morph opencl kernel.4
Fixed JIT errors with large non-linear kernels5
Fixed bug in CPU JIT after moddims was called 5
Fixed a deadlock scenario caused by the method MemoryManager::nativeFree6

Documentation

Fixed variable name typo in vectorization.md. 1
Fixed AF_API_VERSION value in Doxygen config file. 2

Known issues

NVCC does not currently support platform toolset v141 (Visual Studio 2017 R15.6). Use the v140 platform toolset, instead. You may pass in the toolset version to CMake via the -T flag like so cmake -G "Visual Studio 15 2017 Win64" -T v140.
- To download and install other platform toolsets, visit https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017
Several OpenCL tests failing on OSX:
- canny_opencl, fft_opencl, gen_assign_opencl, homography_opencl, reduce_opencl, scan_by_key_opencl, solve_dense_opencl, sparse_arith_opencl, sparse_convert_opencl, where_opencl

Contributions

Special thanks to our contributors:
Adrien F. Vincent, Cedric Nugteren, Felix, Filip Matzner, HoneyPatouceul, Patrick Lavin, Ralf Stubner, William Tambellini

arrayfire - First Bugfix Release for v3.5

Published by mlloreda about 7 years ago

v3.5.1

The source code with submodules can be downloaded directly from the following
link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Improvements

Relaxed af::unwrap() function's arguments. 1
Changed behavior of af::array::allocated() to specify memory allocated. 1
Removed restriction on the number of bins for af::histogram() on CUDA and
OpenCL kernels. 1

Performance

Improved JIT performance. 1
Improved CPU element-wise operation performance. 1
Improved regions performance using texture objects. 1

Bug fixes

Fixed overflow issues in mean. 1
Fixed memory leak when chaining indexing operations. 1
Fixed bug in array assignment when using an empty array to index. 1
Fixed bug with af::matmul() which occured when its RHS argument was an
indexed vector. 1
Fixed bug deadlock bug when sparse array was used with a JIT Array. 1
Fixed pixel tests for FAST kernels. 1
Fixed af::replace so that it is now copy-on-write. 1
Fixed launch configuration issues in CUDA JIT. 1
Fixed segfaults and "Pure Virtual Call" error warnings when exiting on
Windows. 1 2
Workaround for clEnqueueReadBuffer bug on OSX.
1

Build

Fixed issues when compiling with GCC 7.1. 1 2
Eliminated unnecessary Boost dependency from CPU and CUDA backends. 1

Misc

Updated support links to point to Slack instead of Gitter. 1

arrayfire - Feature Release v3.5.0

Published by mlloreda over 7 years ago

v3.5.0

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.0.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Major Updates

ArrayFire now supports threaded applications. 1
Added Canny edge detector. 1
Added Sparse-Dense arithmetic operations. 1

Features

ArrayFire Threading
- af::array can be read by multiple threads
- All ArrayFire functions can be executed concurrently by multiple threads
- Threads can operate on different devices to simplify Muli-device workloads
New Canny edge detector function, af::canny(). 1
- Can automatically calculate high threshold with AF_CANNY_THRESHOLD_AUTO_OTSU
- Supports both L1 and L2 Norms to calculate gradients
New tuned OpenCL BLAS backend, CLBlast.

Improvements

Converted CUDA JIT to use NVRTC instead of NVVM.
Performance improvements in af::reorder(). 1
Performance improvements in array::scalar(). 1
Improved unified backend performance. 1
ArrayFire now depends on Forge v1.0. 1
Can now specify the FFT plan cache size using the af::setFFTPlanCacheSize() function.
Get the number of physical bytes allocated by the memory manager af_get_allocated_bytes(). 1
af::dot() can now return a scalar value to the host. 1

Bug Fixes

Fixed improper release of default Mersenne random engine. 1
Fixed af::randu() and af::randn() ranges for floating point types. 1
Fixed assignment bug in CPU backend. 1
Fixed complex (c32,c64) multiplication in OpenCL convolution kernels. 1
Fixed inconsistent behavior with af::replace() and replace_scalar(). 1
Fixed memory leak in af_fir(). 1
Fixed memory leaks in af_cast for sparse arrays. 1
Fixing correctness of af_pow for complex numbers by using Cartesian form. 1
Corrected af::select() with indexing in CUDA and OpenCL backends. 1
Workaround for VS2015 compiler ternary bug. 1
Fixed memory corruption in cuda::findPlan(). 1
Argument checks in af_create_sparse_array avoids inputs of type int64. 1

Build fixes

On OSX, utilize new GLFW package from the brew package manager. 1 2
Fixed CUDA PTX names generated by CMake v3.7. 1
Support gcc > 5.x for CUDA. 1

Examples

New genetic algorithm example. 1

Documentation

Updated README.md to improve readability and formatting. 1
Updated README.md to mention Julia and Nim wrappers. 1
Improved installation instructions - docs/pages/install.md. 1

Miscellaneous

A few improvements for ROCm support. 1
Removed CUDA 6.5 support. 1

Known issues

Windows
- The Windows NVIDIA driver version 37x.xx contains a bug which causes fftconvolve_opencl to fail. Upgrade or downgrade to a different version of the driver to avoid this failure.
- The following tests fail on Windows with NVIDIA hardware: threading_cuda,qr_dense_opencl, solve_dense_opencl.
macOS
- The Accelerate framework, used by the CPU backend on macOS, leverages Intel graphics cards (Iris) when there are no discrete GPUs available. This OpenCL implementation is known to give incorrect results on the following tests: lu_dense_{cpu,opencl}, solve_dense_{cpu,opencl}, inverse_dense_{cpu,opencl}.
- Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due to inconsistent driver behavior: fft_large_cuda and svd_dense_cuda.
- The following tests are currently failing on macOS with AMD GPUs: cholesky_dense_opencl and scan_by_key_opencl.

arrayfire - Second Bugfix Release for v3.4

Published by shehzan10 almost 8 years ago

v3.4.2

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.2.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Deprecation Announcement

This release supports CUDA 6.5 and higher. The next ArrayFire release will
support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no
longer supporting CUDA 6.5 include:

CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which
is used by ArrayFire's CPU and OpenCL backends.
Very few ArrayFire users still use CUDA 6.5.

As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in
the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to
have full capability with ArrayFire.

Docker

ArrayFire has been Dockerized.

Improvements

Implemented sparse storage format conversions between AF_STORAGE_CSR
and AF_STORAGE_COO.
1
- Directly convert between AF_STORAGE_COO <--> AF_STORAGE_CSR
  using the af::sparseConvertTo() function.
- af::sparseConvertTo() now also supports converting to dense.
Added cast support for sparse arrays.
1
- Casting only changes the values array and the type. The row and column
  index arrays are not changed.
Reintroduced automated computation of chart axes limits for graphics functions.
1
- The axes limits will always be the minimum/maximum of the current and new
  limit.
- The user can still set limits from API calls. If the user sets a limit
  from the API call, then the automatic limit setting will be disabled.
Using boost::scoped_array instead of boost::scoped_ptr when managing
array resources.
1
Internal performance improvements to getInfo() by using const references
to avoid unnecessary copying of ArrayInfo objects.
1
Added support for scalar af::array inputs for af::convolve() and
set functions.
1
2
3
Performance fixes in af::fftConvolve() kernels.
1
2

Build

Support for Visual Studio 2015 compilation.
1
2
Fixed FindCBLAS.cmake when PkgConfig is used.
1

Bug fixes

Fixes to JIT when tree is large.
1
2
Fixed indexing bug when converting dense to sparse af::array as
AF_STORAGE_COO.
1
Fixed af::bilateral() OpenCL kernel compilation on OS X.
1
Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr().
1
2
3

Installers

Major OS X installer fixes.
1
- Fixed installation scripts.
- Fixed installation symlinks for libraries.
Windows installer now ships with more pre-built examples.

Examples

Added af::choleskyInPlace() calls to cholesky.cpp example.
1

Documentation

Added u8 as supported data type in getting_started.md.
1
Fixed typos.
1

CUDA 8 on OSX

CUDA 8.0.55 supports Xcode 8.
1

Known Issues

Known failures with CUDA 6.5. These include all functions that use
sorting. As a result, sparse storage format conversion between
AF_STORAGE_COO and AF_STORAGE_CSR has been disabled for CUDA 6.5.

arrayfire - First Bugfix Release for v3.4

Published by shehzan10 about 8 years ago

v3.4.1

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.1.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Installers

Installers for Linux, OS X and Windows
- CUDA backend now uses CUDA 8.0.
- Uses Intel MKL 2017.
- CUDA Compute 2.x (Fermi) is no longer compiled into the library.
Installer for OS X
- The libraries shipping in the OS X Installer are now compiled with Apple
  Clang v7.3.1 (previouly v6.1.0).
- The OS X version used is 10.11.6 (previously 10.10.5).
Installer for Jetson TX1 / Tegra X1
- Requires JetPack for L4T 2.3
  (containing Linux for Tegra r24.2 for TX1).
- CUDA backend now uses CUDA 8.0 64-bit.
- Using CUDA's cusolver instead of CPU fallback.
- Uses OpenBLAS for CPU BLAS.
- All ArrayFire libraries are now 64-bit.

Improvements

Add sparse array support to af::eval().
1
Add OpenCL-CPU fallback support for sparse af::matmul() when running on
a unified memory device. Uses MKL Sparse BLAS.
When using CUDA libdevice, pick the correct compute version based on device.
1
OpenCL FFT now also supports prime factors 7, 11 and 13.
1
2

Bug Fixes

Allow CUDA libdevice to be detected from custom directory.
Fix aarch64 detection on Jetson TX1 64-bit OS.
1
Add missing definition of af_set_fft_plan_cache_size in unified backend.
1
Fix intial values for af::min() and af::max() operations.
1
2
Fix distance calculation in af::nearestNeighbour for CUDA and OpenCL backend.
1
2
Fix OpenCL bug where scalars where are passed incorrectly to compile options.
1
Fix bug in af::Window::surface() with respect to dimensions and ranges.
1
Fix possible double free corruption in af_assign_seq().
1
Add missing eval for key in af::scanByKey in CPU backend.
1
Fixed creation of sparse values array using AF_STORAGE_COO.
1
1

Examples

Add a Conjugate Gradient solver example
to demonstrate sparse and dense matrix operations.
1

CUDA Backend

When using CUDA 8.0,
compute 2.x are no longer in default compute list.
- This follows CUDA 8.0
  deprecating computes 2.x.
- Default computes for CUDA 8.0 will be 30, 50, 60.
When using CUDA pre-8.0, the default selection remains 20, 30, 50.
CUDA backend now uses -arch=sm_30 for PTX compilation as default.
- Unless compute 2.0 is enabled.

Known Issues

af::lu() on CPU is known to give incorrect results when built run on
OS X 10.11 or 10.12 and compiled with Accelerate Framework.
1
- Since the OS X Installer libraries uses MKL rather than Accelerate
  Framework, this issue does not affect those libraries.

arrayfire - Feature Release v3.4.0

Published by shehzan10 about 8 years ago

v3.4.0

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.0.tar.bz2

Installer CUDA Version: 7.5 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Major Updates

[Sparse Matrix and BLAS](ref sparse_func). 1
2
Faster JIT for CUDA and OpenCL. 1
2
Support for [random number generator engines](ref af::randomEngine).
1
2
Improvements to graphics. 1
2

Features

[Sparse Matrix and BLAS](ref sparse_func) 1
2
- Support for [CSR](ref AF_STORAGE_CSR) and [COO](ref AF_STORAGE_COO)
  [storage types](ref af_storage).
- Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication as a
  part of af::matmul() using AF_STORAGE_CSR format for sparse.
- Conversion to and from [dense](ref AF_STORAGE_DENSE) matrix to [CSR](ref AF_STORAGE_CSR)
  and [COO](ref AF_STORAGE_COO) [storage types](ref af_storage).
Faster JIT 1
2
- Performance improvements for CUDA and OpenCL JIT functions.
- Support for evaluating multiple outputs in a single kernel. See af::array::eval() for more.
[Random Number Generation](ref af::randomEngine)
1
2
- af::randomEngine(): A random engine class to handle setting the type and seed
  for random number generator engines.
- Supported engine types are:
Graphics 1
2
- Using Forge v0.9.0
- [Vector Field](ref af::Window::vectorField) plotting functionality.
  1
- Removed GLEW and replaced with glbinding.
  - Removed usage of GLEW after support for MX (multithreaded) was dropped in v2.0.
    1
- Multiple overlays on the same window are now possible.
  - Overlays support for same type of object (2D/3D)
  - Supported by af::Window::plot, af::Window::hist, af::Window::surface,
    af::Window::vectorField.
- New API to set axes limits for graphs.
  - Draw calls do not automatically compute the limits. This is now under user control.
  - af::Window::setAxesLimits can be used to set axes limits automatically or manually.
  - af::Window::setAxesTitles can be used to set axes titles.
- New API for plot and scatter:
  - af::Window::plot() and af::Window::scatter() now can handle 2D and 3D and determine appropriate order.
  - af_draw_plot_nd()
  - af_draw_plot_2d()
  - af_draw_plot_3d()
  - af_draw_scatter_nd()
  - af_draw_scatter_2d()
  - af_draw_scatter_3d()
New [interpolation methods](ref af_interp_type)
1
- Applies to
  - af::resize()
  - af::transform()
  - af::approx1()
  - af::approx2()
Support for [complex mathematical functions](ref mathfunc_mat)
1
- Add complex support for trig_mat, af::sqrt(), af::log().
af::medfilt1(): Median filter for 1-d signals 1
Generalized scan functions: scan_func_scan and scan_func_scanbykey
- Now supports inclusive or exclusive scans
- Supports binary operations defined by af_binary_op.
  1
[Image Moments](ref moments_mat) functions
1
Add af::getSizeOf() function for af_dtype
1
Explicitly extantiate af::array::device() for `void *
1

Bug Fixes

Fixes to edge-cases in morph_mat. 1
Makes JIT tree size consistent between devices. 1
Delegate higher-dimension in convolve_mat to correct dimensions. 1
Indexing fixes with C++11. 1 2
Handle empty arrays as inputs in various functions. 1
Fix bug when single element input to af::median. 1
Fix bug in calculation of time from af::timeit(). 1
Fix bug in floating point numbers in af::seq. 1
Fixes for OpenCL graphics interop on NVIDIA devices.
1
Fix bug when compiling large kernels for AMD devices.
1
Fix bug in af::bilateral when shared memory is over the limit.
1
Fix bug in kernel header compilation tool bin2cpp.
1
Fix inital values for morph_mat functions.
1
Fix bugs in af::homography() CPU and OpenCL kernels.
1
Fix bug in CPU TNJ.
1

Improvements

CUDA 8 and compute 6.x(Pascal) support, current installer ships with CUDA 7.5. 1 2 3
User controlled FFT plan caching. 1
CUDA performance improvements for image_func_wrap, image_func_unwrap and approx_mat.
1
Fallback for CUDA-OpenGL interop when no devices does not support OpenGL.
1
Additional forms of batching with the transform_func_transform functions.
New behavior defined here.
1
Update to OpenCL2 headers. 1
Support for integration with external OpenCL contexts. 1
Performance improvements to interal copy in CPU Backend.
1
Performance improvements to af::select and af::replace CUDA kernels.
1
Enable OpenCL-CPU offload by default for devices with Unified Host Memory.
1
- To disable, use the environment variable AF_OPENCL_CPU_OFFLOAD=0.

Build

Compilation speedups. 1
Build fixes with MKL. 1
Error message when CMake CUDA Compute Detection fails. 1
Several CMake build issues with Xcode generator fixed.
1 2
Fix multiple OpenCL definitions at link time. 1
Fix lapacke detection in CMake. 1
Update build tags of
- clBLAS
- clFFT
- Boost.Compute
- Forge
- glbinding
Fix builds with GCC 6.1.1 and GCC 5.3.0. 1

Installers

All installers now ship with ArrayFire libraries build with MKL 2016.
All installers now ship with Forge development files and examples included.
CUDA Compute 2.0 has been removed from the installers. Please contact us
directly if you have a special need.

Examples

Added [example simulating gravity](ref graphics/field.cpp) for
demonstration of vector field.
Improvements to financial/black_scholes_options.cpp example.
Improvements to graphics/gravity_sim.cpp example.
Fix graphics examples to use af::Window::setAxesLimits and
af::Window::setAxesTitles functions.