ILGPU | Cuda Ecosystem Directory

Bot releases are hidden (Show)

ILGPU - Release v0.2.0 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

Added convenient kernel loading and caching to accelerator classes.
Added properties to query the maximum number of threads of an accelerator.
Added Disposed event to Accelerator.
Added support for Cuda 9.0.
Added new cross-platform Cuda API.
Added integer-division operators to GPUMath.
Added RadToDeg and DegToRad conversion methods to GPUMath.
Added support for .Net Core 2.0.
Removed LLVMSharp dependency.
Enhanced SSA code generation quality.
Fixed invalid constant generation of padded structures.
Updated CompilerServices.Unsafe dependency to version 4.4.0.

ILGPU - Release v0.1.4 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

Fixed invalid code generation of float-based Atomic.Min/Atomic.Max functions.
Added support for nullable types in kernels.
Fixed invalid return value of atomic add in CPU mode.
Fixed invalid resolving of generic virtual methods in IL frontend.

ILGPU - Release v0.1.3 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

Fixed critical thread-divergence issues in CPUAccelerator.
Added additional checks to avoid group-barrier functions in implicitly grouped kernels.
Fixed critical issue in kernel-launcher code generation in CPUAccelerator.
Fixed invalid loading of double constants in Force32BitFloats mode.
Fixed invalid code generation of some math intrinsics (Atan, Atan2, Pow).
Fixed invalid view dimensions in ArrayView<T> getters.

ILGPU - Release v0.1.2 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

Added atomics for index types.
Added new debug views for generic array views in CPU mode.
Added additional operators to index types.
Added min/max functions to index types.
Added new clamp functions to GPUMath.
Added support for IntPtr.ToPointer functions.
Enhanced reduction interface functionality.
Fixed invalid ArgumentOfOfRangeException-check in MemoryBuffer.
Fixed critical issue in slice and row views of ArrayView3D<T> and ArrayView2D<T> structures.
Removed internal IL-assembly based intrinsics using the official Unsafe package.

ILGPU - Release v0.10.0-beta1

Published by m4rs-mt almost 4 years ago

This new beta version offers significant performance improvements of the generated kernel programs and a set of new features (get the Nuget package).

Graduated different optimizations from O2 to O1 (release mode) to improve performance in release builds using an additional of stable optimization passes (#344).
Graduated O2 optimizations in the Cuda backend to O1 pipeline to generate vectorized IO operations in release builds (#350).
Added support for printf-like output in Kernels for CPU, Cuda and OpenCL accelerators (#342).
Added new utility Launch/LaunchAutoGrouped methods to immediately launch kernels using a separate strong-reference cache (#336).
Added new AlignTo alignment methods to explicitly align ArrayView instances to a particular alignment in bytes (#316).
Added enhanced support for local memory via a new LocalMemory class (#316).
Added support for several PopCount, CLZ and CTZ operations (#324).
Added new MemSet functions to all memory buffers (#338).
Added new IfConditionalConversion to fold nested and-also and or-else block chains to O2 pipeline (#328).
Added new local memory optimizations to simplify array accesses (#317).
Added simple 64-bit-based LongGlobalIndex helper to simplify correct computations using 64-bit integers (#337).
Added new CLPlatformVersion and fixed OpenCL 1.2 compatibility issues (#335).
Fixed invalid DebugArrayView implementations (#345).
Fixed invalid initializations of local memory arrays (#287).

Special thanks to @MoFtZ and @jgiannuzzi for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.2

Published by m4rs-mt almost 4 years ago

The new stable version offers significant performance improvements of the generated kernel programs (get the Nuget package).

Added new convenience Launch methods to Accelerator class to launch kernels without pre-loading/compiling them (#319).
Changed default inling behavior to AggressiveInlining to improve performance of (usually) performance critical GPU programs (#294).
Significantly improved performance of Cuda programs in many cases using a new control-flow scheduling algorithm that can be enabled via O2 or the flag ContextFlags.EnhancedPTXBackendFeatures (#274, #303).
Added support for RTX 30xx cards (#302, #305, #311).
Added support for tuple-types in kernel functions (#266).
Added support for Span<T> in the scope of MemoryBuffer copy operations (#122, #276).
Added new Capability API to enable specific extensions in the scope of OpenCL programs and to provide better error messages (#103, #279).
Added new arithmetic simplifications to enhance the optimization potential of the ILGPU optimization pipeline (#278, #283).
Added support for unrolling of loop nests to improve performance (#281).
Added new loop invariant code motion (LICM) code transformation to reduce the code size and enable more aggressive optimizations in O2 mode (#291).
Enhanced alignment of local and shared-memory allocations in the PTX backend to emit fast vectorized instructions in a huge variety of additional cases (#304).
Improved alignment of padding in fixed-size structures (#315).
Fixed invalid Unix OpenCL library names (#327).
Fixed calling ambiguous OpenCL 64-bit atomic functions (#321).
Fixed invalid unrolling of loops in some cases (#292).
Fixed invalid loading of unsigned fields from structures (#314).
Fixed invalid handling of FP16 types on unsupported devices (#312).
Fixed invalid constant folding of LHS constants in compare operations (#326).

Major internal changes:

Enhanced unreachable code elimination to be compatible with the latest optimization pipeline (#300).
Fixed invalid detection of entry and exit blocks in Loop analysis (#293).
Added additional debugging capabilities via new dumper methods (#282).

Special thanks to @MoFtZ for his contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.2-beta1

Published by m4rs-mt almost 4 years ago

This new beta version offers significant performance improvements of the generated kernel programs (get the Nuget package).

Changed default inling behavior to AggressiveInlining to improve performance of (usually) performance critical GPU programs (#294).
Significantly improved performance of Cuda programs in many cases using a new control-flow scheduling algorithm that can be enabled via O2 or the flag ContextFlags.EnhancedPTXBackendFeatures (#274, #303).
Added support for RTX 30xx cards (#302, #305).
Added support for tuple-types in kernel functions (#266).
Added support for Span<T> in the scope of MemoryBuffer copy operations (#122, #276).
Added new Capability API to enable specific extensions in the scope of OpenCL programs and to provide better error messages (#103, #279).
Added new arithmetic simplifications to enhance the optimization potential of the ILGPU optimization pipeline (#278, #283).
Added support for unrolling of loop nests to improve performance (#281).
Added new loop invariant code motion (LICM) code transformation to reduce the code size and enable more aggressive optimizations in O2 mode (#291).
Enhanced alignment of local and shared-memory allocations in the PTX backend to emit fast vectorized instructions in a huge variety of additional cases (#304).
Fixed invalid unrolling of loops in some cases (#292).

Major internal changes:

Enhanced unreachable code elimination to be compatible with the latest optimization pipeline (#300).
Fixed invalid detection of entry and exit blocks in Loop analysis (#293).
Added additional debugging capabilities via new dumper methods (#282).

Special thanks to @MoFtZ for his contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.1

Published by m4rs-mt about 4 years ago

The new stable version offers significant performance improvements of the generated kernel programs (get the Nuget package).

Added initial loop unrolling capabilities for innermost loops (#259).
Added new address-space specializer to infer the actual address spaces of memory accesses (#247).
Added several code simplification techniques to improve generated kernel programs (#268, #270, #271).
Added support for FP16x2 (Half2) types (#273).
Added support for non-capturing lambda kernels (#186).
Added additional copy operations to ExchangeBuffer (#255).
Enhanced generation of vectorized IO instructions in the PTX backend using new alignment rules (#247, #260).
Fixed invalid accelerator synchronization in OpenCL (#246).
Fixed invalid sign extension of byte and ushort values in the context of method calls (#239).
Fixed invalid handling of unsafe array buffers in several cases (#262, #263, #285).

Major internal changes:

Added new enhanced loop-analyses classes to get detailed insights about loops in ILGPU programs (#259).
Refactored the internal static-program analysis framework (#247).
Updated native DLL-interop API (#249).
Fixed code analysis warnings (#248).

Special thanks to @MoFtZ, @Yey007 and @LxBos for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.1-beta1

Published by m4rs-mt about 4 years ago

This new beta version offers significant performance improvements of the generated kernel programs (get the Nuget package).

Added initial loop unrolling capabilities for innermost loops (#259).
Added new address-space specializer to infer the actual address spaces of memory accesses (#247).
Added several code simplification techniques to improve generated kernel programs (#268, #270, #271).
Added support for FP16x2 (Half2) types (#273).
Added support for non-capturing lambda kernels (#186).
Added additional copy operations to ExchangeBuffer (#255).
Enhanced generation of vectorized IO instructions in the PTX backend using new alignment rules (#247, #260).
Fixed invalid accelerator synchronization in OpenCL (#246).
Fixed invalid sign extension of byte and ushort values in the context of method calls (#239).
Fixed invalid handling of unsafe array buffers in several cases (#262, #263).

Major internal changes:

Added new enhanced loop-analyses classes to get detailed insights about loops in ILGPU programs (#259).
Refactored the internal static-program analysis framework (#247).
Updated native DLL-interop API (#249).
Fixed code analysis warnings (#248).

Special thanks to @MoFtZ, @Yey007 and @LxBos for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

Package Rankings

Top 9.7% on Proxy.golang.org