ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs

OTHER License

Stars
1.3K
Committers
39

Bot releases are hidden (Show)

ILGPU - Release v0.2.0 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

  • Added convenient kernel loading and caching to accelerator classes.
  • Added properties to query the maximum number of threads of an accelerator.
  • Added Disposed event to Accelerator.
  • Added support for Cuda 9.0.
  • Added new cross-platform Cuda API.
  • Added integer-division operators to GPUMath.
  • Added RadToDeg and DegToRad conversion methods to GPUMath.
  • Added support for .Net Core 2.0.
  • Removed LLVMSharp dependency.
  • Enhanced SSA code generation quality.
  • Fixed invalid constant generation of padded structures.
  • Updated CompilerServices.Unsafe dependency to version 4.4.0.
ILGPU - Release v0.1.4 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

  • Fixed invalid code generation of float-based Atomic.Min/Atomic.Max functions.
  • Added support for nullable types in kernels.
  • Fixed invalid return value of atomic add in CPU mode.
  • Fixed invalid resolving of generic virtual methods in IL frontend.
ILGPU - Release v0.1.3 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

  • Fixed critical thread-divergence issues in CPUAccelerator.
  • Added additional checks to avoid group-barrier functions in implicitly grouped kernels.
  • Fixed critical issue in kernel-launcher code generation in CPUAccelerator.
  • Fixed invalid loading of double constants in Force32BitFloats mode.
  • Fixed invalid code generation of some math intrinsics (Atan, Atan2, Pow).
  • Fixed invalid view dimensions in ArrayView<T> getters.
ILGPU - Release v0.1.2 [LLVM based]

Published by m4rs-mt almost 4 years ago

Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5 to use the platform independent ILGPU compiler version.

  • Added atomics for index types.
  • Added new debug views for generic array views in CPU mode.
  • Added additional operators to index types.
  • Added min/max functions to index types.
  • Added new clamp functions to GPUMath.
  • Added support for IntPtr.ToPointer functions.
  • Enhanced reduction interface functionality.
  • Fixed invalid ArgumentOfOfRangeException-check in MemoryBuffer.
  • Fixed critical issue in slice and row views of ArrayView3D<T> and ArrayView2D<T> structures.
  • Removed internal IL-assembly based intrinsics using the official Unsafe package.
ILGPU - Release v0.10.0-beta1

Published by m4rs-mt almost 4 years ago

This new beta version offers significant performance improvements of the generated kernel programs and a set of new features (get the Nuget package).

  • Graduated different optimizations from O2 to O1 (release mode) to improve performance in release builds using an additional of stable optimization passes (#344).
  • Graduated O2 optimizations in the Cuda backend to O1 pipeline to generate vectorized IO operations in release builds (#350).
  • Added support for printf-like output in Kernels for CPU, Cuda and OpenCL accelerators (#342).
  • Added new utility Launch/LaunchAutoGrouped methods to immediately launch kernels using a separate strong-reference cache (#336).
  • Added new AlignTo alignment methods to explicitly align ArrayView instances to a particular alignment in bytes (#316).
  • Added enhanced support for local memory via a new LocalMemory class (#316).
  • Added support for several PopCount, CLZ and CTZ operations (#324).
  • Added new MemSet functions to all memory buffers (#338).
  • Added new IfConditionalConversion to fold nested and-also and or-else block chains to O2 pipeline (#328).
  • Added new local memory optimizations to simplify array accesses (#317).
  • Added simple 64-bit-based LongGlobalIndex helper to simplify correct computations using 64-bit integers (#337).
  • Added new CLPlatformVersion and fixed OpenCL 1.2 compatibility issues (#335).
  • Fixed invalid DebugArrayView implementations (#345).
  • Fixed invalid initializations of local memory arrays (#287).

Special thanks to @MoFtZ and @jgiannuzzi for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.2

Published by m4rs-mt almost 4 years ago

The new stable version offers significant performance improvements of the generated kernel programs (get the Nuget package).

  • Added new convenience Launch methods to Accelerator class to launch kernels without pre-loading/compiling them (#319).
  • Changed default inling behavior to AggressiveInlining to improve performance of (usually) performance critical GPU programs (#294).
  • Significantly improved performance of Cuda programs in many cases using a new control-flow scheduling algorithm that can be enabled via O2 or the flag ContextFlags.EnhancedPTXBackendFeatures (#274, #303).
  • Added support for RTX 30xx cards (#302, #305, #311).
  • Added support for tuple-types in kernel functions (#266).
  • Added support for Span<T> in the scope of MemoryBuffer copy operations (#122, #276).
  • Added new Capability API to enable specific extensions in the scope of OpenCL programs and to provide better error messages (#103, #279).
  • Added new arithmetic simplifications to enhance the optimization potential of the ILGPU optimization pipeline (#278, #283).
  • Added support for unrolling of loop nests to improve performance (#281).
  • Added new loop invariant code motion (LICM) code transformation to reduce the code size and enable more aggressive optimizations in O2 mode (#291).
  • Enhanced alignment of local and shared-memory allocations in the PTX backend to emit fast vectorized instructions in a huge variety of additional cases (#304).
  • Improved alignment of padding in fixed-size structures (#315).
  • Fixed invalid Unix OpenCL library names (#327).
  • Fixed calling ambiguous OpenCL 64-bit atomic functions (#321).
  • Fixed invalid unrolling of loops in some cases (#292).
  • Fixed invalid loading of unsigned fields from structures (#314).
  • Fixed invalid handling of FP16 types on unsupported devices (#312).
  • Fixed invalid constant folding of LHS constants in compare operations (#326).

Major internal changes:

  • Enhanced unreachable code elimination to be compatible with the latest optimization pipeline (#300).
  • Fixed invalid detection of entry and exit blocks in Loop analysis (#293).
  • Added additional debugging capabilities via new dumper methods (#282).

Special thanks to @MoFtZ for his contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.2-beta1

Published by m4rs-mt almost 4 years ago

This new beta version offers significant performance improvements of the generated kernel programs (get the Nuget package).

  • Changed default inling behavior to AggressiveInlining to improve performance of (usually) performance critical GPU programs (#294).
  • Significantly improved performance of Cuda programs in many cases using a new control-flow scheduling algorithm that can be enabled via O2 or the flag ContextFlags.EnhancedPTXBackendFeatures (#274, #303).
  • Added support for RTX 30xx cards (#302, #305).
  • Added support for tuple-types in kernel functions (#266).
  • Added support for Span<T> in the scope of MemoryBuffer copy operations (#122, #276).
  • Added new Capability API to enable specific extensions in the scope of OpenCL programs and to provide better error messages (#103, #279).
  • Added new arithmetic simplifications to enhance the optimization potential of the ILGPU optimization pipeline (#278, #283).
  • Added support for unrolling of loop nests to improve performance (#281).
  • Added new loop invariant code motion (LICM) code transformation to reduce the code size and enable more aggressive optimizations in O2 mode (#291).
  • Enhanced alignment of local and shared-memory allocations in the PTX backend to emit fast vectorized instructions in a huge variety of additional cases (#304).
  • Fixed invalid unrolling of loops in some cases (#292).

Major internal changes:

  • Enhanced unreachable code elimination to be compatible with the latest optimization pipeline (#300).
  • Fixed invalid detection of entry and exit blocks in Loop analysis (#293).
  • Added additional debugging capabilities via new dumper methods (#282).

Special thanks to @MoFtZ for his contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.1

Published by m4rs-mt about 4 years ago

The new stable version offers significant performance improvements of the generated kernel programs (get the Nuget package).

  • Added initial loop unrolling capabilities for innermost loops (#259).
  • Added new address-space specializer to infer the actual address spaces of memory accesses (#247).
  • Added several code simplification techniques to improve generated kernel programs (#268, #270, #271).
  • Added support for FP16x2 (Half2) types (#273).
  • Added support for non-capturing lambda kernels (#186).
  • Added additional copy operations to ExchangeBuffer (#255).
  • Enhanced generation of vectorized IO instructions in the PTX backend using new alignment rules (#247, #260).
  • Fixed invalid accelerator synchronization in OpenCL (#246).
  • Fixed invalid sign extension of byte and ushort values in the context of method calls (#239).
  • Fixed invalid handling of unsafe array buffers in several cases (#262, #263, #285).

Major internal changes:

  • Added new enhanced loop-analyses classes to get detailed insights about loops in ILGPU programs (#259).
  • Refactored the internal static-program analysis framework (#247).
  • Updated native DLL-interop API (#249).
  • Fixed code analysis warnings (#248).

Special thanks to @MoFtZ, @Yey007 and @LxBos for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.1-beta1

Published by m4rs-mt about 4 years ago

This new beta version offers significant performance improvements of the generated kernel programs (get the Nuget package).

  • Added initial loop unrolling capabilities for innermost loops (#259).
  • Added new address-space specializer to infer the actual address spaces of memory accesses (#247).
  • Added several code simplification techniques to improve generated kernel programs (#268, #270, #271).
  • Added support for FP16x2 (Half2) types (#273).
  • Added support for non-capturing lambda kernels (#186).
  • Added additional copy operations to ExchangeBuffer (#255).
  • Enhanced generation of vectorized IO instructions in the PTX backend using new alignment rules (#247, #260).
  • Fixed invalid accelerator synchronization in OpenCL (#246).
  • Fixed invalid sign extension of byte and ushort values in the context of method calls (#239).
  • Fixed invalid handling of unsafe array buffers in several cases (#262, #263).

Major internal changes:

  • Added new enhanced loop-analyses classes to get detailed insights about loops in ILGPU programs (#259).
  • Refactored the internal static-program analysis framework (#247).
  • Updated native DLL-interop API (#249).
  • Fixed code analysis warnings (#248).

Special thanks to @MoFtZ, @Yey007 and @LxBos for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.