ILGPU JIT Compiler for high-performance .Net GPU programs
OTHER License
Bot releases are visible (Hide)
Published by m4rs-mt over 3 years ago
The new stable version contains several bug fixes and improves the code quality of the generated kernel programs (get the Nuget package).
It is strongly recommended to upgrade to this version as soon as possible to avoid known bugs and some CPU-buffer deallocation issues.
Special thanks to @MoFtZ, @marcin-krystianc and @jgiannuzzi for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.
Published by m4rs-mt over 3 years ago
The new stable version offers significant performance improvements of the generated kernel programs and contains critical resource deallocation fixes (get the Nuget package).
It is strongly recommended to upgrade to this version as soon as possible to avoid resource and GC related deallocation issues.
ExchangeBuffer
class has been changed to avoid exposing internal memory buffers. If you previously relied on the immediate inheritance from ExchangeBufferBase
on MemoryBuffer
, you have to adapt your program to use the intermediate base class MemoryBuffer<T, TIndex>
instead (see diff).MemoryBufferXD
classes have been removed to avoid ownership related GC-free issues (see diff).We have decided to remove dangerous properties from several memory buffer classes. The use of these properties can lead to program crashes, since buffers could be disposed asynchronously in the background by the GC without further notice.
O2
to O1
(release mode) to improve performance in release builds using an additional of stable optimization passes (#344).Cuda
backend to O1
pipeline to generate vectorized IO operations in release builds (#350).sizeof
IL instruction (#380).PrintInformation
method to Accelerator
instances to print detailed accelerator information (#389).ArrayView
accesses on GPU devices (Use flag ContextFlags.EnableAsserations
or attach a debugger to your application to enable assertion checks. Make sure to use the portable
debug information format for detailed source location information) (#375).CPU
, Cuda
and OpenCL
accelerators (#342).AlignTo
alignment methods to explicitly align ArrayView
instances to a particular alignment in bytes (#316).LocalMemory
class (#316).PopCount
, CLZ
and CTZ
operations (#324).MemSet
functions to all memory buffers (#338).O2
pipeline (#328).LongGlobalIndex
helper to simplify correct computations using 64-bit integers (#337).CLPlatformVersion
and fixed OpenCL 1.2 compatibility issues (#335).SharedMemory
in implicitly grouped kernels (#354).CudaAccelerator
and CLAccelerator
instances to run on non-native OS .NET versions (#396).LoopUnrolling
(#373).printf
formats for int64
and uintX
types (#391).DebugArrayView
implementations (#345).RuntimeSystem
to avoid concurrency/reflection-API issues (#393).Special thanks to @MoFtZ, @Ruberik and @jgiannuzzi for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.
Published by m4rs-mt over 3 years ago
This new beta version offers important bug fixes and performance improvements of the generated kernel programs and a set of new features (get the Nuget package).
sizeof
IL instruction (#380).PrintInformation
method to Accelerator
instances to print detailed accelerator information (#389).ArrayView
accesses on GPU devices (Use flag ContextFlags.EnableAsserations
or attach a debugger to your application to enable assertion checks. Make sure to use the portable
debug information format for detailed source location information) (#375).SharedMemory
in implicitly grouped kernels (#354).CudaAccelerator
and CLAccelerator
instances to run on non-native OS .NET versions (#396).LoopUnrolling
(#373).printf
formats for int64
and uintX
types (#391).Major internal changes:
RuntimeSystem
to avoid concurrency/reflection-API issues (#393).Special thanks to @MoFtZ, @Ruberik for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.
Published by m4rs-mt almost 4 years ago
This new stable version offers significant performance and code quality improvements of the generated kernel programs.
Special thanks to @MoFtZ, @Yey007 and @jgiannuzzi for contributing to this release.
Published by m4rs-mt almost 4 years ago
Special thanks to @MoFtZ, @Yey007 and @jgiannuzzi for contributing to this release.
Published by m4rs-mt almost 4 years ago
The new stable version offers significant performance and code quality improvements of the generated kernel programs.
ContextFlags.EnableVerifier
(#121).Published by m4rs-mt almost 4 years ago
The new beta version offers significant performance improvements of the generated kernel programs.
ContextFlags.EnableVerifier
(#121).Published by m4rs-mt almost 4 years ago
The new stable version offers significant performance and code quality improvements of the generated kernel programs.
GetSubView
in the context of generic and multidimensional array views (#19).Special thanks to @MoFtZ for contributing to this release.
Published by m4rs-mt almost 4 years ago
OpenCL
code generation issues (#85, #88, #91, #92)Special thanks to @MoFtZ for contributing to this release.
Published by m4rs-mt almost 4 years ago
PTX
and OpenCL
code by enabling more aggressive optimizations and clever code generation (#70).enum
-value interop (#66).PTXBackend
to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic the Cuda
compiler.OpenCL
backend to support all API changes and to fix several critical code-generation issues (#72, #73, #74, #78).IR-rewriter
API to perform more advanced IR transformations.rewriter API
.Special thanks to @MoFtZ for contributing to this release.
Published by m4rs-mt almost 4 years ago
CPU
& Cuda
backends).KernelConfig
structure to specify launch dimensions for explicitly grouped kernels.KernelConfig
structure instead of GroupedIndex
types.Grid
and Group
properties.Index1
structure to avoid name clashes with new System.Index
structure.Index2
and Index3
types.EntryPointDescription
structure to specify an entry point and its index type.RuntimeKernelConfig
structure to combine static and dynamic information about a particular kernel launch.GroupedIndex
types.PTXInstructions
to support bool-based IOs in PTXBackend
(#68).ExchangeBuffer
to use new page-locked memory allocation (if available).CudaAPI
to supported paged-lock host-memory allocation functions.GetSubView
in the context of generic and multidimensional array views (#19).AtomicCAS
operations on AMD hardware (#67).Published by m4rs-mt almost 4 years ago
Cuda
and CPU
-based array views.CPU
and GPU
memory.CLAccelerator
.CL
accelerators.AcceleratorId
classes.OpenCL
code generator for float values that are assign integers values.OpenCL
backend.ABI
thread safe to support concurrent queries of size/alignment information.Published by m4rs-mt almost 4 years ago
OpenCL
-compatible GPUs (beta)CUDA
driver version detection.CPUAccelerator
.Utility.Select
method that can be used to create highly-efficient select instructions in favor of if branches.XMath
functions to the ILGPU.Algorihtms
library. Use the new IntrinsicMath
class for math functions that are supported on all platforms.AcceleratorId
functionality.CudaMemoryBuffer
to support MemSetToZero
using alternate streams.Accelerator
to resolve a launch extent with the maximum number of groups.PTXBackend
.Special thanks to @MoFtZ for contributing to this release.
Published by m4rs-mt almost 4 years ago
Greatly improved ILGPU version that included significant performance and code quality improvements.
PTXBackend
.PTXBackend
.PTXBackend
.SequencePoint
.PTXBackend
.Special thanks to @MoFtZ and @mikhail-khalizev for contributing to this release.
Published by m4rs-mt almost 4 years ago
Improved version of v0.5
that contains bug fixes and performance improvements and features based on community feedback.
DebuggerDisplay
attributes on array views.MemoryBuffer.CopyTo
.PTXBackend
.null
values in PTXBackend
.Published by m4rs-mt almost 4 years ago
Note that this has been the first standalone compiler version without any native library dependencies.
Published by m4rs-mt almost 4 years ago
Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5
to use the platform independent ILGPU compiler version.
*** This release has been the last LLVM-based compiler version ***
DLLLoader
and PTXBackend
.Published by m4rs-mt almost 4 years ago
Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5
to use the platform independent ILGPU compiler version.
Published by m4rs-mt almost 4 years ago
Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5
to use the platform independent ILGPU compiler version.
GPUMath
.GPUMath
.LLVMSharp
dependency.CompilerServices.Unsafe
dependency to version 4.4.0.Published by m4rs-mt almost 4 years ago
Note that this version is based on the LLVM compiler framework and contains native dependencies. Please use ILGPU >= v0.5
to use the platform independent ILGPU compiler version.
Atomic.Min
/Atomic.Max
functions.