Bot releases are visible (Hide)

ILGPU - Release v0.10.1

Published by m4rs-mt over 3 years ago

The new stable version contains several bug fixes and improves the code quality of the generated kernel programs (get the Nuget package).

It is strongly recommended to upgrade to this version as soon as possible to avoid known bugs and some CPU-buffer deallocation issues.

Changes

Added CopySign intrinsic (#438).
Added intrinsic mappings for BitConverter functions (#437).
Added call stack recording during compilation for error reporting (#436).
Gracefully fail when loading symbols from in-memory assemblies (#435).
Fixed invalid detection of loop bodies (#452).
Fixed incorrect assertion on repeating successors (#447).
Fixed emitting switch statement with constant condition (#442).
Fixed invalid disposal of CPU buffers (#440).
Fixed applications blocking during tear-down by changing Accelerator GC thread to run in the background (#439).
Fixed bounds check on large views (#433).
Fixed retrieving field from structure types (#426).

Special thanks

Special thanks to @MoFtZ, @marcin-krystianc and @jgiannuzzi for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.10.0

Published by m4rs-mt over 3 years ago

The new stable version offers significant performance improvements of the generated kernel programs and contains critical resource deallocation fixes (get the Nuget package).

It is strongly recommended to upgrade to this version as soon as possible to avoid resource and GC related deallocation issues.

Breaking changes

The inheritance hierarchy of the ExchangeBuffer class has been changed to avoid exposing internal memory buffers. If you previously relied on the immediate inheritance from ExchangeBufferBase on MemoryBuffer, you have to adapt your program to use the intermediate base class MemoryBuffer<T, TIndex> instead (see diff).
Properties exposing internal memory buffers of the high-level MemoryBufferXD classes have been removed to avoid ownership related GC-free issues (see diff).

Why are there breaking changes?

We have decided to remove dangerous properties from several memory buffer classes. The use of these properties can lead to program crashes, since buffers could be disposed asynchronously in the background by the GC without further notice.

Changes

Improved performance of kernel launchers by passing packed argument structures (#358, #372).
Graduated different optimizations from O2 to O1 (release mode) to improve performance in release builds using an additional of stable optimization passes (#344).
Graduated O2 optimizations in the Cuda backend to O1 pipeline to generate vectorized IO operations in release builds (#350).
Added support for managed sizeof IL instruction (#380).
Added PrintInformation method to Accelerator instances to print detailed accelerator information (#389).
Added enhanced assertions and out-of-bounds checks to all ArrayView accesses on GPU devices (Use flag ContextFlags.EnableAsserations or attach a debugger to your application to enable assertion checks. Make sure to use the portable debug information format for detailed source location information) (#375).
Added support for printf-like output in Kernels for CPU, Cuda and OpenCL accelerators (#342).
Added new utility Launch/LaunchAutoGrouped methods to immediately launch kernels using a separate strong-reference cache (#336).
Added new AlignTo alignment methods to explicitly align ArrayView instances to a particular alignment in bytes (#316).
Added enhanced support for local memory via a new LocalMemory class (#316).
Added support for several PopCount, CLZ and CTZ operations (#324).
Added new MemSet functions to all memory buffers (#338).
Added new IfConditionalConversion to fold nested and-also and or-else block chains to O2 pipeline (#328).
Added new local memory optimizations to simplify array accesses (#317).
Added simple 64-bit-based LongGlobalIndex helper to simplify correct computations using 64-bit integers (#337).
Added new CLPlatformVersion and fixed OpenCL 1.2 compatibility issues (#335).
Removed support for .NET Core 2.0 (#353).
Prevent using SharedMemory in implicitly grouped kernels (#354).
Prevent using CudaAccelerator and CLAccelerator instances to run on non-native OS .NET versions (#396).
Fixed critical GC-related resource deallocation issues (#376, #393).
Fixed returning correct length of dynamic shared memory buffers (#357).
Fixed invalid alignment information in the presence of reinterpret casts (#386).
Fixed invalid address computations of fixed array buffers (#361).
Fixed invalid PTX calling convention (#362).
Fixed edge cases in LoopUnrolling (#373).
Fixed invalid printf formats for int64 and uintX types (#391).
Fixed invalid DebugArrayView implementations (#345).
Fixed invalid initializations of local memory arrays (#287).

Major internal changes:

Removed singleton instance of RuntimeSystem to avoid concurrency/reflection-API issues (#393).
Updated default optimizations for ILGPU debug builds (#384).
Added support for unity tests running on. NET Framework 4.7 (#355).
Migrated from FxCop analyzers to .NET analyzers. (#352).
Redesigned internal address-space inference passes (#364).

Special thanks

Special thanks to @MoFtZ, @Ruberik and @jgiannuzzi for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.10.0-beta2

Published by m4rs-mt over 3 years ago

This new beta version offers important bug fixes and performance improvements of the generated kernel programs and a set of new features (get the Nuget package).

Improved performance of kernel launchers by passing packed argument structures (#358, #372).
Added support for managed sizeof IL instruction (#380).
Added PrintInformation method to Accelerator instances to print detailed accelerator information (#389).
Added enhanced assertions and out-of-bounds checks to all ArrayView accesses on GPU devices (Use flag ContextFlags.EnableAsserations or attach a debugger to your application to enable assertion checks. Make sure to use the portable debug information format for detailed source location information) (#375).
Removed support for .NET Core 2.0 (#353).
Prevent using SharedMemory in implicitly grouped kernels (#354).
Prevent using CudaAccelerator and CLAccelerator instances to run on non-native OS .NET versions (#396).
Fixed critical GC-related resource deallocation issues (#376, #393).
Fixed returning correct length of dynamic shared memory buffers (#357).
Fixed invalid alignment information in the presence of reinterpret casts (#386).
Fixed invalid address computations of fixed array buffers (#361).
Fixed invalid PTX calling convention (#362).
Fixed edge cases in LoopUnrolling (#373).
Fixed invalid printf formats for int64 and uintX types (#391).

Major internal changes:

Removed singleton instance of RuntimeSystem to avoid concurrency/reflection-API issues (#393).
Updated default optimizations for ILGPU debug builds (#384).
Added support for unity tests running on. NET Framework 4.7 (#355).
Migrated from FxCop analyzers to .NET analyzers. (#352).
Redesigned internal address-space inference passes (#364).

Special thanks to @MoFtZ, @Ruberik for their contributions to this release and to the entire ILGPU community for providing feedback, submitting issues and feature requests.

ILGPU - Release v0.9.0