Bot releases are visible (Hide)

AdaptiveCpp - AdaptiveCpp 24.06.0 Latest Release

Published by illuhad 3 months ago

The fastest heterogeneous C++ compiler - free from vendor politics

This release increases performance even further, while also adding various new features. AdaptiveCpp 24.06 is now without a doubt one of the leading heterogeneous C++ compilers when it comes to performance. In many cases, it is faster than vendor-supported compiler stacks such as CUDA or oneAPI. At the same time, as a purely community-driven project, is is completely free from vendor politics, giving the community back control over their preferred programming models.

Users are encouraged to read the performance guide for directions as to how to get the most out of the AdaptiveCpp stack.

Highlights

Run your application more often for more performance: AdaptiveCpp 24.06 introduces additional JIT-time optimizations that allow it to progressively emit more and more optimized kernels over multiple kernel and application runs. This functionality is controlled via the ACPP_ADAPTIVITY_LEVEL environment variable. This release introduces support for the more aggressive ACPP_ADAPTIVITY_LEVEL=2 setting.
C++ standard parallelism offloading: AdaptiveCpp can now also offload the std::execution::par execution policy on devices which support strong forward progress guarantees. In this case, there is experimental support for std::atomic and std::atomic_ref in device code.
Runtime latency optimizations when using the generic JIT compiler (the AdaptiveCpp default compiler). Kernel submission latency is down by up to ~30% in our testing.
Improved SYCL 2020 support, including e.g. SYCL 2020 reductions with the generic JIT compiler, or the buffer and multi_ptr interfaces.
OpenCL backend is more feature complete; our unit test suite now compiles and runs cleanly with the OpenCL backend.
Major new feature: Dynamic functions allow explicitly programming for the JIT-compilation case, and allow the programmer to explicitly express e.g. JIT-time polymorphism (where the definition of a function is only hardwired at runtime during JIT compilation) or kernel fusion-like semantics.
Introduces the sycl::specialized extension, which hints to the JIT compiler that a runtime kernel argument should be replaced with a constant at JIT-time. This makes AdaptiveCpp the first SYCL implementation to support specialization semantics across all backends thanks to its unified JIT compiler.
Deprecates most of the remaining old macros starting with HIPSYCL_*, __hipsycl* and adds new versions following ACPP_*, __acpp_* naming scheme. Users are encouraged to migrate to the new names.

Benchmarks

The following benchmarks explore the performance of the new ACPP_ADAPTIVITY_LEVEL=2 (AL2) feature as well as performance in general.

Benchmark notes

The AL1/AL2 in the AdaptiveCpp results denotes the value of the ACPP_ADAPTIVITY_LEVEL environment variable, which controls the aggressiveness of additional JIT-time optimizations that AdaptiveCpp supports. The AL2 results were obtained after 2-3 application runs when performance has converged.
It is a non-trivial task to align multiple compilers and programming models such that results are comparable. For example, different compilers and programming models may have different defaults regarding the accuracy to which math builtins are calculated, or to what extent fast math is used by default. In general, it is easier to enable all optimizations rather than finding a common middle ground. This is why for these benchmarks, we compiled with -O3 -ffast-math universally, which aligns most compilers. For hipcc, -fno-hip-fp32-correctly-rounded-divide-sqrt was used in addition to align the behavior with the other compilers.
These results are not comparable to earlier results we have published due to differences in compiler flags and methodology.
AdaptiveCpp was built against LLVM 15, CUDA 12.1 and ROCm 5.4. On Intel GPU, the OpenCL backend was used.
The AdaptiveCpp results were all obtained using its default generic JIT compiler (--acpp-targets=generic)
In the earlier AdaptiveCpp 24.02 release, the generic JIT compiler did not yet support all functionality needed by cloverleaf, and therefore could not compile it. Cloverleaf results are thus missing for the previous release AdaptiveCpp 24.02.
For the easywave results with AdaptiveCpp, we used the additional -DACPP_ALLOW_INSTANT_SUBMISSION=1 compilation flag in line with the recommendations in the AdaptiveCpp performance guide.
icpx/oneAPI for AMD crashed when attempting to compile easywave, so these results are missing.

The figures below may be freely shared under CC-by license, with attribution to the AdaptiveCpp project.

Performance compared to CUDA and oneAPI on NVIDIA GPU

perf_2406

Performance compared to HIP and oneAPI on AMD GPU

perf_2406_amd

Performance compared to oneAPI on Intel GPU

perf_2406_intel

What's Changed in Detail

Full Changelog: https://github.com/AdaptiveCpp/AdaptiveCpp/compare/v24.02.0...v24.06.0

New Contributors

@Abdulrahman295 made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1411
@feltech made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1410
@chsasank made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1419
@badumbatish made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1427
@Luigi-Crisci made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1430
@marcosolanki made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1486

AdaptiveCpp - AdaptiveCpp 24.02.0

Published by illuhad 7 months ago

Maxing out SYCL performance

AdaptiveCpp 24.02 introduces multiple compiler improvements, making it one of the best SYCL compilers - and in many cases the best - in the world when it comes to extracting performance from the hardware.

If you are not using it already, try it now and perhaps save some compute time!

The following performance results have been obtained with AdaptiveCpp's generic single-pass compiler (--acpp-targets=generic).

Note: oneAPI by default compiles with -ffast-math, while AdaptiveCpp does not enable fast math by default. All benchmarks have been explicitly compiled with -fno-fast-math to align compiler behavior, except where noted otherwise.

perf_2402_nvidia

perf_2402_amd
Note: oneAPI for AMD does not correctly round sqrt() calls even if -fno-fast-math is passed, using approximate builtins instead. This loss of precision can substantially skew benchmark results, resulting in misleading performance results. AdaptiveCpp 24.02 correctly rounds math functions by default. To align precision and allowed compiler optimizations, AdaptiveCpp was allowed to use approximate sqrt builtins as well for the AMD results.

perf_2402_intel

Note: AdaptiveCpp was running on the Intel GPU through OpenCL, while DPC++ was using its default backend Level Zero, which allows for more low-level control. Some of the differences may be explained by the different backend runtimes underneath the SYCL implementations.

World's fastest compiler for C++ standard parallelism offload

AdaptiveCpp 24.02 ships with the world's fastest compiler for offloading C++ standard parallelism constructs. This functionality was already part of 23.10, however AdaptiveCpp includes multiple important improvements. It can substantially outperform vendor compilers, and is the world's only compiler that can demonstrate C++ standard parallelism offloading performance across Intel, NVIDIA and AMD hardware. Consider the following performance results for the CloverLeaf, TeaLeaf and miniBUDE benchmarks:

apps_stdpar_normalized

The green bars show AdaptiveCpp 24.02 speedup over NVIDIA nvc++ on NVIDIA A100;
The red bars show AdaptiveCpp 24.02 speedup over AMD roc-stdpar on AMD Instinct MI100;
The blue bars show AdaptiveCpp 24.02 speedup over Intel icpx -fsycl-pstl-offload=gpu on Intel Data Center GPU Max 1550.
The dashed blue line indicates performance +/- 20%.

In particular, note that AdaptiveCpp does not depend on the XNACK hardware feature to obtain performance on AMD GPUs. XNACK is an elusive feature that is not available on most consumer hardware, and usually not enabled on most production HPC systems.

New features: Highlights

No targets specification needed anymore! AdaptiveCpp now by default compiles with --acpp-targets=generic. This means that a simple compiler invocation such as acpp -o test -O3 test.cpp will create a binary that can run on Intel, NVIDIA and AMD GPUs. AdaptiveCpp 24.02 is the world's only SYCL compiler that does not require specifying compilation targets to generate a binary that can run "everywhere".
New JIT backend: Host CPU. --acpp-targets=generic can now also target the host CPU through the generic JIT compiler. This can lead to performance improvements over the old omp compiler. E.g. on AMD Milan, babelstream's dot benchmark was observed to improve from 280GB/s to 380GB/s. This also means that it is no longer necessary to target omp to run on the CPU. generic is sufficient, and will likely perform better. Not having to compile for omp explicitly can also reduce compile times noticably (we observed e.g. ~15% for babelstream).
Persistent on-disk kernel cache: AdaptiveCpp 24.02 ships with an on-disk kernel cache for JIT compilations occuring when using --acpp-targets=generic. This can substantially reduce JIT overheads.
Automatic runtime specialization of kernels: When using --acpp-targets=generic, AdaptiveCpp can now automatically apply optimizations to kernels at JIT-time based on runtime knowledge. This can lead to noticable speedups in some cases, although the full potential of this is expected to only become apparent with future AdaptiveCpp versions.
- This means that achieving best possible performance might require running the application multiple times, as AdaptiveCpp will try to JIT-compile increasingly specialized kernels with each application run. This can be controlled using the ACPP_ADAPTIVITY_LEVEL environment variable. Set it to 0 to recover the old behavior. The default is currently 1. If you are running benchmarks, you may have to update your benchmarking infrastructure to run applications multiple times.

What's Changed in Detail

Full Changelog: https://github.com/AdaptiveCpp/AdaptiveCpp/compare/v23.10.0...v24.02.0

New Contributors

@blinkfrog made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1251
@acmnpv made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1368
@archibate made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1118

AdaptiveCpp - AdaptiveCpp 23.10.0

Published by illuhad 12 months ago

Highlights

This release contains several major features, and introduces a major shift in the project's capabilities:

New project name: AdaptiveCpp. This release is the first release with the new name, and contains renamed, user-facing components. This includes e.g. renamed compiler (acpp), compiler flags (e.g. --acpp-targets), cmake integration and more. The old name is still supported for backward compatibility during a transitional period. For details on why this renaming occured, see https://github.com/AdaptiveCpp/AdaptiveCpp/issues/1147
The world's first single-pass SYCL compiler (--acpp-targets=generic): This release is the first release to contain our new single-pass compiler. This is the world's only SYCL compiler which does not need to parse the code multiple times to generate a binary. Instead, during the regular host compilation, LLVM IR for kernels is extracted and embedded in the binary. At runtime, this IR is then JIT-compiled to whatever is needed (currently supported is PTX, amdgcn and SPIR-V)
- As such, this new compiler design is also the first SYCL compiler to introduce a unified code representation across backends
- "Compile once, run anywhere" - the new design guarantees that every binary generated by acpp --acpp-targets=generic can directly be executed on all supported GPUs from Intel, NVIDIA and AMD. The new approach can dramatically reduce compile times, especially when many devices need to be targeted since the code still is only parsed a single time.
- See the paper for more details: https://dl.acm.org/doi/10.1145/3585341.3585351
The world's first SYCL implementation to support automatic offloading of C++ parallel STL algorithms (--acpp-stdpar). This heterogeneous programming model was until now primarily supported by NVIDIA's nvc++ for NVIDIA GPUs. AdaptiveCpp not only supports it for NVIDIA, AMD and Intel GPUs, but also conveniently allows to generate a binary that can dispatch to all supported devices using the new single-pass compiler. See here for details on this new experimental feature: https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/stdpar.md
Production support for Intel GPUs through the new single-pass compiler
New OpenCL backend - this new backend supports targeting OpenCL SPIR-V devices, such as Intel's CPU and GPU OpenCL runtimes, bringing the total number of supported backends to five.
Many bug fixes and performance optimizations!

What's changed

The full list of changes it too long for release pages; please see here for a comprehensive list of all changes:
Full Changelog: https://github.com/AdaptiveCpp/AdaptiveCpp/compare/v0.9.4...v23.10.0

New Contributors

@RaulPPelaez made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/967
@Momellouky made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1017
@tdavidcl made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/965
@tom91136 made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1065
@0dminnimda made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1080
@eirrgang made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1069
@jamesreinders made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1157
@bashbaug made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1163
@karolherbst made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1177
@Calandracas606 made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1204
@gogo2 made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1209

AdaptiveCpp - AdaptiveCpp 23.10.0 alpha prerelease

Published by illuhad about 1 year ago

This is a prerelease for the upcoming 23.10.0 to provide a testing target.

What's Changed (incomplete, see full changelog below)

Add generic SSCP compilation flow: Single pass compiler to generic LLVM IR + runtime JIT by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/862
[SSCP][NFC] Update installation requirements for SSCP by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/904
[Doc] Update install-llvm.md by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/905
Update Level Zero installation instructions by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/906
[SSCP][L0] Avoid passing nullptr as pointee value to zeKernelSetArgumentValue by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/910
[SSCP] Handle llvm.lifetime.start/end intrinsic when moving allocas to different AS by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/911
[SSCP][llvm-to-spirv] Handle freeze instruction, which is unsupported by llvm-spirv translator by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/914
[SSCP] Avoid spilling function pointer type into SSCP IR due to host barrier pointer in nd_item by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/913
[SSCP][llvm-to-ptx] Respect NVVM wanting alloca instructions in addrspace 0 by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/915
[SSCP] Avoid using typeid in LLVMToBackend to allow RTTI-less LLVM by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/920
Fix API compat with LLVM 16 ToT. by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/918
[SSCP] Remove stack protection attributes in device code by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/922
[SSCP] Handle global variable address spaces by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/921
Add comparison operators to test if multi_ptr == nullptr by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/924
install-rocm.md: Problem with ROCm 5.0 and SSCP by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/927
Explicitly convert paths to std::string by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/926
Bump version to 0.9.4 by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/928
[SSCP] Use information from llvm::DataLayout to correctly calculate parameter offset in kernel lambda by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/925
Resolve -Wpessimizing-move warning by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/931
[SSCP][llvm-to-spirv] Add support for pointer wrapping by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/930
[SSCP][llvm-to-spirv] Enable Intel llvm-spirv translator extensions by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/932
[SSCP] Enable aggressive inlining for all backends by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/933
[doc] Mention that the repositories are outdated. by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/935
Fix parsing backend string by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/936
[doc] Fix formatting of compilation flow documentation by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/938
Update sycl::vec class to reflect SYCL 2020 requirements by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/907
[Renaming][NFC] Change logo and top-level readme by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/943
[Renaming][NFC] Update SYCL implementations image by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/947
[Renaming] Fix CI paths by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/944
Logo: Add text-to-path version of logo, and double check image size by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/946
[Renaming][NFC] Update documentation hipSYCL->Open SYCL by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/950
[Renaming] Remove internal syclcc references to hipSYCL; accept Open SYCL arguments by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/948
[SYCL 2020] Initial marray implementation by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/884
[Renaming] Rename all targets containing hipSYCL except for hipSYCL-rt by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/951
Generic half (fp16) support by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/885
Add half int constructor by @normallytangent in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/957
Change sycl::noinit to sycl::no_init by @RaulPPelaez in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/967
Tweaks and fixes for math built-ins by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/960
Add type aliases halfn = vec<half, n> by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/970
[SSCP] Fix sinpi/cospi builtins for SPIR-V targets by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/975
[SSCP] Add atomic support by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/941
[CI] Add LLVM 16. by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/980
[Renaming] Support new name in cmake integration by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/958
[half] Add numeric_limits and hash for half by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/984
[CI] Update to version 22.11 of the Nvidia HPC SDK by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/986
[SSCP][CI] Add SSCP compile testing to CI by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/981
[SSCP] Do not use OpenMP CXX flags/link line by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/988
Append environment variable for flags in syclcc by @normallytangent in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/982
Addendum to #832 for CUDA/HIP devices by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/989
Add operators for +,-,*,/ for half with other scalar types by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/969
[CI] Fix CI for forks still using hipSYCL as repo name 😇 by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/995
Add isfinite, isinf, isnormal and signbit relational built-ins by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/959
[Doc] Tweak instructions to build w/ LLVM by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/937
Remove Ubuntu 18.04 from CI by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/997
[CBS] Fix creating wi-loop for barrier-free kernel, if the kernel has… by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/998
[CBS] Dynamically sized stack arrays. by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/994
Check number of Args... of vec constructor in template parameter to allow SFINAE by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/954
Add -DWITH_SSCP_COMPILER=OFF to minimal install script by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1009
[Doc] Fix (very) minor spelling/grammar mistakes by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1011
[Doc] fix a very minor typo by @Momellouky in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1017
[L0] Resolve API failures for USM pointer queries by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1021
Implement math builtin frexp, modf, sincos by @fxzjshm in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1007
Add a few missing operators to id class by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1018
Add buffer(Container) constructor by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/990
[SSCP] Fix S2 compilation for globals without initializer by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1020
[NFC] Add new publications to readme by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1026
Add missing operators for range class by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1027
Fix Windows GitHub CI by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/977
[LLVM] Fix compat with upstream LLVM by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1003
[SSCP] Fix SSCP issues for LLVM 17 by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1028
[CI] Tidy up and test with ubuntu 22.04 by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1004
[CI] Fix Windows CI (again) by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1032
[CI] Add GPU-based workflows & testing to CI by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1029
Implement SYCL2020 accessor offset semantics by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/992
[SYCL2020] Migrate information descriptors to their respective namespaces by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/987
Add ldexp math built-in by @nmnobre in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/962
add clz builtin by @tdavidcl in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/965
Add unary +/- operators for half by @normallytangent in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1038
Fix CMake warnings by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1041
Expose Dimensions template paramter for {nd_}range, {nd_,h_}item and id by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1016
Allow non-default-construbtible types for buffer(Container) and buffer(Iterator, Iterator) constructors by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1033
[CI] Remove superfluous (and incorrect) cmake install prefix argument by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1042
Use size_t in decl and def of createExitWithID by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1063
Keep CMake target rules if defined by @tom91136 in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1065
Fix build with ROCm Clang 5.5.0 by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1040
Optimize submission process for eager submission case by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1054
Add aliases for marray by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1075
Use fixed width int types in SSCP builtin interface by @fxzjshm in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1076
Fix error when compiling with -std=c++20 by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1083
[DOC] fix the broken link to the wiki article in the README.md by @0dminnimda in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1080
Fix comment about (cuda|hip)StreamCreateWithPriority by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1086
Return device architecture in info::device::version by @al42and in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1084
Implement iterators for the accessor class by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1008
Let hipSYCL-rt be a non-transitive dependency. by @eirrgang in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1069
[SSCP][llvm-to-spirv] Use pown(double, i32) since IGC does not support pown(double, i64) by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1091
[L0] Take EUs into account when calculating compute units by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1106
Add C++ standard parallelism offloading support by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1088
Add OpenCL backend by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1109
Fix fill with offset by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1111
[OpenCL] Fix accessing build log by @fxzjshm in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1115
[stdpar] Don't use hipMemcpy and hipMemset due to performance issue with shared allocations by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1120
[SSCP] Strip module level inline assembly from device code by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1123
Use FindCUDAToolkit for cmake versions >= 3.17 by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1124
Pass CUDA libraries in FindCUDA.cmake as a list by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1127
[stdpar] Implement {m,aligned_}alloc and free by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1114
[NFC] Add image of compiler stack to documentation by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1126
Allow for non-default-constructible iterators in std::for_each_n by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1132
[CI] Run self-hosted runners only for actions started from main repository by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1133
[Renaming] Migrate syclcc flags and content to new name; add acpp alias by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1137
Bump version to 23.10.0 by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1136
[Renaming] Add AdaptiveCpp cmake infrastructure, migrate tests to use new cmake integration by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1144
[Renaming][NFC] Migrate to new name in documentation and images by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1142
[renaming] Rename opensycl-hcf-tool and opensycl-info to acpp-hcf-tool and acpp-info by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1149
[renaming] libopensycl-clang.so -> libacpp-clang.so by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1153
[renaming] Use AdaptiveCpp in debug output instead of hipSYCL by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1152
[renaming] Support ACPP_* environment variables by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1151
update build instructions - easy by @jamesreinders in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1157
[NFC] Add note to describe potential perf pitfall due to OpenMP runtime mismatch by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1158
Fix compilation error with LLVM 18 by @tom91136 in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1159
[renaming] Remove obsolete file by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1160
Add workaround for clang CUDA header incompatibility by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1161
[HipLike] add header noinline workaround for gcc-13 by @tdavidcl in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1162
add parallel_for overloads with a number by @bashbaug in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1163
Add get_backend to Interop handle by @normallytangent in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1141
Fix HIPSYCL_RT_SANITIZE cmake option by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1171
Fix UB in test cases by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1173
[renaming] libopensycl-common -> libacpp-common, libhipSYCL-rt -> libacpp-rt by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1174
Fix clang-16 and newer include path by @karolherbst in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1177
Update the path to include directory in AdaptiveCpp's installation by @normallytangent in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1170
[SSCP][llvm-to-ptx] Strip debug information to avoid JIT failures when using -g by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1168
[SSCP] Handle circular references in globals by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1166
Redesign execution hints to avoid dynamic memory allocation and allow faster queries by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1178
buffer allocation: Avoid requesting alignments not supported by backends by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1184
[SSCP][llvm-to-spirv] Don't error if requested local memory is unused by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1185
[CI] Update nvc++ to 23.9 and resolve nvc++ CI issues by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1181
[renaming] Support ACPP_DEBUG_LEVEL in compiler and tools by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1186
[SSCP][llvm-to-spirv] Do not use cmake INSTALL_DIR to install llvm-spirv translator by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1172
[SSCP] Fix excessive global pruning introduced in #1166 by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1187
[OpenCL] Fix typo potentially causing USM pointer queries to fail by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1192
[SSCP][llvm-to-spirv] Also strip debug information for SPIR-V JIT by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1191
[OpenCL] Correctly initialize is_from_host_backend when querying USM pointer info by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1193
Introduce small_vector support by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1182
[CI][CBS] Enable LLVM 17 in CI by @fodinabor in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1195
Print more detailed version information; allow custom version suffix by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1167
[L0] Handle USM pointer queries in a more robust way if the pointer is unknown by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1197
Update sycl::exception class to SYCL2020 by @nilsfriess in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1066
[CI] Expand LIT infrastructure to SSCP and add some SSCP CI tests by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1196
Add instant submission mode by @illuhad in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1128

New Contributors

@RaulPPelaez made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/967
@Momellouky made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1017
@tdavidcl made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/965
@tom91136 made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1065
@0dminnimda made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1080
@eirrgang made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1069
@jamesreinders made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1157
@bashbaug made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1163
@karolherbst made their first contribution in https://github.com/AdaptiveCpp/AdaptiveCpp/pull/1177

Full Changelog: https://github.com/AdaptiveCpp/AdaptiveCpp/compare/v0.9.4...v23.10.0-alpha

AdaptiveCpp - hipSYCL 0.9.4

Published by illuhad over 1 year ago

This is a maintenance release, intended as a last stop before major additions. It therefore does not include major functionality already available on the develop branch such as the generic single-pass compiler.

What's Changed

Add minimal install script by @illuhad in https://github.com/illuhad/hipSYCL/pull/819
Fix handling of HCF object id when both CUDA and HIP are in explicit multipass by @illuhad in https://github.com/illuhad/hipSYCL/pull/831
Fix device::max_work_item_sizes by @nmnobre in https://github.com/illuhad/hipSYCL/pull/832
Add HIPSYCL_SYCLCC_EXTRA_COMPILE_OPTIONS by @al42and in https://github.com/illuhad/hipSYCL/pull/824
[CBS] Cope with a minor LLVM API change. by @fodinabor in https://github.com/illuhad/hipSYCL/pull/843
[CI] Enable LLVM 15. by @fodinabor in https://github.com/illuhad/hipSYCL/pull/844
replace activemask with ballot function by @DieGoldeneEnte in https://github.com/illuhad/hipSYCL/pull/838
Make embedded_pointer store pointer which than convert to unique_id rather than the other way round by @illuhad in https://github.com/illuhad/hipSYCL/pull/821
Update comment on nvc++ workaround in mem_fence() by @nmnobre in https://github.com/illuhad/hipSYCL/pull/849
Use -isystem instead of -I for hipSYCL headers to avoid warnings with high warning levels by @illuhad in https://github.com/illuhad/hipSYCL/pull/859
Add implicit conversion for item<1> to size_t by @illuhad in https://github.com/illuhad/hipSYCL/pull/847
Remove comparing my_id since it does not exist in class item by @nilsfriess in https://github.com/illuhad/hipSYCL/pull/868
Remove unnecessary/wrong consts by @nilsfriess in https://github.com/illuhad/hipSYCL/pull/876
Fix hipSYCL clang plugin path on MacOS by @illuhad in https://github.com/illuhad/hipSYCL/pull/883
Add vec deduction guides and fix swizzles when directly accessed using .elem() by @illuhad in https://github.com/illuhad/hipSYCL/pull/866
Ensure that device{} == device{default_selector{}} by @nilsfriess in https://github.com/illuhad/hipSYCL/pull/888
[CBS] Cleanup globals if unused loads still around by @fodinabor in https://github.com/illuhad/hipSYCL/pull/887
WIP: Fix MacOS build CI by @normallytangent in https://github.com/illuhad/hipSYCL/pull/882
Rename global_mem_cache_type::write_only to read_write by @nilsfriess in https://github.com/illuhad/hipSYCL/pull/875
Fix LLVM 16 compat. by @fodinabor in https://github.com/illuhad/hipSYCL/pull/893
Add clang include path from lib64. (Fix building on opensuse tumbleweed) by @marknefedov in https://github.com/illuhad/hipSYCL/pull/898

New Contributors

@marknefedov made their first contribution in https://github.com/illuhad/hipSYCL/pull/898

Full Changelog: https://github.com/illuhad/hipSYCL/compare/v0.9.3...v0.9.4

AdaptiveCpp - hipSYCL 0.9.3

Published by illuhad about 2 years ago

Highlights

Improved compatibility with new clang versions and ROCm clang
New extensions, e.g.
- coarse grained events. These are zero-construction-cost events at the expense of lower synchronization performance, and hence a good match if the returned event of an operation is not expected to be used
- queue priorities for in-order queues on certain backends
Added hip.explicit-multipass compilation flow
Multiple optimizations that can potentially reduce runtime overheads substantially
- Use event pools in CUDA/HIP backends
- Use asynchronous garbage collector thread to clean up old DAG nodes to remove garbage collection from the kernel submission path
- Use std::weak_ptr instead of shared_ptr to express dependencies in the DAG; making old DAG nodes and their associated events eligible earlier for reuse by the event pool.
In-order queues map 1:1 to dedicated CUDA or HIP streams for more explicit scheduling control
Unified kernel cache and data format for all explicit multipass compilation flow (hipSYCL container format, HCF)
Manage hipSYCL runtime lifetime by refcounting all SYCL objects created by the user instead of just having a global object; this can resolve errors when terminating the program on some backends.
Simplify deployment when no std::filesystem is available
New tool: hipsycl-hcf-tool to inspect and edit HCF files
New tool: hipsycl-info to print information about detected devices.

What's Changed (details)

Fix SPIR-V isnan() builtin by @illuhad in https://github.com/illuhad/hipSYCL/pull/710
Don't spill OpenMP pragmas and add .sycl as file ending by @illuhad in https://github.com/illuhad/hipSYCL/pull/711
Update installation scripts by @sbalint98 in https://github.com/illuhad/hipSYCL/pull/677
Fix typo in macro name causing harmless warnings by @al42and in https://github.com/illuhad/hipSYCL/pull/715
Check all dyn casts in analyzeModule. by @fodinabor in https://github.com/illuhad/hipSYCL/pull/717
Align name mangling in clang 13 host pass with upstream clang and restrict uses of createDeviceMangleContext() by @illuhad in https://github.com/illuhad/hipSYCL/pull/720
Add missing include directive for unordered_map by @normallytangent in https://github.com/illuhad/hipSYCL/pull/735
Make random number generators for embedded_pointer unique id thread_local by @illuhad in https://github.com/illuhad/hipSYCL/pull/738
Fix multi-threaded task processing by @illuhad in https://github.com/illuhad/hipSYCL/pull/739
dag_node: Only use backend wait() functionality if we are not yet complete by @illuhad in https://github.com/illuhad/hipSYCL/pull/742
Describe boost 1.78 build system bug in documentation by @illuhad in https://github.com/illuhad/hipSYCL/pull/744
Add released LLVM 14 to Linux CIs. by @fodinabor in https://github.com/illuhad/hipSYCL/pull/747
Add global kernel cache and HCF infrastructure by @illuhad in https://github.com/illuhad/hipSYCL/pull/736
Fix fiinding boost library path for boost with cmake intgeration by @sbalint98 in https://github.com/illuhad/hipSYCL/pull/748
Use reference-counting of user SYCL objects to manage runtime lifetime by @illuhad in https://github.com/illuhad/hipSYCL/pull/749
Restrict queries of event state by @illuhad in https://github.com/illuhad/hipSYCL/pull/750
Fix signature of __hipsycl_atomic_store for double and float by @al42and in https://github.com/illuhad/hipSYCL/pull/751
[CUDA][HIP] Add event pool by @illuhad in https://github.com/illuhad/hipSYCL/pull/757
Add coarse grained events extension by @illuhad in https://github.com/illuhad/hipSYCL/pull/754
Make max cached nodes configurable by @illuhad in https://github.com/illuhad/hipSYCL/pull/759
[cbs] Fix compatibility issues with upstream Clang/LLVM by @aaronmondal in https://github.com/illuhad/hipSYCL/pull/763
[CBS] Fix runtime issues with opaque pointers by @fodinabor in https://github.com/illuhad/hipSYCL/pull/765
[Plugin] Resolve version macros in HIPSYCL_STRINGIFY by @aaronmondal in https://github.com/illuhad/hipSYCL/pull/773
Add missing sycl::nd_range::get_group_range function by @al42and in https://github.com/illuhad/hipSYCL/pull/775
Add HIPSYCL_RT_SANITIZE cmake option by @illuhad in https://github.com/illuhad/hipSYCL/pull/779
Update ROCm installation documentation by @illuhad in https://github.com/illuhad/hipSYCL/pull/780
Remove unnecessary linking against boost for the clang plugin by @illuhad in https://github.com/illuhad/hipSYCL/pull/781
Use weak_ptr in node requirements list by @illuhad in https://github.com/illuhad/hipSYCL/pull/771
[CI] fix compilation on MSVC 2017 by @fxzjshm in https://github.com/illuhad/hipSYCL/pull/784
dag_submitted_ops: Manage node lifetime by asynchronously waiting instead of event queries by @illuhad in https://github.com/illuhad/hipSYCL/pull/761
Optimize queue::wait() by waiting on nodes in reverse submission order by @illuhad in https://github.com/illuhad/hipSYCL/pull/787
Remove OpenMP dependency for sequential backend by @illuhad in https://github.com/illuhad/hipSYCL/pull/786
Optimize inorder queue::wait() by @illuhad in https://github.com/illuhad/hipSYCL/pull/788
Add support for HIP explicit multipass by @illuhad in https://github.com/illuhad/hipSYCL/pull/790
Add hipsycl-info tool by @illuhad in https://github.com/illuhad/hipSYCL/pull/791
Fix ThreadSanitizer complaint about worker_thread::_continue by @al42and in https://github.com/illuhad/hipSYCL/pull/794
Avoid printing unprintable from memset_operation::dump by @al42and in https://github.com/illuhad/hipSYCL/pull/795
Fix linking errors with libstdc++ < 9 by @al42and in https://github.com/illuhad/hipSYCL/pull/667
Use device managers in allocators instead of setting device directly by @illuhad in https://github.com/illuhad/hipSYCL/pull/796
Work around nvc++ bug by not having empty if target branches in mem_fence() by @illuhad in https://github.com/illuhad/hipSYCL/pull/798
Manually check version of clang if ROCm is used. by @fodinabor in https://github.com/illuhad/hipSYCL/pull/800
Implement sincos and sinh math builtins by @nmnobre in https://github.com/illuhad/hipSYCL/pull/802
Add dedicated backend queues for inorder queues and priority queue support by @illuhad in https://github.com/illuhad/hipSYCL/pull/770
Add HIPSYCL_EXT_QUEUE_PRIORITY flag by @al42and in https://github.com/illuhad/hipSYCL/pull/804
Fix CMake error with ROCm 4.5 Clang by @al42and in https://github.com/illuhad/hipSYCL/pull/806
Add option to compile tests with reduced local mem usage by @illuhad in https://github.com/illuhad/hipSYCL/pull/805
omp.library-only: Fix incorrect addition of master group offset to group id by @illuhad in https://github.com/illuhad/hipSYCL/pull/814
Bump version to 0.9.3 by @illuhad in https://github.com/illuhad/hipSYCL/pull/803

New Contributors

@normallytangent made their first contribution in https://github.com/illuhad/hipSYCL/pull/735
@aaronmondal made their first contribution in https://github.com/illuhad/hipSYCL/pull/763
@nmnobre made their first contribution in https://github.com/illuhad/hipSYCL/pull/802

Thank you to our first-time contributors!

Full Changelog: https://github.com/illuhad/hipSYCL/compare/v0.9.2...v0.9.3

AdaptiveCpp - hipSYCL 0.9.2

Published by illuhad over 2 years ago

Changes compared to the previous release 0.9.1 (selection)

The following is an incomplete list of changes and improvements:

Highlights

Initial support for operating as a pure CUDA library for NVIDIA's proprietary nvc++ compiler, without any additional hipSYCL compiler magic. In this flow, LLVM is not required and new NVIDIA hardware can be targeted as soon as NVIDIA adds support in nvc++.
Initial support for dedicated compiler support in the CPU backend. These new compilation passes can greatly improve performance of nd_range parallel for kernels on CPU. This allows executing SYCL code efficiently on any CPU supported by LLVM.
Scoped parallelism API v2 for a more performance portable programming model
Reimplement explicit multipass support for clang >= 13. This allows targeting multiple backends simultaneously, and was previously only supported on clang 11. Kernel names in the binary are now always demangleable as __hipsycl_kernel<KernelNameT> or __hipsycl_kernel<KernelBodyT>.

SYCL support

Support for new SYCL 2020 features such as atomic_ref, device selector API, device aspect API and others
Support for SYCL 2020 final group algorithm interface
Add support for the profiling API
... more

Extensions

Add initial support for multi-device queue hipSYCL extension to automatically distribute work across multiple devices
Add initial support for queue::get_wait_list() hipSYCL extension to allow barrier-like semantics at the queue level
Add accessor_variant extension which allows accessors to automatically optimize the internal data layout of the accessor object depending on how they were constructed. This can save registers on device without any changes needed by the user.
Add handler::update_device() extension in analogy to already existing update_host(). This can be e.g. used to prefetch data.
Complete buffer-USM interoperability API
Add support for explicit buffer policy extension and asynchronous buffers

See the documentation on extensions for more details.

Optimizations

Automatic work distribution across multiple streams
Fix massive performance bug caused by a bug in the kernel cache in the Level Zero backend
Optimize CUDA backend to perform aggressive CUDA module caching in an explicit multipass scenario. This can greatly improve performance of the cuda.explicit-multipass compilation flow when multiple translation units are involved.
Several performance fixes and improvements in the hipSYCL runtime. Especially when spawning many tasks, performance can now be significantly better.
... more

Bug fixes and other improvements

Yes, a lot of them :-)

AdaptiveCpp - hipSYCL 0.9.1

Published by illuhad over 3 years ago

hipSYCL 0.9.1

-- This release is dedicated to the memory of Oliver M. Some things just end too soon.

New major features

Add new "explicit multipass" compilation model, allowing to simultaneously target all of hipSYCL's backends. This means hipSYCL can now compile to a binary that runs can run on devices from multiple vendors. Details on the compilation flow can be found here: https://github.com/illuhad/hipSYCL/blob/develop/doc/compilation.md
Introduce plugin architecture for backends of the hipSYCL runtime. This means hipSYCL now looks for backend plugins at runtime, allowing to extend an already existing hipSYCL installation with support for additional hardware without changing the already installed components.
Initial, experimental support for Intel GPUs using Level Zero and SPIR-V
Introducing initial support for large portions of oneDPL using our fork at https://github.com/hipSYCL/oneDPL
hipSYCL is now also tested on Windows in CI, although Windows support is still experimental.

New features and extensions

Command group properties that can influence how kernels or other operations are scheduled or executed:
- hipSYCL_retarget command group property. Execute an operation submitted to a queue on an arbitrary device instead of the one the queue is bound to.
- hipSYCL_prefer_group_size<Dim> command group property. Provides a recommendation to hipSYCL which group size to choose for basic parallel for kernels.
- hipSYCL_prefer_execution_lane command group property. Provides a hint to the runtime on which backend queue (e.g. CUDA stream) an operation should be executed. This can be used to optimize kernel concurrency or overlap of data transfers and compute in case the hipSYCL scheduler does not already automatically submit an optimal configuration.
Comprehensive interoperability framework between buffers and USM pointers. This includes extracting USM pointers from existing buffer objects, turning any buffer into a collection of USM pointers, as well as constructing buffer objects on top of existing USM pointers.
The hipSYCL_page_size buffer property can be used to enable data state tracking inside a buffer at a granularity below the buffer size. This can be used to allow multiple kernels to concurrently write to the same buffer as long as they access different hipSYCL data management pages. Unlike subbuffers, this also works with multi-dimensional strided memory accesses.
Synchronous sycl::mem_advise() as free function
handler::prefetch_host() and queue::prefetch_host() for a simpler mechanism of prefetching USM allocations to host memory.
Explicit buffer policies to make programmer intent clearer as well as asynchronous buffer types that do not block in the destructor, which can improve performance. For example, auto v = sycl::make_async_view(ptr, range) constructs a buffer that operates directly on the input pointer and does not block in the destructor.
HIPSYCL_VISIBLITY_MASK environment variable can be used to select which backends should be loaded.

See https://github.com/illuhad/hipSYCL/blob/develop/doc/extensions.md for a list of all hipSYCL extensions with more details.

Optimizations and improvements

Hand-tuned optimizations for SYCL 2020 group algorithms
Automatic distribution of kernels across multiple CUDA/HIP streams
Improved support for newer ROCm versions
SYCL 2020 accessor deduction guides and host_accessor
Improve handling of Multi-GPU setups
Significant performance improvements for queue::wait()
Early DAG optimizations to improve handling of complex and large dependency graphs
Optimizations to elide unnecessary synchronization between DAG nodes

Bug fixes and other improvements

Yes, a lot of them!

AdaptiveCpp - hipSYCL 0.9.0

Published by illuhad almost 4 years ago

hipSYCL 0.9.0

hipSYCL 0.9 is packed with tons of new features compared to the older 0.8 series:

Support for key SYCL 2020 features

hipSYCL 0.9.0 introduces support for several key SYCL 2020 features, including:

Unified shared memory provides a pointer-based memory model as an alternative to the traditional buffer-accessor model
SYCL 2020 generalized backend model and backend interoperability provides generic mechanisms for interoperability between the underlying backend objects and SYCL
Queue shortcuts for kernel invocation and USM memory management functions
Inorder queues to submit kernels in order when a task graph is not required
Unnamed kernal lambdas (requires building hipSYCL against clang >= 10)
Subgroups
Group algorithms for parallel primitives at work group and subgroup level (Note that the interface may change slightly with the release of SYCL 2020 final, optimization is ongoing)
Reductions provide a simple way to carry out arbitrary amounts of reduction operations across all work items of a kernel using either predefined or user-provided reduction operators (Note that the interface may change slightly with the release of SYCL 2020 final, optimization is ongoing). Currently only scalar reductions are supported. Multiple simultaneous reductions are supported. In addition to the requirements of the SYCL specification, we also support reductions for the hierarchical and scoped parallelism models.
... and more! See here for more information on the SYCL 2020 coverage of current hipSYCL: https://github.com/hipSYCL/featuresupport

Unique hipSYCL extensions

There are two new extensions in hipSYCL 0.9.0:

Enqueuing custom backend operations for highly efficient backend interoperability: https://github.com/illuhad/hipSYCL/blob/develop/doc/enqueue-custom-operation.md
Scoped parallellism is a novel kernel execution model designed for performance portability between host and device backends: https://github.com/illuhad/hipSYCL/blob/develop/doc/scoped-parallelism.md

New runtime library

hipSYCL 0.9.0 is the first release containing the entirely rewritten, brand new runtime library, which includes features such as:

Single library for all backends (libhipSYCL-rt) instead of libraries for each backend (libhipSYCL_cpu, libhipSYCL_cuda etc)
Strict seperation between backend specific code and generic code, clear, simple interface to add new backends, making it easy to add additional backends in the future
Multiple runtime backends can now be active at the same time and interact
SYCL interface is now header-only; bootstrap mode in syclcc is no longer required and has been removed. When building hipSYCL, only the runtime needs to be compiled which can be done with any regular C++ compiler. This should simplify the build process greatly.
Architecture supports arbitrary execution models in different backends - queue/stream based, task graphs etc.
CUDA and CPU backends do not depend on HIP API anymore. The CUDA backend now goes directly to CUDA without going through HIP, and the CPU backend goes directly to OpenMP without going through hipCPU. hipCPU and HIP submodules are no longer required and have been removed.
Strict separation between SYCL interface and runtime, making it easy to expose new features (e.g. SYCL 2020) in the SYCL interface by leveraging the SYCL runtime interfaces underneath.
For each operation, SYCL interface can pass additional information to runtime/scheduler using hints framework. Device on which an operation is executed is just another hint for the runtime.
Support for lazy DAG execution (Note: Only partially activated by default)
Almost entirely callback-free execution model in CUDA/ROCm backends for potentially higher task throughput
New memory management system and improved multi-GPU support
- manages arbitrary allocations on multiple devices
- manages memory potentially below buffer granularity, using 3D page table to track invalid memory regions (not yet fully exposed)
Backend queues (e.g. CUDA streams) are maintained by the backend in a pool, the scheduler then distributes operations across the queues. No matter how many sycl::queues exist, compute/memory-overlap always works equally well. This means a sycl::queue is now nothing more than an interface to the runtime.
Vastly improved error handling. Proper implementation of async errors/error handlers. Task execution will be cancelled when an error is detected.
ROCm backend: Add support for 3D data transfers

`syclcc` and compilation improvements

new --hipsycl-targets flag that allows to compile for multiple targets and backends, e.g. syclcc --hipsycl-targets="omp;hip:gfx906,gfx900" compiles for the OpenMP backend as well as for Vega 10 and Vega 20. Note that simultaneous compilation for both NVIDIA and AMD GPUs is not supported due to clang limitations.
The compiler arguments and linker flags passed to backend compilers are now all exposed in cmake (and syclcc.json), giving the user more control to adapt the compilation flow to individual requirements. This can be helpful for uncommon setup scenarios where different flags may be required.

Performance improvements

New execution model for nd_range parallel for on CPU, bringing several orders of magnitudes of performance. Note that nd_range parallel for is inherently difficult to implement in library-only CPU backends, and basic parallel for or our scoped parallelism extension should be preferred if possible.

Fixes and other improvements

Yes, a lot of them :-)

AdaptiveCpp - hipSYCL 0.8.0

Published by illuhad about 5 years ago

Note: hipSYCL 0.8.0 is deprecated, users are encouraged to use our package repositories instead

This is the release of hipSYCL 0.8.0. We provide the following packages:

hipSYCL-base provides the basic LLVM compiler stack that is needed in any case
hipSYCL-rocm provides a compatible ROCm stack that additionally allows hipSYCL to target AMD GPUs
hipSYCL provides the actual hipSYCL libraries, tools and headers

While we cannot provide matching CUDA packages for NVIDIA support due to legal reasons, scripts for installing a matching CUDA distribution as well as scripts to generate CUDA packages are provided. You will find further information in the readme here on github.

At the moment, Arch Linux, CentOS 7 and Ubuntu 18.04 packages are provided.

AdaptiveCpp - hipSYCL 0.8.0 Release Candidate 1

Published by illuhad about 5 years ago

This is a prerelease of hipSYCL 0.8.0. In particular, it serves to test new packages of the entire hipSYCL stack. We provide the following packages:

hipSYCL-base provides the basic LLVM compiler stack that is needed in any case
hipSYCL-rocm provides a compatible ROCm stack that additionally allows hipSYCL to target AMD GPUs
hipSYCL provides the actual hipSYCL libraries, tools and headers

While we cannot provide matching CUDA packages due to legal reasons, CUDA installation scripts will be provided for the actual hipSYCL 0.8.0 release.

At the moment, Arch Linux and Ubuntu 18.04 packages are provided.

Related Projects

HIP

HIP: C++ Heterogeneous-Compute Interface for Portability

07 Jan 2016 3,435

easyLambda

distributed dataflows with functional list operations for data processing with C++14

14 Mar 2016 497

FunctionalPlus

Functional Programming Library for C++. Write concise and readable C++ code.

23 Nov 2015 2,098

alpaka

Abstraction Library for Parallel Kernel Acceleration

05 Nov 2014 303

ReactivePlusPlus

Implementation of async observable/observer (Reactive Programming) in C++ with care about perform...

28 Dec 2021 230

libsimdpp

Portable header-only C++ low level SIMD library

08 May 2013 1,226

cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

15 Sep 2021 5,454

Mach7

Functional programming style pattern-matching library for C++

27 May 2014 1,277

functions-framework-cpp

Functions Framework for C++

08 Oct 2020 42

cpp-tutor

Code examples for tutoring modern C++

18 Nov 2018 86

triSYCL

Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group

19 Apr 2014 439

proxy

Proxy: Next Generation Polymorphism in C++

30 May 2022 804

xsimd

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AV...

19 Feb 2016 2,018

real-time-cpp

Source code for the book Real-Time C++, by Christopher Kormanyos

23 Sep 2012 578

OpenCL-Guide

OpenCL Guide

22 Aug 2021 13

AdaptiveCpp

The fastest heterogeneous C++ compiler - free from vendor politics

Highlights

Benchmarks

Benchmark notes

Performance compared to CUDA and oneAPI on NVIDIA GPU

Performance compared to HIP and oneAPI on AMD GPU

Performance compared to oneAPI on Intel GPU

What's Changed in Detail

New Contributors

Maxing out SYCL performance

World's fastest compiler for C++ standard parallelism offload

New features: Highlights

What's Changed in Detail

New Contributors

Highlights

What's changed

New Contributors

What's Changed (incomplete, see full changelog below)

New Contributors

What's Changed

New Contributors

Highlights

What's Changed (details)

New Contributors

Changes compared to the previous release 0.9.1 (selection)

Highlights

SYCL support

Extensions

Optimizations

Bug fixes and other improvements

hipSYCL 0.9.1

New major features

New features and extensions

Optimizations and improvements

Bug fixes and other improvements

hipSYCL 0.9.0

Support for key SYCL 2020 features

Unique hipSYCL extensions

New runtime library

syclcc and compilation improvements

Performance improvements

Fixes and other improvements

Related Projects

HIP

easyLambda

FunctionalPlus

alpaka

ReactivePlusPlus

libsimdpp

cppfront

Mach7

functions-framework-cpp

cpp-tutor

triSYCL

proxy

xsimd

real-time-cpp

OpenCL-Guide

`syclcc` and compilation improvements