cccl

CUDA C++ Core Libraries

OTHER License

Stars
743
cccl - v2.3.2 Latest Release

Published by wmaxey 7 months ago

What's Changed

Full Changelog: https://github.com/NVIDIA/cccl/compare/v2.3.1...v2.3.2

cccl - CCCL 2.3.0

Published by wmaxey 8 months ago

What’s New

In addition to various fixes and documentation improvements, the following notable improvements have been made to Thrust, CUB, and libcudacxx.

System Headers and Warnings

Users don't want to see warnings from CCCL headers. The typical way to accomplish this with header libraries is to use -isystem. However, this causes problems when using CCCL from GitHub, it will conflict with the CCCL headers in the CTK. Therefore, you should always include CCCL headers via -I.

To achieve the same effect as -isystem, CCCL headers will now use the system_header pragma. For more information, see https://github.com/NVIDIA/cccl/issues/527.

TL;DR: You should never see warnings emitted from a CCCL header ever again!

Linkage Issues

Using CUB and Thrust in shared libraries is a known source of issues. Previously, the solution to these issues consisted of using the THRUST_CUB_WRAPPED_NAMESPACE macro so that different shared libraries have different symbol names. However, this solution has poor discoverability, since issues present themselves in forms of segmentation faults, hangs, wrong results, etc. As of the 2.3 release, linkage issues are addressed by default without the need for THRUST_CUB_WRAPPED_NAMESPACE. Although the fix is API compatible, it might cause ABI compatibility issues. For more details, see issue #443.

Thrust

thrust::tuple, thrust::pair, and thrust::complex have been replaced with cuda::std alternatives. This can be a breaking change, but should be source compatible.

CUB

Up to 60% performance improvements of cub::DeviceSelect::UniqueByKey, cub::DeviceScan::ExclusiveSumByKey, and cub::DeviceReduce::ReduceByKey on A100. cub::DeviceSegmentedReduce now supports 64-bit indexing.

libcudacxx

  • The cuda::ptx namespace and <cuda/ptx> header is now available and provides access to various inline PTX functions that enumerate various async memcpy and barrier intrinsics.
  • #379 - Added experimental bulk TMA memcpy under <cuda/barrier>

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA/cccl/compare/v2.2.0...2.3.0

cccl - CCCL 2.2.0

Published by jrhemstad about 1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA/cccl/commits/v2.2.0

Package Rankings
Top 6.75% on Proxy.golang.org
Badges
Extracted from project README
Open in GitHub Codespaces
Related Projects