CUDA C++ Core Libraries
OTHER License
Full Changelog: https://github.com/NVIDIA/cccl/compare/v2.3.1...v2.3.2
Published by wmaxey 8 months ago
In addition to various fixes and documentation improvements, the following notable improvements have been made to Thrust, CUB, and libcudacxx.
Users don't want to see warnings from CCCL headers. The typical way to accomplish this with header libraries is to use -isystem
. However, this causes problems when using CCCL from GitHub, it will conflict with the CCCL headers in the CTK. Therefore, you should always include CCCL headers via -I
.
To achieve the same effect as -isystem
, CCCL headers will now use the system_header
pragma. For more information, see https://github.com/NVIDIA/cccl/issues/527.
TL;DR: You should never see warnings emitted from a CCCL header ever again!
Using CUB and Thrust in shared libraries is a known source of issues. Previously, the solution to these issues consisted of using the THRUST_CUB_WRAPPED_NAMESPACE
macro so that different shared libraries have different symbol names. However, this solution has poor discoverability, since issues present themselves in forms of segmentation faults, hangs, wrong results, etc. As of the 2.3 release, linkage issues are addressed by default without the need for THRUST_CUB_WRAPPED_NAMESPACE
. Although the fix is API compatible, it might cause ABI compatibility issues. For more details, see issue #443.
thrust::tuple
, thrust::pair
, and thrust::complex
have been replaced with cuda::std
alternatives. This can be a breaking change, but should be source compatible.
Up to 60% performance improvements of cub::DeviceSelect::UniqueByKey
, cub::DeviceScan::ExclusiveSumByKey
, and cub::DeviceReduce::ReduceByKey
on A100. cub::DeviceSegmentedReduce
now supports 64-bit indexing.
cuda::ptx
namespace and <cuda/ptx>
header is now available and provides access to various inline PTX functions that enumerate various async memcpy and barrier intrinsics.<cuda/barrier>
_LIBCUDACXX_INLINE_VAR
macro by @miscco in https://github.com/NVIDIA/cccl/pull/355
iter_swap
CPO by @miscco in https://github.com/NVIDIA/cccl/pull/332
copy-pr-bot
by @ajschmidt8 in https://github.com/NVIDIA/cccl/pull/381
permutable
concept by @miscco in https://github.com/NVIDIA/cccl/pull/367
_NOEXCEPT_
macro uses by @miscco in https://github.com/NVIDIA/cccl/pull/371
identity
changes for c++20 by @miscco in https://github.com/NVIDIA/cccl/pull/383
Roadmap
project value on issue/pr close and Auto-type new issues by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/389
DeviceAdjacentDifference::SubtractRight
tests to catch2 by @miscco in https://github.com/NVIDIA/cccl/pull/390
continue-on-error
by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/425
util_device.cuh
from iterator headers to enable online compilation by @leofang in https://github.com/NVIDIA/cccl/pull/412
cub::DeviceRunLengthEncode
tests to catch2 by @miscco in https://github.com/NVIDIA/cccl/pull/411
projected
helper struct by @miscco in https://github.com/NVIDIA/cccl/pull/385
cuda::barrier
by @ahendriksen in https://github.com/NVIDIA/cccl/pull/440
thrust::complex
unit tests to prepare for upcoming replacement with std::complex
by @Blonck in https://github.com/NVIDIA/cccl/pull/413
.devcontainer/launch.sh
with bash + add error checking by @wence- in https://github.com/NVIDIA/cccl/pull/407
__ppc64__
by @miscco in https://github.com/NVIDIA/cccl/pull/451
indirectly_comparable
concept by @miscco in https://github.com/NVIDIA/cccl/pull/445
constexpr
all the things by @pb-dseifert in https://github.com/NVIDIA/cccl/pull/476
sortable
concept by @miscco in https://github.com/NVIDIA/cccl/pull/471
thrust::complex
as a struct derived from cuda::std::complex
by @Blonck in https://github.com/NVIDIA/cccl/pull/454
mergeable
concept by @miscco in https://github.com/NVIDIA/cccl/pull/484
move_sentinel
by @miscco in https://github.com/NVIDIA/cccl/pull/496
unreachable_sentinel
by @miscco in https://github.com/NVIDIA/cccl/pull/506
__bounded_iter
by @miscco in https://github.com/NVIDIA/cccl/pull/540
cuda::device::barrier_expect_tx
by @ahendriksen in https://github.com/NVIDIA/cccl/pull/498
cp.async.bulk
global -> shared support to cuda::memcpy_async
by @ahendriksen in https://github.com/NVIDIA/cccl/pull/501
thrust::tuple
implementation with cuda::std::tuple
by @miscco in https://github.com/NVIDIA/cccl/pull/262
ranges::advance
by @miscco in https://github.com/NVIDIA/cccl/pull/546
ranges::next
by @miscco in https://github.com/NVIDIA/cccl/pull/611
cuda::ptx::st_async
by @github-actions in https://github.com/NVIDIA/cccl/pull/1093
cuda::ptx::red_async
by @github-actions in https://github.com/NVIDIA/cccl/pull/1094
cuda::ptx:mbarrier_{try/test}_wait{_parity}
by @github-actions in https://github.com/NVIDIA/cccl/pull/1106
cuda::ptx::red.async
for int32_t types by @github-actions in https://github.com/NVIDIA/cccl/pull/1107
ptx.st.async.compile.pass.cpp
failing in C++11. by @github-actions in https://github.com/NVIDIA/cccl/pull/1136
_LIBCUDACXX_UNREACHABLE
for old MSVC (#1114) by @miscco in https://github.com/NVIDIA/cccl/pull/1143
Full Changelog: https://github.com/NVIDIA/cccl/compare/v2.2.0...2.3.0
Published by jrhemstad about 1 year ago
sccache
version by @ajschmidt8 in https://github.com/NVIDIA/cccl/pull/19
11.5.1
containers by @ajschmidt8 in https://github.com/NVIDIA/cccl/pull/20
tuple_cat
for std::
qualified types by @miscco in https://github.com/NVIDIA/cccl/pull/144
CMAKE_CUDA_COMPILER_LAUNCHER
for ccache support by @miscco in https://github.com/NVIDIA/cccl/pull/152
atomic_ref
by @miscco in https://github.com/NVIDIA/cccl/pull/164
destructible
by @miscco in https://github.com/NVIDIA/cccl/pull/157
iter_move
CPO by @miscco in https://github.com/NVIDIA/cccl/pull/197
skip-tests
is part of the latest commit message by @miscco in https://github.com/NVIDIA/cccl/pull/224
cudaMemcpyAsync
for thrust::copy
by @miscco in https://github.com/NVIDIA/cccl/pull/211
cub::Device{Select, Partition}
tests to catch2 by @miscco in https://github.com/NVIDIA/cccl/pull/229
std::aligned_storage
when building with c++23 by @miscco in https://github.com/NVIDIA/cccl/pull/258
cub::DeviceRunLengthEncode::NonTrivialRuns
by @senior-zero in https://github.com/NVIDIA/cccl/pull/294
cudacc
and nvcc
mixup. by @wmaxey in https://github.com/NVIDIA/cccl/pull/329
destructible
concept on MSVC by @miscco in https://github.com/NVIDIA/cccl/pull/333
Full Changelog: https://github.com/NVIDIA/cccl/commits/v2.2.0