cccl | Cuda Ecosystem Directory

cccl - v2.3.2 Latest Release

Published by wmaxey 7 months ago

What's Changed

[BACKPORT]: Silence some static asserts in ptx helpers (#1257) by @miscco in https://github.com/NVIDIA/cccl/pull/1284
[BACKPORT]: Ensure that pair is trivially copyable (#1249) by @miscco in https://github.com/NVIDIA/cccl/pull/1292
[BACKPORT]: Properly test internal headers (#1258) by @miscco in https://github.com/NVIDIA/cccl/pull/1299
[Backport]: Fix errors when find_package(CCCL) is called twice. (#1157) by @miscco in https://github.com/NVIDIA/cccl/pull/1298
[BACKPORT] Fix MSVC issues (#1261) by @miscco in https://github.com/NVIDIA/cccl/pull/1297
[backport] thrust/mr: fix the case of reuising a block for a smaller alloc. (#1232) by @griwes in https://github.com/NVIDIA/cccl/pull/1317
[BACKPORT]: Fix ptx usage to account for PTX ISA availability (#1359) by @miscco in https://github.com/NVIDIA/cccl/pull/1421
Create patch 2.3.2 by @wmaxey in https://github.com/NVIDIA/cccl/pull/1530

Full Changelog: https://github.com/NVIDIA/cccl/compare/v2.3.1...v2.3.2

cccl - CCCL 2.3.0

Published by wmaxey 8 months ago

What’s New

In addition to various fixes and documentation improvements, the following notable improvements have been made to Thrust, CUB, and libcudacxx.

System Headers and Warnings

Users don't want to see warnings from CCCL headers. The typical way to accomplish this with header libraries is to use -isystem. However, this causes problems when using CCCL from GitHub, it will conflict with the CCCL headers in the CTK. Therefore, you should always include CCCL headers via -I.

To achieve the same effect as -isystem, CCCL headers will now use the system_header pragma. For more information, see https://github.com/NVIDIA/cccl/issues/527.

TL;DR: You should never see warnings emitted from a CCCL header ever again!

Linkage Issues

Using CUB and Thrust in shared libraries is a known source of issues. Previously, the solution to these issues consisted of using the THRUST_CUB_WRAPPED_NAMESPACE macro so that different shared libraries have different symbol names. However, this solution has poor discoverability, since issues present themselves in forms of segmentation faults, hangs, wrong results, etc. As of the 2.3 release, linkage issues are addressed by default without the need for THRUST_CUB_WRAPPED_NAMESPACE. Although the fix is API compatible, it might cause ABI compatibility issues. For more details, see issue #443.

Thrust

thrust::tuple, thrust::pair, and thrust::complex have been replaced with cuda::std alternatives. This can be a breaking change, but should be source compatible.

CUB

Up to 60% performance improvements of cub::DeviceSelect::UniqueByKey, cub::DeviceScan::ExclusiveSumByKey, and cub::DeviceReduce::ReduceByKey on A100. cub::DeviceSegmentedReduce now supports 64-bit indexing.

libcudacxx

The cuda::ptx namespace and <cuda/ptx> header is now available and provides access to various inline PTX functions that enumerate various async memcpy and barrier intrinsics.
#379 - Added experimental bulk TMA memcpy under <cuda/barrier>

What's Changed

Port cub::DeviceSegmentedReduce tests to catch2 by @elstehle in https://github.com/NVIDIA/cccl/pull/303
Branch/2.2.x by @gevtushenko in https://github.com/NVIDIA/cccl/pull/305
Tune unique by key on A100 by @gevtushenko in https://github.com/NVIDIA/cccl/pull/306
Merge branch/2.2.x to main by @jrhemstad in https://github.com/NVIDIA/cccl/pull/308
Add example cmake project by @jrhemstad in https://github.com/NVIDIA/cccl/pull/177
Adds catch2 tests for reduce-by-key by @elstehle in https://github.com/NVIDIA/cccl/pull/311
Tune scan by key on A100 by @gevtushenko in https://github.com/NVIDIA/cccl/pull/325
Replace diag_suppress by nv_diag_suppress in documentation by @ahendriksen in https://github.com/NVIDIA/cccl/pull/281
Fix MSVC / CUB tests build by @gevtushenko in https://github.com/NVIDIA/cccl/pull/336
gdb pretty printer: handle non-cuda device vectors by @siboehm in https://github.com/NVIDIA/cccl/pull/264
Add a nvrtc configuration for libcu++ by @miscco in https://github.com/NVIDIA/cccl/pull/202
GH Infra: project automation and issue template fixes by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/297
Tune reduce by key on A100 by @gevtushenko in https://github.com/NVIDIA/cccl/pull/346
Merge commits from 2.2 branch by @miscco in https://github.com/NVIDIA/cccl/pull/350
Fix a shadow warning in thrust's execute_with_dependencies.h by @hageboeck in https://github.com/NVIDIA/cccl/pull/334
Assorted fixes for MSVC 2017 by @miscco in https://github.com/NVIDIA/cccl/pull/341
[skip-tests] Guard inline variables with _LIBCUDACXX_INLINE_VAR macro by @miscco in https://github.com/NVIDIA/cccl/pull/355
Port cub::DeviceScan tests to catch2 by @elstehle in https://github.com/NVIDIA/cccl/pull/347
Remove _NOEXCEPT macro in favor of noexcept in libcu++ by @Blonck in https://github.com/NVIDIA/cccl/pull/349
Project Automation: add conditional steps due to context errors by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/353
Work around strange gcc bug by @miscco in https://github.com/NVIDIA/cccl/pull/363
Implement iter_swap CPO by @miscco in https://github.com/NVIDIA/cccl/pull/332
Replace default, constexpr, and delete macros by original keywords by @Blonck in https://github.com/NVIDIA/cccl/pull/360
Add clang16 devcontainer and CI job by @miscco in https://github.com/NVIDIA/cccl/pull/362
[skip-tests] Skip merge conflict from old iter_swap PR by @miscco in https://github.com/NVIDIA/cccl/pull/369
[skip-tests] Also skip all CI runs that require a GPU when [skip-tests] is set by @miscco in https://github.com/NVIDIA/cccl/pull/370
Remove _LIBCUDACXX_CXX03_LANG macro and all encapsulated code by @Blonck in https://github.com/NVIDIA/cccl/pull/368
Remove checks against _LIBCUDACXX_STD_VER < 11 by @Blonck in https://github.com/NVIDIA/cccl/pull/375
Use copy-pr-bot by @ajschmidt8 in https://github.com/NVIDIA/cccl/pull/381
Implement the permutable concept by @miscco in https://github.com/NVIDIA/cccl/pull/367
[NFC] We missed some _NOEXCEPT_ macro uses by @miscco in https://github.com/NVIDIA/cccl/pull/371
Implement identity changes for c++20 by @miscco in https://github.com/NVIDIA/cccl/pull/383
Hide third party cmake options in our cmake developer builds. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/300
Port cub::DeviceScanByKey tests to Catch2 by @elstehle in https://github.com/NVIDIA/cccl/pull/380
Fixes a race in DeviceRunLengthEncode::NonTrivialRuns by @elstehle in https://github.com/NVIDIA/cccl/pull/399
Add commit information to the test output by @miscco in https://github.com/NVIDIA/cccl/pull/401
Project Automation: Handle PRs opened as non-draft + multiple bug fixes by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/387
Project Automation: set Roadmap project value on issue/pr close and Auto-type new issues by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/389
Add support for tests that should fail at runtime by @ahendriksen in https://github.com/NVIDIA/cccl/pull/418
Port DeviceAdjacentDifference::SubtractRight tests to catch2 by @miscco in https://github.com/NVIDIA/cccl/pull/390
Project automation - Fix indentation for continue-on-error by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/425
[BUG] Ensure that all headers build on their own by @miscco in https://github.com/NVIDIA/cccl/pull/200
Remove util_device.cuh from iterator headers to enable online compilation by @leofang in https://github.com/NVIDIA/cccl/pull/412
Fix ci-overview example by @gevtushenko in https://github.com/NVIDIA/cccl/pull/428
Port cub::DeviceRunLengthEncode tests to catch2 by @miscco in https://github.com/NVIDIA/cccl/pull/411
Add cuda::device::barrier_arrive tx by @ahendriksen in https://github.com/NVIDIA/cccl/pull/358
Fix CubDebug by @gevtushenko in https://github.com/NVIDIA/cccl/pull/430
Do not use static member functions to initialize static member variables. by @miscco in https://github.com/NVIDIA/cccl/pull/438
Implement the projected helper struct by @miscco in https://github.com/NVIDIA/cccl/pull/385
Add PTX wrapping functions for TMA features by @ahendriksen in https://github.com/NVIDIA/cccl/pull/379
Clarify docstring for num_items parameter of DeviceSegmentedRadixSort by @HapeMask in https://github.com/NVIDIA/cccl/pull/320
Enable lit to determine the compute architectures by @miscco in https://github.com/NVIDIA/cccl/pull/447
Add NVRTC_SKIP_KERNEL_RUN tag to compile, but skip running NVRTC test by @ahendriksen in https://github.com/NVIDIA/cccl/pull/434
Improve documentation of cuda::barrier by @ahendriksen in https://github.com/NVIDIA/cccl/pull/440
Extend thrust::complex unit tests to prepare for upcoming replacement with std::complex by @Blonck in https://github.com/NVIDIA/cccl/pull/413
Remove having two install rules for -header-search.cmake by @robertmaynard in https://github.com/NVIDIA/cccl/pull/298
Run .devcontainer/launch.sh with bash + add error checking by @wence- in https://github.com/NVIDIA/cccl/pull/407
Remove C++03 compatability from unit tests by @Blonck in https://github.com/NVIDIA/cccl/pull/378
[libcu++] Fix use of __ppc64__ by @miscco in https://github.com/NVIDIA/cccl/pull/451
Update the README by @jrhemstad in https://github.com/NVIDIA/cccl/pull/291
[libcu++] Try to avoid gcc misscompilation issues by @miscco in https://github.com/NVIDIA/cccl/pull/452
Consolidate matrix logic into single script/job by @jrhemstad in https://github.com/NVIDIA/cccl/pull/361
Implement the indirectly_comparable concept by @miscco in https://github.com/NVIDIA/cccl/pull/445
Fix compute matrix dropping trailing zeros by @jrhemstad in https://github.com/NVIDIA/cccl/pull/466
Avoid integer promotion warnings with MSVC by @miscco in https://github.com/NVIDIA/cccl/pull/460
Implement ranges comparison objects by @miscco in https://github.com/NVIDIA/cccl/pull/464
Fix CUB/MSVC/RDC tests by @gevtushenko in https://github.com/NVIDIA/cccl/pull/469
Fix Thrust/CUB Linkage Issues by @gevtushenko in https://github.com/NVIDIA/cccl/pull/443
Script for Running CUB Benchmarks by @gevtushenko in https://github.com/NVIDIA/cccl/pull/472
[skip ci] Add list of CCCL users to README by @jrhemstad in https://github.com/NVIDIA/cccl/pull/474
constexpr all the things by @pb-dseifert in https://github.com/NVIDIA/cccl/pull/476
Add Gonzalo/Allard to trustees by @jrhemstad in https://github.com/NVIDIA/cccl/pull/482
Implement the sortable concept by @miscco in https://github.com/NVIDIA/cccl/pull/471
[libcu++] Add _LIBCUDACXX_CUDACC_BELOW_12_3 macro by @gonzalobg in https://github.com/NVIDIA/cccl/pull/479
Refactor thrust::complex as a struct derived from cuda::std::complex by @Blonck in https://github.com/NVIDIA/cccl/pull/454
Add ci scripts for windows by @miscco in https://github.com/NVIDIA/cccl/pull/251
Enable complex interop on MSVC by @miscco in https://github.com/NVIDIA/cccl/pull/490
[skip ci] Add related projects to readme. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/492
Reenable nvrtc tests by @miscco in https://github.com/NVIDIA/cccl/pull/488
Implement the mergeable concept by @miscco in https://github.com/NVIDIA/cccl/pull/484
64-bit indexing for DeviceSegmentedReduce by @jecs in https://github.com/NVIDIA/cccl/pull/414
Implement move_sentinel by @miscco in https://github.com/NVIDIA/cccl/pull/496
Support skipped benches in run script by @gevtushenko in https://github.com/NVIDIA/cccl/pull/508
Implement unreachable_sentinel by @miscco in https://github.com/NVIDIA/cccl/pull/506
Disable flaky barrier tests by @miscco in https://github.com/NVIDIA/cccl/pull/510
Add constant initialization of managed variable to silence gcc warning by @miscco in https://github.com/NVIDIA/cccl/pull/509
Add verbose flag to ninja build. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/491
Add devcontainer readme by @jrhemstad in https://github.com/NVIDIA/cccl/pull/481
Add contributor guide by @jrhemstad in https://github.com/NVIDIA/cccl/pull/500
[skip ci] Fix devcontainer guide link by @jrhemstad in https://github.com/NVIDIA/cccl/pull/518
[skip ci] Add example godbolt link. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/519
Replace cuda::atomic with legacy functions for old arch compatibility. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/516
Simplify examples matrix. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/517
Disable PR workflow triggering on pushes to main. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/532
Add CI job to verify devcontainers are always up to date by @jrhemstad in https://github.com/NVIDIA/cccl/pull/514
[CI] Sink error when git repo is missing from build. by @wmaxey in https://github.com/NVIDIA/cccl/pull/533
Rework our tuple implementation to work with older MSVC by @miscco in https://github.com/NVIDIA/cccl/pull/530
Add jobs using clang as CUDA compiler by @jrhemstad in https://github.com/NVIDIA/cccl/pull/493
Remove cudaDeviceSetSharedMemConfig from CUB tests by @gevtushenko in https://github.com/NVIDIA/cccl/pull/538
Implement __bounded_iter by @miscco in https://github.com/NVIDIA/cccl/pull/540
Fix cub::BlockAdjacentDifference documentation by @pauleonix in https://github.com/NVIDIA/cccl/pull/542
Add cuda::device::memcpy_async_tx by @ahendriksen in https://github.com/NVIDIA/cccl/pull/405
Introduce Thrust benchmarks by @gevtushenko in https://github.com/NVIDIA/cccl/pull/534
Fix MSVC benchmarks build by @gevtushenko in https://github.com/NVIDIA/cccl/pull/536
Fix nvc++ as host compiler by @gevtushenko in https://github.com/NVIDIA/cccl/pull/560
Add missing overload definition of thrust::complex operator!= by @srinivasyadav18 in https://github.com/NVIDIA/cccl/pull/564
Make template parameters consistent in thrust::complex operators by @srinivasyadav18 in https://github.com/NVIDIA/cccl/pull/555
Migrate CI configs to CMake presets. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/324
Replace thrust::detail::integral_constant with libcudacxx implementation by @ZelboK in https://github.com/NVIDIA/cccl/pull/561
Add cuda::device::barrier_expect_tx by @ahendriksen in https://github.com/NVIDIA/cccl/pull/498
Add ARM build configs for latest gcc/clang. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/468
Fea/486 Improve thrust::complex operators compile time throughput by @srinivasyadav18 in https://github.com/NVIDIA/cccl/pull/567
Define compiler env vars for CMake in dev containers. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/576
Revert back to working nvbench commit by @miscco in https://github.com/NVIDIA/cccl/pull/582
use clang-format in dev containers by @miscco in https://github.com/NVIDIA/cccl/pull/513
Introduce CCCL clang-format by @gevtushenko in https://github.com/NVIDIA/cccl/pull/551
Add cp.async.bulk global -> shared support to cuda::memcpy_async by @ahendriksen in https://github.com/NVIDIA/cccl/pull/501
[skip ci] Also update the base image by @miscco in https://github.com/NVIDIA/cccl/pull/584
Replace thrust::tuple implementation with cuda::std::tuple by @miscco in https://github.com/NVIDIA/cccl/pull/262
Fix clangd integration by @gevtushenko in https://github.com/NVIDIA/cccl/pull/588
Always treat CCCL as system headers by @miscco in https://github.com/NVIDIA/cccl/pull/531
Refactor inline comments by @gevtushenko in https://github.com/NVIDIA/cccl/pull/581
Relax Catch2 include order requirements by @gevtushenko in https://github.com/NVIDIA/cccl/pull/601
Project Automation - Fix issue/pr sync workflow by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/504
[skip-tests] Add a preset that builds all configs of all projects. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/580
Implement ranges::advance by @miscco in https://github.com/NVIDIA/cccl/pull/546
Update status check job to check status of precursor jobs by @jrhemstad in https://github.com/NVIDIA/cccl/pull/605
Report times for libcudacxx tests in CI by @jrhemstad in https://github.com/NVIDIA/cccl/pull/606
Fix bug in the construct_at optimization by @miscco in https://github.com/NVIDIA/cccl/pull/608
[skip-tests] Disable rdc tests for windows. by @miscco in https://github.com/NVIDIA/cccl/pull/615
Implement ranges::next by @miscco in https://github.com/NVIDIA/cccl/pull/611
Support FP8 in radix sort by @gevtushenko in https://github.com/NVIDIA/cccl/pull/623
Fix examples/cccl_infra mixup in ci. by @wmaxey in https://github.com/NVIDIA/cccl/pull/633
Fixes block-scope run-length decode one-past-the-end memory access into smem TempStorage by @elstehle in https://github.com/NVIDIA/cccl/pull/626
Harmonize CUB includes by @gevtushenko in https://github.com/NVIDIA/cccl/pull/632
Create NVRTCC, a utility for running tests under NVRTC by @wmaxey in https://github.com/NVIDIA/cccl/pull/494
Fix typo and grammar errors by @VaibhavWakde52 in https://github.com/NVIDIA/cccl/pull/639
[Backport branch/2.3.x] Add CCCL_VERSION and script for updating version by @github-actions in https://github.com/NVIDIA/cccl/pull/667
Backport 574 ptx by @miscco in https://github.com/NVIDIA/cccl/pull/663
[Backport branch/2.3.x] Fix C++11 support of recently added tests by @github-actions in https://github.com/NVIDIA/cccl/pull/658
[Backport branch/2.3.x] Update CUDA newest to CTK 12.3 by @github-actions in https://github.com/NVIDIA/cccl/pull/1072
[Backport to branch/2.3.x] Rework our system header approach to be more error proof (#661) by @miscco in https://github.com/NVIDIA/cccl/pull/675
[Backport branch/2.3.x] Fix fallback when checking git repo by @github-actions in https://github.com/NVIDIA/cccl/pull/1086
[Backport branch/2.3.x] Currently the verbose option does not work beacuse of a typo in the argument handling by @github-actions in https://github.com/NVIDIA/cccl/pull/1090
[Backport branch/2.3.x] Add cuda::ptx::st_async by @github-actions in https://github.com/NVIDIA/cccl/pull/1093
[Backport branch/2.3.x] Add cuda::ptx::red_async by @github-actions in https://github.com/NVIDIA/cccl/pull/1094
Backport PR #1075 by @wmaxey in https://github.com/NVIDIA/cccl/pull/1100
[Backport branch/2.3.x] Add cuda::ptx:mbarrier_{try/test}_wait{_parity} by @github-actions in https://github.com/NVIDIA/cccl/pull/1106
[Backport branch/2.3.x] Fix cuda::ptx::red.async for int32_t types by @github-actions in https://github.com/NVIDIA/cccl/pull/1107
[Backport branch/2.3.x] Fix local test runs with lit by @github-actions in https://github.com/NVIDIA/cccl/pull/1110
[Backport branch/2.3.x] Fix config when only non-CDPv1 arches are enabled. by @github-actions in https://github.com/NVIDIA/cccl/pull/1111
[Backport branch/2.3.x] Fix GCC6 / FP8 warning by @github-actions in https://github.com/NVIDIA/cccl/pull/1131
[Backport branch/2.3.x] Fix ptx.st.async.compile.pass.cpp failing in C++11. by @github-actions in https://github.com/NVIDIA/cccl/pull/1136
BACKPORT: Fix _LIBCUDACXX_UNREACHABLE for old MSVC (#1114) by @miscco in https://github.com/NVIDIA/cccl/pull/1143
[2.3.x] Backport benchmarking PRs by @wmaxey in https://github.com/NVIDIA/cccl/pull/1168
Backport P0 filter commit. by @wmaxey in https://github.com/NVIDIA/cccl/pull/1172
[BACKPORT] Implement math functions for thrust::complex by @miscco in https://github.com/NVIDIA/cccl/pull/1191
Backport fix icc / cub (#1152) by @wmaxey in https://github.com/NVIDIA/cccl/pull/1171
[BACKPORT]: Fix availability of is_constant_evaluated on old MSVC by @miscco in https://github.com/NVIDIA/cccl/pull/1198
[BACKPORT] Add icc to the ci matrix by @miscco in https://github.com/NVIDIA/cccl/pull/1209
[BACKPORT]: Add missing overloads for thrust::pow by @miscco in https://github.com/NVIDIA/cccl/pull/1223

New Contributors

@siboehm made their first contribution in https://github.com/NVIDIA/cccl/pull/264
@hageboeck made their first contribution in https://github.com/NVIDIA/cccl/pull/334
@Blonck made their first contribution in https://github.com/NVIDIA/cccl/pull/349
@leofang made their first contribution in https://github.com/NVIDIA/cccl/pull/412
@HapeMask made their first contribution in https://github.com/NVIDIA/cccl/pull/320
@jecs made their first contribution in https://github.com/NVIDIA/cccl/pull/414
@pauleonix made their first contribution in https://github.com/NVIDIA/cccl/pull/542
@srinivasyadav18 made their first contribution in https://github.com/NVIDIA/cccl/pull/564
@ZelboK made their first contribution in https://github.com/NVIDIA/cccl/pull/561
@VaibhavWakde52 made their first contribution in https://github.com/NVIDIA/cccl/pull/639

Full Changelog: https://github.com/NVIDIA/cccl/compare/v2.2.0...2.3.0

cccl - CCCL 2.2.0

Published by jrhemstad about 1 year ago

What's Changed

Add axis for docker builds by @raydouglass in https://github.com/NVIDIA/cccl/pull/1
Docker: Add support for ICPC and NVC++, install newer CMake, and add curl by @brycelelbach in https://github.com/NVIDIA/cccl/pull/4
Update excludes by @raydouglass in https://github.com/NVIDIA/cccl/pull/5
Docker: OS and CUDA upgrades, support for additional configurations by @brycelelbach in https://github.com/NVIDIA/cccl/pull/9
Docker: Add Thrust/CUB documentation toolchain to Ubuntu docker images by @brycelelbach in https://github.com/NVIDIA/cccl/pull/15
Re-enable CentOS images. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/16
Add sccache to dockerfile by @msadang in https://github.com/NVIDIA/cccl/pull/17
Update base containers. by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/18
Update sccache version by @ajschmidt8 in https://github.com/NVIDIA/cccl/pull/19
Build 11.5.1 containers by @ajschmidt8 in https://github.com/NVIDIA/cccl/pull/20
Add ops-bot.yaml by @jrhemstad in https://github.com/NVIDIA/cccl/pull/80
Monorepo workflow by @jrhemstad in https://github.com/NVIDIA/cccl/pull/99
Add devcontainers by @jrhemstad in https://github.com/NVIDIA/cccl/pull/105
Update the libcu++ submodule by @miscco in https://github.com/NVIDIA/cccl/pull/109
Update libcudaxx again by @miscco in https://github.com/NVIDIA/cccl/pull/110
Remove submodules from CI workflow by @jrhemstad in https://github.com/NVIDIA/cccl/pull/115
Fix CUB CI by @senior-zero in https://github.com/NVIDIA/cccl/pull/114
Fix async scan / counting iterator tests by @senior-zero in https://github.com/NVIDIA/cccl/pull/118
Make sccache work locally by @jrhemstad in https://github.com/NVIDIA/cccl/pull/113
Fix compilation of thrust and cub by @miscco in https://github.com/NVIDIA/cccl/pull/120
Fix segfault in cub::CachingDeviceAllocator by @senior-zero in https://github.com/NVIDIA/cccl/pull/119
Initial GH Infra Setup by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/23
Visualize variant space coverage by @senior-zero in https://github.com/NVIDIA/cccl/pull/125
Fix broken issue templates by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/124
Tune scan by key for SM90 by @senior-zero in https://github.com/NVIDIA/cccl/pull/121
Update PR template to more explicitly prompt for a linked issue closed by the PR by @jrhemstad in https://github.com/NVIDIA/cccl/pull/134
Change component section to more general "area" by @jrhemstad in https://github.com/NVIDIA/cccl/pull/132
Try and fix CI for old CTK by @miscco in https://github.com/NVIDIA/cccl/pull/116
Fix tuple_cat for std:: qualified types by @miscco in https://github.com/NVIDIA/cccl/pull/144
Add ccache to lit invocation by @miscco in https://github.com/NVIDIA/cccl/pull/147
Benchmark batched memcpy by @senior-zero in https://github.com/NVIDIA/cccl/pull/136
Properly querry CMAKE_CUDA_COMPILER_LAUNCHER for ccache support by @miscco in https://github.com/NVIDIA/cccl/pull/152
Implement Three-Way Partition Tuning / Benchmark by @senior-zero in https://github.com/NVIDIA/cccl/pull/155
Port three-way partition to use Catch2 by @senior-zero in https://github.com/NVIDIA/cccl/pull/156
Add gcc-6 to the test matrix by @miscco in https://github.com/NVIDIA/cccl/pull/160
Tune reduce / unique by key for SM90 by @senior-zero in https://github.com/NVIDIA/cccl/pull/163
Remove unused folders by @miscco in https://github.com/NVIDIA/cccl/pull/145
Fix documentation of atomic_ref by @miscco in https://github.com/NVIDIA/cccl/pull/164
New iterator traits by @miscco in https://github.com/NVIDIA/cccl/pull/158
Improve implementation of destructible by @miscco in https://github.com/NVIDIA/cccl/pull/157
Build script improvements by @jrhemstad in https://github.com/NVIDIA/cccl/pull/149
Fix icpc / denormals by @senior-zero in https://github.com/NVIDIA/cccl/pull/185
Enable tests by @jrhemstad in https://github.com/NVIDIA/cccl/pull/167
Monorepo by @jrhemstad in https://github.com/NVIDIA/cccl/pull/194
Multi-benchmark tuning by @senior-zero in https://github.com/NVIDIA/cccl/pull/208
Fixes universal_vector test failure on CTK 11.1 & gcc-6 by @elstehle in https://github.com/NVIDIA/cccl/pull/209
Delete several directories for older CI infra. by @wmaxey in https://github.com/NVIDIA/cccl/pull/218
Memory-safe radix sort test by @senior-zero in https://github.com/NVIDIA/cccl/pull/222
[FEA] Implement iter_move CPO by @miscco in https://github.com/NVIDIA/cccl/pull/197
Build cub benchmarks in build_cub.sh by @jrhemstad in https://github.com/NVIDIA/cccl/pull/216
[skip-tests] Do not run tests when skip-tests is part of the latest commit message by @miscco in https://github.com/NVIDIA/cccl/pull/224
Factor out build job logic into a "run-as-coder" reusable workflow. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/205
Fix instances of 'scan' copy-pasted into reduction documentation by @milesvant in https://github.com/NVIDIA/cccl/pull/221
Add clangd to devcontainer by @senior-zero in https://github.com/NVIDIA/cccl/pull/225
Add initial CODEOWNERS file by @jrhemstad in https://github.com/NVIDIA/cccl/pull/226
Attempt to fix codeowners by @jrhemstad in https://github.com/NVIDIA/cccl/pull/231
Make libcudacxx respect CMake options for CUDA archs. by @wmaxey in https://github.com/NVIDIA/cccl/pull/235
Optimize Three-Way Partition by @senior-zero in https://github.com/NVIDIA/cccl/pull/228
[BUG] Rework how we handle feature test macros by @miscco in https://github.com/NVIDIA/cccl/pull/195
Enable use of cudaMemcpyAsync for thrust::copy by @miscco in https://github.com/NVIDIA/cccl/pull/211
Enable additional arguments in build_common.sh by @wmaxey in https://github.com/NVIDIA/cccl/pull/236
[BUG] Properly uglify all qualifiers in product headers by @miscco in https://github.com/NVIDIA/cccl/pull/201
Port cub::Device{Select, Partition} tests to catch2 by @miscco in https://github.com/NVIDIA/cccl/pull/229
Fix CUB tests / MSVC 2022 by @senior-zero in https://github.com/NVIDIA/cccl/pull/255
Ensure that any CMake re-rooting doesn't break our find_file by @miscco in https://github.com/NVIDIA/cccl/pull/257
[BUG] Fix compilation issues with MSVC 2017 by @miscco in https://github.com/NVIDIA/cccl/pull/196
Implement iterator concepts by @miscco in https://github.com/NVIDIA/cccl/pull/223
Tune Histogram on H100 by @senior-zero in https://github.com/NVIDIA/cccl/pull/266
Add WarpExchangeAlgorithm customization for WarpExchange class by @pb-dseifert in https://github.com/NVIDIA/cccl/pull/256
[BUG]: Avoid deprecation warning for std::aligned_storage when building with c++23 by @miscco in https://github.com/NVIDIA/cccl/pull/258
Port cub::DeviceReduce tests to catch2 by @elstehle in https://github.com/NVIDIA/cccl/pull/267
Add support for nvcc-specific matrix. by @jrhemstad in https://github.com/NVIDIA/cccl/pull/243
Fix anchor link to cooperative groups in CUDA programming guide by @wence- in https://github.com/NVIDIA/cccl/pull/274
Fix BibTeX syntax in CITATION.md [skip-tests] by @wence- in https://github.com/NVIDIA/cccl/pull/276
Enforce C++17 for benches by @senior-zero in https://github.com/NVIDIA/cccl/pull/275
Project Automation: Move PR and Linked Issues to In Progress by @jarmak-nv in https://github.com/NVIDIA/cccl/pull/170
Update to 23.08 devcontainers and CUDA 12.2 by @jrhemstad in https://github.com/NVIDIA/cccl/pull/270
[skip-tests] CTK 12.2 tuning image by @senior-zero in https://github.com/NVIDIA/cccl/pull/282
Fix single-thread block reduction by @senior-zero in https://github.com/NVIDIA/cccl/pull/287
Tune Select and Partition on A100 by @senior-zero in https://github.com/NVIDIA/cccl/pull/289
Fix CUB tests / MSVC by @senior-zero in https://github.com/NVIDIA/cccl/pull/292
Allow building CUB tests without cuRand by @senior-zero in https://github.com/NVIDIA/cccl/pull/250
Fixup to CUB build - s/curand/cudart/ by @wmaxey in https://github.com/NVIDIA/cccl/pull/301
Fix OOB in cub::DeviceRunLengthEncode::NonTrivialRuns by @senior-zero in https://github.com/NVIDIA/cccl/pull/294
Tune RLE on A100 by @senior-zero in https://github.com/NVIDIA/cccl/pull/295
Tune scan on A100 by @senior-zero in https://github.com/NVIDIA/cccl/pull/302
Add new CCCL:: CMake targets by @allisonvacanti in https://github.com/NVIDIA/cccl/pull/244
Fix cudacc and nvcc mixup. by @wmaxey in https://github.com/NVIDIA/cccl/pull/329
[skip-tests] Use builtin for destructible concept on MSVC by @miscco in https://github.com/NVIDIA/cccl/pull/333
Fix merge conflict from two inflight PRs by @miscco in https://github.com/NVIDIA/cccl/pull/338

New Contributors

@raydouglass made their first contribution in https://github.com/NVIDIA/cccl/pull/1
@brycelelbach made their first contribution in https://github.com/NVIDIA/cccl/pull/4
@msadang made their first contribution in https://github.com/NVIDIA/cccl/pull/17
@wmaxey made their first contribution in https://github.com/NVIDIA/cccl/pull/218
@milesvant made their first contribution in https://github.com/NVIDIA/cccl/pull/221
@pb-dseifert made their first contribution in https://github.com/NVIDIA/cccl/pull/256
@wence- made their first contribution in https://github.com/NVIDIA/cccl/pull/274

Full Changelog: https://github.com/NVIDIA/cccl/commits/v2.2.0

Package Rankings

Top 6.75% on Proxy.golang.org

Badges

Extracted from project README

Related Projects

nvidia-gpu-ml-library-test

Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being p...

04 Jan 2021 15

https://github.com/lebedov/scikit-cuda

Python interface to GPU-powered libraries

27 Sep 2010 986

awesome-gpgpu

A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources

20 Jun 2018 63

CuVec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

16 Jan 2021 80

Arch-Data-Science

Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision

26 Jan 2017 93

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

24 Jul 2023 289

https://github.com/open-atmos/PySDM

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package wi...

26 Jul 2019 45

https://github.com/PennyLaneAI/pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with...

06 Jul 2020 82

awesome-rust-list

This repository lists some awesome public Rust projects, Videos, Blogs and Jobs.

07 Aug 2022 34

spbla

Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations

18 Feb 2021 14

https://github.com/MrNeRF/gaussian-splatting-cuda

3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!

30 Jul 2023 862

extending-jax

Extending JAX with custom C++ and CUDA code

07 Jan 2021 368

RAPIDS.jl

An unofficial Julia wrapper for the RAPIDS.ai ecosystem using PythonCall.jl

20 Mar 2022 17

librapid

A highly optimised C++ library for mathematical applications and neural networks.

25 May 2021 163

https://github.com/rapidsai/node

GPU-accelerated data science and visualization in node

01 Jun 2020 179