A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
OTHER License
Published by ProjectPhysX almost 2 years ago
Abstraction Library for Parallel Kernel Acceleration
Execute a subset of Python on HPC platforms
SYCL accelerated BLAKE3 Hash Implementation
AutoDock for GPUs and other accelerators
Achieve peak performance on x86 CPUs and NVIDIA GPUs
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mi...
Real-time large scale dense visual SLAM system
Real-time dense visual SLAM system
Some CUDA design patterns and a bit of template magic for CUDA
A small utility for getting some info post-hoc about a program's run.
CUDA C++ Core Libraries
Extending JAX with custom C++ and CUDA code
BQN virtual machine
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package wi...
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources