A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
OTHER License
Statistics for this project are still being loaded, please check back later.
CUDA C++ Core Libraries
Some CUDA design patterns and a bit of template magic for CUDA
SYCL accelerated BLAKE3 Hash Implementation
A small utility for getting some info post-hoc about a program's run.
Execute a subset of Python on HPC platforms
Real-time dense visual SLAM system
Abstraction Library for Parallel Kernel Acceleration
Extending JAX with custom C++ and CUDA code
Achieve peak performance on x86 CPUs and NVIDIA GPUs
AutoDock for GPUs and other accelerators
Real-time large scale dense visual SLAM system
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package wi...
BQN virtual machine
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mi...