Achieve peak performance on x86 CPUs and NVIDIA GPUs
GPL-2.0 License
Statistics for this project are still being loaded, please check back later.
Abstraction Library for Parallel Kernel Acceleration
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mi...
SYCL accelerated BLAKE3 Hash Implementation
Simple experimental async GPGPU framework for Rust
Some CUDA design patterns and a bit of template magic for CUDA
BQN virtual machine
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
CUDA C++ Core Libraries
An architecture for LLMs' continual-learning and long-term memories
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
ILGPU JIT Compiler for high-performance .Net GPU programs
Agenium Scale vectorization library for CPUs and GPUs
Extending JAX with custom C++ and CUDA code
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofl...