SYCL accelerated BLAKE3 Hash Implementation
MIT License
Statistics for this project are still being loaded, please check back later.
Abstraction Library for Parallel Kernel Acceleration
Weighted MinHash implementation on CUDA (multi-gpu).
BQN virtual machine
High-Performance Rendering Framework on Stream Architectures
Achieve peak performance on x86 CPUs and NVIDIA GPUs
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mi...
Some CUDA design patterns and a bit of template magic for CUDA
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Rust bindings to the NVIDIA NVBIT binary instrumentation API
Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations
CUDA C++ Core Libraries
Low-latency CUDA JPEG decoder by parallelizing Huffman decoding
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofl...
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the ...
Agenium Scale vectorization library for CPUs and GPUs