Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
OTHER License
High-Performance Rendering Framework on Stream Architectures
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mi...
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being p...
Weighted MinHash implementation on CUDA (multi-gpu).
BQN virtual machine
SYCL accelerated BLAKE3 Hash Implementation
A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, ...
GPU Framework for Radio Astronomical Image Synthesis
an implementation of parallel linear BVH (LBVH) on GPU
Some CUDA design patterns and a bit of template magic for CUDA
An architecture for LLMs' continual-learning and long-term memories
CUDA C++ Core Libraries
Extending JAX with custom C++ and CUDA code
Real-time large scale dense visual SLAM system
Real-time dense visual SLAM system