Simple VGG16 implemented in CUDA
Statistics for this project are still being loaded, please check back later.
A CUDA Extension of Neural Network Libraries
Provide Docker build sequences of PyTorch for various environments.
cuda编程学习入门
VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writ...
The fastest Tropical number matrix multiplication on GPU
Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being p...
Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlo...
CUDA C++ Core Libraries
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
CULiNGAM accelerates LiNGAM analysis on GPUs.
MWE for using the Eigen library in CUDA kernels
Cuda-based matrix/vector computations
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
Some CUDA design patterns and a bit of template magic for CUDA