Collective communications library with various primitives for multi-machine training.
OTHER License
Optimized primitives for collective multi-GPU communication
A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
stdgpu: Efficient STL-like Data Structures on the GPU
Golang bindings for ggllm.cpp
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Benchmarking Deep Learning operations on different hardware
A modern, low latency datadog client for C++
Some CUDA design patterns and a bit of template magic for CUDA
CUDA C++ Core Libraries
A simple example demonstrating how to call C++ from Go
Modern C++ RandomX Implementation