🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
GPL-3.0 License
No README available, please check again later.
Haskell FFI bindings to CUDA
使用yolov8、fast-reid、deepsort完成目标跟踪
[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the ...
cuda编程学习入门
Classes enabling finmath-lib to run its Monte-Carlo models on Cuda GPUs
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
Some CUDA design patterns and a bit of template magic for CUDA
A CUDA Extension of Neural Network Libraries
CUDA C++ Core Libraries
NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)
Weighted MinHash implementation on CUDA (multi-gpu).
The fastest Tropical number matrix multiplication on GPU
An architecture for LLMs' continual-learning and long-term memories