CUDA implementation of autoregressive linear attention, with all the latest research findings
MIT License
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the ...
Abstraction Library for Parallel Kernel Acceleration
Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-proce...
NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
A PyTorch Library for Accelerating 3D Deep Learning Research
Fast Neural Machine Translation in C++ - development repository
A high-performance inference system for large language models, designed for production environments.
Efficient Deep Learning Systems course materials (HSE, YSDA)
ThunderGBM: Fast GBDTs and Random Forests on GPUs
CUDA C++ Core Libraries
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, ...
An architecture for LLMs' continual-learning and long-term memories
Kernel Tuner
Special Presentation Demo at Intel IoT Planet 2021 DeepLearning Day / インテル IoT プラネット 2021 DeepLea...