校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。
No README available, please check again later.
cuda编程学习入门
CUDA C++ Core Libraries
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, ...
Codes for learning cuda. Implementation of multiple kernels.
A high-performance inference system for large language models, designed for production environments.
Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
Python interface to GPU-powered libraries
This is a discord bot running on llama cpp with the llama 3 model and image geneartion
An easy to use, high performant CUDA powered LLM inference library.
An architecture for LLMs' continual-learning and long-term memories
使用yolov8、fast-reid、deepsort完成目标跟踪
A CUDA Extension of Neural Network Libraries
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the ...
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!