校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。
CUDA C++ Core Libraries
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
使用yolov8、fast-reid、deepsort完成目标跟踪
A high-performance inference system for large language models, designed for production environments.
Python interface to GPU-powered libraries
An easy to use, high performant CUDA powered LLM inference library.
Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
An architecture for LLMs' continual-learning and long-term memories
Codes for learning cuda. Implementation of multiple kernels.
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the ...
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
A CUDA Extension of Neural Network Libraries
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, ...
This is a discord bot running on llama cpp with the llama 3 model and image geneartion
cuda编程学习入门