LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
MIT License
Yet another `llama.cpp` Rust wrapper
Distributed LLM and StableDiffusion inference for mobile, desktop and server.
A fast llama2 decoder in pure Rust.
An ecosystem of Rust libraries for working with large language models