LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
MIT License
This repo contains the popular LLaMa 7b language model, fully implemented in the rust programming language!
Uses dfdx tensors and CUDA acceleration.
This runs LLaMa directly in f16, meaning there is no hardware acceleration on CPU. Using CUDA is heavily recommended.
Here is the 7b model running on an A10 GPU:
sudo apt install git-lfs
git lfs install
.git clone https://huggingface.co/decapoda-research/llama-7b-hf
git clone https://huggingface.co/decapoda-research/llama-13b-hf
git clone https://huggingface.co/decapoda-research/llama-65b-hf
python3.x -m venv <my_env_name>
to create a python virtual environment, where x
is your prefered python versionsource <my_env_name>\bin\activate
(or <my_env_name>\Scripts\activate
if on Windows) to activate the environmentpip install numpy torch
python convert.py
to convert the model weights to rust understandable format:python convert.py
python convert.py llama-13b-hf
python convert.py llama-65b-hf
You can compile with normal rust commands:
With cuda:
cargo build --release -F cuda
Without cuda:
cargo build --release
With default args:
./target/release/llama-dfdx --model <model-dir> generate "<prompt>"
./target/release/llama-dfdx --model <model-dir> chat
./target/release/llama-dfdx --model <model-dir> file <path to prompt file>
To see what commands/custom args you can use:
./target/release/llama-dfdx --help