robot-agent

Fine-tuned LLaMa2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting.

MIT License

Stars
17

Robot Agent

Fine-tuned Llama2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting. The codebase has the following desirable features:

  • Entire training procedure runs out of the box on a single computer with 32GB of RAM and 24GB of VRAM (i.e. consumer-grade graphics cards such as the RTX 3090 and RTX 4090) with less than 30 hours of compute time.
    • Carefully tuned to use no more than 27GiB of RAM and 23.6GiB of VRAM.
    • This is accomplished through quantization, FP16, TF32, and the usual gradient accumulation/checkpointing settings.
    • Training is fully interruptible/resumable.
  • Heavily commented, short, clean, and reproducible training code.
    • All library dependency versions fully pinned, base models and datasets are pinned and downloaded as part of setup process.
    • After initial setup, training process does not require network access - entire project folder is portable, can be moved into airgapped and offline environments.
    • Use SafeTensors everywhere for speed and security.

Technical details:

  • Based on Llama2 13B.
  • QLoRA training, a 128 rank LoRA similar to Guanaco.
  • 2048-token context window used in supervised finetuning, 1536-token context window used in direct preference finetuning.
  • Supervised finetuning using Airoboros' self-instruct dataset, generated by Airoboros' self-instruct implementation.
    • The dataset has been filtered for refusals, and so could be considered "uncensored".
    • The dataset generation code also uses a GPT4 jailbreak to reduce the number of refusals in the first place.
  • Direct preference finetuning using Anthropic's hh-rlhf dataset
    • This replaces the reward modelling and reinforcement learning steps in a standard RLHF pipeline.
  • Codebase takes ideas and inspiration from StackLLaMa, QLoRA, LLaMA-TRL, Airoboros, .

Roadmap

  • Full reproducible environment with all datasets, base models, and dependencies included.
  • Supervised finetuning script using high-quality publically-available instruct datasets.
  • Human-preference finetuning script based on Anthropic's hh-rlhf "helpfulness" dataset.
  • Accidentally delete the training results on my GPU server and start the training over again from scratch.
  • Fiddle with agentic dataset generation using Charades dataset.
  • If that doesn't work, fiddle with video captioning using multimodal models like Otter to generate agentic captions from how-to videos on Youtube.

Prompt Format

### Human:
INSTRUCTIONS_GO_HERE

### Assistant:

Note that there is a single newline at the end of the prompt. Example:

### Human:
What color is the sky?

### Assistant:
The sky is blue.

Training

First, download everything that requires an internet connection into the current project folder. It will increase to around 30GiB in size:

make download-datasets-and-models

Next, transfer the current project folder to the training machine, where the rest of the training can be performed fully offline:

make train

Inference

To use the model, a simple chat-like interface is included for demo purposes, it's not very fancy but it's good enough for testing purposes:

make chat

Using Llama.cpp

First, run the following command to create ./exported-models/ggml-robot-agent-q5_K_M.bin, an 8.6GiB GGML file compatible with Llama.cpp:

make generate-ggml

Now to load the model using Llama.cpp:

make chat-llama-cpp

To use Llama.cpp manually, navigate to your llama.cpp folder and start using the model with the following command (replace PATH_TO_PROJECT_FOLDER with the path to the current project folder):

./main --model PATH_TO_PROJECT_FOLDER/exported-models/ggml-robot-agent-q5_K_M.bin --color --interactive --interactive-first --mirostat 2 --ctx-size 2048 --reverse-prompt $'\n\n### Human:\n' --prompt $'\n\n### Human:\n' --in-suffix $'\n### Assistant:\n'