Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
APACHE-2.0 License
👋🤗🤗👋 Join our WeChat.
中文 | English
LLamaTuner is an efficient, flexible and full-featured toolkit for fine-tuning LLM (Llama3, Phi3, Qwen, Mistral, ...)
Efficient
Flexible
Full-featured
Model | Model size | Default module | Template |
---|---|---|---|
Baichuan | 7B/13B | W_pack | baichuan |
Baichuan2 | 7B/13B | W_pack | baichuan2 |
BLOOM | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - |
BLOOMZ | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - |
ChatGLM3 | 6B | query_key_value | chatglm3 |
Command-R | 35B/104B | q_proj,v_proj | cohere |
DeepSeek (MoE) | 7B/16B/67B/236B | q_proj,v_proj | deepseek |
Falcon | 7B/11B/40B/180B | query_key_value | falcon |
Gemma/CodeGemma | 2B/7B | q_proj,v_proj | gemma |
InternLM2 | 7B/20B | wqkv | intern2 |
LLaMA | 7B/13B/33B/65B | q_proj,v_proj | - |
LLaMA-2 | 7B/13B/70B | q_proj,v_proj | llama2 |
LLaMA-3 | 8B/70B | q_proj,v_proj | llama3 |
LLaVA-1.5 | 7B/13B | q_proj,v_proj | vicuna |
Mistral/Mixtral | 7B/8x7B/8x22B | q_proj,v_proj | mistral |
OLMo | 1B/7B | q_proj,v_proj | - |
PaliGemma | 3B | q_proj,v_proj | gemma |
Phi-1.5/2 | 1.3B/2.7B | q_proj,v_proj | - |
Phi-3 | 3.8B | qkv_proj | phi |
Qwen | 1.8B/7B/14B/72B | c_attn | qwen |
Qwen1.5 (Code/MoE) | 0.5B/1.8B/4B/7B/14B/32B/72B/110B | q_proj,v_proj | qwen |
StarCoder2 | 3B/7B/15B | q_proj,v_proj | - |
XVERSE | 7B/13B/65B | q_proj,v_proj | xverse |
Yi (1/1.5) | 6B/9B/34B | q_proj,v_proj | yi |
Yi-VL | 6B/34B | q_proj,v_proj | yi_vl |
Yuan | 2B/51B/102B | q_proj,v_proj | yuan |
Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA |
---|---|---|---|---|
Pre-Training | ✅ | ✅ | ✅ | ✅ |
Supervised Fine-Tuning | ✅ | ✅ | ✅ | ✅ |
Reward Modeling | ✅ | ✅ | ✅ | ✅ |
PPO Training | ✅ | ✅ | ✅ | ✅ |
DPO Training | ✅ | ✅ | ✅ | ✅ |
KTO Training | ✅ | ✅ | ✅ | ✅ |
ORPO Training | ✅ | ✅ | ✅ | ✅ |
As of now, we support the following datasets, most of which are all available in the Hugging Face datasets library.
Please refer to data/README.md to learn how to use these datasets. If you want to explore more datasets, please refer to the awesome-instruction-datasets. Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.
pip install --upgrade huggingface_hub
huggingface-cli login
We provide a number of data preprocessing tools in the data folder. These tools are intended to be a starting point for further research and development.
We provide a number of models in the Hugging Face model hub. These models are trained with QLoRA and can be used for inference and finetuning. We provide the following models:
Base Model | Adapter | Instruct Datasets | Train Script | Log | Model on Huggingface |
---|---|---|---|---|---|
llama-7b | FullFinetune | - | - | - | |
llama-7b | QLoRA | openassistant-guanaco | finetune_lamma7b | wandb log | GaussianTech/llama-7b-sft |
llama-7b | QLoRA | OL-CC | finetune_lamma7b | ||
baichuan7b | QLoRA | openassistant-guanaco | finetune_baichuan7b | wandb log | GaussianTech/baichuan-7b-sft |
baichuan7b | QLoRA | OL-CC | finetune_baichuan7b | wandb log | - |
Mandatory | Minimum | Recommend |
---|---|---|
python | 3.8 | 3.10 |
torch | 1.13.1 | 2.2.0 |
transformers | 4.37.2 | 4.41.0 |
datasets | 2.14.3 | 2.19.1 |
accelerate | 0.27.2 | 0.30.1 |
peft | 0.9.0 | 0.11.1 |
trl | 0.8.2 | 0.8.6 |
Optional | Minimum | Recommend |
---|---|---|
CUDA | 11.6 | 12.2 |
deepspeed | 0.10.0 | 0.14.0 |
bitsandbytes | 0.39.0 | 0.43.1 |
vllm | 0.4.0 | 0.4.2 |
flash-attn | 2.3.0 | 2.5.8 |
* estimated
Method | Bits | 7B | 13B | 30B | 70B | 110B | 8x7B | 8x22B |
---|---|---|---|---|---|---|---|---|
Full | AMP | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB |
Full | 16 | 60GB | 120GB | 300GB | 600GB | 900GB | 400GB | 1200GB |
Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 360GB | 160GB | 400GB |
LoRA/GaLore/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 240GB | 120GB | 320GB |
QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 140GB | 60GB | 160GB |
QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 72GB | 30GB | 96GB |
QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 48GB | 18GB | 48GB |
Clone this repository and navigate to the Efficient-Tuning-LLMs folder
git clone https://github.com/jianzhnie/LLamaTuner.git
cd LLamaTuner
main function | Useage | Scripts |
---|---|---|
train.py | Full finetune LLMs on SFT datasets | full_finetune |
train_lora.py | Finetune LLMs by using Lora (Low-Rank Adaptation of Large Language Models finetune) | lora_finetune |
train_qlora.py | Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs) | qlora_finetune |
The train_qlora.py
code is a starting point for finetuning and inference on various datasets.
Basic command for finetuning a baseline model on the Alpaca dataset:
python train_qlora.py --model_name_or_path <path_or_name>
For models larger than 13B, we recommend adjusting the learning rate:
python train_qlora.py –learning_rate 0.0001 --model_name_or_path <path_or_name>
To find more scripts for finetuning and inference, please refer to the scripts
folder.
Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.
bnb_4bit_compute_type='fp16'
can lead to instabilities. For 7B LLaMA, only 80% of finetuning runs complete without error. We have solutions, but they are not integrated yet into bitsandbytes.tokenizer.bos_token_id = 1
to avoid generation issues.LLamaTuner
is released under the Apache 2.0 license.
We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.
We appreciate the work by many open-source contributors, especially:
Please cite the repo if you use the data or code in this repo.
@misc{Chinese-Guanaco,
author = {jianzhnie},
title = {LLamaTuner: Easy and Efficient Fine-tuning LLMs},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jianzhnie/LLamaTuner}},
}