GPT2Alpaca Pipeline

This project demonstrates how to build a machine learning pipeline for fine-tuning GPT2 on Alpaca dataset with the technologies of TensorFlow Extended(TFX), KerasNLP, TensorFlow, and Hugging Face Hub. This project is done as a part of 2023 Keras Community Sprint held by the official Keras team at Google.

Introduction

The demand on building ChatGPT like Large Language Model(LLM)s has been dramatically increasing since early 2023 because of their promising capabilities. In order to build a customized and private LLM based Chatbot applications, we need to fine-tune a language model(i.e. GPT2) on (instruction, response) paried custom dataset.

This project uses GPT2 model from KerasNLP library as the base language model and fine-tune the GPT2 on Stanford Alpaca dataset from alpaca-lora repository.

NOTE: The Alpaca dataset used in this project is the enhanced version of the original Standford Alpaca dataset by open source communities to fix some flaws manually and with GPT4 API.

Further, in order to automate fine-tuning process, this project embedded the fine-tuning process in and end to end machine learning pipeline built in TensorFlow Extended(TFX). Within the pipline, when the data is given, the following TFX components are sequentially triggered, and the data in between components is shared in TFRecord format.

Alpaca dataset is injected into the TFX pipeline through TFX ExampleGen component. It is assumed that the data is prepared as TFRecord format beforehand. TensorFlow Dataset allows us to create TFRecords easily without knowing much about TFRecords. If you are curious, check out the alpaca sub directory to find about how-to.
Injected data is transformed into instruction-following format through TFX Transform component. The original Alpaca dataset separately stores instruction, input, and response for each conversation. However, they should be merged into a single string in the following format:

f"""### Instruction:
{instruction_txt}

### Input:
{input_txt}

### Response:
{response_txt}
"""

Fine-tuning process begins with the transformed data through TFX Trainer component. It instantiates GPT2 tokenizer, preprocessor, and model, then it fine-tunes GPT2 model on the transformed data. The final fine-tuned model is exported as SavedModel with custom a signature(this is a minimum requirement to serve TensorFlow/Keras model within TensorFlow Serving).

NOTE: There are two paths from this point to deploy fine-tuned GPT2 model. First option is the deployment on GCP's Vertex AI platform, and the second option is the deployment on Hugging Face Hub. In this document, the latter one is explained because the official TFX docker image currently does not support some operations in KerasNLP's GPT2 model.

Fine-tuned model is pushed to the Hugging Face Model Hub through custom TFX HFPusher component. At each time the model is pushed, new revision name(based on date) is assigned to it to distinguish the version of the model.
With the additonal capability of the custom TFX HFPusher component, it publishes a prepared template application to Hugging Face Space Hub. At each time the model is pushed, some strings within the template is replaced by real values at runtime such as revision name.

Instruction

Currently, Vertex AI is not supported to run this pipeline due to the CUDA and cuDNN version conflicts between TFX and KerasNLP. However, you can simply run the whole pipeline in a local and colab environment as below.

Requirements

Be sure to have GPU(s) in both cases. I have tested fine-tuning process with a single 80G A100 instance, and it took about an hour to finish the whole pipeline.
Also, be sure to have CUDA >= 11.6 and cuDNN >= 8.6. Below these versions, some KerasNLP GPT2 model would fail. As of 07/28/2023, the default Colab environment comes with higher versions of the two frameworks.

Local environment

Install dependencies

# it is recommended to run the following pip command in venv

$ cd training_pipeline
$ pip install -r requirements.txt

Replace Hugging Face Token inside pipeline/configs.py with the environment variable. This token will be used to push the model and publish a space application on Hugging Face Hub. If you are not familiar with how to get Hugging Face Access Token, check out the official document about it.
```
$ HF_ACCESS_TOKEN="YOUR Hugging Face Access Token"
$ envsubst '$HF_ACCESS_TOKEN' < pipeline/configs.py \
                              > pipeline/configs.py
```
Create TFX pipeline with tfx pipeline create command. This command registers a TFX pipeline system wide. After the creation, if you modify something in the pipeline perspective, you need to run tfx pipeline update instead of create. In this case, the options and their values remain the same. Any modifications of the files inside modules directory does not require to run tfx pipeline update.
```
$ tfx pipeline create --pipeline-path local_runner.py \
                      --engine local
```
Once TFX pipeline is created(registered) successfully, you can run the pipeline with tfx run create command. It will go through each component sequentially, and any intermediate products will be stored under the current directory.
```
$ tfx run create --pipeline-name kerasnlp-gpt2-alpaca-pipeline \
                 --engine local
```

Colab environment (TBD)

Todo

Notebook to convert GPT2CausalLM into SavedModel format
Notebook to fine-tune GPT2CausalLM with Alpaca dataset
Notebook to build a minimal ML Pipline with TFX
Build a standalone TFX pipeline w/ notebook
Put the TFX pipeline up on Google Cloud Platform (Vertex AI)
Testing out deployed GPT2CausalLM in Vertex AI and Hugging Face Space
Testing out deployed GPT2CausalLM on Hugging Face Space

Related Projects

ml-deployment-k8s-tfserving

This project shows how to serve an TF based image classification model as a web service with TFSe...

17 May 2022 119

TFX-WandB

07 Mar 2023 14

vertex-ai-huggingface

🤗 Collection of examples on how to train, deploy and monitor HuggingFace models in Google Cloud V...

24 Feb 2024 11

gpt2-ft-pipeline