Gemma-2 2B fine-tuned for Structured Data Extraction

This project is a collection of notebook and a simple flask web server to serve Gemma-2 using llama-cpp.

The goal of this project is to fine-tune a model to get a better result on the task of to the task of extracting data into a structured format (JSON).

You will need to provide the output schema in openapi format and the text (context).

⛩️ Project Architecture

The project is divided between notebook for the fine-tuning, quantization and evaluation and python files.

Source	Description
➡️ Gemma-2 Finetuning	A notebook that shows how tofine-tune and quantize gemma2-2b-it using the unsloth and hugging-face libraries.
➡️ Server	A simple flask REST server using llama-cpp with a 4 bit quantized model.
➡️ CI/CD	A github action consisting of a formatting/linting step with ruff, testing with pytest and building the docker image.
➡️ Dockerfile	A mutlistage dockerfile to build the server with gunicorn.

📊 Details about the Dataset

The different finetuned models can be found in safetensors and GGUF format (4bit, 8bit) on the hugging-face hub at bastienp/Gemma-2-2B-it-JSON-data-extration.

Note: It also gives more details on how to use it with llama-cpp or unsloth.

💻 Installation

Dev setup

Recommended: Use the fast Python package installer and resolver uv from astral.

Alternatively, you can replace this command with pip. You can find the documentation for installing uv here.

Sync the dependencies with uv

uv venv .venv

source .venv/bin/activate

uv sync --all-extras --dev # in addition it adds pytest and ruff

Launch a flask dev server

flask --app src.web.app run --debug

To reproduce the fine-tuning, the easiest way is to use Google Collab (the free version is sufficient).

Run the tests (API testing)

pytest

Note: An example of how to call the API and the prompt format can be found in examplesexample_api_call.py.

👥 Deployment setup

In order to deploy the model the easiest way to go is to use the provided docker image.

Pull the image from github (buit from the CI):

docker pull ghcr.io/bastienpo/unsloth_finetuning:main

Note: Otherwise you can build the image yourself

docker build -tag unsloth_finetuning:0.0.1 .

Run the docker image

docker run -p 8000:8000 -d unsloth_finetuning:main # or 0.0.1

Make a post request

curl -i -H "Content-Type: application/json" -X POST -d '{"query": "How are you ?"}' http://localhost:8000/api/v1/chat/completions

Related Projects

qdurllm

Search your favorite websites and chat with them, on your desktop🌐

05 Jul 2024 21

asktube

AskTube - An AI-powered YouTube video summarizer and QA assistant powered by Retrieval Augmented ...

03 Sep 2024 62

DeveloperGPT

DeveloperGPT is a LLM-powered command line tool that enables natural language to terminal command...

01 Apr 2023 36

gemma

Open weights LLM from Google DeepMind.

20 Feb 2024 2,413

finetune-hf-vits

Finetune VITS and MMS using HuggingFace's tools

11 Dec 2023 39

LMdiff

A diff tool for language models

19 May 2021 40

Enhancing-LLM-with-Jenkins-Knowledge

🚀 this project aims to develop an app using an existing open-source LLM with data collected for d...

20 May 2024 10