A diff tool for language models
APACHE-2.0 License
Qualitative comparison of large language models.
Demo & Paper: http://lmdiff.net
From the root directory install Conda dependencies:
conda env create -f environment.yml
conda activate LMdiff
pip install -e .
Run the backend in development mode, deploying default models and configurations:
uvicorn backend.server:app --reload
Check the output for the right port (something like http://localhost:8000) and open in Browser.
This is optional, because we have a compiled version checked into this repo.
cd client
npm install
npm run build:backend
cd ..
To use your own models:
Create a TextDataset
of phrases to analyze
You can create the dataset file in several ways:
python scripts/make_dataset.py path/to/my_dataset.txt my_dataset -o folder/i/want/to/save/in
from analysis.create_dataset import create_text_dataset_from_object
my_collection = ["Phrase 1", "My second phrase"]
create_text_dataset_from_object(my_collection, "easy-first-dataset", "human_created", "folder/i/want/to/save/in")
from analysis.create_dataset import create_text_dataset_from_hf_datasets
import datasets
import path_fixes as pf
glue_mrpc = datasets.load_dataset("glue", "mrpc", split="train")
name = "glue_mrpc_train"
def ds2str(glue):
"""(e.g.,) Turn the first 50 sentences of the dataset into sentence information"""
sentences = glue['sentence1'][:50]
return "\n".join(sentences)
create_text_dataset_from_hf_datasets(glue_mrpc, name, ds2str, ds_type="human_created", outfpath=pf.DATASETS)
The dataset is a simple .txt
file, with a new phrase on every line, and with a bit of required metadata header at the top. E.g.,
---
checksum: 92247a369d5da32a44497be822d4a90879807a8751f5db3ff1926adbeca7ba28
name: dataset-dummy
type: human_created
---
This is sentence 1, please analyze this.
Every line is a new phrase to pass to the model.
I can keep adding phrases, so long as they are short enough to pass to the model. They don't even need to be one sentence long.
The required fields in the header:
checksum
:: A unique identifier for the state of that file. It can be calculated however you wish, but it should change if anything at all changes in the contents below (e.g., two phrases are transposed, a new phase added, or a period is added after a sentence)name
:: The name of the dataset.type
:: Either human_created
or machine_generated
if you want to compare on a dataset that was spit out by another modelEach line in the contents is a new phrase to compare in the language model. A few warnings:
Choose two comparable models
Two models are comparable if they:
This allows us to do tokenwise comparisons on the model. For example, this could be:
distilbert-base-cased
and distilbert-base-uncased-finetuned-sst-2-english
)bert-base-cased
and distilbert-base-cased
)gpt2
and gpt2-large
)Preprocess the models on the chosen dataset
python scripts/preprocess.py all gpt2-medium distilgpt2 data/datasets/glue_mrpc_1+2.csv --output-dir data/sample/gpt2-glue-comparisons
Start the app
python backend/server/main.py --config data/sample/gpt2-glue-comparisons
Note that if you use a different tokenization scheme than the default gpt
, you will need to tell the frontend how to visualize the tokens. For example, a bert
based tokenization scheme:
python backend/server/main.py --config data/sample/bert-glue-comparisons -t bert
Models and datasets for the deployed app are stored on the cloud and require a private .dvc/config
file.
With the correct config:
dvc pull
will populate the data directories correctly for the deployed version.
make test
or
python -m pytest tests
All tests are stored in tests
.
We like pnpm
but npm
works just as well. We also like Vite
for its rapid hot module reloading and pleasant dev experience. This repository uses Vue
as a reactive framework.
From the root directory:
cd client
pnpm install --save-dev
pnpm run dev
If you want to hit the backend routes, make sure to also run the uvicorn backend.server:app
command from the project root.
pnpm run serve
cd client
pnpm run build:backend
cd ..
uvicorn backend.server:app
Or the gunicorn
command from above.
All artifacts are stored in the client/dist
directory with the appropriate basepath.
pnpm run build
All artifacts are stored in the client/dist
directory.
<localhost>:<port>/docs