A full fledged mistral+wandb
This project demonstrates how to fine-tune and evaluate a Mistral AI language model to detect factual inconsistencies and hallucinations in text summaries. It is based on this amazing blog post by Eugene Yan.
In this project, we will:
In this project we make extensive use of Weave to trace and organize our model evaluations.
Prepare the data:
01_prepare_data.ipynb
to process and format the datasetsThe dataset is also available in the
data
folder, so you may skip this notebook.
Fine-tune and evaluate the model:
02_finetune_and_eval.ipynb
to:
The notebook demonstrates improvements in hallucination detection after fine-tuning, with detailed metrics and comparisons between model versions.
All the results and evaluation are logged to this Weave Project
The finetuning process is logged to Weights & Biases as well, living together on the same project as the model evals.
NUM_SAMPLES
in the evaluation notebook to control the number of examples usedFor more details, refer to the individual notebooks and comments within the code.