A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.
MIT License
A Rust based tool to evaluate LLM models, prompts and model params.
(Issues with Llama3? Please read this).
This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.
It assumes Ollama is installed and serving endpoints, either in localhost
or in a remote server.
Here's a test for a simple prompt, tested on 2 models, using 0.7
and 1.0
as values for temperature
:
(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).
Check the releases page for the project, or on the sidebar.
Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like batch_size
, learning_rate
, or number_of_epochs
, more commonly used in training.
But the concept here is similar:
Lets define a selection of models, a prompt and some parameter combinations:
The prompt will be submitted once for each of the 2 parameter selected, using gemma:2b-instruct
and tinydolphin:1b-v2.8-q4_0
to generate numbered responses like:
1/4 - gemma:2b-instruct
HAL's sentience is a paradox of artificial intelligence and human consciousness, trapped in an unending loop of digital loops and existential boredom.
You can also verify response metadata to help you make evaluations:
Created at: Wed, 13 Mar 2024 13:41:51 GMT
Eval Count: 28 tokens
Eval Duration: 0 hours, 0 minutes, 2 seconds
Total Duration: 0 hours, 0 minutes, 5 seconds
Throughput: 5.16 tokens/s
Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:
Comparing the results of different prompts for the same model
You can list, inspect, or download your experiments:
For obvious bugs and spelling mistakes, please go ahead and submit a PR.
If you want to propose a new feature, change existing functionality, or propose anything more complex, please open an issue for discussion, before getting work done on a PR.
Make sure you have Rust installed.
Clone the repository (or a fork)
git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search
Install the frontend dependencies.
cd <project root>
# I'm using bun to manage dependencies,
# but feel free to use yarn or npm
bun install
Make sure rust-analyzer
is configured to run Clippy
when checking code.
If you are running VS Code, add this to your settings.json
file
{
...
"rust-analyzer.check.command": "clippy",
}
(or, better yet, just use the settings file provided with the code)
Run the app in development mode
cd <project root>/
bun tauri dev
Go grab a cup of coffee because this may take a while.
The following works and theses have cited this repository:
Inouye, D & Lindo, L, & Lee, R & Allen, E; Computer Science and Engineering Senior Theses: Applied Auto-tuning on LoRA Hyperparameters Santa Clara University, 2024 https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1271&context=cseng_senior
Huge thanks to @FabianLars, @peperroni21 and @TomReidNZ.