A clone of InfiniteCraft (AI!!! LLMs!!) you can run on a laptop _without_ a good GPU!!
MIT License
Like the beloved Infinite Craft but:
Play it online here: GitHub CodeSpace port forward
One word. Guidance. This nifty little library lets you "proompt" models unlike anything you've seen before. No more unexpected gibberish, no more wrangling with prompts to get the right format for your output. You can constrain the ouput of a model to a specific format, you can even write grammars with this thing.
Though, I didn't use it to its full potential here to be fair. What's best about this library is that after it's parsed a prompt, subsequent queries with variations of the same prompt (depends on how you structure your program really) will be way faster than the first one. Once you get past the initial overhead of consuming the prompt, you can generate text faster than you can click the infamous "I'm not a robot" checkbox.
For the task of generating <item> + <item> = ?
and appropriate emoji for <item> = ?
, it's a godsend,
and sets this project apart from all the other clones (very few) out there that either
pay for OpenAI's API or struggle with the heavy cost of local models.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
MODEL_PATH=/wherever/your/llama-cpp/gguf-model/is-located/model.gguf python3 app.py
For PowerShell (Windows):
# mostly the same commands, but a little different
python3 -m venv venv
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process -Force
./venv/bin/Activate.ps1
$env:MODEL_PATH='C:\Path\To\Your\Model.gguf'
python3 app.py
For this initial version, you can disregard Flask's nagging. No, you don't need a dedicated multi-threaded wrapper like Gunicorn or uWSGI. For now.
First, the server will warm-up by generating three or four word combinations (not remembered) and some emojis too. This is to get the models ready, and avoid latency for visitors. Only then will the server start, on port 5000.
You can use ngrok or cloudflared
(no link, google it) to expose your local server to the internet
and play with friends.
I've developed this with the following models in mind:
llama_cpp_python>=0.2.82
for this to work.You can use any model you want, but the ones above are the ones I've tested with.
Here's plans: