This project creates a real-time conversational AI, either serverless via SvelteKit/Static or using LangChain with FastAPI as a web server, streaming GPT model responses and supporting in-browser LLMs via webllm.
MPL-2.0 License
This project demonstrates how to create a real-time conversational AI from models hosted in your browser or commercially available. It uses FastAPI to create a web server that accepts user inputs and streams generated responses back to the user in a Svelte UI app.
The app also supports the running of LLM's in the browser via webllm and it's completely private.
Have a look at the live version here, multishot.ai, although it needs Chrome or Edge that have WebGPU support.
remote
to local
to find the local models and it will download one for you and you're good to start chatting.With the addition of running models locally with webllm in your browser,
you can run this app without the python
backend and skip that installation and steps by starting
at running the UI in the ./ui/
directory with npm run dev -- --port=3333
.
Select the local
toggle button in the UI to download and select the model to run prompts locally
without sending data over the web. It's a great way to keep your LLM chattery private.
python -m venv .venv
source .venv/bin/activate
source .venv/bin/activate
server
directory run: pip install -r requirements.txt
../server/.env
and use example.env
as a template in the server
directory.uvicorn server.main:app --reload
pnpm
but you can use npm
if you prefer and have time.
cd ./ui/
pnpm install --save-dev vite
pnpm build
- if you want to build for productionpnpm exec vite --port=3333
or npm run dev -- --port=3333
http://localhost:3333/
and your backend on http://127.0.0.1:8000/static/index.html
.