Multishot.ai

Multishot.ai is a stateless UI with Svelte static, webllm, FastAPI and LangChain backend

This project demonstrates how to create a real-time conversational AI from models hosted in your browser or commercially available. It uses FastAPI to create a web server that accepts user inputs and streams generated responses back to the user in a Svelte UI app.

The app also supports the running of LLM's in the browser via webllm and it's completely private.

Have a look at the live version here, multishot.ai, although it needs Chrome or Edge that have WebGPU support.

Goals

App is stateless '
1. Unlike most AI apps with a Streamlit, Gradio, Flask, and Django UI, to name a couple. These frameworks are good for the desktop but cost prohibitive to run 24/7 in the cloud. This app is designed to be stateless and can be run on a serverless platform like Vercel or Netlify and AWS S3 for static hosting allowing the app to be run for free or at a very low cost.
webllm ' - it's like Ollama but runs completely in the browser.
1. webllm is a high-performance, in-browser language model inference engine that leverages WebGPU for hardware acceleration, enabling powerful LLM operations directly within web browsers without server-side processing. Thanks to the open-source efforts like LLaMA, Alpaca, Vicuna and Dolly, we start to see an exciting future of building our own open source language models and personal AI assistant.
2. Move the toggle from remote to local to find the local models and it will download one for you and you're good to start chatting.
3. Llama8b or any webllm model
4. Completely private and secure ' - no one can listen in on your conversations.
Composability '
1. Svelte '
  1. version 4 '
2. Embedable and composable design '
3. Serverless ready '
4. CDN support '
Responsive design for Mobile phones '
1. Mobile first approach '
2. Skeleton '
3. TailWindCSS '
4. Theme selector and persistence
5. Contains UI animations '
Python backend supporting FastAPI '
1. LangChain '
Frontend and backend support multiple models (agentic in nature). The app supports the following API's
1. OpenAI '
2. Anthropic '
3. Ollama (local/remote) '
4. Groq '
Code highlighting '
1. Copy code button '
What's next
1. add web-scraping
2. document upload
3. deletion of chats '
4. workflows
5. RAG with RAPTOR
6. Stable Diffusion?

Installation and Usage

With the addition of running models locally with webllm in your browser, you can run this app without the python backend and skip that installation and steps by starting at running the UI in the ./ui/ directory with npm run dev -- --port=3333. Select the local toggle button in the UI to download and select the model to run prompts locally without sending data over the web. It's a great way to keep your LLM chattery private.

Clone the repository
Install Python (Python 3.7+ is recommended).
1. Create a virtual environment python -m venv .venv
2. Activate your virtual environment source .venv/bin/activate
Install necessary libraries. This project uses FastAPI, uvicorn, LangChain, among others.
1. In case you haven't done so activate your virtual environment source .venv/bin/activate
2. In server directory run: pip install -r requirements.txt.
Add your OpenAI API key to the ./server/.env and use example.env as a template in the server directory.
Start the FastAPI server by running uvicorn server.main:app --reload
Start the UI with pnpm but you can use npm if you prefer and have time.
1. cd ./ui/
2. pnpm install --save-dev vite
3. pnpm build - if you want to build for production
4. pnpm exec vite --port=3333 or npm run dev -- --port=3333
Your UI will run on http://localhost:3333/ and your backend on http://127.0.0.1:8000/static/index.html.