Web extension that embeds LLMs in your browser to power AI in web apps
WebextLLM is a browser extension created to democratize AI.
The extension simplifies integrating AI into web applications by taking ownership of LLM management and local deployment so that developers don't need to embed them in their apps. Instead, the extension makes local LLMs accessible to them by injecting a simple and lightweight Javascript API (based on and mostly compliant with the window.ai) into all web pages. This brings the traditional cloud-backed development paradigm to the edge!
Users enjoy an exciting combination of freedom, privacy, and security. Harness browser-native LLMs to fuel a growing AI-based application ecosystem as a user, developer, or model provider.
https://github.com/idosal/WebextLLM/assets/18148989/2ac3586c-eeee-4404-8864-b13e879def19
window.ai API
is designed to be simple, working seamlessly with any JavaScript application. No complex SDKs or framework-specific wrappers are required.Configuration
tab, verify that the selected model is ready for use. The model weighs ~1.9GB which need to be downloaded once (on first use). The model is then cached so future initializations will be much quicker (and offline).window.ai
(see the Apps tab in the extension's popup).This section is taken almost as-is from the window.ai repository.
To leverage user-managed models in your app, simply call await window.ai.generateText
with your prompt and options.
Example:
const [ response ] : Output[] = await window.ai.generateText(
{ messages: [{role: "user", content: "Who are you?"}] }: Input
)
console.log(response.message.content) // "I am an AI language model"
All public types, including error messages, are documented in this file. Input
, for example, allows you to use both simple strings and ChatML.
Example of streaming results to the console:
await window.ai.generateText(
{
messages: [{ role: "user", content: "Who are you?" }]
},
{
temperature: 0.7,
onStreamResult: (res) => console.log(res.message.content)
}
)
Note that generateText
will return an array, Output[]
, that only has multiple elements if numOutputs > 1
.
This does not guarantee that the length of the return result will equal numOutputs
. If the model doesn't support multiple choices, then only one choice will be present in the array.
The Window API is simple. Just a few functions:
Generate Text: generate text from a specified model or the user-preferred model.
window.ai.generateText(
input: Input,
options: CompletionOptions = {}
): Promise<Output | Output[]>
Input
is either a { prompt : string }
or { messages: ChatMessage[]}
. Examples: see getting started above.
Current model: get the user's currently preferred model ID.
window.ai.getCurrentModel(): Promise<ModelID>
Listen to events: to listen to events emitted by the extension, such as whenever the preferred model changes, here's what you do:
window.ai.addEventListener((event: EventType, data: unknown) => {
// You can check `event` to see if it's the EventType you care about, e.g. "model_changed"
console.log("EVENT received", event, data)
})
All public types, including error messages, are documented in this file. Highlights below:
This options dictionary allows you to specify options for the completion request.
export interface CompletionOptions {
// If specified, partial updates will be streamed to this handler as they become available,
// and only the first partial update will be returned by the Promise.
onStreamResult?: (result: Output | null, error: string | null) => unknown
// What sampling temperature to use, between 0 and 2. Higher values like 0.8 will
// make the output more random, while lower values like 0.2 will make it more focused and deterministic.
// Different models have different defaults.
temperature?: number
/* In the future, we'll support the full spec and more. For example:
// How many completion choices to generate. Defaults to 1.
numOutputs?: number
// The maximum number of tokens to generate in the chat completion. Defaults to infinity, but the
// total length of input tokens and generated tokens is limited by the model's context length.
maxTokens?: number */
}
Errors emitted by the extension API:
export enum ErrorCode {
// User denied permission to the app
PermissionDenied = "PERMISSION_DENIED",
// Happens when a permission request popup times out
RequestNotFound = "REQUEST_NOT_FOUND",
// When a request is badly formed
InvalidRequest = "INVALID_REQUEST",
// When an AI model refuses to fulfill a request. The returned error is
// prefixed by this value and includes the status code that the model API returned
ModelRejectedRequest = "MODEL_REJECTED_REQUEST"
}
After installing the extension, visit the Apps tab in its popup to find supported apps that use window.ai
.
There are two setup options:
âś… Chrome and other Chromium based browsers (e.g., Brave, Edge, etc.)
The extension is built with Plasmo.
Build with:
pnpm build
Load the unpacked extension to the browser:
Chrome - build/chrome-mv3-prod
.
WebextLLM is a proof-of-concept using experimental technologies. It is not recommended for production use and is currently targeted for research purposes. The software is provided "as-is" without any warranty, expressed or implied. By using this software, you agree to assume all risks, including potential data loss, system failure, or other issues that may occur. The models provided in this project are not affiliated with, or endorsed by the project's author. The author does not claim ownership of the models and is not responsible for any issues arising from their use. Please note that open-source models, especially uncensored ones, are not regulated and may generate offensive or harmful content. Similarly, the project's author is not affiliated with the applications utilizing the window.ai API. The author does not claim ownership of the applications and is not liable for any issues arising from their use. Use at your own disclosure.
This project utilizes and builds on the incredible work by web-llm and window.ai!