Domain-specific ChatGTP Starter App

ChatGPT is great for casual, general-purpose question-answers but falls short when domain-specific knowledge is needed. Further, it makes up answers to fill its knowledge gaps and never cites its sources, so it can't really be trusted. This starter app uses embeddings coupled with vector search to solve this, or more specifically, to show how OpenAI's chat completions API can be used to create conversational interfaces to domain-specific knowledge.

Embeddings, as represented by vectors of floating-point numbers, measure the "relatedness" of text strings. These are super useful for ranking search results, clustering, classification, etc. Relatedness is measured by cosine similarity. If the cosine similarity between two vectors is close to 1, the vectors are highly similar and point in the same direction. In the case of text embeddings, a high cosine similarity between two embedding vectors indicates that the corresponding text strings are highly related.

This starter app uses embeddings to generate a vector representation of a document, and then uses vector search to find the most similar documents to the query. The results of the vector search are then used to construct a prompt. The response is then streamed to the user. Check out the Supabase blog posts on pgvector and OpenAI embeddings for more background.

Technologies used:

Nextjs (React framework) + Vercel hosting
Supabase (using their pgvector implementation as the vector database)
OpenAI API (for generating embeddings and chat completions)
TailwindCSS (for styling)

Functional Overview

Creating and storing the embeddings:

Web pages are scraped, stripped to plain text and split into 1000-character documents
OpenAI's embedding API is used to generate embeddings for each document using the "text-embedding-ada-002" model
The embeddings are then stored in a Supabase postgres table using pgvector; the table has three columns: the document text, the source URL, and the embedding vectors returned from the OpenAI API.

Responding to queries:

A single embedding is generated from the user prompt
That embedding is used to perform a similarity search against the vector database
The results of the similarity search are used to construct a prompt for GPT-3.5/GPT-4
The GPT response is then streamed to the user.

Getting Started

The following set-up guide assumes at least basic familiarity developing web apps with React and Nextjs. Experience with OpenAI APIs and Supabase is helpful but not required to get things working.

Set-up Supabase

Create a Supabase account and project at https://app.supabase.com/sign-in. NOTE: Supabase support for pgvector is relatively new (02/2023), so it's important to create a new project if your project was created before then.
First we'll enable the Vector extension. In Supabase, this can be done from the web portal through Database → Extensions. You can also do this in SQL by running:

create extension vector;

Next let's create a table to store our documents and their embeddings. Head over to the SQL Editor and run the following query:

create table documents (
  id bigserial primary key,
  content text,
  url text,
  embedding vector (1536)
);

Finally, we'll create a function that will be used to perform similarity searches. Head over to the SQL Editor and run the following query:

create or replace function match_documents (
  query_embedding vector(1536),
  similarity_threshold float,
  match_count int
)
returns table (
  id bigint,
  content text,
  url text,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    documents.id,
    documents.content,
    documents.url,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > similarity_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

Set-up local environment

clone the repo: gh repo clone gannonh/chatgpt-pgvector
open in your favorite editor (the following assumes VS Code on a Mac)

cd chatgpt-pgvector
code .

install dependencies

npm install

create a .env.local file in the root directory to store environment variables:

cp .env.local.example .env.local

open the .env.local file and add your Supabase project URL and API key. You can find these in the Supabase web portal under Project → API. The API key should be stored in the SUPABASE_ANON_KEY variable and project URL should be stored under NEXT_PUBLIC_SUPABASE_URL.
Add your OPENAI API key to .env.local. You can find this in the OpenAI web portal under API Keys. The API key should be stored in the OPENAI_API_KEY variable.
[optional] environment variable OPEAI_PROXY be provide to enable your custom proxy of OPENAI api. Left it "" to call official API directly.
[optional] environment variable SPLASH_URL be provide to enable your splash (Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python 3 using Twisted and QT5) api. Left it "" to fetch url direct.
Start the app

npm run dev

Open http://localhost:3000 in your browser to view the app.

Related Projects

nextjs-starter-template

A starter template for building Next.js applications with Supabase for authentication, TypeScript...

03 Aug 2024 3

headless-vector-search

Supabase Toolkit to perform vector similarity search on your knowledge base embeddings.

25 May 2023 134

DocNavigator

AI-powered chatbot builder that is designed to improve the user experience on product documentati...

06 May 2023 118

chatpdf-gpt

ChatPDF-GPT is an innovative chat interface application powered by LangChain and OpenAI, allowing...

09 Jun 2023 160

chatgpt-your-files

Production-ready MVP for securely chatting with your documents using pgvector

07 Oct 2023 338

nextjs-openai-doc-search

Template for building your own custom ChatGPT style doc search powered by Next.js, OpenAI, and Su...

01 Apr 2023 1,580

cult-directory-template

A full stack Next.js, Shadcn, and Supabase directory template. Build your startup directory effor...

12 Jun 2024 408

remix-starter-kit

Remix with brilliant bells and useful whistles

23 Nov 2021 234

ng-starter-kit

Angular with Supabase, and other bells and whistles

20 Nov 2021 16

Next.js-Blog-App

✨ Multi-User, Full-stack blogging application built with Next.js and Supabase.

09 Jul 2023 299

portfolio.io

An app to share your investment portfolio with your friends!

09 Apr 2024 2

langchain-supabase-website-chatbot

Build a chatgpt chatbot for your website using LangChain, Supabase, Typescript, Openai, and Next.js.

10 Mar 2023 665

DiscovAI-search

🔍 DiscovAI-Search: An AI-powered search engine for AI tools and custom data. Built with Next.js, ...

30 Jul 2024 179

sveltedoc

Think of Google doc (without the whole collaboration complexity) with a twist that all the docume...

03 Aug 2021 17

embedbase

A dead-simple API to build LLM-powered apps

26 Dec 2022 480