RAG/LLM info retrieval
BSD-2-CLAUSE License
Doxie is an experiment in LLM-based information retrieval. Or in plain words: Doxie uses ChatGPT to answer questions based on the content of one or more websites, documents, or whatever other text content you want to feed it. It exposes a simple, ChatGPT like interface.
Doxie implements the most basic RAG loop imaginable, with simple context management. Have a look at these source files:
xxx.embeddings.bin
file to be loaded into the vector databaseembedder-cli.ts
into a collection and lets you query it src/server/rag.ts
To run Doxie you need the following software installed on your system:
Doxie currently does not have a proper ingestion pipeline, and is hard coded for the test use case. This will probably, maybe amended in the coming days/weeks, depending on my mood.
If you want to play around with Doxie on some real-world data, run the ./download-testdata.sh
script in the root folder. It will download a data set with embeddings generated from AMS Berufslexikon to ./docker/data/berufslexikon.embeddings.bin
. You can then follow the instructions in the Development
section below.
npm run dev
In VS Code run the dev
launch configurations, which will attach the debugger to the server, spawn a browser window, and also attach to that for frontend debugging.
./publish.sh server
./publish.sh