Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Llama-3, Langchain, OpenAI, Upstash, Brave & Serper
MIT License
This repository contains the code and instructions needed to build a sophisticated answer engine that leverages the capabilities of Groq, Mistral AI's Mixtral, Langchain.JS, Brave Search, Serper API, and OpenAI. Designed to efficiently return sources, answers, images, videos, and follow-up questions based on user queries, this project is an ideal starting point for developers interested in natural language processing and search technologies.
Simple, Easy, Fast and Free - deploy to vercel
Make Sure to fill all the API Keys required for the Installation.
git clone https://github.com/developersdigest/llm-answer-engine.git
cd llm-answer-engine
Edit the docker-compose.yml
file and add your API keys
Running the Server
To start the server, execute:
docker compose up -d #for v2
or
docker-compose up -d #for v1
the server will be listening on the specified port.
npm install
or
bun install
.env
file in the root of your project and add your API keys:
OPENAI_API_KEY=your_openai_api_key
GROQ_API_KEY=your_groq_api_key
BRAVE_SEARCH_API_KEY=your_brave_search_api_key
SERPER_API=your_serper_api_key
To start the server, execute:
npm run dev
or
bun run dev
the server will be listening on the specified port.
The configuration file is located in the app/config.tsx
file. You can modify the following values
Currently, function calling is supported with the following capabilities:
Currently, streaming text responses are supported for Ollama, but follow-up questions are not yet supported.
Embeddings are supported, however, time-to-first-token can be quite long when using both a local embedding model as well as a local model for the streaming inference. I recommended decreasing a number of the RAG values specified in the app/config.tsx
file to decrease the time-to-first-token when using Ollama.
To get started, make sure you have the Ollama running model on your local machine and set within the config the model you would like to use and set use OllamaInference and/or useOllamaEmbeddings to true.
Note: When 'useOllamaInference' is set to true, the model will be used for both text generation, but it will skip the follow-up questions inference step when using Ollama.
More info: https://ollama.com/blog/openai-compatibility
https://github.com/Portkey-AI/gateway
Watch the express tutorial here for a detailed guide on setting up and running this project. In addition to the Next.JS version of the project, there is a backend only version that uses Node.js and Express. Which is located in the 'express-api' directory. This is a standalone version of the project that can be used as a reference for building a similar API. There is also a readme file in the 'express-api' directory that explains how to run the backend version.
Watch the Upstash Redis Rate Limiting tutorial here for a detailed guide on setting up and running this project. Upstash Redis Rate Limiting is a free tier service that allows you to set up rate limiting for your application. It provides a simple and easy-to-use interface for configuring and managing rate limits. With Upstash, you can easily set limits on the number of requests per user, IP address, or other criteria. This can help prevent abuse and ensure that your application is not overwhelmed with requests.
Contributions to the project are welcome. Feel free to fork the repository, make your changes, and submit a pull request. You can also open issues to suggest improvements or report bugs.
This project is licensed under the MIT License.
I'm the developer behind Developers Digest. If you find my work helpful or enjoy what I do, consider supporting me. Here are a few ways you can do that: