LLM retrieval augmentation in Google Cloud

This demo features GCP Vector Search and VertexAI PaLM to combine the functionality of retrieval augmentation and conversational engines to create a question answering system where the user can ask a question and the LLM will use it's given context to answer the question.

The Dataset used is the Stanford Question Answering Dataset (SQuAD) , a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles.

The demo can be accessed here.

Services used

VertexAI Vector Search: ANN Similarity Seach
VertexAI PaLM: Conversational Engine
Cloud Run: Hosting of the API
Firestore: Document Database
Firebase: Frontend hosting
Cloud Build: CI/CD

Frameworks:

LangChain: Framework for creating conversational agent and retrieval augmentation
Tensorflow Hub:
Embeddings

Prerequisites

Terraform
A GCP project created

Docs

Infrastructure and Vector Search Setup:
Setup the required infrastructure using Terraform and create
the Vector Search index
Create embeddings: Generate the embeddings for the documents and index them in
Vector Search
Firestore: Index the documents in Firestore
LangChain Retriever and Agent: Create a LangChain retriever and conversational agent
Cloud Run: Grab all the code, package it and deploy the API to Cloud Run
Firebase WebUI: Create the Web app