RAG document chat with Amazon Bedrock using Typescript on Lambda

This repository represents a basic, more-or-less functional prototype-grade implementation of retrieval augmented generation (RAG) on Amazon Bedrock.

The project uses TypeScript and, I hope, a clear structure to understand the steps, undiluted without LangChain or other magics (other than a small text-splitting library!).

When making this, unfortunately it was harder than expected to get clear, first-principles examples and guidance, especially if you are not primarily interested in Python as the language. Thanks mainly to Janakiram MSV's videos on RAG and a set of examples shared by David Boyne on LinkedIn (unable to find the link, though...) I've been able to get something working that should demonstrate how to think of constructing these capabilities.

I hope this repo will make this technique easier to understand and implement for you, even in you might use a different tech stack.

Solution

Prerequisites

Recent Node.js (ideally 20+) installed.
Amazon Web Services (AWS) account with sufficient permissions so that you can deploy infrastructure. A naive but simple policy would be full rights for CloudWatch, Lambda, API Gateway, and S3.
Ideally some experience with Serverless Framework as that's what we will use to deploy the service and infrastructure.

Configuration and setup

In the AWS console:

Go to the Bedrock page and enable any models you want to use. We will use Jurassic-2 Ultra and Amazon Titan Text Embeddings.
Create an SQS queue named document-chat-demo-embeddings with standard settings.
Create an OpenSearch Serverless collection using the "Vector search" and "Easy create" options. This will take a few minutes to start up.
Under Collections, select your collection, go into the Indexes tab and create a vector index.
- Select the "JSON" option and paste the contents from opensearch-index.json into the text field.
- Give the index the name documents.
- In the Collections view, make note of the OpenSearch URL; you will update the infrastructure configuration in the next step.

In serverless.yml:

Update the values at custom.awsAccountNumber, custom.documentsBucketName (your choice of random name), and custom.openSearchUrl to your values

In your IDE/CLI:

Deploy the stack with npm run deploy

In the AWS console:

In the console for OpenSearch, under Serverless > Security > Data access policies, open the pre-baked policy and add the Lambda functions' roles (Ask and GenerateEmbeddings) to the selected principals

There is also a file src/config/config.ts that you may wish to modify, if you want a different region or similar.

Installation

Clone, fork, or download the repo as you normally would. Run npm install.

Commands

npm start: Run application locally
npm run build: Package application with Serverless Framework
npm run deploy: Deploy application to AWS with Serverless Framework
npm run teardown: Remove stack from AWS

Running

You will need documents for this to use your "own data".

Adding documents

In the current implementation, the infrastructure allows for S3 events to be emitted for either PDF and TXT files being added to a documents folder in your bucket (create this if you haven't already).

However, the actual chunking function will only currently do anything with TXT files. Feel free to extend this with PDF parsing and whatever you might need. It's not too complicated, and this repo is about showing the principles in a working minimal way, so I've not felt any need to over-invest here and now.

To start the process of embedding vectors on document data, simply upload one of the provided documents (or any other such document) to your buckets documents folder. There is a TXT and a PDF file, with essentially the same content, located in the data directory.

Asking questions

The endpoint takes a GET request with a URL-encoded string. If you don't know how to do this by heart, then there are simple online tools that can help you.

For the question "What does Mikael say about dumb, predictable code?", the call would be:

curl https://RANDOM_ID.execute-api.REGION.amazonaws.com/\?ask\=What%20does%20Mikael%20say%20about%20dumb%2C%20predictable%20code%3F

This will respond back with the LLM's answer in a few seconds.

Models

Amazon

Titan Text Large: amazon.titan-tg1-large
Titan Text Embeddings: amazon.titan-e1t-medium
Titan Text Embeddings v2: amazon.titan-embed-g1-text-02
Titan Text G1 - Express: amazon.titan-text-express-v1
Titan Embeddings G1 - Text: amazon.titan-embed-text-v1

Stability AI

Stable Diffusion XL: stability.stable-diffusion-xl
Stable Diffusion XL: stability.stable-diffusion-xl-v0

AI21 Labs

J2 Grande Instruct: ai21.j2-grande-instruct
J2 Jumbo Instruct: ai21.j2-jumbo-instruct
Jurassic-2 Mid: ai21.j2-mid
Jurassic-2 Mid: ai21.j2-mid-1
Jurassic-2 Ultra: ai21.j2-ultra
Jurassic-2 Ultra: ai21.j2-ultra-v1

Anthropic

Claude Instant: anthropic.claude-instant-v1
Claude: anthropic.claude-v1
Claude: anthropic.claude-v2

Cohere

Cohere Command: cohere.command-text-v14

Other notes

Don't forget to add the Lambda IAM profile to OpenSearch's allowed users for the data access policy.
Don't use the name vector for fields in OpenSearch - it won't work :)
Check in the AWS web console/GUI, on the Bedrock Providers tab, for API request examples.

bedrock-rag-demo