Interactive Video Transcript and URL Chatbot - RAG

This project was implemented for LLM Zoomcamp - a free course about LLMs and RAG.

Expected Usage

Gather you YouTube video URLs and website URLs
Paste them in the given box separated by new lines

Click the Load URLs button
Wait until the processing is complete

Make the questions on the URL Chat Bot Section (nexto th the Submit message button)
Wait for your answer ... (the answer speed varies according the deployment system)

Keep texting

Diagram & Services

Technologies

Language: Python 3.12
LLM:
- Ollama (for local deployment and some tests) - slow for long executions [using gemma2, llama3.1 and phi3.5]
- OpenAI (gpt-4o-mini)
Knowledge base: FAISS
Interface: Gradio (UI)
Ingestion Pipeline: Automated ingestion customized for the use-case. Implemented with LangChain and other complementary libraries.
Monitoring: WIP

Dataset

The used dataset for this project is dynamic as it depends on the user interests.

We can consider the next datasets as the foundations of this project:

YouTuve Video (Audio Transcripts): Provide the video URL and the Data Ingestion Pipeline will solve the acquisition of the video transcript and its processing.
Websites, Web Articles and Wikis: Provide the URL of the desired document to be included and the Data Ingestion Pipeline will read it as a website and leave it ready for the usage.

The data to validate and test the LLM can be found here and here (Public URLs accessible by anyone).

Detailed information

1. Problem Description: Interactive Content Exploration Tool

Objective:

Develop a web-based application enabling users to interact with and extract insights from YouTube video transcripts and website content.

This solution aims to enhance user engagement, streamline content exploration, and provide actionable insights efficiently.

Enhanced Content Accessibility:

- Challenge: Users often face difficulties finding and accessing relevant information from video content and websites.

- Solution: This tool allows users to input YouTube video URLs and website links, process them, and interact with the content through a chat interface. This makes it easier for users to find specific information and gain insights without manually sifting through lengthy videos or web pages.
Improved User Engagement:

- Challenge: Traditional methods of content consumption can be passive and less engaging, leading to lower user interaction and satisfaction.

- Solution: By providing a chatbot interface, users can engage in a conversational manner with the content, asking questions and receiving tailored responses. This interactive approach increases user engagement and makes content exploration more dynamic and user-friendly.
Streamlined Information Retrieval:

- Challenge: Retrieving specific information from videos and websites can be time-consuming and inefficient.

- Solution: The application processes video transcripts and website content, allowing users to instantly query and receive relevant information. This speeds up information retrieval and improves overall efficiency.
Accessibility for Non-Technical Users:

- Challenge: Many users lack the technical expertise to manually analyze or process content from various sources.

- Solution: The user-friendly interface simplifies the process of content analysis and interaction, making it accessible to users with varying levels of technical knowledge.
Competitive Advantage:

- Challenge: Businesses and content creators need innovative tools to stand out and provide value to their audiences.

- Solution: This tool positions your business as a forward-thinking content interaction and analysis leader. It demonstrates a commitment to enhancing user experience and leveraging advanced technologies to provide valuable insights.

Interactive Chat Interface: Users can ask questions and receive responses based on the content of YouTube videos and websites.
Seamless URL Processing: Users can easily input and process multiple URLs to extract relevant content.
Real-Time Insights: Provides immediate responses and insights from the processed content, improving user satisfaction and efficiency.

Enhanced User Experience: Provides a more engaging and intuitive way for users to interact with and explore content.
Increased Efficiency: Streamlines content retrieval and analysis, saving users time and effort.
Accessibility: Makes complex content more accessible to a broader audience.
Innovation: Positions your business as a leader in integrating advanced technologies for content interaction.

2. RAG flow

A Knowledge base in FAISS is used.
A LLM is used as well, querying on top of the gathered Knowledge base.

3. Retrieval evaluation

Multiple Retrieval approaches were evaluated (in total 6), as implemented here. The used metric was the hit rate. The results are presented as follows:

The different Retrieval approaches are the next ones:

Simple Vector Search
Simple Vector Search with Rerank
Simple Keyword Search
Simple Keyword Search with Rerank
Embedded Vector and Keyword Search
Embedded Vector and Keyword Search with Rerank

As shown in the Figure above, the best retrieval was Embedded Vector and Keyword Search with Rerank, outperforming all its competitors.

4. RAG evaluation

Multiple RAG approaches were evaluated based on its RAG configurations.

The cosine similarity with a groundtruth database was the metric to evaluate the RAGs, waas implemented here. The average metric of the result is presented in the next figure:

The best RAG performance based on the Cosine Similarity was Embedded Vector and Keyword Search with Rerank, but the other competitors were very close.

5. Interface

The UI implemented in gradio is presented here.

6. Ingestion pipeline

The ingestion pipeline is totally automated and part of the core functionalities of this project, and the related code can be found here and here, implemented in different ways (gradio app or tests 1 or test 2 or test 3)

7. Monitoring

Not implemented, but easily a database and user questions and answers can be saved.

8. Containerization & Reproducibility

Dockerfile

A Dockerfile was implemented to load the Gradio app with all the requirements. With this Dockerfile is possible to run the App consuming a local Ollama service.

Using a local Ollama service make sure you:

Installed it
It is running ollama serve, to serve the models

You have downloaded the required models:

ollama pull mxbai-embed-large
ollama pull gemma2
ollama pull phi3.5

To run the Dockerfile follow the next steps:

Build the Dockerfile
```
docker build -t llm-url_video-rag .
```
Validate Ollamma is running as shown before
Run the app (Make sure the used port to serve Ollama is 7860 - default Ollama port)
```
docker run --network="host" -p 7860:7860 llm-url_video-rag
```
Open the UI url in your browser. It expects to be http://localhost:7860/

Docker Compose

As well a fully implemented docker-compose was developed to manage the full app (including its local Ollama service model).

To run the docker compose follow the next steps:

Make sure you build the Dockerfile (as shown before)

Install the Ollama required models. Execute in the console:

docker compose exec ollama ollama pull mxbai-embed-large
docker compose exec ollama ollama pull gemma2
docker compose exec ollama ollama pull phi3.5

Execute the Docker Compose (make sure the local Ollama is disabled -if you have it-, the port 11434 needs to be available)
```
docker compose up
```
Open the UI url in your browser. It expects to be http://localhost:7860/