PDF Data Analyzer - Conversational PDF Querying with Google Gemini AI

Project Overview

The PDF Data Analyzer is an AI-powered chatbot application that allows users to upload PDF documents and ask questions using natural language. The system responds with relevant, context-aware answers by leveraging Google's Gemini AI model for both conversational responses and document embeddings. The application is built using Streamlit for the user interface and FAISS for vectorized document search.

This project is ideal for anyone needing to search and retrieve information from large PDF documents in an interactive, conversational way. Whether for research, legal documents, or academic papers, the PDF Data Analyzer is designed to make document querying seamless and intuitive.

Features

Google Gemini AI Integration: Uses Google's Generative AI embeddings and Gemini Pro for natural language understanding and document interaction.
Conversational Interface: A chatbot-like experience where users can ask questions about PDF documents and receive accurate answers.
PDF Text Extraction: Automatically extracts text from uploaded PDFs for processing and analysis.
Embeddings and Vector Search: Uses FAISS to convert text chunks into embeddings and retrieve relevant content based on semantic similarity.
Real-time Response: Processes PDFs and returns answers almost instantly, improving user productivity.
Streamlit Interface: Simple and elegant web-based interface, allowing for easy user interaction.
AWS S3 Support: Optional integration with AWS S3 for cloud storage of PDFs.

Experience the live project

Check out the live Streamlit app demo.

Project UI: App Screenshot

System Design:

Installation

1. Clone the Repository

git clone [email protected]:Srikanth-Banda/PDF-Data-Analyzer.git
cd PDF-Data-Analyzer

2. Set Up a Virtual Environment

python3 -m venv pdf-chat-env
source pdf-chat-env/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up Environment Variables

Create a .env file in the root of the project and add your Google API Key:

GOOGLE_API_KEY=your_google_api_key

(Optional) If using AWS S3, also add:

AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_BUCKET=your_s3_bucket_name

5. Run the Application

streamlit run app.py

You can now access the application on http://localhost:8501 and start interacting with your PDF documents.

Usage

Upload PDFs: Upload one or multiple PDF files using the sidebar.
Ask Questions: Once processed, ask natural language questions about the contents of the PDFs.
Get Answers: The system will return the most relevant sections of the PDF in response to your queries.

Tech Stack

LangChain: For processing text and conversational AI logic.
Google Gemini AI: Provides embeddings and conversational responses.
FAISS: Efficient vector search for document retrieval.
PyPDF2: Extracts text from uploaded PDF documents.
Streamlit: Frontend for the web interface.
AWS S3: (Optional) For cloud-based PDF storage.

Future Improvements

Support for Additional File Types: Extend support to Word, Excel, and other document types.
Multi-lingual Support: Add support for querying PDFs in multiple languages.
User Authentication: Implement authentication for secure file uploads and personalized queries.

Contributing

Contributions are welcome! Please submit a pull request or open an issue to suggest improvements.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Related Projects

docGPT-langchain

🔐Free GPT-3.5 chat with your docs (PDF, WORD, CSV, TXT)

03 Jul 2023 236

Mutual_Fund_Chatbot

This is a basic RAG chatbot and report generator made using LangChain, Streamlit, FAISS, Cohere's...

04 May 2024 2

AnyChat

Chat with your Documents(PDF, TXT, DOCX, ODT, PPTX etc), Websites and Youtube Chat too!, CSV file...

13 Mar 2024 42

InstaDoc-Intelligent-QnA-Powered-by-RAG

Upload documents 📄 and get instant, accurate answers to your questions with InstaDoc: Intelligent...

23 Jul 2024 0

Chat-Duck

This repository is a 'Chat-with-your-PDF' project using RAG approach.

28 Jun 2024 0

chatpdf-gpt

ChatPDF-GPT is an innovative chat interface application powered by LangChain and OpenAI, allowing...

09 Jun 2023 160

PaperChat

PaperChat is an AI-powered chat application designed to handle PDF documents through a user-frien...

11 Jun 2024 2

Gemini-Quizzify

This project develops a solution called Quiz Builder, which dynamically generates quizzes based o...

23 Aug 2024 0

DataChad

Ask questions about any data source by leveraging langchains

10 May 2023 314

ChatPDF

Chat with any PDF. Easily upload the PDF documents you'd like to chat with. Instant answers. Ask ...

09 May 2023 1,398

Chatbot_Langchain_MultiQueryRetriever

chatbot that allows you to interact with your PDF files. When you upload a document, it generates...

25 Aug 2024 0

PrivateDocBot

📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Pr...

13 Aug 2023 71

Rag-Ai-ChatBot-For-CSE-QnA

This project is an AI-driven chatbot trained on theoretical Computer Science subjects like DBMS, ...

10 Sep 2024 2

Chatbot-PDF

This repository is created for the web development project of Custom PDF ChatBot by METIS, IITGN.

22 May 2024 2

MultiAI

A versatile AI-powered chatbot built with Streamlit, integrating multiple LLMs for chat, web sear...

01 Sep 2024 0