Vision-Augmented Retrieval and Generation (VARAG)
Vision Augmented Retrieval and Generation
VARAG (Vision-Augmented Retrieval and Generation) is a vision-first RAG engine that emphasizes vision-based retrieval techniques. It enhances traditional Retrieval-Augmented Generation (RAG) systems by integrating both visual and textual data through Vision-Language models. |
---|
VARAG supports a wide range of retrieval techniques, optimized for different use cases, including text, image, and multimodal document retrieval. Below are the primary techniques supported:
Follow these steps to set up VARAG:
git clone https://github.com/adithya-s-k/VARAG
cd VARAG
Create and activate a virtual environment using Conda:
conda create -n varag-venv python=3.10
conda activate varag-venv
Install the required packages using pip:
pip install -e .
# or
poetry install
To install OCR dependencies:
pip install -e .["ocr"]
Explore VARAG with our interactive playground! It lets you seamlessly compare various RAG (Retrieval-Augmented Generation) solutions, from data ingestion to retrieval.
You can run it locally or on Google Colab:
python demo.py --share
This makes it easy to test and experiment with different approaches in real-time.
Each RAG technique is structured as a class, abstracting all components and offering the following methods:
from varag.rag import {{RAGTechnique}}
ragTechnique = RAGTechnique()
ragTechnique.index(
"/path_to_data_source",
other_relevant_data
)
results = ragTechnique.search("query", top_k=5)
# These results can be passed into the LLM / VLM of your choice
I initially set out to rapidly test and evaluate different Vision-based RAG (Retrieval-Augmented Generation) systems to determine which one best fits my use case. I wasnt aiming to create a framework or library, but it naturally evolved into one.
The abstraction is designed to simplify the process of experimenting with different RAG paradigms without complicating compatibility between components. To keep things straightforward, LanceDB was chosen as the vector store due to its ease of use and high customizability.
This paradigm is inspired by the Byaldi repo by Answer.ai.
Technique | Notebook | Demo |
---|---|---|
Simple RAG | simpleRAG.py | |
Vision RAG | visionDemo.py | |
Colpali RAG | colpaliDemo.py | |
Hybrid Colpali RAG | hybridColpaliDemo.py |
Contributions to VARAG are highly encouraged! Whether it's code improvements, bug fixes, or feature enhancements, feel free to contribute to the project repository. Please adhere to the contribution guidelines outlined in the repository for smooth collaboration.
VARAG is licensed under the MIT License, granting you the freedom to use, modify, and distribute the code in accordance with the terms of the license.
We extend our sincere appreciation to the following projects and their developers:
This project also draws inspiration from the following repositories:
For the implementation of Colpali, we referred to the following blogs and codebases:
We would also like to acknowledge the authors of the ColPali paper, which significantly influenced our work:
@misc{faysse2024colpaliefficientdocumentretrieval,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Cline Hudelot and Pierre Colombo},
year={2024},
eprint={2407.01449},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.01449},
}