This is a generalizable graph-RAG LLM tool originally developed for EHR (electronic health records) where often there is no reliable documentation, but can be used for any large/messy/poorly documented database and not limited to healthcare data.
Given access to a database and a prompt, it returns a valid SQL query based on the learned structure and examples from all available tables and databases.
Large EHR databases often have hundreds and thousands of tables and lack comprehensive documentation, since the schema is often proprietary and implemented slightly differently in different organizations, making it challenging for users to write complex SQL queries. This tool bridges that gap by automatically exploring the database structure and generating SQL code based on natural language prompts.
main.py
: The main application file that integrates all components and handles user requests.db_connector.py
: Manages database connections and caching. Example given with sqlite3, set up your Spark/Hadoop/etc. connector here.llm_api_connector.py
: Handles the connection to an LLM API. The example uses Google Generative AI studio due to their free API. If used in a healthcare setting, you need to use a PHI API approved by your institution.graph_rag.py
: Implements the Graph RAG system using FAISS for vector storage and NetworkX for graph representation.user_interface.html
: Provides a simple interface for users to input prompts and view results while experimenting with the logic.pip install -r requirements.txt
GOOGLE_AI_KEY
environment variable is set.main.py
to start the Flask server.http://localhost:5000
and enter your prompts to generate SQL code.This project is designed for demonstration purposes and uses sample SQLite databases. For production use, you should adapt the db_connector.py
to work with your specific database (Spark,Hadoop,cloud etc.) systems and implement proper security measures.