streamlit-neo4j-hackathon

This repository has been created as an application to the Streamlit LLM hackathon.

The idea is to present multiple ways of improving Cypher-generating capabilities of LLMs to improve RAG applications based on knowledge graphs like Neo4j. LangChain is used for all the LLM integrations and functionalities.

Demo is available on Streamlit community cloud: https://vc-chatbot.streamlit.app/

You need to have access to GPT-4 in order for the demo to work!

Dataset

There is a demo database running on demo.neo4jlabs.com. This database is a set of companies, their subsidiaries, people related to the companies and articles mentioned the companies. The database is a subset of the Diffbot knowledge graph. You can access it with the following credentials:

URI: neo4j+s://demo.neo4jlabs.com
username: companies
password: companies
database: companies

The database contains both structured information about organizations and people as well as news articles. The news articles are linked to the mentioned entity, while the actual text is stored in the Chunk nodes alongside their text-embedding-ada-002 vector representations.

You can ask questions related to companies, such as their board members, suppliers, competitors, subsidiaries, and investors. Additionally, you can ask questions regarding the news about those organizations, or just search through news in general using semantic search.

Architecture

The code include a couple of improvements to the original LangChain GraphCypherQAChain:

Entity matching: an LLM extract all people or organizations from the questions and then uses Full-text search to match them to database entities
Dynamic few-shot examples: Fewshot Cypher statement examples are imported to the database and indexed using the vector index. At query time, vector index search is used to find the most similar fewshot example, which are then used in the Cypher generating prompt
Relationship direction validation: A module has been added that programatically validates and corrects relationship directions in LLM-generated Cypher statements based on the existing graph schema.

Setup local environment

To setup a local database replicating the dataset used in the demo, you need to follow these steps:

Download the database dump and restore it
Import fewshot examples and add vector/full-text indices by following the import.cql script. You need to provide the openai_api_key to calculate the fewshot example embedding values

Then you can run the streamlit application by install the requirements and setting the streamlit secrets for the following variables: