This repository contains the official Neo4j GraphRAG features for Python.
The purpose of this package is to provide a first party package to developers, where Neo4j can guarantee long term commitment and maintenance as well as being fast to ship new features and high performing patterns and methods.
Documentation: https://neo4j.com/docs/neo4j-graphrag-python/
Python versions supported:
This package requires Python (>=3.9).
To install the latest stable version, use:
pip install neo4j-graphrag
pygraphviz
is used for visualizing pipelines.
Follow installation instructions here.
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm.openai_llm import OpenAILLM
# Instantiate Entity and Relation objects
entities = ["PERSON", "ORGANIZATION", "LOCATION"]
relations = ["SITUATED_AT", "INTERACTS", "LED_BY"]
potential_schema = [
("PERSON", "SITUATED_AT", "LOCATION"),
("PERSON", "INTERACTS", "PERSON"),
("ORGANIZATION", "LED_BY", "PERSON"),
]
# Instantiate the LLM
llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"max_tokens": 2000,
"response_format": {"type": "json_object"},
},
)
# Create an instance of the SimpleKGPipeline
kg_builder = SimpleKGPipeline(
llm=llm,
driver=driver,
embedder=OpenAIEmbeddings(),
file_path=file_path,
entities=entities,
relations=relations,
)
await kg_builder.run_async(text="""
Albert Einstein was a German physicist born in 1879 who wrote many groundbreaking
papers especially about general relativity and quantum mechanics.
""")
When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions the embeddings have.
Assumption: Neo4j running
from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_vector_index
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
INDEX_NAME = "vector-index-name"
# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)
# Creating the index
create_vector_index(
driver,
INDEX_NAME,
label="Document",
embedding_property="vectorProperty",
dimensions=1536,
similarity_fn="euclidean",
)
Note that the below example is not the only way you can upsert data into your Neo4j database. For example, you could also leverage the Neo4j Python driver.
Assumption: Neo4j running with a defined vector index
from neo4j import GraphDatabase
from neo4j_graphrag.indexes import upsert_vector
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)
# Upsert the vector
vector = ...
upsert_vector(
driver,
node_id=1,
embedding_property="vectorProperty",
vector=vector,
)
Assumption: Neo4j running with populated vector index in place.
Limitation: The query over the vector index is an approximate nearest neighbor search and may not give exact results. See this reference for more details.
While the library has more retrievers than shown here, the following examples should be able to get you started.
In the following example, we use a simple vector search as retriever,
that will perform a similarity search over the index-name
vector index
in Neo4j.
from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.embeddings import OpenAIEmbeddings
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
INDEX_NAME = "vector-index-name"
# Connect to Neo4j database
driver = GraphDatabase.driver(URI, auth=AUTH)
# Create Embedder object
embedder = OpenAIEmbeddings(model="text-embedding-3-large")
# Initialize the retriever
retriever = VectorRetriever(driver, INDEX_NAME, embedder)
# Initialize the LLM
# Note: An OPENAI_API_KEY environment variable is required here
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
# Initialize the RAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)
# Query the graph
query_text = "How do I do similarity search in Neo4j?"
response = rag.search(query_text=query_text, retriever_config={"top_k": 5})
print(response.answer)
poetry install
If you have a bug to report or feature to request, first search to see if an issue already exists. If a related issue doesn't exist, please raise a new issue using the relevant issue form.
If you're a Neo4j Enterprise customer, you can also reach out to Customer Support.
If you don't have a bug to report or feature request, but you need a hand with the library; community support is available via Neo4j Online Community and/or Discord.
main
and start with your changes!When you're finished with your changes, create a pull request, also known as a PR.
main
.CHANGELOG.md
if you have made significant changes to the project, these include:
CHANGELOG.md
changes brief and focus on the most important changes.CHANGELOG.md
@CodiumAI-Agent /update_changelog
CHANGELOG.md
content under 'Next'.This should run out of the box once the dependencies are installed.
poetry run pytest tests/unit
To run e2e tests you'd need to have some services running locally:
The easiest way to get it up and running is via Docker compose:
docker compose -f tests/e2e/docker-compose.yml up
(pro tip: if you suspect something in the databases are cached, run docker compose -f tests/e2e/docker-compose.yml down
to remove them completely)
Once the services are running, execute the following command to run the e2e tests.
poetry run pytest tests/e2e