Big Data Management: Research Articles Graph Databases using Neo4j

Project Overview

This project is a hands-on lab focused on Property Graphs using Neo4j completed for the course Big Data Management at the Barcelona School of Economics. The primary objective of the project is to build, load, modify, and query a graph database that captures the complex relationships within the domain of academic reasearch articles, journals, and authors. This project offers a comprehensive exploration of how graph databases can be used to store, manage, and analyze data in the context of academic research, providing valuable insights into the relationships and impact of scholarly work. The nature of graph databases like Neo4j, allows them to be applied to a wide range of other contexts as well, including fraud detection, social networks, recommendation systems, and more.

Key Concepts and Relationships

Authors: Researchers who write and review scientific papers.
Papers: Research articles authored by researchers and published in journals or conferences.
Conferences/Workshops: Forums where research papers are presented; organized by editions.
Journals: Periodicals that publish research papers in volumes.
Reviews: Evaluation of papers by assigned reviewers, as well as suggested reviewer decision (accepted/rejected) of authors results.
Keywords: Topics associated with the reasearch papers

Key Features

Graph Modeling: Creation of conceptual graph model representing research papers and their associated entities.
Data Loading: Creation and insertion of synthetic data into the Neo4j database.
Graph Evolution: Modification of the graph structure to accomodate new data requirements which included author affiliation, storing of paper reviews, etc.
Querying: Execution of Cypher queries to retrieve insights from the graph, such as most cited papers, author communities, etc.
Graph Algorithm: Application of Similarity algorithm to identify citation patterns between research papers.

Assumptions and Design Decisions

Data Modeling: The graph model was designed to optimize query performance for common research-related task (e.g., finiding top-cited papers, year published, etc).
Data Sources: Due to time constraints the data used for this project was synthetic rather than sourced online.
Data Evolution: The model was extended to include paper reviews and author affiliations to reflect real-world scenarios where such data would evolve over time.

Setup and Installation

Application Requirements

Neo4j
Neo4j Graph Data Science Library Plugin
Python

Run Project

The following setup guide should provide you with the necessary steps to get the project up and running smoothly in Neo4j, from initial setup to data exploration and analysis.

1. Clone repository

git clone https://github.com/nataliabeltranarg/NoSQL_GraphDatabases_Neo4j.git
cd NoSQL_GraphDatabases_Neo4j

2. Set Up Neo4j

Install Neo4j Desktop: Download and install Neo4j Desktop.
Install the Graph Data Science Plugin
Create a New Project & Database: Open Neo4j create a new project and a new database instance.
Start Database: Start the Neo4j database instance

3. Data Loading

Import data in the Import directory in Neo4j. Use the Cypher scripts to load data to browser.

4. Queries and Algorithm

Neo4j Browser:
- To execute and explore the graph data you will have to access the Neo4j Browser. This can be accessed by navigating to the http://localhost:7474 in your web browser.
Execute algorithm queries for analysis.

How to navigate the repository

├── 1. Data
│   ├── A1.ModelingGraph.jpg
│   ├── Synthetic Data
│   │   ├── Neo4j Data
│   │   │   ├── edges
│   │   │   ├── nodes
│   │   │   └── original-json-files-created 
│   │   └── BigData-NataliaClarice-DataCreation.ipynb
├── 2. Cypher Queries
│   ├── A.Modeling,Loading,Evolving
│   │   ├── A2.Loading-Instantiating.cypher
│   │   └── A3.Modification.cypher
│   ├── B.Querying
│   │   └── Queries.cypher
│   ├── C.Graph Algorithm
│   │   └── SimilarityAlgorithm.cypher
├── 3. Report
│   └── BSE-BigDataManagement-NataliaBeltran-ClariceMottet.pdf
└── README.md

Related Projects

neo4j-graphrag-python

Neo4j GraphRAG for Python

27 Feb 2024 96

masstin

Masstin: High-Speed DFIR Tool written in Rust and Graph Visualization in Neo4j for Comprehensive ...

19 Aug 2024 3

ERKG

Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph

18 Mar 2024 22

bloom

Neo4j Bloom is a graph exploration application for visually interacting with graph data

09 Apr 2020 9

my-graphDB-learning-lab

Explore the realms of graph databases with Neo4j, dive into Cypher queries, and integrate LLMs fo...

20 Mar 2024 6

bootiful-music

02 Aug 2018 20

contact-tracing

Contact Tracing graph for pandemic spread e.g. COVID-19 based on http://blog.bruggen.com/search/l...

07 Jul 2020 5

learning-neo4j

Learning trail and useful resources discovered while learning the Neo4j Graph database

11 Jul 2014 9

icij-paradise-papers

The Paradise Papers dataset and guide from the International Consortium of Investigative Journali...

09 Apr 2020 9

icij-panama-papers

The Panama Papers dataset and guide from the International Consortium of Investigative Journalist...

09 Apr 2020 10

PapersLab

The project aims to automate content classification and knowledge retrieval, as well as to perfor...

02 Apr 2024 1

NaLLM

Repository for the NaLLM project

02 May 2023 1,246

blank-sandbox

Blank Sandbox. Load your own data with LOAD CSV or create data from scratch

31 Jan 2020 2

twitter-trolls

Explore data released by NBC News from their investigation into Russian Twitter Trolls around the...

09 Apr 2020 6

graph-database-project

The Project is designed to interact with a graph database representing Wikipedia Main topic Class...

11 Aug 2024 0