The Project is designed to interact with a graph database representing Wikipedia Main topic Classification categories. It has a command Line utility feature that allows us to query the database efficiently and seamlessly.
MIT License
This project provides a command-line interface to interact with a graph database representing Wikipedia's Main Topic Classification categories. It allows efficient querying and exploration of the category hierarchy.
Link to Wikipedia classifications
import_data.py
: Imports data from CSV files into Neo4j.utils.py
: Provides utility functions for database operations.goals.py
: Defines functions for various database queries.dbcli.py
: Command-line interface for interacting with the database.config.py
: Stores database connection details and other settings.import_data.py
processes and imports data from taxonomy_iw.csv.gz
into Neo4j.dbcli.py
executes user commands using functions from goals.py
.utils.py
manages database connections and query execution.sudo apt update && sudo apt upgrade -y
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable 4.x' | sudo tee /etc/apt/sources.list.d/neo4j.list
sudo apt update
sudo apt install neo4j -y
Verify installation: neo4j --version
sudo apt install python3 python3-venv python3-pip -y
python3 -m venv myenv
source myenv/bin/activate
pip install neo4j pandas tqdm
cd dbproject
source myenv/bin/activate
sudo systemctl enable neo4j
sudo systemctl start neo4j
http://localhost:7474
in your web browser.neo4j
user.config.py
with your Neo4j credentials.python import_data.py
Schema Design:
name
propertyHAS_SUBCATEGORY
between parent-child nodesname
propertyname
property for all nodesData Import:
Query Functions:
goals.py
Command Line Interface:
dbcli.py
provides a user-friendly interfaceActivate the virtual environment and run:
python dbcli.py <goal_number> [arguments]
Available goals:
python dbcli.py 1 <node_name>
python dbcli.py 2 <node_name>
python dbcli.py 3 <node_name>
python dbcli.py 4 <node_name>
python dbcli.py 5 <node_name>
python dbcli.py 6 <node_name>
python dbcli.py 7
python dbcli.py 8
python dbcli.py 9
python dbcli.py 10
python dbcli.py 11 <old_name> <new_name>
python dbcli.py 12 <start_node> <end_node> [search_depth]
Detailed query results can be found in the Results folder.
Initial implementation faced performance issues with complex queries. Improvements made:
Performance comparison for the query from "Centuries" to "2020s_anime_films":
The optimized version with a default depth of 10 along with Asynchronous Processing efficiently identifies relevant paths quickly, balancing depth and time efficiency. Custom depth can be set for more extensive searches when time is not a constraint.
Contributions to improve this project are welcome. Please submit pull requests or open issues in the project repository.
For assistance or inquiries, please open an issue in the project's issue tracker or contact [email protected].
This project is licensed under the MIT License. See the LICENSE file for details.