Role2Vec

A scalable parallel gensim implementation of Learning Role-based Graph Embeddings (IJCAI 2018).

Abstract

The second-order random walk sampling methods were taken from the reference implementation of Node2vec.

The model is now also available in the package Karate Club.

This repository provides an implementation of Role2Vec as described in the paper:

Learning Role-based Graph Embeddings. Nesreen K. Ahmed, Ryan Rossi, John Boaz Lee, Theodore L. Willke, Rong Zhou, Xiangnan Kong, Hoda Eldardiry. StarAI workshop - IJCAI, 2018. [Paper]

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
gensim            3.6.0
scikit-learn      0.20.0

Datasets

Input and output options

  --graph-input      STR   Input graph path.   Default is `input/cora_edges.csv`.
  --output           STR   Embeddings path.    Default is `output/cora_role2vec.csv`.

Random walk options

  --window-size      INT    Skip-gram window size.        Default is 5.
  --walk-number      INT    Number of walks per node.     Default is 10.
  --walk-length      INT    Number of nodes in walk.      Default is 80.
  --sampling         STR    Sampling procedure.           Default is `first`.
  --P                FLOAT  Return parameter.             Default is 1.0.
  --Q                FLOAT  In-out parameter.             Default is 1.0.

Factorization options

  --dimensions      INT      Number of dimensions.      Default is 128
  --down-sampling   FLOAT    Down sampling frequency.   Default is 0.001.
  --alpha           FLOAT    Initial learning rate.     Default is 0.025.
  --min-alpha       FLOAT    Final learning rate.       Default is 0.025.
  --min-count       INT      Minimal feature count.     Default is 1
  --workers         INT      Number of cores.           Default is 4.
  --epochs          INT      Number of epochs.          Default is 10.

Feature creation options

  --features               STR     Feature extraction mechanism.         Default is `wl`.
  --labeling-iterations    INT     Number of WL labeling iterations.     Default is 2.
  --log-base               FLOAT   Log base for label creation.          Default is 1.5.
  --graphlet-size          INT     Maximal graphlet size.                Default is 4.
  --quantiles              INT     Number of quantiles for binning.      Default is 5.
  --motif-compression      STR     Motif compression procedure.          Default is `string`.
  --seed                   INT     Sklearn random seed.                  Default is 42.
  --factors                INT     Factors for motif compression.        Default is 8.
  --clusters               INT     Number of motif based labels.         Default is 50.
  --beta                   FLOAT   Motif compression regularizer.        Default is 0.01.

Examples

Using the degree centrality as a structural feature.

python src/main.py --features degree

Using the Weisfeiler-Lehman labeling as a structural feature.

python src/main.py --features wl

Using motif based structural features with factorization compression.

python src/main.py --features motif --motif-compression factorization

Using motif based structural features with factorization compression and a structural label number of 40.

python src/main.py --features motif --motif-compression factorization --clusters 40

Using a custom factorization dimension for the embedding.

python src/main.py --dimensions 32

Using second-order attributed ranom walks for sampling.

python src/main.py --sampling second --P 1 --Q 4

License

Badges

Extracted from project README's

Related Projects

STree

Oblique Tree classifier based on SVM nodes

09 May 2020 8

GraKeL

A scikit-learn compatible library for graph kernels

31 Oct 2017 593

AttentionWalk

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (Neur...

11 Jan 2019 318

SimGNN

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computat...

31 Jan 2019 749

DANMF

A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Co...

22 Nov 2018 201

Machine-Learning-with-Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

17 Jul 2017 3,075

ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network ap...

08 Oct 2016 198

ML_for_learner

Implementations of the machine learning algorithm with Python and numpy

20 Dec 2018 87

CapsGNN

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

29 Jan 2019 1,241

karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CI...

05 Dec 2019 2,092

imodelsX

Scikit-learn friendly library to interpret, and prompt-engineer text datasets using large languag...

04 Oct 2022 74

GraphWaveletNeuralNetwork

A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

18 Jan 2019 568

GAM

A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

28 Dec 2018 266

MachineLearningToolKit

Helper functions for all stages of the machine learning cycle.

21 Feb 2023 4

datawaza

Data science tools for exploration, visualization, and model iteration.

21 Aug 2023 3