A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).
GPL-3.0 License
A scalable parallel gensim implementation of Learning Role-based Graph Embeddings (IJCAI 2018).
The second-order random walk sampling methods were taken from the reference implementation of Node2vec.
The model is now also available in the package Karate Club.
This repository provides an implementation of Role2Vec as described in the paper:
Learning Role-based Graph Embeddings. Nesreen K. Ahmed, Ryan Rossi, John Boaz Lee, Theodore L. Willke, Rong Zhou, Xiangnan Kong, Hoda Eldardiry. StarAI workshop - IJCAI, 2018. [Paper]
The codebase is implemented in Python 3.5.2. package versions used for development are just below.
networkx 2.4
tqdm 4.28.1
numpy 1.15.4
pandas 0.23.4
texttable 1.5.0
scipy 1.1.0
argparse 1.1.0
gensim 3.6.0
scikit-learn 0.20.0
--graph-input STR Input graph path. Default is `input/cora_edges.csv`.
--output STR Embeddings path. Default is `output/cora_role2vec.csv`.
--window-size INT Skip-gram window size. Default is 5.
--walk-number INT Number of walks per node. Default is 10.
--walk-length INT Number of nodes in walk. Default is 80.
--sampling STR Sampling procedure. Default is `first`.
--P FLOAT Return parameter. Default is 1.0.
--Q FLOAT In-out parameter. Default is 1.0.
--dimensions INT Number of dimensions. Default is 128
--down-sampling FLOAT Down sampling frequency. Default is 0.001.
--alpha FLOAT Initial learning rate. Default is 0.025.
--min-alpha FLOAT Final learning rate. Default is 0.025.
--min-count INT Minimal feature count. Default is 1
--workers INT Number of cores. Default is 4.
--epochs INT Number of epochs. Default is 10.
--features STR Feature extraction mechanism. Default is `wl`.
--labeling-iterations INT Number of WL labeling iterations. Default is 2.
--log-base FLOAT Log base for label creation. Default is 1.5.
--graphlet-size INT Maximal graphlet size. Default is 4.
--quantiles INT Number of quantiles for binning. Default is 5.
--motif-compression STR Motif compression procedure. Default is `string`.
--seed INT Sklearn random seed. Default is 42.
--factors INT Factors for motif compression. Default is 8.
--clusters INT Number of motif based labels. Default is 50.
--beta FLOAT Motif compression regularizer. Default is 0.01.
Using the degree centrality as a structural feature.
python src/main.py --features degree
Using the Weisfeiler-Lehman labeling as a structural feature.
python src/main.py --features wl
Using motif based structural features with factorization compression.
python src/main.py --features motif --motif-compression factorization
Using motif based structural features with factorization compression and a structural label number of 40.
python src/main.py --features motif --motif-compression factorization --clusters 40
Using a custom factorization dimension for the embedding.
python src/main.py --dimensions 32
Using second-order attributed ranom walks for sampling.
python src/main.py --sampling second --P 1 --Q 4
License