A tool for parallel and distributed enumeration of cliques and diameter two kplexes.
APACHE-2.0 License
This repository contains implementations of algorithms for maximal subgraph enumeration, including the algorithm for maximal cliques and the algorithm for k-plexes of diameter 2 (which include those of size at least 2k-1) appeared in:
ICALP16 - Sublinear-Space Bounded-Delay Enumeration for Massive Network Analytics: Maximal Cliques (ICALP 2016, https://drops.dagstuhl.de/opus/volltexte/2016/6292/ )
KDD18 - D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes (KDD 2018, https://dl.acm.org/citation.cfm?doid=3219819.3220093 )
To build the project, just run the command ./build.sh
. It will ask if you
want to build the project with the support for distributed memory, for example
to run the code on a cluster of homogeneous interconnected computing nodes. If
this is the case, please read the "MPI" section, otherwise you jump
directly to the Using section.
MPI is the de-facto standard for implementing high performance applications
running on a cluster of computing nodes. If you need to run the code on a
cluster, while building the project you need to specify the path where MPI is
installed. The build script will try to infer by itself the path where MPI is
installed. If this path is wrong, you need to specify it manually. If you do
not have MPI already installed on your machine, please install it before
running the build script, for example by following
this guide. The
script assumes that the folder you specified contains one include
subfolders containing the MPI headers and one lib
subfolder containing
the MPI library. If this is not the case, you may need to modify either the
folder structure or the build.py
script. The code was only tested with
OpenMPI but should work with other MPI installations as well.
By default, the application accepts graphs in the NDE (Nodes-Degrees-Edges) format. The graph file should have .nde extension and contain the following information:
The OLY format is also available by using the flag -graph_format="oly". In this case the graph file should have .oly extension and contain the following information:
To execute the code, you need to run the following executable file:
./build/text_ui graph.nde
This executable accepts the following optional parameters:
-system: What should be enumerated. This can assume one of the following values:
-enumerator: This can assume one of the following values:
-k: This is the k value for k-plex enumeration. By default it is set to 2.
-q: This is the minimum size of the d2kplexes to be found. By default it is set to 1.
-n: Number of cores to use when enumerator=parallel. By default this is set to the number of cores available on the computing node.
-graph_format: The format of the graph provided in input. It can assume one of the following values (in both cases, node indices should start from 0):
-fast_graph: Use the faster but more memory hungry graph format. By default it is set to true.
-huge_graph: Use 64 bit integers to count nodes. By default it is set to false.
-one_based: Whether the graph is one based. Used only by oly format. By default it is set to false.
-quiet: Do not show any non-fatal output. By default it is set to false.
-print: Print the solutions one per line on stdout. Use with -quiet to print only the solutions.
-enable_pivoting: Enables pivoting in d2kplex. By default it is set to true.
For example, to count 3-plexes bigger than 2 on the 'graph.nde' graph, in parallel by using 10 cores, you should execute the following command:
./build/text_ui -k 3 -q 2 -enumerator="parallel" -n 10 graph.nde
To run this code on a cluster of nodes, first of all we assume that these nodes
are using NFS so that all of them have the same directory structure. Before
executing the code, you need to specify a hosts.txt
file. For example,
supposing that you have 32 nodes (from node00 to node031), the file should look
like the following one:
node00 slots=1 max-slots=1
node01 slots=1 max-slots=1
node02 slots=1 max-slots=1
...
node030 slots=1 max-slots=1
node031 slots=1 max-slots=1
Basically, we have one row for each computing node. Each row has three space
separated fields. The first field is the hostname of the node, while the second
and third fields force MPI to run only one process for each node. This is
needed to avoid oversubscription of the node, since each process will spawn
multiple threads (a number of threads equal to the one specified through the
-n
parameter).
ATTENTION: This is the format for OpenMPI installations. Other MPI versions should have a similar format. Please refer to the corresponding user manual to check how to force only one process for each node.
After you built the hosts.txt
file, you need to run the application by
using the mpiexec
tool.
For example, to count 3-plexes bigger than 2 on the 'graph.nde' graph, by using 15 nodes of the cluster, each of which uses 10 cores, you should execute the following command:
mpiexec --hostfile hosts.txt -np 15 ./build/text_ui -k 3 -q 2 -enumerator="distributed" -n 10 graph.nde
In a nutshell, you need to run the same command you would run in the
sequential/parallel case, by replacing the distributed
value for the
enumerator
flag and by prepending the command with mpiexec --hostfile hosts.txt -np 15
where the value of the -np
flag is the number of
computing nodes you want to use.
The following macros can be specified at compile time, but are mostly for internal/debug use.