Community Detection In Networks A User Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 43

DownloadCommunity Detection In Networks- A User Guide
Open PDF In BrowserView PDF
Community detection in networks: A user guide
Santo Fortunato∗
Center for Complex Networks and Systems Research, School of Informatics and Computing and Indiana University Network
Science Institute (IUNI), Indiana University, Bloomington, USA and
Department of Computer Science, Aalto University School of Science, P.O. Box 15400, FI-00076

Darko Hric
Department of Computer Science, Aalto University School of Science, P.O. Box 15400, FI-00076

arXiv:1608.00163v2 [physics.soc-ph] 3 Nov 2016

(Dated: November 4, 2016)
Community detection in networks is one of the most popular topics of modern network science.
Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental
ingredients, like the definition of community itself, nor on other crucial issues, like the validation of
algorithms and the comparison of their performances. This has generated a number of confusions
and misconceptions, which undermine the progress in the field. We offer a guided tour through
the main aspects of the problem. We also point out strengths and weaknesses of popular methods,
and give directions to their use.

PACS numbers: 89.75.Fb, 89.75.Hc

I. INTRODUCTION

Contents
I. Introduction
II. What are communities?
A. Variables
B. Classic view
C. Modern view
III. Validation
A. Artificial benchmarks
B. Partition similarity measures
C. Detectability
D. Structure versus metadata
E. Community structure in real networks

1
3
3
5
7
9
9
12
15
17
19

IV. Methods
22
A. How many clusters?
22
B. Consensus clustering
24
C. Spectral methods
25
D. Overlapping communities: Vertex or Edge clustering? 25
E. Methods based on statistical inference
27
F. Methods based on optimisation
27
G. Methods based on dynamics
31
H. Dynamic clustering
34
I. Significance
35
J. Which method then?
37
V. Software

38

VI. Outlook

39

Acknowledgments

40

References

40

The science of networks is a modern discipline spanning the natural, social and computer sciences, as well
as engineering (Barrat et al., 2008; Caldarelli, 2007; Cohen and Havlin, 2010; Dorogovtsev and Mendes, 2013;
Estrada, 2011; Estrada and Knight, 2015; Newman,
2010). Networks, or graphs, consist of vertices and edges.
An edge typically connects a pair of vertices1 . Networks
occur in an huge variety of contexts. Facebook, for instance, is a large social network, where more than one billion people are connected via virtual acquaintanceships.
Another famous example is the Internet, the physical network of computers, routers and modems which are linked
via cables or wireless signals (Fig. 1). Many other examples come from biology, physics, economics, engineering,
computer science, ecology, marketing, social and political
sciences, etc..
Most networks of interest display community structure,
i. e., their vertices are organised into groups, called communities, clusters or modules. In Fig. 2 we show a collaboration network of scientists working at the Santa Fe
Institute (SFI) in Santa Fe, New Mexico. Vertices are
scientists, edges join coauthors. Edges are concentrated
within groups of vertices representing scientists working
on the same research topic, where collaborations are more
natural. Likewise, communities could represent proteins
with similar function in protein-protein interaction networks, groups of friends in social networks, websites on

1
∗ Electronic

address: santo@indiana.edu

There may be connections between three vertices or more. In this
case one speaks of hyperedges and the network is a hypergraph.

2
the communities they belong to. For instance we can distinguish vertices totally embedded within their clusters
from vertices at the boundary of the clusters, which may
act as brokers between the modules and, in that case,
could play a major role both in holding the modules together and in the dynamics of spreading processes across
the network.

FIG. 1 Internet network. Reprinted figure with permission
from www.opte.org.

Agent-based
Models

Mathematical
Ecology

Statistical Physics

Structure of RNA

FIG. 2 Collaboration network of scientists working at the
Santa Fe Institute (SFI). Edges connect scientists that have
coauthored at least one paper. Symbols indicate the research
areas of the scientists. Naturally, there are more edges between scholars working on the same area than between scholars working in different areas. Reprinted figure with permission from (Girvan and Newman, 2002). c 2002, by the
National Academy of Sciences, USA.

similar topics on the Web graph, and so on.
Identifying communities may offer insight on how the
network is organised. It allows us to focus on regions having some degree of autonomy within the graph. It helps
to classify the vertices, based on their role with respect to

Community detection in networks, also called graph
or network clustering, is an ill-defined problem though.
There is no universal definition of the objects that one
should be looking for. Consequently, there are no clearcut guidelines on how to assess the performance of different algorithms and how to compare them with each
other. On the one hand, such ambiguity leaves a lot of
freedom to propose diverse approaches to the problem,
which often depend on the specific research question and
(or) the particular system at study. On the other hand, it
has introduced a lot of noise into the field, slowing down
progress. In particular, it has favoured the diffusion of
questionable concepts and convictions, on which a large
number of methods are based.

This work presents a critical analysis of the problem of
community detection, intended to practitioners but accessible to readers with basic notions of network science.
It is not meant to be an exhaustive survey. The focus
is on the general aspects of the problem, especially in
the light of recent findings. Also, we discuss some popular classes of algorithms and give advice on their usage.
More info on network clustering can be found in several
review articles (Chakraborty et al., 2016; Coscia et al.,
2011; Fortunato, 2010; Malliaros and Vazirgiannis, 2013;
Newman, 2012; Parthasarathy et al., 2011; Porter et al.,
2009; Schaeffer, 2007; Xie et al., 2013).

The contents are organised in three main sections. Section II deals with the concept of community, describing
its evolution from the classic subgraph-based notions to
the modern statistical interpretation. Next we discuss
the critical issue of validation (Section III), emphasising
the role of artificial benchmarks, the importance of the
choice of partition similarity scores, the conditions under
which clusters are detectable, the usefulness of metadata
and the structural peculiarities of communities in real
networks. Section IV hosts a critical discussion of some
popular clustering approaches. It also tackles important
general methodological aspects, such as the determination of the number of clusters, which is a necessary input
for several techniques, the possibility to generate robust
solutions by combining multiple partitions, the main approaches to discover dynamic communities, as well as the
assessment of the significance of clusterings. In Section V
we indicate where to find useful software. The concluding
remarks of Section VI close the work.

3
II. WHAT ARE COMMUNITIES?
A. Variables

We start with a subgraph C of a graph G. The number
of vertices and edges are n, m for G and nC , mC for C,
respectively. The adjacency matrix of G is A, its element
Aij equals 1 if vertices i and j are neighbours, otherwise
it equals 0. We assume that the subgraph is connected
because communities usually are2 . Other types of group
structures do not require connectedness (Section II.C).
The subgraph is schematically illustrated in Fig. 3. Its

the relationship between the vertex and its community.
The mixing parameter µi is the ratio between the external degree and the degree of vertex i: µi = kiext /ki . By
definition, µi = 1 − ξi .
Now we present a number of variables related to the
subgraph as a whole. We distinguish them in three
classes.
The first class comprises measures based on internal
connectedness, i. e., on how cohesive the subgraph is.
The main variables are:
int
• Internal degree kC
. The sum of the internal degrees of the vertices of C. It equals twice the
number mC of internal edges, as each edge contributesPtwo units of degree. In matrix form,
int
kC
= i,j∈C Aij .
avg -int
• Average internal degree kC
. Average degree
of vertices of C, considering only internal edges:
avg−int
int
kC
= kC
/nC .
int
. The ratio between the
• Internal edge density δC
number of internal edges of C and the number of
all possible internal edges:
int
δC
=

int
kC
.
nC (nC − 1)

(1)

We remark that nC (nC − 1)/2 is the maximum
number of internal edges that a simple graph with
nC vertices may have3 .
FIG. 3 Schematic picture of a connected subgraph.

vertices are enclosed by the dashed contour. The magenta dots are the external vertices connected to the subgraph, while the black ones are the remaining vertices of
the network. The blue lines indicate the edges connecting
the subgraph to the rest of the network.
The internal and external degree kiint and kiext of a
vertex i of the network with respect to subgraph C are
the number of edges connecting i to vertices of C and
to the rest of the graph, respectively. Both definitions
can be expressedPin compact form via the
P adjacency matrix A: kiint = j∈C Aij and kiext = j ∈C
/ Aij , where
the sums run over all vertices j inside and outside C,
respectively. Naturally,
P the degree ki of i is the sum of
kiint and kiext : ki = j Aij . If kiext = 0 and kiint > 0 i
has neighbours only within C and is an internal vertex
of C (dark green dots in the figure). If kiext > 0 and
kiint > 0 i has neighbours outside C and is a boundary
vertex of C (bright green dots in the figure). If kiint = 0,
instead, the vertex is disjoint from C. The embeddedness
ξi is the ratio between the internal degree and the degree of vertex i: ξi = kiint /ki . The larger ξi , the stronger

The second class includes measures based on external
connectedness, i. e., on how embedded the subgraph is in
the network or, equivalently, how separated the subgraph
is from it. The main variables are:
ext
. The sum of the exter• External degree, or cut, kC
nal degrees of the vertices of C. It gives the number of external edges of the subgraph
(blue lines in
P
ext
= i∈C,j ∈C
Fig. 3). In matrix form, kC
/ Aij .
avg -ext
• Average external degree, or expansion, kC
. Average degree of vertices of C, considering only exavg−ext
ext
ternal edges: kC
= kC
/nC .
ext
• External edge density, or cut ratio, δC
. The ratio
between the number of external edges of C and the
number of all possible external edges:
ext
δC
=

The variables defined in this section hold for any subgraph, connected or not.

(2)

Finally, we have hybrid measures, combining internal
and external connectedness. Notable examples are:

3
2

ext
kC
.
nC (n − nC )

A simple graph has at most one edge running between any pair
of vertices and no self-loops, i. e., no edges connecting a vertex
to itself.

P

kiext
ki
ξi
µi

External degree
Degree
Embeddedness
Mixing parameter

µw
i

ξiw

wi

wiext

wiint

Wij

Wij

Wij

wiext
wi

wiint
wi

j

P

j ∈C
/

P

j∈C

P

Symbol Definition

int
kC

Internal degree

ext
kC

External degree

Aij

Aij

Average strength
Weighted conductance

kC
nC
ext
kC
kC

avg
kC

CC

Average degree
Conductance

Total strength
i∈C,j

External weight density

kC

Aij

ext
wC

int
δw,C

Cw,C

avg
wC

wC

ext
δw,C

avg -ext
Average external strength wC

External strength

Internal weight density

Total degree

P

int
wC

Symbol

avg -int
Average internal strength wC

Internal strength

Name

ext
δC

ext
kC
nC (n−nC )

ext
kC
nC

i∈C,j ∈C
/

P

int
kC
nC (nC −1)

int
kC
nC

i,j∈C

P

Definition

Weighted networks

External edge density

avg -ext
Average external degree kC

int
δC

Internal edge density

avg -int
Average internal degree kC

Symbol

Name

Unweighted networks

int
wC
nC

i,j∈C

Wij

ext
wC
nC

i∈C,j ∈C
/

Wij

Wij

ext
wC
wC

wC
nC

i∈C,j

P

ext
wC
w̄nC (n−nC )

P

int
wC
w̄nC (nC −1)

P

Definition

TABLE II Basic community variables, for unweighted and weighted networks. A and W are the adjacency and the weight matrix, respectively, nC the number of
vertices of the community, n the total number of vertices of the graph, w̄ the average weight of the network edges.

Total

External

Internal

TABLE I Basic vertex community variables, for unweighted and weighted networks. A and W are the adjacency and the weight matrix, respectively.

Weighted mixing parameter

kiext
ki

Strength
Weighted embeddedness

Aij

Aij

kiint
ki

j

P

j ∈C
/

External strength

Internal strength

Aij

kiint

Internal degree
j∈C

Name

P

Weighted networks

Symbol Definition

Name

Unweighted networks

4

5
• Total degree, or volume, kC . The sum of the degrees
int
ext
of the vertices of C. Naturally,
kC = kC
+ kC
.
P
In matrix form, kC = i∈C,j Aij .

vertices belong to their communities with equal strength,
or fuzzy, when the strength of their membership can be
different in different clusters4 .

avg
• Average degree kC
. Average degree of vertices of
avg
C: kC = kC /nC .

• Conductance CC . The ratio between the external
degree and the total degree of C:
CC =

ext
kC
.
kC

(3)

All definitions we have given hold for the case of
undirected and unweighted networks. The extension to
weighted graphs is straightforward, as it suffices to replace the “number of edges” with the sum of the weights
carried by every edge. For instance, the internal degree
kvint of a vertex v becomes the internal strength wvint ,
which is the sum of the weights of the edges joining v
with the vertices of subgraph C. For the internal and external edge densities of Eqs. (1) and (2) one would have
to replace the numerators with their weighted counterparts and multiply
the denominators by the average edge
P
weight w̄ = ij Wij /2m, where Wij is the element of the
weight matrix, indicating the weight of the edge joining
vertices i and j (Wij = 0 if i and j are disconnected) and
m the total number of graph edges. In Tables I and II
we list all variables we have presented along with their
extensions to the case of weighted networks. In directed
networks one would have to distinguish between incoming
and outgoing edges. Extensions of the metrics are fairly
simple to implement, though their usefulness is unclear.

FIG. 4 Classic view of community structure. Schematic picture of a network with three communities.

The oldest definitions of community-like objects were
proposed by social network analysts and focused on the
internal cohesion among vertices of a subgraph (Moody
and White, 2003; Scott, 2000; Wasserman and Faust,

B. Classic view

Figure 4 shows how scholars usually envision community structure. The network has three clusters and in
each cluster the density of edges is comparatively higher
than the density of edges between the clusters. This can
be summarised by saying that communities are dense
subgraphs which are well separated from each other. This
view has been challenged, recently (Jeub et al., 2015;
Leskovec et al., 2009), as we shall see in Section III.E.
Communities may overlap as well, sharing some of the
vertices. For instance, in social networks individuals can
belong to different circles at the same time, like family, friends, work colleagues. Figure 5 shows an example
of a network with overlapping communities. Communities are typically supposed to be overlapping at their
boundaries, as in the figure. Recent results reveal a different picture, though (Yang and Leskovec, 2014) (Section III.E). A subdivision of a network into overlapping
communities is called cover and one speaks of soft clustering, as opposed to hard clustering, which deals with
divisions into non-overlapping groups, called partitions.
The generic term clustering can be used to indicate both
types of subdivisions. Covers can be crisp, when shared

FIG. 5 Overlapping communities. A network is divided in
four communities, enclosed by the dashed contours. Three of
them share boundary vertices, indicated by the blue dots.

4

In the literature the word fuzzy is often used to describe both
situations.

6
1994). The most popular concept is that of clique (Luce
and Perry, 1949). A clique is a complete graph, that is,
a subgraph such that each of its vertices is connected to
all the others. It is also a maximal subgraph, meaning
that it is not included in a larger complete subgraph. In
modern network science it is common to call clique any
complete graph, not necessarily maximal. Triangles are
the simplest cliques. Finding cliques is an NP-complete
problem (Bomze et al., 1999); a popular technique is the
Bron–Kerbosch method (Bron and Kerbosch, 1973).
The notion of cliques, albeit useful, cannot be considered a good candidate for a community definition. While
a clique has the largest possible internal edge density,
as all internal edges are present, communities are not
complete graphs, in general. Moreover, all vertices have
identical role in a clique, while in real network communities some vertices are more important than others, due to
their heterogeneous linking patterns. Therefore, in social
network analysis the notion has been relaxed, generating the related concepts of n-cliques (Alba, 1973; Luce,
1950), n-clans and n-clubs (Mokken, 1979). Other definitions are based on the idea that a vertex must be adjacent to some minimum number of other vertices in the
subgraph. A k-plex is a maximal subgraph in which each
vertex is adjacent to all other vertices of the subgraph except at most k of them (Seidman and Foster, 1978). Details on the above definitions can be found in specialised
books (Scott, 2000; Wasserman and Faust, 1994).
For a proper community definition, one should take
into account both the internal cohesion of the candidate
subgraph and its separation from the rest of the network. A simple idea that has received a great popularity
is that a community is a subgraph such that “the number of internal edges is larger than the number of external
edges”5 . This idea has inspired the following definitions.
An LS-set (Luccio and Sami, 1969), or strong community (Radicchi et al., 2004), is a subgraph such that the
internal degree of each vertex is greater than its external
degree. A relaxed condition is that the internal degree
of the subgraph exceeds its external degree [weak community (Radicchi et al., 2004)]6 . A strong community is
also a weak community, while the converse is not generally true.
A drawback of these definitions is that one separates
the subgraph at study from the rest of the network, which
is taken as a single object. But the latter can be in turn
divided into communities. If a subgraph C is a proper
community, it makes sense that each of its vertices is

5
6

Here we focus on the case of unweighted graphs, extensions of
all definitions to the weighted case are immediate.
The definition of weak community is the natural implementation
of the naı̈ve expectation that there must be more edges inside
than outside. However, for a subgraph C to be a weak community it is not necessary that the number of internal edges mC
ext . Since the internal degree
exceeds that of external edges kC
int = 2m
ext
kC
C (Section II.A) the actual condition is 2mC > kC .

more strongly attached to the vertices of C than to the
vertices of any other subgraph. This concept, proposed
by Hu et al. (Hu et al., 2008), is more in line (though
not entirely) with the modern idea of community that
we discuss in the following section. It has generated two
alternative definitions of strong and weak community. A
subgraph C is a strong community if the internal degree
of any vertex within C exceeds the internal degree of
the vertex within any other subgraph, i. e., the number of edges joining the vertex to those of the subgraph;
likewise, a community is weak if its internal degree exceeds the (total) internal degree of its vertices within every other community. A strong (weak) community à la
Radicchi et al. is a strong (weak) community also in the
sense of Hu et al.. The opposite is not true, in general
(Fig. 6). In particular, a subgraph can be a strong community in the sense of Hu et al. even though all of its
vertices have internal degree smaller than their respective
external degree.

FIG. 6 Strong and weak communities. The four subgraphs
enclosed in the contours are weak communities according to
the definitions of Radicchi et al. (Radicchi et al., 2004) and
Hu et al. (Hu et al., 2008). They are also strong communities
according to Hu et al., as the internal degree of each vertex exceeds the number of edges joining the vertex with the vertices
of every other subgraph. However, three of the subgraphs are
not strong communities according to Radicchi et al., as some
vertices (indicated in blue) have external degree larger than
their internal degree (the internal and external edges of these
vertices are coloured in yellow and magenta, respectively).

The above definitions of communities use extensive
variables: their value tends to be the larger, the bigger
the community (e. g., the internal and external degrees).
But there are also variables discounting community size.
An example is the internal cluster density δint (C) of
Eq. (1). One could assume that a subgraph C with k
vertices is a cluster if δint (C) is larger than a threshold
ξ. Setting the size of the subgraph is necessary because
otherwise any clique would be among the best possible

7

B

A

communities, including trivial two-cliques (simple edges)
or triangles.

po

C. Modern view

As we have seen in the previous section, traditional definitions of community rely on counting edges (internal,
external), in various ways. But what one should be really focusing on is the probability that vertices share edges
with a subgraph. The existence of communities implies
that vertices interact more strongly with the other members of their community than they do with vertices of
the other communities. Consequently, there is a preferential linking pattern between vertices of the same group.
This is the reason why edge densities end up being higher
within communities than between them. We can formulate that by saying that vertices of the same community
have a higher probability to form edges with their partners than with the other vertices.
Let us suppose that we estimated the edge probabilities between all pairs of vertices, somehow. We can define the groups by means of those probabilities. It is a
scenario similar to the classic one we have seen in Section II.B, where we add and compare probabilities, instead of edges. Natural definitions of strong and weak
community are:
• A strong community is a subgraph each of whose
vertices has a higher probability to be linked to every vertex of the subgraph than to any other vertex
of the graph.
• A weak community is a subgraph such that the average edge probability of each vertex with the other
members of the group exceeds the average edge
probability of the vertex with the vertices of any
other group7 .
The difference between the two definitions is that, in
the concept of strong community, the inequality between
edge probabilities holds at the level of every pair of vertices, while in the concept of weak community the inequality holds only for averages over groups. Therefore,
a strong community is also a weak community, but the
opposite is not true, in general.
Now we can see why the former definitions of strong
and weak community (Hu et al., 2008; Radicchi et al.,
2004) are not satisfactory. Suppose to have a network
with two subgraphs A and B of very different sizes, say
with nA and nB  nA vertices (Fig. 7). The network
is generated by a model where the edge probability is

pin
pin
po
FIG. 7 Problems of the classic notions of strong and weak
communities. A network is generated by the illustrated
model, with two subgraphs A and B and edge probabilities
pin between vertices of the subgraphs and pout < pin between
vertices of A and B. The red circle is a representative vertex
of subgraph A, the smaller blue circle represents the rest of
the vertices of A. The subgraphs are both strong and weak
communities in the probabilistic sense, but they may be neither strong nor weak communities according to the classic
definitions by Radicchi et al. (Radicchi et al., 2004) and Hu
et al. (Hu et al., 2008), if B is sufficiently larger than A.

pin between vertices of the same group and pout < pin
for vertices of different groups. The two subgraphs are
communities both in the strong and in the weak sense,
according to the probability-based definitions above. The
int
expected internal degree of a vertex of A is kA
= pin nA :
since there are nA possible internal neighbours8 . Likewise, the expected external degree of a vertex of A is
ext
= pout nB . The expected internal and external dekA
ext
int
= pout nA nB . For
= pin n2A and KA
grees of A are KA
any two values of pin and pout < pin one can always
ext
int
,
< kA
choose nB sufficiently larger than nA that kA
ext
int
which also implies that KA < KA . In this setting
the subgraphs are neither strong nor weak communities,
according to the definitions proposed by Radicchi et al.
and Hu et al..
How can we compute the edge probabilities between
vertices? This is still an ill-defined problem, unless one
has a model stating how edges are formed. One can make
many hypotheses on the process of edge formation. For
instance, if we take social networks, we can assume that
the probability that two individuals know each other is
a decreasing function of their geographical distance, on
average (Liben-Nowell et al., 2005). Each set of assump-

8
7

Since we are comparing average probabilities, which come with
a standard error, the definition of weak community should not
rely on the simple numeric inequality between the averages, but
on the statistical significance of their difference. Significance will
be discussed in Section IV.I.

The number of possible community neighbours should actually
be nA − 1, but for simplicity one allows for the formation of selfedges, from a vertex to itself. Results obtained with and without
self-edges are basically undistinguishable, when community sizes
are much larger than one. We shall stick to this setup throughout
the paper.

8

(a) Community structure

(b) Disassortative structure

(c) Core-periphery structure

(d) Random graph

FIG. 8 Stochastic block model. We show the schematic adjacency matrices of network realisations produced by the model for
special choices of the edge probabilities, along with one representative realisation for each case. For simplicity we show the case
of two blocks of equal size. Darker blocks indicate higher edge probabilities and consequently a larger density of edges inside
the block. Figure 8a illustrates community (or assortative) structure: the probabilities (link densities) are much higher inside
the diagonal blocks than elsewhere. Figure 8b shows the opposite situation (disassortative structure). Figure 8c illustrates a
core-periphery structure. Figure 8d shows a random graph à la Erdős and Rényi: all edge probabilities are identical, inside
and between the blocks, so there are no actual groups. Adapted figure with permission from (Jeub et al., 2015). c 2015, by
the American Physical Society.

tions defines a model. For our purposes, eligible models
should take into account the possible presence of groups
of vertices, that behave similarly.
The most famous model of networks with group structure is the stochastic block model (SBM) (Fienberg and
Wasserman, 1981; Holland et al., 1983; Snijders and Nowicki, 1997). Suppose we have a network with n vertices,
divided in q groups. The group of vertex i is indicated
with the integer label gi = 1, 2, . . . , q. The idea of the
model is very simple: the probability P (i ↔ j) that
vertices i and j are connected depends exclusively on
their group memberships: P (i ↔ j) = pgi gj . Therefore, it is identical for any i and j in the same groups.
The probabilities pgi gj form a q × q symmetric matrix9 ,
called the stochastic block matrix. The diagonal elements
pkk (k = 1, 2, . . . , q) of the stochastic block matrix are
the probabilities that vertices of block k are neighbours,

whereas the off-diagonal elements give the edge probabilities between different blocks10 .
For pkk > plm , ∀k, l, m = 1, 2, . . . , q, with l 6= m, we
recover community structure, as the probabilities that
vertices of the same group are connected exceed the
probabilities that vertices of different groups are joined
(Fig. 8a). It is also called assortative structure, as it privileges bonds between vertices of the same group. The
model is very versatile, though, and can generate various types of group structure. For pkk < plm , ∀k, l, m =
1, 2, . . . , q, with l 6= m, we have disassortative structure, as edges are more likely between the blocks than
inside them (Fig. 8b). In the special case in which
pkk = 0, ∀k = 1, 2, . . . , q we recover multipartite struc-

10

9

For directed graphs, the matrix is in general asymmetric. The
extension of the stochastic block model to directed graphs is
straightforward. Here we focus on undirected graphs.

In another definition of SBM the number of edges ers between
blocks r and s is fixed (r, s = 1, 2, . . . , q), instead of the edge
probabilities. If ers  1, ∀ r, s = 1, 2, . . . , q the two models are
fully equivalent if the edge probabilities prs are chosen such that
the expected number of edges running between r and s coincides
with ers .

9
ture, as there are edges only between the blocks. If
q = 2, p11  p12  p22 , we have core-periphery structure: the vertices of the first block (core) are relatively
well-connected amongst themselves as well as to a peripheral set of vertices that interact very little amongst themselves (Fig. 8c). If all probabilities are equal, pij = p,
∀i, j, we recover the classic random graph à la Erdős and
Rényi (Erdös and Rényi, 1959; Erdös and Rényi, 1960)
(Fig. 8d). Here any two vertices have identical probability of being connected, hence there is no group structure.
This has become a fundamental axiom in community detection, and has inspired some popular techniques like,
e. g., modularity optimisation (Newman, 2004b; Newman and Girvan, 2004) (Section IV.F). Random graphs
of this type are also useful in the validation of clustering
algorithms (Section III.A).
Alternative community definitions are based on the interplay between network topology and dynamics. Diffusion is the most used dynamics. Random walks are the
simplest diffusion processes. A simple random walk is a
path such that the vertex reached at step t is a random
neighbour of the vertex reached at step t − 1. A random walker would be spending a long time within communities, due to the supposedly low number of routes
taking out of them (Delvenne et al., 2010; Rosvall and
Bergstrom, 2008; Rosvall et al., 2014). The evolution
of random walks does not depend solely on the number
or density of edges, in general, but also on the structure
and distribution of paths formed by consecutive edges, as
paths are the routes that walkers can follow. This means
that random walk dynamics relies on higher-order structures than simple edges, in general. Such relationship
is even more pronounced when one considers Markov dynamics of second order or higher, in which the probability
of reaching a vertex at step t + 1 of the walk does not
depend only on where the walker sits at step t, but also
on where it was at step t − 1 and possibly earlier (Persson et al., 2016; Rosvall et al., 2014). Indeed, one could
formulate the network clustering problem by focusing on
higher order structures, like motifs (e. g., triangles) (Arenas et al., 2008a; Benson et al., 2016; Serrour et al., 2011).
The advantage is that one can preserve more complex features of the network and its communities, which typically
get lost when one uses network models solely based on
edge probabilities, like SBMs11 . The drawback is that
calculations become more involved and lengthy.
Is a definition of community really necessary? Actually not, most techniques to detect communities in networks do not require a precise definition of community.
The problem can be attacked from many angles. For instance, one can remove the edges separating the clusters
from each other, that can be identified via some partic-

ular feature (Girvan and Newman, 2002; Radicchi et al.,
2004). But defining clusters beforehand is a useful starting point, that allows one to check the reliability of the
final results.

III. VALIDATION

In this section we will discuss the crucial issue of
validation of clustering algorithms. Validation usually
means checking how precisely algorithms can recover
the communities in benchmark networks, whose community structure is known. Benchmarks can be computergenerated, according to some model, or actual networks,
whose group structure is supposed to be known via nontopological features (metadata). The lack of a universal
definition of communities makes the search for benchmarks rather arbitrary, in principle. Nevertheless, the
best known artificial benchmarks are based on the modern definition of clusters presented in Section II.C.
We shall present some popular artificial benchmarks
and show that partition similarity measures have to be
handled with care. We will see under which conditions
communities are detectable by methods, and expose the
interplay between topological information and metadata.
We will conclude by presenting some recent results on
signatures of community structure extracted from real
networks.

A. Artificial benchmarks

The principle underneath stochastic block models (Section II.C) has inspired many popular benchmark graphs
with group structure. Community structure is recovered
in the case in which the probability for two vertices to
be joined is larger for vertices of the same group than for
vertices of different groups (Fig. 8a). For simplicity, let us
suppose that there are only two values of the edge probability, pin and pout < pin , for edges within and between
communities, respectively. Furthermore, we assume that
all communities have identical size nc , so qnc = n, where
q is the number of communities. In this version, the
model coincides with the planted l-partition model, introduced in the context of graph partitioning12 (Bui
et al., 1987; Condon and Karp, 2001; Dyer and Frieze,
1989). The expected internal and external degrees of
a vertex are hkin i = pin nc and hkout i = pout nc (q − 1),

12

11

Since edges are usually placed independently of each other in
SBMs, higher order structures like triangles are usually underrepresented in the model graphs with respect to the actual graph
at study.

Graph partitioning means dividing a graph in subgraphs, such
to minimise the number of edges joining the subgraphs to each
other. It is related to community detection, as it aims at finding
the minimum separation between the parts. However, it usually
does not consider how cohesive the parts are (number or density
of internal edges), except when special measures are used, like
conductance [Eq. (3)].

10

FIG. 9 Benchmark of Girvan and Newman. The three networks correspond to realisations of the model for hkout i = 1 (a),
hkout i = 5 (b) and hkout i = 8 (c). In (c) the four groups are hardly distinguishable by eye and methods fail to assign many
vertices to their groups. Reprinted figure with permission from Ref. (Guimerà and Amaral, 2005). c 2005, by the Nature
Publishing Group.

respectively, yielding an expected (total) vertex degree
hki = hkin i + hkout i = pin nc + pout nc (q − 1).
Girvan and Newman (Girvan and Newman, 2002) set
q = 4, nc = 32 (for a total number of vertices n = 128)
and fixed the average total degree hki to 16. This implies
that pin + 3pout = 1/2 and pin and pout are not independent parameters. The benchmark by Girvan and Newman is still the most popular in the literature (Fig. 9).
Performance plots of clustering algorithms typically
have, on the horizontal axis, the expected external degree hkout i. For low values of hkout i communities are
well separated13 and most algorithms do a good job at
detecting them. By increasing hkout i, the performance
declines. Still, one expects to do better than by assigning memberships to the vertices at random, as long as
pin > pout , which means for hkout i < 12. In Section III.C
we will see that the actual threshold is lower, due to random fluctuations.
The benchmark by Girvan and Newman, however,
is not a good proxy of real networks with community
structure. For one thing, all vertices have equal degree,
whereas the degree distribution of real networks is usually highly heterogeneous (Albert et al., 1999). In addition, most clustering techniques find skewed distributions of community sizes (Clauset et al., 2004; Danon
et al., 2007; Lancichinetti et al., 2010; Newman, 2004a;
Palla et al., 2005; Radicchi et al., 2004). For this reason,
Lancichinetti, Fortunato and Radicchi proposed the LFR
benchmark, having power-law distributions of degree and
community size (Lancichinetti et al., 2008) (Fig. 10).
The mixing parameters µi of the vertices (Section II.A)

FIG. 10 LFR benchmark. Vertex degree and community size
are power-law distributed, to account for the heterogeneity
observed in real networks with community structure.

are set equal to a constant µ, which estimates the quality
of the partition14 . LFR benchmark networks are built

14

13

They are also more cohesive internally, since hkin i is higher, to
keep the total degree constant.

The parameter µ is actually only the average of the mixing parameter over all vertices. In fact, since degrees are integer, it is
impossible to tune them such to have exactly the same value of
µ for each vertex, and keep the constraint on the degree distribution at the same time.

11
by joining stubs at random, once one has established
which stubs are internal and which ones are external
with respect to the community of the vertex attached
to the stubs. In this respect, it is basically a configuration model (Bollobás, 1980; Molloy and Reed, 1995) with
built-in communities.
Clearly, when µ is low, clusters are better separated
from each other, and easier to detect. When µ grows, performance starts to decline. But for which range of µ can
we expect a performance better than random guessing?
Let us suppose that the group structure is detectable
for µ ∈ [0, µc ]. The upper limit µc should be such that
the network is random for µ = µc . The network is random when stubs are combined at random, without distinguishing between internal and external stubs, which
yields the standard configuration model. There the expected number of edges between two vertices with degrees ki and kj is ki kj /(2m), m being the total number
of network edges. Let us focus on a generic vertex i,
belonging to community C. We denote with KC and
K̃C the sum of the degrees of the vertices inside and
outside C, respectively. Clearly KC + K̃C = 2m. In a
random graph built with the configuration model, vertex
i would have an expected internal degree15 kiint-rand =
ki (KC − ki )/(2m) ≈ ki KC /(2m) and an expected external degree kiext-rand = ki K̃C /(2m). Since, by construction, kiint = (1 − µ)ki and kiext = µki , the community C
is not real when kiext = kiext-rand and kiint = kiint-rand ,
which implies µ = µC = K̃C /(2m) = 1 − KC /(2m). We
see that KC depends on the community C: the larger
the community, the lower the threshold is. Therefore,
not all clusters are detectable at the same time, in general. For this to happen, µ must be lower than the minimum of µC over all communities: µ ≤ µc = minC µC . If
communities are all much smaller than the network as a
whole, KC /(2m) ≈ 0 and µc could get very close to the
upper limit 1 of the variable µ. However, it is possible
that the actual threshold is lower than µc , due to the
perturbations of the group structure induced by random
fluctuations (Section III.C). Anyway, in most cases the
threshold is going to be quite a bit higher than 1/2, the
value which is mistakenly considered as the threshold by
some scholars.
The LFR benchmark turns out to be a special version of the recently introduced degree-corrected stochastic block model (Karrer and Newman, 2011), with the
degree and the block size distributed according to truncated power laws16 .
The LFR benchmark has been extended to directed

15
16

The approximation is justified when the community is large
enough that KC  ki .
In fact, the correspondence is exact for a slightly different
parametrisation of the benchmark, introduced in (Peixoto, 2014).
In this version of the model, instead of the mixing parameter µ,
which is local, a global parameter c is used, estimating how strong
the community structure is.

and weighted networks with overlapping communities (Lancichinetti and Fortunato, 2009). The extensions
to directed and weighted graphs are rather straightforward. Overlaps are obtained by assigning each vertex to
a number of clusters and distributing its internal edges
equally among them17 . Recently, another benchmark
with overlapping communities has been introduced by
Ball, Karrer and Newman (Ball et al., 2011). It consists
of two clusters A and B, with overlap C. Vertices in the
non-overlapping subsets A − C and B − C set edges only
between each other, while vertices in C are connected to
vertices of both A and B. The expected degree of all
vertices is set equal to hki. The authors considered various settings, by tuning hki, the size of the overlap and
the sizes of A and B, which may be uneven. However,
the fact that all vertices have equal degree (on average)
makes the model less realistic and flexible than the LFR
benchmark.
Following the increasing availability of evolving timestamped network data sets, the analysis and modelling
of temporal networks have received a lot of attention
lately (Holme and Saramäki, 2012). In particular, scholars have started to deal with the problem of detecting
evolving communities (Section IV.H). A benchmark designed to model dynamic communities was proposed by
Granell et al. (Granell et al., 2015). It is based on
the planted l-partition model, just like the benchmark
of Girvan and Newman, where pin and pout < pin are
the edge probabilities within communities and between
communities, respectively. Communities may grow and
shrink (Fig. 11a), they may merge with each other or
split into smaller clusters (Fig. 11b), or do all of the
above (Fig. 11c). The dynamics unfold such that at each
time the subgraphs are proper communities in the probabilistic sense discussed in Section II.C. In the mergesplit dynamics, clusters actually merge before the intercommunity edge probability pout reaches the value pin
of the intra-community edge probability, due to random
fluctuations (Section III.C).
In Section II.C we have shown why random graphs cannot have a meaningful group structure18 . That means
that they can be employed as null benchmarks, to test
whether algorithms are capable to recognise the absence
of groups. Many methods find non-trivial communities
in such random networks, so they fail the test. We
strongly encourage doing this type of exam on new algorithms (Lancichinetti and Fortunato, 2009).

17

18

A better way to do it would be taking into account the size of
the communities the vertex is assigned to, and divide the edges
proportionally to the (total) degrees of the communities.
Here we refer to random graphs where the edge probabilities do
not depend on their membership in groups. Examples are Erdős
and Rényi random graphs, the configuration model, etc..

12
(a) Grow / Shrink

t=0

t=τ/4

t=τ/2

t=3τ/4

trix, association matrix or contingency table.
Most similarity measures can be divided in three categories: measures based on pair counting, cluster matching
and information theory.
Pair counting means computing the number of pairs of
vertices which are classified in the same (different) clusters in the two partitions. Let a11 indicate the number
of pairs of vertices which are in the same community in
both partitions, a01 (a10 ) the number of pairs of elements
which are in the same community in X (Y) and in different communities in Y (X ) and a00 the number of pairs of
vertices that are in different communities in both partitions. Several measures can be defined by combining the
above numbers in various ways. A famous example is the
Rand index (Rand, 1971)

t=τ

(b) Merge / Split

t=0

t=τ/2

t=τ

(c) Mixed

R(X , Y) =

a11 + a00
,
a11 + a01 + a10 + a00

(4)

which is the ratio of the number of vertex pairs correctly
classified in both partitions (i. e. either in the same or
in different clusters), by the total number of pairs. Another notable option is the Jaccard index (Ben-Hur et al.,
2001),
t=0

t=τ/4

t=τ/2

t=3τ/4

t=τ

J(X , Y) =
FIG. 11 Dynamic benchmark. (a) Grow-Shrink benchmark.
Starting from two communities of equal size, vertices move
from one cluster to the other and back. (b) Merge-Split benchmark. It starts with two communities, edges are added until
there is one community with uniform link density (merge),
then the process is reversed, leading to a fragmentation into
two equal-sized clusters. (c) Mixed benchmark. There are
four communities: the upper pair undergoes the grow-shrink
dynamics of (a), the lower pair the merge-split dynamics of
(b). All processes are periodic with period τ . Reprinted figure with permission from (Granell et al., 2015). c 2015, by
the American Physical Society.

B. Partition similarity measures

The accuracy of clustering techniques depends on their
ability to detect the clusters of networks, whose community structure is known. That means that the partition detected by the method(s) has to match closely the
planted partition of the network. How can the similarity
of partitions be computed? This is an important problem, with no unique solution. In this section we discuss
some issues about partition similarity measures. More
information can be found in (Meilă, 2007), (Fortunato,
2010) and (Traud et al., 2011).
Let us consider two partitions X = (X1 , X2 , ..., XqX )
and Y = (Y1 , Y2 , ..., YqY ) of a network G, with qX and
qY clusters, respectively. Let n be the total number of
Y
vertices, nX
i and nj the number of vertices in clusters Xi
and Yj and nij the number
T of vertices shared by clusters
Xi and Yj : nij = |Xi Yj |. The qX × qY matrix NX Y
whose entries are the overlaps nij is called confusion ma-

a11
,
a11 + a01 + a10

(5)

which is the ratio of the number of vertex pairs classified
in the same cluster in both partitions, by the number
of vertex pairs classified in the same cluster in at least
one partition. The Jaccard index varies over a broader
range than the Rand index, due to the dominance of
a00 in R(X , Y), which typically confines the Rand index
to a small interval slightly below 1. Both measures lie
between 0 and 1.
If we denote with XC and YC the sets of vertex pairs
with are members of the same community in partitions
X and Y, respectively, the Jaccard index is just the ratio between the intersection and the union of XC and
YC . Such concept can be used as well to determine the
similarity between two clusters A and B
T
|A B|
S .
JAB =
(6)
|A B|
The score JAB is also called Jaccard index and is the
most general definition of the score, for any two sets A
and B (Jaccard, 1901). Measuring the similarity between
communities is very important to determine, given different partitions, which cluster of a partition corresponds to
which cluster(s) of the other(s). For instance, the cluster
Yj of Y corresponding to cluster Xi of Y is the one maximising the similarity between Xi and Yj , e. g., JXi Yj .
This strategy is also used to track down the evolution
of communities in temporal networks (Lancichinetti and
Fortunato, 2012; Palla et al., 2007).
The Rand and the Jaccard indices, as defined in
Eqs. (5) and (6), have the disturbing feature that they do
not take values in the entire range [0, 1]. For this reason,

13

C'

C

C''

FIG. 12 Partition similarity measures based on cluster matching. There are three partitions in three clusters: C, C 0 and C 00 .
The clusters include all elements of columns 1 − 2, 3 − 4 and 5 − 6, which for C are labelled in black, grey and white, respectively.
Partition C 0 is obtained from C by reassigning the same fraction of elements from one cluster to the next, while C 00 is derived
from C by reassigning the same fraction of elements from each cluster equally between the other clusters. From cluster matching
scores one concludes that C 0 and C 00 are equally similar to C, while intuition suggests that C 0 is closer to C than C 00 . Adapted
figure with permission from (Meilă, 2007). c 2007, by Elsevier.

adjusted versions of both indices exist, in that a baseline
is introduced, yielding the expected values of the score
for all pairs of partitions X̃ and Ỹ obtained by randomly
assigning vertices to clusters such that X̃ and Ỹ have the
same number of clusters and the same size for all clusters of X and Y, respectively (Hubert and Arabie, 1985).
The baseline is subtracted from the unadjusted version,
and the result is divided by the range of this difference,
yielding 1 for identical partitions and 0 as expected value
for independent partitions. But there are problems with
these definitions as well. The null model used to compute
the baseline relies on the assumption that the communities of the independent partitions have the same number
of vertices as in the partitions whose similarity is to be
compared. But such assumption usually does not hold,
in practical instances, as algorithms sometimes need the
number of communities as input, but they never impose
any constraint on the cluster sizes. Adjusted indices have
also the disadvantage of nonlocality (Meilă, 2005): the
similarity between partitions differing only in one region
of the network depends on how the remainder of the network is subdivided. Moreover, the adjusted scores can
take negative values, when the unadjusted similarity lies
below the baseline.
A better option is to use standardised indices (Brennan
and Light, 1974): for a given score Si the value of the
null model term µi is computed along with its standard
deviation σi over many different randomisations of the
partitions X and Y. By computing the z-score
zi =

Si − µi
,
σi

(7)

we can see how non-random the measured similarity score
is, and assess its significance. It can be shown that the zscores for the Jaccard, Rand and Adjusted Rand indices
coincide (Traud et al., 2011), so the measures are sta-

tistically equivalent. Since the actual values Si of these
indices differ for the same pair of partitions, in general,
we conclude that the magnitudes of the scores may give
a wrong perception about the effective similarity.
Cluster matching aims at establishing a correspondence between pairs of clusters of different partitions
based on the size of their overlap. A popular measure
is the fraction of correctly detected vertices, introduced
by Girvan and Newman (Girvan and Newman, 2002). A
vertex is correctly classified if it is in the same cluster
as at least half of the other vertices in its cluster in the
planted partition. If the detected partition has clusters
given by the merger of two or more groups of the planted
partition, all vertices of those clusters are considered incorrectly classified. The number of correctly classified
vertices is then divided by the number n of vertices of the
graph, yielding a number between 0 and 1. The recipe
to label vertices as correctly or incorrectly classified is
somewhat arbitrary. The fraction of correctly detected
vertices is similar to
X
1
H(X , Y) =
nkk0 ,
(8)
n
k0 =match(k)
where k 0 is the index of the best match Yk0 of cluster
Xk (Meilă and Heckerman, 2001). A common problem
of this type of measures is that partitions whose clusters
have the same overlap would have the same similarity,
regardless of what happens to the parts of the communities which are unmatched. The situation is illustrated
schematically in Fig. 12. Partitions C 0 and C 00 are obtained from C by reassigning the same fraction of their
elements to the other clusters. Their overlaps with C are
identical and so are the corresponding similarity scores.
However, in partition C 00 the unmatched parts of the clusters are more scrambled than in C 0 , which should be re-

14
flected in a lower similarity score.
Similarity can be also estimated by computing, given a
partition, the additional amount of information that one
needs to have to infer the other partition. If partitions
are similar, little information is needed to go from one
to the other. Such extra information can be used as a
measure of dissimilarity. To evaluate the Shannon information content (Mackay, 2003) of a partition, we start
from the community assignments {xi } and {yi }, where
xi and yi indicate the cluster labels of vertex i in partition X and Y, respectively. The labels x and y are
the values of two random variables X and Y , with joint
distribution P (x, y) = P (X = x, Y = y) = nxy /n, so
that P (x) = P (X = x) = nX
x /n and P (y) = P (Y =
y) = nYy /n. The mutual information I(X, Y ) of two
random variables
is I(X, Y ) = H(X) − H(X|Y ), where
P
H(X) = − x P (x) P
log P (x) is the Shannon entropy of
X and H(X|Y ) = − x,y P (x, y) log P (x|y) is the conditional entropy of X given Y . The mutual information is
not ideal as a similarity measure: for a given partition X ,
all partitions derived from X by splitting (some of) its
clusters would all have the same mutual information with
X , even though they could be very different from each
other. In this case the mutual information equals the
entropy H(X), because the conditional entropy is zero.
It is then necessary to introduce an explicit dependence
on the other partition, that persists even in those special
cases. This has been achieved by introducing the normalized mutual information (NMI), obtained by dividing
the mutual information by the arithmetic average19 of
the entropies of X and Y (Fred and Jain, 2003)
Inorm (X , Y) =

2I(X, Y )
.
H(X) + H(Y )

(9)

The NMI equals 1 if and only if the partitions are identical, whereas it has an expected value of 0 if they are independent. Since the first thorough comparative analysis of
clustering algorithms (Danon et al., 2005), the NMI has
been regularly used to compute the similarity of partitions in the literature. However, the measure is sensitive
to the number of clusters qY of the detected partition,
and may attain larger values the larger qY , even though
more refined partitions are not necessarily closer to the
planted one. This may give wrong perceptions about the
relative performance of algorithms (Zhang, 2015).
A more promising measure, proposed by Meilă (Meilă,
2007) is the variation of information (VI)
V (X , Y) = H(X|Y ) + H(Y |X).

(10)

The VI defines a metric in the space of partitions as it
has the properties of distance (non-negativity, symmetry

19

Strehl and Ghosh introduced an earlier definition of NMI, where
the mutual information is divided by the geometric average of
the entropies (Strehl and Ghosh, 2002). Alternatively, one could
normalise by the larger of the entropies H(X) and H(Y ) (Esquivel and Rosvall, 2012; McDaid et al., 2011).

and triangle inequality). It is a local measure: the VI
of partitions differing only in a small portion of a graph
depends on the differences of the clusters in that region,
and not on how the rest of the graph is subdivided. The
maximum value of the VI is log n, which implies that the
scores of an algorithm on graphs of different sizes cannot
be compared with each other, in principle. One could divide V (X , Y) by log n (Karrer et al., 2008), to force the
score to be in the range [0, 1], but the actual span of values of the measure depends on the number of clusters of
the partitions. In fact, if the
√ maximum number of communities is q ? , with q ? ≤ n, V (X , Y) ≤ 2 log q ? . Consequently, in those cases where it is reasonable to set an
upper bound on the number of clusters of the partitions,
the similarities between planted and detected partitions
on different graphs become comparable, and it is possible
to assess both the performance of an algorithm and to
compare algorithms across different benchmark graphs.
We stress, however, that the measure may not be suitable
when the partitions to be compared are very dissimilar
from each other (Traud et al., 2011) and that it shows unintuitive behaviour in particular instances (Delling et al.,
2006).
So far we discussed of comparing partitions. What
about covers? Extensions of the measures we have presented to the case of overlapping communities are usually not straightforward. The Omega index (Collins and
Dent, 1988) is an extension of the Adjusted Rand index (Hubert and Arabie, 1985). Let X and Y be covers of
the same graph to be compared. We denote with ajj the
number of pairs of vertices occurring together in exactly
j communities in both covers. It is a natural generalisation of the variables a00 and a11 we have seen above,
where j can also be larger than 1 since a pair of vertices
can now belong simultaneously to multiple communities.
The variable
X
2
o(X , Y) =
ajj
(11)
n(n − 1) j
is the fraction of pairs of vertices belonging to the same
number of communities in both covers (including the case
j = 0, which refers to the pairs not being in any community together). The Omega index is defined as
Ω(X , Y) =

o(X , Y) − oe (X , Y)
,
1 − oe (X , Y)

(12)

where oe (X , Y) is the expected value of o(X , Y) according
to the null model discussed earlier, in which vertex labels
are randomly reshuffled such to generate covers with the
same number and size of the communities.
The NMI has also been extended to covers by Lancichinetti, Fortunato and Kertész (Lancichinetti et al.,
2009). The definition is non-trivial: the community assignments of a cover are expressed by a vectorial random
variable, as each vertex may belong to multiple clusters
at the same time. The measure overestimates the similarity of two covers, in special situations, where intuition suggests much lower values. The problem can be

15

20

Recently the NMI has been extended to handle the comparison
of hierarchical partitions as well (Perotti et al., 2015).

1.0
0.8
0.6
0.4
0.2
0.0
0.2

Omega

by Lancichinetti, Fortunato and Kertész (Lancichinetti
et al., 2009) (left diagram) and with the Omega index
[Eq. (12)] (right diagram). From the left plot one would
think that Ganxis clearly outperforms LinkCommunities,
whereas from the right plot Ganxis still prevails for µ until about 0.5 (though the curves are closer to each other
than in the NMI plot) and LinkCommunities is better for
larger values of µ.

NMI

solved by using an alternative normalisation, as shown
in (McDaid et al., 2011). Unfortunately neither the definition by Lancichinetti, Fortunato and Kertész nor the
one by McDaid, Greene and Hurley are proper extensions
of the NMI, as they do not coincide with the classic definition of Eq. (9) when partitions in non-overlapping clusters are compared. However, the differences are typically
small, and one can rely on them in practice. Esquivel
and Rosvall have proposed an actual extension (Esquivel
and Rosvall, 2012). Following the comparative analysis
performed in (Lancichinetti and Fortunato, 2009), the
NMI by Lancichinetti, Fortunato and Kertész has been
regularly used in the literature, also in the case of regular
partitions, without overlapping communities20 .
If covers are fuzzy (Section II.B), the similarity measures above cannot be used, as they do not take into
account the degree of membership of vertices in the communities they belong to. A suitable option is the Fuzzy
Rand index (Hüllermeier and Rifqi, 2009), which is an
extension of the Adjusted Rand index. Both the Fuzzy
Rand index and the Omega index coincide with the Adjusted Rand index when communities do not overlap.
For temporal networks, a naı̈ve approach would be
comparing partitions (or covers) corresponding to configurations of the system in the same time window, and
to see how this score varies across different time windows.
However, this does not tell if the clusters are evolving in
the same way, as there would be no connection between
clusterings at different times. A sensible approach is comparing sequences of clusterings, by building a confusion
matrix that takes into account multiple snapshots. This
strategy allows one to define dynamic versions of various
indices, like the NMI and the VI (Granell et al., 2015).
In conclusion, while there is no clear-cut criterion to establish which similarity measure is best, we recommend
to use measures based on information theory. In particular, the VI seems to have more potential than others, for the reasons we explained, modulo the caveats in
Refs. (Delling et al., 2006; Traud et al., 2011). There are
currently no extensions of the VI to handle the comparison of covers, but it would not be difficult to engineer
one, e. g., by following a similar procedure as in (Lancichinetti et al., 2009; McDaid et al., 2011), though this
might cost the sacrifice of some of its nice features.
One should keep in mind that the choice of one similarity index or another is a sensitive one, and warped
conclusions may be drawn when different measures are
adopted. In Fig. 13 we show the accuracy of two algorithms on the LFR benchmark (Section III.A): Ganxis, a
method based on label propagation (Xie and Szymanski,
2012) and LinkCommunities, a method based on grouping edges instead of vertices (Ahn et al., 2010) (Section IV.D). The accuracy is estimated with the NMI

0.4

0.6

Mixing parameter µ

0.8

1.0
0.8
0.6
0.4
0.2
0.0
0.2

Ganxis
LinkCommunities

0.4

0.6

Mixing parameter µ

0.8

FIG. 13 Importance of choice of partition similarity measures.
The plots show the comparison between the planted partition
of LFR benchmark graphs and the ones found by two algorithms: Ganxis and LinkCommunities. In the left diagram
similarity was computed with the NMI, in the right one with
the Omega index. The performances of the algorithms appear
much closer when the Omega index is used. The LFR benchmark graphs used in the analysis have 1 000 vertices, average
degree 15, maximum degree 50, exponents 2 and 1 for the degree and community size distributions and range [10, 50] for
the community size.

C. Detectability

In validation procedures one assumes that, if the network has clusters, there must be a way to identify them.
Therefore, if we do not manage, we have to blame the
specific clustering method(s) adopted. But are we certain that clusters are always detectable?
Most networks of interest are sparse, i. e., their average
degree is much smaller than the number of vertices. This
means that the number of edges of the graph is much
smaller than the number of possible edges n(n − 1)/2.
A more precise way to formulate this is by saying that
a graph is sparse when, in the limit of infinite size, the
average degree of the graph remains finite. A number of
analytical calculations can be carried out by using network sparsity. Many algorithms for community detection
only work on sparse graphs.
On the other hand, sparsity can also give troubles. Due
to the very low density of edges, small amounts of noise
could perturb considerably the structure of the system.
For instance, random fluctuations in sparse graphs could
trick algorithms into finding groups that do not really
exist (Section IV.I). Likewise, they could make actual
groups undetectable. Let us consider the simplest version
of the assortative stochastic block model, which matches
the planted partition model (Section III.A). There are q
communities of the same size n/q, and only two values for

the edge probability: pin for pairs of vertices in the same
group and pout for pairs of vertices in different groups.
Since the graphs are sparse, pin and pout vanish in the
limit of infinite graph size. So we shall use the expected
internal and external degrees hkin i = npin /q and hkout i =
npout (q − 1)/q, which stay constant in that limit. By
construction, the groups are communities so long as pin >
pout or, equivalently, for hkin i > hkout i/(q − 1). But that
does not mean that they are always detectable.
In principle, dealing with the issue of detectability involves examining all conceivable clustering techniques,
which is clearly impossible. Fortunately, it is not necessary, because we know what model has generated the
communities of the graphs we are considering. The most
effective technique to infer the groups is then fitting the
stochastic block model on the data (a posteriori block
modelling). This can be done via the maximum likelihood
method (Gelman et al., 2014). In recent work (Decelle
et al., 2011), Decelle et al. have shown that, in the limit
of infinite graph size, the partition obtained this way is
correlated with the planted partition whenever
hkin i −

hkout i p
> hkin i + hkout i,
q−1

Fraction of correctly
detected vertices

16

1.00
0.75

Louvain
OSLOM
Infomap

0.50
0.25
0.00

0

2
4
6
8 10 12
Average external degree < kout >

14

FIG. 14 Detectability of communities. Performances of three
popular algorithms on the benchmark by Girvan and Newman. The dotted vertical line at hkout i = 8 indicates the
threshold corresponding to the concept of strong community
á la Radicchi et al., the dashed line at hkout i = 12 the threshold according to the probability-based definition of strong
community we have given in Section II.C. The baseline of
random assignment is 1/4 (horizontal dashed line). All algorithms do not do better than random assignment already
before hkout i = 12. The theoretical detectability limit is at
hkout i = 9, in the limit of groups of infinite size.

(13)

which implies
hkin i >

hkout i 1
+
q−1
2

s
1+

1+

4qhkout i
q−1

!
.

(14)

So,
out i, when hk
in i is in the range
 given a value of hkq
hkout i hkout i
4qhk
i
1
out
1 + q−1
the probability
q−1 , q−1 + 2 1 +
pc of classifying a vertex correctly is not larger than the
probability 1/q of assigning the vertex to a randomly
chosen group, although the groups are communities, according to the model. We stress that this result only
holds when the graphs are sparse: if pin and pout remain
non-zero in the large-n limit (dense graph), the classic
detectability threshold pin > pout is correct.
A fortiori, no clustering technique can detect the clusters better than random assignment when the inference
of the model parameters fails to do so. If communities
are searched via the spectral optimisation of NewmanGirvan’s modularity (Newman, 2006), one obtains the
same threshold of Eq. (14) (Nadakuditi and Newman,
2012), provided the network is not too sparse.
For the benchmark of Girvan and Newman (Section III.A) (Girvan and Newman, 2002) it has long been
unclear where the actual detectability limit sits. GirvanNewman benchmark graphs are not infinite, their size
being set to 128, so there is no proper detectability
transition, but rather a smooth crossover from a regime
in which clusters are frequently detectable to a regime
where they are frequently undetectable. For this reason there cannot be a sharp threshold separating the two
regimes. Still it is useful to have an idea of where the
pattern changes. In the following we shall still use the

term threshold to refer to the crossover point. In the
beginning, scholars thought that clusters are detectable
as long as they satisfy the definition of strong community by Radicchi et al. (Radicchi et al., 2004) (Section II.B), i. e., as long as the expected internal degree
exceeds the expected external degree, yielding a threshold hkin i = hkout i (Girvan and Newman, 2002). Since
the expected total degree of a vertex is set to 16, communities are detectable as long as hkout i < kdstrong = 8.
It soon became obvious that the actual threshold should
be the one of the “modern” definition of community we
have presented in Section II.C, according to which the
condition21 is pin > pout , that is hkout i < kdstandard = 12.
However, numerical calculations reveal that algorithms
tend to fail long before that limit. From Eq. (14) we see
that for the case of four infinite clusters and total expected degree hktot i = 16, the theoretical detectability
limit is kdtheor = 9. In Fig. 14 we see the performance
on the benchmark of three well-known algorithms: Louvain (Blondel et al., 2008), a greedy optimisation technique of Newman–Girvan modularity (Newman and Girvan, 2004) (Section IV.F); Infomap, which is based on
random walk dynamics (Rosvall and Bergstrom, 2008)
(Section IV.G); OSLOM, that searches for clusters via a
local optimisation of a significance score (Lancichinetti
et al., 2011). The accuracy is estimated via the fraction of correctly detected vertices (Section III.B). The

21

In the setting of the Girvan-Newman benchmark, where edge
probabilities are identical for all vertices, the strong and weak
definitions we presented in Section II.C coincide.

17
three thresholds kdstrong , kdstandard and kdtheor are represented by vertical lines. The performance of all methods
becomes comparable with random assignment well before kdstandard . The theoretical limit kdtheor appears to be
compatible with the performance curves.
Graph sparsity is a necessary condition for clusters to
become undetectable, but it is not sufficient. The symmetry of the model we have considered plays a major
role too. Clusters have equal size and vertices have equal
degree. This helps to “confuse” algorithms. If communities have unequal sizes and the degree of vertices are
correlated with the size of their communities, so that vertices have larger degree, the bigger their clusters, community detection becomes easier, as the degrees can be used
as proxy for group membership. In this case, the nontrivial detectability limit disappears when there are four
clusters or fewer, while it persists up to a given extent
of group size inequality when there are more than four
clusters (Zhang et al., 2016). Other types of block structure, like core-periphery, do not suffer from detectability
issues (Zhang et al., 2015).
LFR benchmark graphs are more complex models than
the one studied in (Zhang et al., 2016) and it is not clear
whether there is a non-trivial detectability limit, though
it is unlikely, due to the big heterogeneity in the distribution of vertex degree and community size.

D. Structure versus metadata

Another standard way to test clustering techniques is
using real networks with known community structure.
Knowledge about the memberships of the vertices typically comes from metadata, i. e., non-structural information. If vertices are annotated communities are assumed
to be groups of vertices with identical tags. Examples are
user groups in social networks like LiveJournal and product categories for co-purchasing networks of products of
online retailers such as Amazon.
In Fig. 15 we show the most popular of such benchmark
graphs, Zachary karate club network (Zachary, 1977). It
consists of 34 vertices, the members of a karate club in the
United States, who were observed over a period of three
years. Edges connect individuals interacting outside the
activities of the club. Eventually a conflict between the
club president (vertex 34) and the instructor (vertex 1)
led to the fission of the club in two separate groups, whose
members supported the instructor and the president, respectively (indicated by the colours). Indeed, the groups
make sense topologically: vertices 1 and 34 are hubs, and
most members are directly connected to either of them.
Most algorithms of community detection have been
tested on this network, as well as others, e. g., the American college football network (Evans, 2010; Girvan and
Newman, 2002) or Lusseau’s network of bottlenose dolphins (Lusseau, 2003). The idea is that the method doing
the best job at recovering groups with identical annotations would also be the most reliable in applications.

16

23

10

19

20

31

15

14

34
9

21
33
27

29

13
4
22

2
3

6
11

30
24

28

5

1

18

32

17

7

12

8
26

25

FIG. 15 Zachary karate club network. Symbols of different
colours indicate the two groups generated by the fission of
the network, following the disagreement between the club’s
instructor (vertex 1) and its president (34).

Such idea, however, is based on a questionable principle, i. e., that the groups corresponding to the metadata are also communities in the topological sense we
have discussed in Section II. Communities exist because
their vertices are supposed to be similar to each other,
in some way. The similarity among the vertices is then
revealed topologically through the higher edge probability among pairs of members of the same group than between pairs of members of different groups, whose similarity is lower. Hence, when one is provided with annotations or other sources of information that allows to classify vertices based on their similarity, one expects that
such similarity-based classes are also the best communities that structure-based algorithms may detect.
Indeed, for some small networks like Zachary’s karate
club this seems to be the case. But for quite some time
scholars could not test this hypothesis, due to the limited number of suitable data sets. Over the past few years
this has finally become possible, due to the availability
of several large network data sets with annotated vertices (Hric et al., 2014; Yang and Leskovec, 2012a, 2013,
2014). It turns out that the alignment between the communities found by standard clustering algorithms and the
annotated groups is not good, in general. In Fig. 16
we show the similarity between the topological partitions found by different methods and the annotated partitions, for several social, information and technological
networks (Hric et al., 2014). The heights of the vertical
bars are the values of the normalised mutual information
(NMI) (Lancichinetti et al., 2009). Groups of contiguous
bars represent the scores for a given data set. To the
left of the vertical dashed line we see the results for classic benchmarks, like LFR graphs (Section III.A) (Lancichinetti et al., 2008), Zachary karate club, etc., and the
scores are generally good. But for the large data sets
to the right of the line the scores are rather low, signalling a significant mismatch between topological and
annotated communities. For Amazon co-purchasing network (Yang and Leskovec, 2012b), in which vertices are
products and edges are set between products often purchased by the same customer(s), the similarity is quite a

18

1.0

CD algorithms

CliquePerc
Conclude
Copra
Demon
Ganxis
GreedyCliqueExp
Infomap
InfomapSingle
LinkCommunities
Louvain
Oslom

0.8

NMI

0.6
0.4

u
ac t
kst
rom
lj-m
islo
ve

ork

lj-b

flic
kr

lp
am
azo
n

db

ob
ii
an

pg

fb1

00

a
aid

d

asc

dp

og
s
lbl

ok
s

po

ll

lbo

tba

po

e

foo

rat

ka

lfr

0.0

p

0.2

FIG. 16 Mismatch between structural communities and metadata groups. The vertical axis reports the similarity between
communities detected via several clustering methods and groups of vertices with identical attributes for several networks,
estimated via the NMI. Scores are grouped by data sets on the horizontal axis. The vertical dashed line separates small classic
data sets from large ones, recently compiled. The low scores obtained for the latter indicate that the correspondence between
structural and annotated communities is poor. Reprinted figure with permission from (Hric et al., 2014). c 2014, by the
American Physical Society.

bit higher than for the other networks. This is because
the classification of Amazon products is hierarchical (e.
g., Books/Fiction/Fantasy), so there are different levels
of annotated communities, and the reported scores refer
to the one which is most similar to the structural ones
detected by the algorithms, while the other levels would
give lower similarity scores. Low similarity at the partition level does not rule out that some communities of
the structural partition significantly overlap with their
annotated counterparts, but precision and recall scores
show that this is not the case. Results depend more on
the network than on the specific method adopted, none
of which appears to be particularly good on any (large)
data set.
So the hypothesis that structural and annotated clusters are aligned is not warranted, in general. There can
be multiple reasons for that. The attributes could be
too general or too specific, yielding communities which
are too large or too small to be interesting. Moreover,
while the best partition of the network delivered by an
algorithm can be poorly correlated with the metadata,
there may be alternative topological divisions that also
belong to a set of valid solutions, according to the algorithm22 , but happen to be better correlated with the

22

For instance, for clustering methods based on optimisation, there
are many partitions corresponding to values of the quality func-

annotations (Newman and Clauset, 2016).
The fact that structural and annotated communities
may not be strongly correlated has important consequences. Scholars have been regularly testing their algorithms on small annotated graphs, like Zachary’s karate
club, by tuning parameters such to obtain the best possible performance on them. This is not justified, in general,
as it makes sense only when there is a strong correspondence, which is a priori unknown. Also, forcing an alignment with annotations on one data set does not guarantee that there is going to be a good alignment with the
annotations of a different network. Besides, one of the
reasons why people use clustering algorithms is to provide an improved classification of the vertices, by using
structure. If one obtained the same thing, why bother?
The right thing to do is using structure along with the
annotations, instead of insisting on matching them. This
way the information coming from structure and metadata can be combined and we can obtain more accurate
partitions, if there is any correspondence between them.
Recent approaches explicitly assume that the metadata
(or a portion thereof) are either exactly or approximately
correlated with the best topological partition (Bothorel

tion very close to the searched optimum (Good et al., 2010). The
algorithm will return one (or some) of them, but the others are
comparably good solutions.

D

at

a,

A

M

et
a

d

at

a,

T

19

FIG. 17 Data-metadata stochastic block model by Hric et
al.. One layer consists of the network itself, with its vertices
and edges. The other layer is composed of the graph vertices
and vertices representing the annotations, the edges indicating which vertices are associated to which tag. The presence
of network vertices on both layers induces a coupling between
them. Reprinted figure with permission from (Hric et al.,
2016). c 2016, by the American Physical Society.

et al., 2015; Leng et al., 2013; Moore et al., 2011; Peel,
2015; Yang et al., 2013). A better approach is not assuming a priori that the metadata correlate with the structural communities. The goal is quantifying the relationship between metadata and community and use it to improve the results. If there is no correlation, the metadata
would be ignored, leaving us with the partition derived
from structure alone.
Methods along these lines have been developed, using
stochastic block models. Newman and Clauset (Newman and Clauset, 2016) have proposed a model in which
vertices are initially assigned to clusters based on metadata, and then edges are placed between vertices according to the degree-corrected stochastic block model (Karrer and Newman, 2011). Hric et al. have designed a
similar model (Hric et al., 2016), in which the interplay between structure and metadata is represented by
a multilayer network (Fig. 17). The generative model is
an extension of the hierarchical stochastic block model
(SBM) (Peixoto, 2014) with degree-correction for the
case with edge layers (Peixoto, 2015a). Here the metadata is not supposed to correspond simply to a partition
of the vertices. The majority of data sets contain rich
metadata, with vertices being annotated multiple times,
and often few vertices possess the exact same annotations and can be thus associated to the same group. In
addition, while the number of communities is required
as input by the method of Newman and Clauset, here it
is inferred from the data. Finally, it is also possible to
assess the metadata in its power to predict the network
structure, not only their correlation with latent parti-

tions. This way it is possible to predict missing vertices
of the network, i. e., to infer the connections of a vertex
from its annotations only. We stress that neither method
requires that all vertices are annotated.
Applications of the method by Hric et al. (Hric et al.,
2016) reveal that in many data sets there are statistically significant correlations between the annotations and
the network structure, while in some cases the metadata
seems to be largely uncorrelated with structural clusters.
We conclude that network metadata should not be used
indiscriminately as ground truth for community detection methods. Even when the metadata is strongly predictive of the network structure, the agreement between
the annotations and the network division tends to be
complex, and very different from the one-to-one mapping that is more commonly assumed. Moreover, data
sets usually contain considerable noise in their annotations, and some metadata tags are essentially random,
with no relationship to structure.

E. Community structure in real networks

Artificial benchmark graphs are certainly very useful
to assess the performance of clustering algorithms. However, one could always question whether the model of
community structure they propose is reliable. How can
we assess this? In order to characterise “real” communities we have to find them first. But that can only be
done via some algorithm, and different algorithms search
for different types of objects, in general. Still, one may
hope that general properties of communities can be consistently uncovered across different methods and data
sets, while other features are more closely tied to the
specific method(s) used to detect the clusters and (or)
the specific data set at study (or classes thereof).
A seemingly robust feature of communities in real networks is the heterogeneity of their size distribution. Most
clustering techniques find skewed distributions of cluster sizes in many networks. So, there appears to be no
characteristic size for a community: small communities
usually coexist with large ones. This feature is rather
independent of the type of network (Fig. 18). It may signal a hierarchy among communities, with small clusters
included in large ones. Methods unable to distinguish
between hierarchical levels might find “blended” partitions, consisting of communities of different levels and
hence of very different sizes. The LFR benchmark was
the first graph model to take explicitly into account the
heterogeneity of community sizes (Section III.A).
Another interesting question is how the quality of communities depends on their size. Leskovec et al. (Leskovec
et al., 2009) carried out a systematic analysis of clusters
in many large networks, including traditional and on-line
social networks, technological, information networks and
web graphs. Instead of considering partitions, they focused on individual communities, which are derived by
optimising conductance (Section II.A) around seed ver-

20

Communication
-2

-2

10

P(s)

Information

Internet
-2

10

-4

10

10

-4

-4

10

10

-6

10

-6

Wiki Talk
Email

0

1

2

3

-8

10
4

5

10 10 10 10 10 10

10
0

10

1

2

10

3

10

10

Biological
Dmela
Yeast
Human

-1

P(s)

10

-2

0

10

1

10

2

10

3

10

Social
Live J
Epinions
Last FM
Slashdot

-2

10

10

-3

-6

10

10

-4

-8

10

10

-10

-5

10

4

10

-4

10

10

web-G
Arxiv
Amazon
web-BS

10

AS-caida
AS-dimes

-8

-10

10

-6

10
-8

10

0

1

10

2

10

10

0

1

2

3

4

5

10 10 10 10 10 10

Module Size s
FIG. 18 Distribution of community sizes in real networks.
The clusters were detected with Infomap (Rosvall and
Bergstrom, 2008), but other methods yield qualitatively similar results. Various classes of networks are considered. All
distributions are broad, spanning several orders of magnitude.
Reprinted figure with permission from (Lancichinetti et al.,
2010).

tices. We remind that the conductance of a cluster is
the ratio between the number of external edges and the
total degree of the cluster. Minimising conductance effectively combines the two main community demands,
i. e., good separation from the rest of the graph (low
numerator) and large number of internal edges (high denominator). The measure is also relatively insensitive to
the size of the clusters, as both the numerator and the
denominator are typically proportional to the number of
vertices of the community23 . Therefore one could use it
to compare the quality of clusters of different sizes. For
any given size k Leskovec et al. identified the subgraph
with k vertices with the lowest conductance. This way,
for each network one can draw the network community
profile (NCP), showing the minimum conductance as a
function of community size. The NCPs of all networks
studied by Leskovec et al. have a characteristic shape:
they go downwards till k ∼ 100 vertices, and then they
rise monotonically for larger subgraphs [Fig. 19 (left)].
Alternative shapes have been recently found for other
data sets (Jeub et al., 2015).
For networks characterised by NCPs like the one in
Fig. 19 (left) the most pronounced communities are fairly
small in size. Such small clusters are weakly connected
to the rest of the network, often by a single edge (in this
case they are called whiskers), and form the periphery

23

This is exactly true when the ratio between the external and the
total degree (mixing parameter) is the same for all community
vertices.

of the graph. Large clusters have comparatively lower
quality and melt into a big core. Large communities can
often be split in parts with lower conductance, so they
can be considered conglomerates of smaller communities.
A schematic picture of the resulting network structure is
shown in Fig. 19 (right). The shape of the NCP is fairly
independent of the specific technique adopted to identify
the subgraphs with minimum conductance. The different
shapes of the NCPs encountered in data suggest that
core-periphery is not the only model of group structure
of real networks (Jeub et al., 2015).
The NCP is a signature that can be used to select generative mechanisms of community structure. Indeed, many standard models typically yield NCP sloping
steadily downwards, at odds with the ones encountered in
many social and information networks. Stochastic block
models (Section II.C) are sufficiently versatile that they
can reproduce the NCP shape of Fig. 19 (left), by suitably
tuning the parameters. In the standard LFR benchmark
(Section III.A) the mixing parameters are tightly concentrated about a value µ by construction, hence all clusters
have approximately conductance µ, yielding a roughly
flat NCP24 (Jeub et al., 2015). However, the model can
be easily modified by making µ community-dependent
and a large variety of NCPs are attainable, including the
one of Fig. 19 (left).
The main problem of working with NCPs is that they
are based on extreme statistics, as one systematically reports the minimum conductance for a given cluster size.
How representative is this extremal subgraph of the population of subgraphs with the same size? There may be
just a few clusters of a given size with low conductance.
It may happen that many subgraphs have conductance
near the minimum corresponding to their size(s), which
would then be representative. Alternatively most subgraphs might have much larger conductance than the
minimum but low enough that they can be still considered communities. In this case one should conclude that
communities of that size are not of very high quality,
on average. The above scenarios might lead to different
conclusions about the actual community structure of the
system. In general, even if one could produce a version
of the NCP where the trend refers to representative samples of communities of equal size (whatever that means),
the actual values of the conductance are as important as
the shape of the curve. If conductance is sufficiently low
for all cluster sizes, it means that there are good communities of any size. The fact that small clusters could
be of higher quality does not undermine the role of large
clusters. The observation that large clusters consist of
smaller clusters of higher quality may just be evidence of
hierarchical structure in the network, which is a trade-

24

If all vertices of a subgraph have mixing parameter equal to µ,
it can be easily shown that the conductance of the subgraph is
exactly equal to µ.

21

FIG. 19 Core-periphery structure of real networks. (Left) Schematic shape of the network community profile (NCP), showing
how the minimum conductance of subgraphs of size k varies with k. This pattern is very common in large social and information
networks. The “best” communities in terms of conductance have a size of about 100 vertices (minimum of the curve), whereas
communities of larger sizes have lower quality. The curve labeled Rewired network is the NCP of a randomised version of
the network, where edges are randomly rewired by keeping the degree distribution; the one labeled Bag of whiskers gives
the minimum conductance scores of communities composed of disconnected pieces. (Right) Scheme of the core-periphery
structure in large social and information networks associated to the NCP above. Most vertices are in a central core, where
large communities are blended together, whereas the best communities, which are rather small, are weakly connected to the
core. Reprinted figure with permission from Ref. (Leskovec et al., 2009). c 2009, by Taylor and Francis.

mark of many complex systems (Simon, 1962). In that
case high levels of the hierarchy are not less important
than low ones, a priori. In fact, the actual relative importance of communities should not only come from the
sheer value of specific metrics, like conductance, but also
from their statistical significance (Section IV.I).
That notwithstanding, we strongly encourage analyses
like the one by Leskovec et al., as they provide a statistical characterisation of community structure, in a way
that is only weakly algorithm-dependent. One has to
define operationally what a cluster is, but in a simple
intuitive way that allows us to draw conclusions about
the structure of the graph. In principle one could do
the same by analysing the clusters delivered by any algorithm, but there would be two important drawbacks.
First, the clusters may not be easy to interpret, as most
clustering algorithms usually do not require a clear-cut
definition of community. Second, one would have to handle a partition of the network in communities, instead
of probing locally the group structure of the network.
Therefore, for a given vertex one would have only one
cluster (or a handful, if communities overlap), while a
local exploration allows to analyse a whole population
of candidate subgraphs, which gives more information.
The local subgraphs recovered this way do not need to
be strongly matching the clusters delivered by any algorithm, but they provide useful signatures that allow to
restrict the set of possible model explanations for the network’s group structure. Such investigation can be replicated on any model graph to check whether the results
match (e. g., whether the NCPs coincide).

Another approach to infer properties of clusters of real
networks is using annotations. While we have shown that
annotated clusters do not necessarily coincide with structural ones (Section III.D), general features can be still
derived, provided they are consistently found across different data sets and annotations. A recent analysis by
Yang and Leskovec has questioned the common picture
of networks with overlapping communities (Yang and
Leskovec, 2014). Scholars usually assume that clusters
overlap at their boundaries, hence edge density should be
larger in the non-overlapping parts (Fig. 5). Instead, by
analysing the overlaps of annotated clusters in large social and information networks, Yang and Leskovec found
that the probability that two vertices are connected is
larger in the overlaps, and grows with the number of
communities sharing that pair of vertices. In addition,
connector vertices, i. e., vertices with the largest number
of neighbours within a community, are more likely to be
found in the overlaps. These findings suggest that the
overlaps may play an important role in the community
structure of networks. In Fig. 20 we compare the conventional view with the one resulting from the analysis. The
Community-Affiliation Graph Model (AGM) (Yang and
Leskovec, 2014) and the Cluster Affiliation Model for Big
Networks (BIGCLAM) (Yang and Leskovec, 2013) are
clustering techniques based on generative models of networks featuring communities with dense overlaps. The
models are based on the principle that vertices are more
likely to be neighbours the more the communities sharing them, in line with the empirical finding of (Yang and
Leskovec, 2014).

22
IV. METHODS

(a) No overlaps

(b) Sparse overlaps

There are many algorithms to detect communities in
graphs. They can be grouped in categories, based on different criteria, like the actual operational method (Fortunato, 2010), or the underlying concept of community (Coscia et al., 2011). In most applications, however, just a few popular algorithms are employed. In this
section we present a critical analysis of these methods.
We show the advantages of knowing the number of clusters before-hand and how it is possible to derive robust
solutions from partitions delivered by stochastic clustering techniques. We discuss approaches to the problem of
detecting communities in evolving networks and how to
assess the significance of the detected clustering. We conclude by suggesting the methods that currently appear
to be most promising.

A. How many clusters?

(c) Dense overlaps

(d) Adjacency matrix of (c)

FIG. 20 Stylised views of community structure. In (a) and
(b) we show the conventional pictures of non-overlapping and
overlapping clusters, respectively. Under the network diagrams we see the corresponding adjacency matrices. The
overlaps have a lower edge density than the rest of the communities. The analysis by Yang and Leskovec suggests that
a more realistic model could be the one shown in (c, d),
where the overlaps are denser than the non-overlapping parts.
Reprinted figure with permission from (Yang and Leskovec,
2014). c 2014, by the Association for Computing Machinery,
Inc.

Actual overlapping communities exist in many contexts. However, it is unclear whether soft clustering is
statistically founded. A recent analysis aiming at identifying suitable stochastic block models to describe real
network data indicate that in many cases hard partitions ought to be preferred, as they give simpler descriptions of the group structure of the data than soft
partitions (Peixoto, 2015b). This could be due to the
fact that the underlying models are based on placing
edges independently of each other, neglecting higher order structures between vertices, like motifs. By adopting
approaches that take into account higher-order structures
things may change and community overlaps might become a statistically robust feature. The pervasive overlaps found by Yang and Leskovec in annotated data can
be found if higher order effects are considered (Persson
et al., 2016; Rosvall et al., 2014), without ad hoc hypotheses.

In general, the only preliminary information available
to any algorithm is the structure of the network, i. e.,
which pairs of vertices are connected to each other and
which are not (possibly including weights). Any insight
about community structure is supposed to be given as
output of the procedure. Naturally, it would be valuable
to have some information on the unknown division of the
network beforehand, as one could reduce considerably the
huge space of possible solutions, and increase the chance
of successfully identifying the communities.
Among all the possible pre-detection inputs, the number q of clusters plays a prominent role. Many popular
classes of algorithms require the specification of q before they run, like methods imported from data clustering or parametric statistical inference approaches (Section IV.E). Other methods are capable to infer q as
they can choose among partitions into different numbers of communities. But even such methods could benefit from a preliminary knowledge of q (Darst et al.,
2014). In Fig. 21 we report standard accuracy plots
of two algorithms on the planted partition model (Section III.A) with two clusters of equal size. The algorithms are modularity optimisation via simulated annealing (Guimerà and Amaral, 2005) and the Absolute
Potts Model (APM) (Ronhovde and Nussinov, 2010)
(Section IV.G). There are two performance curves for
each method: one comes from the standard application
of the method, without constraints; the other is obtained
by forcing the method to explore only the subset of partitions with the correct number of clusters q = 2.
We see that the accuracy improves considerably when
q is known. This is particularly striking in the case of
modularity optimisation, which is known to have a limited resolution, preventing the method from identifying
the correct scale of the communities, even when the latter
are very pronounced (Fortunato and Barthélemy, 2007;
Good et al., 2010) (Section IV.F). Knowing q and con-

23

Fraction of correctly
detected vertices

1.0
0.8
0.6
0.4

Mod
Mod+q
APM
APM+q

0.2
0.0

0

1 2 3 4 5 6 7
Average external degree < kout >

8

FIG. 21 Knowing the number of clusters beforehand improves
community detection. The diagram shows the performance
on the planted partition model of two methods: modularity optimisation via simulated annealing (Guimerà and Amaral, 2005) and the Absolute Potts Model (APM) (Ronhovde
and Nussinov, 2010). Networks have 400 vertices, which
are grouped in two equal-sized communities. The accuracy
is measured via the fraction of correctly detected vertices
(Section III.B). The horizontal line indicates the accuracy of
random guessing, the dashed vertical line the theoretical detectability limit (Section III.C). For each algorithm we show
two curves, referring to the results of the method in the absence of any information on the number of clusters, and when
such information is fed into the model as initial input. In
both cases, knowing the number of clusters beforehand leads
to a much better performance.

straining the optimisation of the measure to partitions
with fixed q, the problem can be alleviated (Nadakuditi
and Newman, 2012).
But how do we know how many clusters there are?
Here we briefly discuss some heuristic techniques, for statistically principled methods we defer the reader to Section IV.E. It has been recently shown that in the planted
partition model q can be correctly inferred all the way
up to the detectability limit from the spectra of two matrices: the non-backtracking matrix B (Krzakala et al.,
2013) and the flow matrix F (Newman, 2013). They are
2m × 2m matrices, where m is the number of edges of
the graph. Each edge is considered in both directions,
yielding 2m directed edges and indicated with the notation i → j, meaning that the edge goes from vertex i to
vertex j. Their elements read
Bi→j,r→s = δis (1 − δjr )

(15)

δis (1 − δjr )
.
ki − 1

(16)

and
Fi→j,r→s =

In Eq. (16) ki is the degree of vertex i. So the elements of F are basically the elements of B, normalised
by vertex degrees. This is done to account for the heterogeneous degree distributions observed in most real networks. Both matrices have non-zero elements only for

each pair of edges forming a directed path from the first
vertex of one edge to the second of the other edge. To
do that, edges have to be incident at one vertex. As a
matter of fact, the non-backtracking matrix B is just the
adjacency matrix of the (directed) edges of the graph.
The name of the matrix B is due to a connection
with the properties of non-backtracking walks. A nonbacktracking walk (Angel et al., 2015) is a path across
the edges of a graph that is allowed to return to a vertex
visited previously only after at least two other vertices
have been visited; immediate returns like 1 → 2 → 1 are
forbidden. The elements of the k-th power of B yield
the number of non-backtracking walks of length k from a
(directed) edge of the graph to another and the trace of
the power matrix the number of closed non-backtracking
walks of length k starting from any given (directed) edge.
A remarkable property of both matrices is that on networks with homogeneous groups (i. e., of similar size and
internal edge density) most eigenvalues, which are generally complex, are enclosed by a circle centred at the
origin, and that the number of eigenvalues lying outside
of the circle is a good proxy of the number of communities of the network (Krzakala et al., 2013; Newman,
2013).√ For B the circle’s radius is given by the square
root c of the leading eigenvalue c, which may diverge
for networks with heterogeneouspdegree distributions (e.
g., power laws); for F it equals hk/(k − 1)i/hki, which
is never greater than 1.
Unfortunately, computing the eigenvalues of the nonbacktracking or the flow matrix is lengthy. Both are
2m×2m matrices. The adjacency matrix A has n×n elements, so B and F are larger by a factor of hki2 , where hki
is the average degree of the network. An approximate but
reliable computation of the spectra requires a time which
scales superlinearly (approximately quadratic) with the
network size n. So the problem is intractable for graphs
with number of edges of the order of millions or higher.
Also, if communities have diverse sizes and edge densities,
as it happens in most networks encountered in applications, the bulk of eigenvalues may not have a circular
shape, and it may become problematic to identify eigenvalues falling outside of the bulk.
Besides, non-backtracking walks must contain cycles,
hence trees25 dangling off the graph do not affect the
spectrum of B, which remains unchanged if all dangling
trees are removed. This is a disturbing feature, as treelike regions of the graph may play a role in the network’s community structure, and most methods would
find different partitions if trees are kept or removed26 .
The spectrum of the flow matrix, instead, changes when
dangling trees are kept or removed (Newman, 2013). In

25
26

We remind that trees are connected acyclic graphs.
Singh and Humphries showed that the problem can be solved
via reluctant backtracking walks, in which the walker has a small
but non-zero probability of returning to the vertex immediately (Singh and Humphries, 2015).

24
the limiting case in which the network itself is a tree, all
eigenvalues of B and F are zero and even if there were a
community structure one gets no relevant information.
The number of clusters can also be deduced by studying how the eigenvectors of graph matrices rotate when
the adjacency matrix of the graph is subjected to random
perturbations (Sarkar et al., 2016). On stochastic block
models this approach infers the correct value of q up to a
threshold preceding the detectability limit. The method
is also computationally expensive.
In general, if one can identify a set (range) of promising q-values, from preliminary information or via calculations like the ones described above or in Section IV.E, it
is better to run constrained versions of clustering methods, searching for solutions only among partitions with
those numbers of communities, than letting the methods
discover q by themselves, which may lead to solutions of
lower quality.

B. Consensus clustering

Many clustering techniques are stochastic in character
and do not deliver a unique answer. A common scenario
is when the desired solution corresponds to extrema of a
cost function, that can only be found via approximation
techniques, with results depending on random seeds and
on the choice of initial conditions. Techniques not based
on optimisation sometimes have the same feature, when
tie-break rules are adopted in order to choose among multiple equivalent options encountered along the calculation.
What to do with all these partitions? Sometimes there
are objective criteria to sort out a specific partition and
discard all others. For instance, in algorithms based on
optimisation, one could pick the solution yielding the
largest (smallest) value of the function to optimise. For
other techniques there is no clear-cut criterion.
A promising approach is combining the information of
the different outputs into a new partition. Consensus
clustering (Goder and Filkov, 2008; Strehl and Ghosh,
2002; Topchy et al., 2005) is based on this idea. The
goal is searching for a consensus partition, that is better
fitting than the input partitions. Consensus clustering
is a difficult combinatorial optimisation problem. An alternative greedy strategy (Strehl and Ghosh, 2002) relies on the consensus matrix, which is a matrix based on
the co-occurrence of vertices in communities of the input
partitions (Fig. 22). The consensus matrix is used as an
input for the graph clustering technique adopted, leading
to a new set of partitions, which produce a new consensus
matrix, etc., until a unique partition is finally reached,
which is not changed by further iterations. The steps of
the procedure are listed below. The starting point is a
network G with n vertices and a clustering algorithm A.

Original Graph

Dij = 1
Dij = 3/4

(I)

(II)

Dij = 2/4
Dij = 1/4

(III)

(IV)

FIG. 22 Consensus clustering in networks. A simple graph
has a natural partition in two communities [squares and circles on diagrams (I) and (II)]. The combination of (I), (II),
(III) and (IV) yields the weighted consensus matrix illustrated
on the right. The thickness of each edge is proportional to
its weight. In the consensus matrix the community structure of the original network is more visible: the two communities have become cliques, with “heavy” edges, whereas
inter-community edges are rather weak. This improvement is
obtained despite the presence of two inaccurate partitions in
three clusters (III and IV). Reprinted figure with permission
from (Lancichinetti and Fortunato, 2012). c 2012, by the
Nature Publishing Group.

ber of partitions in which vertices i and j of G are
assigned to the same community, divided by nP .
3. All entries of D below a chosen threshold τ are set
to zero27 .
4. Apply A on D nP times, yielding nP partitions.
5. If the partitions are all equal, stop28 . Otherwise go
back to 2.
Since the consensus matrix is in general weighted, the
algorithm A must be able to handle weighted networks, even if the graph at study is binary. Fortunately
many popular algorithms have natural extensions to the
weighted case.
The integration of consensus clustering with popular existing techniques leads to more accurate partitions

27

1. Apply A on G nP times, yielding nP partitions.
2. Compute the consensus matrix D: Dij is the num-

Consensus Matrix

28

Without thresholding the consensus matrix would quickly turn
into a dense matrix, rendering the application of clustering algorithms computationally expensive. However, the method does
not strictly require thresholding, and if the network is not too
large one can skip step 3 and use the full matrix all along (Bruno
et al., 2015).
The consensus matrix is block-diagonal in this case.

25
than those delivered by the methods alone on LFR benchmark graphs (Lancichinetti and Fortunato, 2012). Interestingly, this holds even for methods whose direct application gives poor results on the same graphs, like modularity optimisation (Section IV.F). The variability of the
partitions, rather than being a problem, becomes a factor
of performance enhancement. The outcome of the procedure depends on the choice of the threshold parameter
τ and the number of input partitions nP , which can be
selected by testing the performance on benchmark networks (Lancichinetti and Fortunato, 2012). Consensus
clustering is also a promising technique to detect communities in evolving networks (Section IV.H).

Indeed on sparse networks constructed with the
planted partition model spectral methods relying on
standard matrices [adjacency matrix, Laplacian, modularity matrix (Newman, 2006), etc.] fail before the
theoretical detectability limit (Section III.C) (Krzakala
et al., 2013). The non-backtracking matrix B of Eq. (15)
was introduced to address this problem (Krzakala et al.,
2013). On the planted partition model the associated
eigenvalues of the community-related eigenvectors of B
are separated from the bulk until the theoretical detectability limit, so spectral methods using the top eigenvectors of B are capable to find communities as long as
they are detectable, modulo the caveats we expressed in
Section IV.A.

C. Spectral methods

D. Overlapping communities: Vertex or Edge clustering?

Spectral graph clustering is an approach to detect clusters using spectral properties of the graph (Fortunato,
2010; von Luxburg, 2006). The eigenvalue spectrum of
several graph matrices (e. g., the adjacency matrix, the
Laplacian, etc.) typically consists of a dense bulk of
closely spaced eigenvalues, plus some outlying eigenvalues separated from the bulk by a significant gap. The
eigenvectors corresponding to these outliers contain information about the large-scale structure of the network,
like community structure29 . Spectral clustering consists
in generating a projection of the graph vertices in a metric space, by using the entries of those eigenvectors as coordinates. The i-th entries of the eigenvectors are the coordinates of vertex i in a k-dimensional Euclidean space,
where k is the number of eigenvectors used. The resulting
points can be grouped in clusters by using standard partitional clustering techniques like k-means clustering (MacQueen, 1967).
Spectral clustering is not always reliable, however.
When the network is very sparse (Section II.C) the separation between the eigenvalues of the community-related
eigenvectors and the bulk is not sharp. Eigenvectors corresponding to eigenvalues outside of the bulk may be correlated to high-degree vertices (hubs), instead of group
structure. Likewise, community-related eigenvectors can
be associated to eigenvalues ending up inside the bulk. In
these situations, selecting eigenvectors based on whether
their associated eigenvalues are inside or outside the bulk
yields a heterogeneous set, containing information both
on communities and on other features (e. g., hubs). Using those eigenvectors for the spectral clustering procedure renders community detection more difficult, sometimes impossible. Unfortunately, many of the networks
encountered in practical studies are very sparse and can
lead to this type of problems.

Soft clustering, where communities may overlap, is an
even harder problem than hard clustering, where there
is no community overlap. The possibility of having multiple memberships for the vertices introduces additional
degrees of freedom in the problem, causing a huge expansion of the space of possible solutions. It has been
pointed out that overlapping communities, especially in
social networks, reflect different types of associations between people (Ahn et al., 2010; Evans and Lambiotte,
2009). Two actors could be co-workers, friends, relatives,
sport mates, etc.. Actor A could be a work colleague of B
and a friend of C, so she would sit in the overlap between
the community of colleagues of B and the community
of friends of C. For this reason, it has been suggested
that an effective way to recover overlapping clusters is to
group edges, rather than vertices. In the example above,
the edges connecting A with B and A with C would be
placed in different groups, and since they both have A
as endpoint, the latter turns out to be an overlapping
vertex.
Moreover, edge clustering is claimed to have the additional advantage of reconciling soft clustering with hierarchical community structure (Ahn et al., 2010). If there
is hierarchy, communities are nested within each other
as many times as there are hierarchical levels. Hierarchical structure is often represented via dendrograms30 ,
with the network being divided in clusters, which are in
turn divided in clusters, and so on until one ends up with
singleton clusters, consisting of one vertex each. But this
can be done only if communities do not share vertices.

29

Typically each such eigenvector is localised, in that its entries are
markedly different from zero in correspondence of the vertices of
a community, while the other entries are close to zero.

30

A dendrogram, or hierarchical tree, is a tree diagram used to represent hierarchical partitions. At the bottom there are as many
nodes as there are vertices in the graph, representing the singleton clusters. At the top there is one node (root), standing for
the partition grouping all vertices in a single cluster. Horizontal
lines indicate mergers of a pair of clusters or, equivalently, splits
of one cluster. Each vertex of the tree identifies one cluster,
whose elements can be read by following all bifurcations starting
from the vertex all the way down to the leaves of the tree.

26
Overlapping vertices should be assigned to multiple clusters of lower hierarchical levels, yielding multiple copies
of them in the dendrogram. Instead, one could build
a dendrogram displaying edge communities, where each
edge is assigned to a single cluster, but clusters can still
overlap because edges in different clusters may share one
endpoint (Ahn et al., 2010).
Some remarks are in order. First, there may still be
overlapping communities even if there were a single type
of association between the vertices. For instance, if we
keep only the friendship relationships within a given population of actors, there are many social circles and there
could be active actors with multiple ties within two circles, or more. Second, in the traditional picture of networks with community structure (Fig. 4), the edges connecting two different groups may be assigned to one of the
communities they join or they could be put together in
a separate group. Either way, they would signal an overlap between the communities, which is artificial. This
happens even in the extreme case of a single edge connecting vertices A and B of two groups, as that edge
will have to be assigned to a group, which inevitably
forces A and B into a common cluster. Third, if we
rely on the picture emerging from the analysis by Yang
and Leskovec (Yang and Leskovec, 2014) (Section III.E)
overlaps between clusters could be much denser than we
expect, hence not only vertices but also edges may be
shared among different groups, and edge dendrograms
would have the same problem as classic vertex dendrograms31 . Fourth, the computational complexity of the
calculation can rise substantially, as in networks of interest there are typically many more edges than vertices.
Finally, there is nothing revealing that there is a conceptual or algorithmic advantage in grouping edges versus
vertices, other than works showing that a specific edge
clustering technique outperforms some vertex clustering
techniques on a specific set of networks.
To shed some light on the situation, we performed the
following test. We took some network data sets with
annotated vertices, giving an indication about what the
communities of those networks could be32 . For each network G we derived the corresponding line graph L(G),
which is the graph whose vertices are the edges of G,
while edges are set between pairs of vertices of L(G)
whose corresponding edges in G are adjacent at one of
their endpoints. Vertex communities of L(G) are then
edge communities of the original network G. The question is whether by working on L(G) the detection improves or not. We searched for overlapping communities

31

32

In this case, if we used edge clustering, each edge would be placed
in one cluster only. However, when one turns edge communities
into vertex communities, multiple relationships can be still recovered (Ahn et al., 2010).
We have seen in Section III.D that metadata are not necessarily
correlated with topological clusters. We used data sets for which
there is some correlation.

with OSLOM (Lancichinetti et al., 2011). We applied
OSLOM on the original graphs and on their line graphs.
The covers found on the line graphs were turned into covers of the vertices of G, by replacing each vertex of L(G)
with the pair of vertices of the corresponding edge of G.

FIG. 23 Comparison between edge communities and vertex
communities. The diagram shows the similarity between the
covers of annotated vertices of five social networks and the
topological covers found by OSLOM on them (left bar of
each pair) and on their line graphs (right bar). The networks
represent FaceBook friendships between students of US universities (Traud et al., 2012). There are several annotations
for each student, we selected those which are more closely
related to topological groups: year of study and dormitory.
The similarity was computed by using the normalised mutual
information (NMI), in the version for covers proposed by Lancichinetti, Fortunato and Kertész (Lancichinetti et al., 2009).
Vertex communities detected in the original graphs are overall better correlated with the annotated clusters than edge
communities.

The results can be seen in Fig. 23, showing how similar
the covers found on the original networks and on the line
graphs are with respect to the covers of annotated vertices. Neither approach is very accurate, as expected (see
Section III.D and Fig. 16), but vertex communities show
a greater association to the annotated clusters than edge
communities, except in a few instances where the similarity is very low. Analyses carried out on LFR benchmark
graphs (not shown) lead to the same conclusion. We
stress that traditional line graphs have the problem that
edges adjacent to a hub vertex in the original graph turn
into vertices who are all connected to each other, forming
giant cliques, which might dominate the structure of the
line graph, misleading clustering techniques. The procedure can be refined by introducing weights for the edges
of the line graphs, that can be computed in various ways,
e. g., based on the similarity of the neighbourhoods of
adjacent edges in the original network (Ahn et al., 2010).
Still we believe that our tests provide some evidence
that edge clustering is no better than vertex clustering,
in general. The superiority of algorithms based on either
approach should be assessed a posteriori, case by case,
and the answer may depend on the specific data sets
under investigation.

27
E. Methods based on statistical inference

Statistical inference provides a powerful set of tools to
tackle the problem of community detection. The standard approach is to fit a generative network model on
the data (Ball et al., 2011; Guimerà and Sales-Pardo,
2009; Hastings, 2006; Karrer and Newman, 2011; Newman and Leicht, 2007; Peixoto, 2014). The stochastic block model (SBM) is by far the most used generative model of graphs with communities (see Section II.C
and references therein). We have seen that it can describe other types of group structure, like disassortative
and core-periphery structure (Fig. 8). The unnormalised
maximum log-likelihood that a given partition g in q
groups of the network G is reproduced by the standard
SBM reads (Karrer and Newman, 2011)
q
X

LS (G|g) =


ers log

r,s=1

ers
nr ns


,

(17)

where ers is the number of edges running from group r to
group s, nr (ns ) the number of vertices in r (s) and the
sum runs over all pairs of groups (including when r = s).
This version of the model, however, does not account
for the degree heterogeneity of most real networks, so it
does a poor job at describing the group structure of many
of them. Therefore, Karrer and Newman proposed the
degree-corrected stochastic block model (DCSBM) (Karrer and Newman, 2011), in which the degrees of the vertices are kept constant, on average, via the introduction
of additional suitable parameters33 . The unnormalised
maximum log-likelihood for the DCSBM is
LDC (G|g) =

q
X
r,s=1


ers log

ers
er es


,

(18)

where er (es ) is the sum of the degrees of the vertices in
r (s).
The most important drawback of this type of approach
is the need to specify the number q of groups beforehand, which is usually unknown for real networks. This
is because a straight maximisation of the likelihoods of
Eqs. (17) and (18) over the whole set of possible solutions yields the trivial partition in which each vertex is a
cluster (overfitting). In Section IV.A we have seen ways
to extract q from spectral properties of the graph. But it
would be better to have statistically principled methods,
to be consistent with the approach used to perform the
inference.
A possibility is model selection, for instance by choosing the model that best compresses the data (Grünwald

33

The authors were inspired by modularity maximisation, which
gives far better results when the null model consists of rewiring
edges by preserving the degree sequence of the network (on average), than by preserving only the total number of edges.

et al., 2005; Rissanen, 1978). The extent of the compression can be estimated via the total amount of information
necessary to describe the data, which includes not only
the fitted model, but also the information necessary to
describe the model itself, which is a growing function of
the number of blocks q (Peixoto, 2013). This quantity,
that we indicate with Σ, is called the description length.
Minimising the description length naturally avoids overfitting. Partitions with large q are associated to “heavy”
models in terms of their information content, and do not
represent the best compression. On the other hand, partitions with low q have high information content, even if
the model itself is not loaded with parameters. Hence the
minimum description length corresponds to a non-trivial
number of groups and it makes sense to minimise Σ to
infer the block structure of the graph.
It turns out that this approach has a limited resolution
on the standard SBM: the maximum
number of blocks
√
that can be resolved scales as n for a fixed average degree hki, where n is the number of vertices of the network.
This means
√ that the minimum size of detectable blocks
scales as n, just as it happens for modularity maximisation (Section IV.F). A more refined method of model
selection, consisting in a nested hierarchy of stochastic
block models, where an upper level of the hierarchy serves
as prior information to a lower level, brings the resolution limit down to log n, enabling the detection of much
smaller blocks (Peixoto, 2014).
Other techniques to extract the number of groups have
been proposed (Côme and Latouche, 2015; Daudin et al.,
2008; Handcock et al., 2007; Latouche et al., 2012; Newman and Reinert, 2016).
F. Methods based on optimisation

Optimisation techniques have received the greatest attention in the literature. The goal is finding an extremum, usually the maximum, of a function indicating
the quality of a clustering, over the space of all possible
clusterings. Quality functions can express the goodness
of a partition or of single clusters.
The most popular quality function is the modularity
by Newman and Girvan (Newman and Girvan, 2004).
It estimates the quality of a partition of the network in
communities. The general expression of modularity is
1 X
Q=
(Aij − Pij ) δ(Ci , Cj ),
(19)
2m ij
where m is the number of edges of the network, the sum
runs over all pairs of vertices i and j, Aij is the element
of the adjacency matrix, Pij is the null model term and
in the Kronecker delta at the end Ci and Cj indicate
the communities of i and j. The term Pij indicates the
average adjacency matrix of an ensemble of networks, derived by randomising the original graph, such to preserve
some of its features. Therefore, modularity measures how
different the original graph is from such randomisations.

28
The concept was inspired by the idea that by randomising the network structure communities are destroyed, so
the comparison between the actual structure and its randomisation reveals how non-random the group structure
is. A standard choice is Pij = ki kj /2m, ki and kj being
the degrees of i and j, and corresponds to the expected
number of edges joining vertices i and j if the edges of
the network were rewired such to preserve the degree of
all vertices, on average. This yields the classic form of
modularity


1 X
ki kj
Q=
Aij −
δ(Ci , Cj ).
(20)
2m ij
2m
Other choices of the null model term allow us to incorporate specific features of network structure, like bipartiteness (Barber, 2007), correlations (MacMahon and
Garlaschelli, 2015), signed edges (Traag and Bruggeman,
2009), space embeddedness (Expert et al., 2011), etc..
The extension of Eq. (20) and of its variants to the case
of weighted networks is straightforward (Newman, 2004).
For simplicity we focus on unweighted graphs here, but
the issues we discuss are general.
Because of the delta, the only contributions to the sum
come from vertex pairs belonging to the same cluster, so
we can group these contributions together and rewrite the
sum over the vertex pairs as a sum over the clusters34
Q=

X h lC
C

m


−

kC
2m

2 i
.

(21)

Here lC the total number of edges joining vertices of community C and kC the sum of the degrees of the vertices
of C (Section II.A). The first term of each summand in
Eq. (21) is the fraction of edges of the graph falling within
community C, whereas the second term is the expected
fraction of edges that would fall inside C if the graph
were taken from the ensemble of random graphs preserving the degree of each vertex of the original network, on
average. The difference in the summand would then indicate how “non-random” subgraph C is. The larger the
difference the more confident we can be that the placement of edges within C is not random (Fig. 24). Large
values of Q are then supposed to indicate partitions with
high quality.
Modularity maximisation is NP-hard (Brandes et al.,
2008). Therefore one can realistically hope to find only
decent approximations of the modularity maximum and
a wide variety of approaches has been proposed. Due
to its simplicity, the prestige of its inventors and early
results on the benchmark of Girvan and Newman (Section III.A) and on small real benchmark networks, like
Zachary karate club network (Fig. 15), modularity has

34

A partition quality function that can be formulated as a sum
over the clusters is called additive.

FIG. 24 Modularity by Newman and Girvan. The network
on the left has a visible community structure, with two clusters, whose vertices are highlighted in blue and red, respectively. Modularity measures how different the clusters of the
partition are from the corresponding clusters of the ensemble
of random graphs obtained by randomly joining the vertices,
such to preserve their degrees, on average. The picture on
the right shows the result of one such randomisation. The
internal edges are coloured in blue and red. They are just a
handful compared to the number of edges joining the same
groups of vertices in the original network (blue and red lines
in the left picture), while there are now many more edges
running between the subgraphs (black lines): the randomisation has destroyed the community structure of the graph, as
expected. The value of modularity for the bipartition on the
left is expected to be large.

become the best known and most studied object in network clustering. In fact, soon after its introduction, it
seemed to represent the essence of the problem, and the
key to its solution.
However, it became quickly clear that the measure is
not as good as it looks. For one thing, there are highmodularity partitions even in random graphs without
groups (Guimerà et al., 2004). This seems counterintuitive, given that modularity has been designed to capture
the difference between random and non-random structure. Modularity is a sort of distance between the actual
network and an average over random networks, ignoring altogether the distribution of the relevant community variables, like the fractions of edges within the clusters, over all realisations generated by the configuration
model. If the distribution is not strongly peaked, the values of the community variables measured on the original
graph may be found in a large number of randomised networks, even though the averages look far away from them.
In other words, we should pay more attention to the significance of the maximum modularity value Qmax , than
to the value itself. How can we estimate the significance
of Qmax ? A natural way is maximising Q over all partitions of every randomised graph. One then computes
rand
the average hQrand i and the standard deviation σQ
of
the resulting values. The statistical significance of Qmax

29
is indicated by the distance of Qmax from the null model
rand
average hQrand i in units of the standard deviation σQ
,
i. e., by the z-score
z=

Qmax − hQrand i
.
rand
σQ

(22)

If z  1, Qmax indicates strong community structure.
This approach has problems, though. The main drawback is that the distribution of Qrand over the ensemble of null model random graphs, though peaked, is not
Gaussian. Therefore, one cannot attribute to the values
of the z-score the significance corresponding to a Gaussian distribution, and one ought to compute the statistical significance for the correct distribution. Also, the
z-score depends on the network size, so the same values
may indicate different levels of significance for networks
differing considerably in size.
Next, it is not true that the modularity maximum always corresponds to the most pronounced community
structure of a network. In Fig. 25 we show the wellknown example of the ring of cliques (Fortunato and
Barthélemy, 2007). The network consists of 16 cliques
with four vertices each. Every clique has two neighbouring cliques, connected to it via a single edge. Intuition suggests that the graph has a natural community structure, with 16 communities, each corresponding
to one clique. Indeed, the Q-value of this partition is
Q1 = 89/112 ≈ 0.79464..., pretty close to 1, which is the
upper bound of modularity. However, there are partitions with larger values, like the partition in 8 clusters
indicated by the dashed contours, whose modularity is
Q2 = 90/112 ≈ 0.80357 > Q1 .
This is due to the fact that Q has a preferential scale
for the communities, deriving from the underlying null
model and revealed by its explicit dependence on the
number of edges m of the network [Eq. (21)]. According to the configuration model, the expected number lAB
of edges running between two subgraphs A and B with
total degree kA and kB , respectively, is approximately
kA√
kB /2m. Consequently, if kA and kB are of the order
of m or smaller, lAB could become smaller than 1. This
means that in many randomisations of the original graph
G, subgraphs A and B are disconnected and even a single
edge joining them in G signals a non-random association.
In these cases, modularity is larger when A and B are put
together than when they are treated as distinct communities, as in the example of Fig. 25. The modularity scale
depends only on the number of edges m, and it may have
nothing to do with the size of the actual communities of
the network. The resolution limit questions the usefulness of modularity in practical applications (Fortunato
and Barthélemy, 2007).
Many attempts have been made to mitigate the consequences of this disturbing feature. One approach consists
in introducing a resolution parameter γ into modularity’s
formula (Arenas et al., 2008b; Reichardt and Bornholdt,
2006). By tuning γ it is possible to arbitrarily vary the
resolution scale of the method, going from very large to

FIG. 25 Resolution limit of modularity optimisation. The
network in the figure is made of cliques of four vertices, arranged such to form a ring-like structure, with each clique
joined to two other cliques by a single edge. The naı̈ve expectation is that modularity would reach its maximum for
the partition whose communities are the cliques, which appears to be the natural partition of the network. However,
it turns out that there are partitions with higher modularity,
whose clusters are combinations of cliques, like the partition
indicated by the dashed contours.

very small communities. We shall discuss such multiresolution approaches in Section IV.G. Here we emphasize that multi-resolution versions of modularity do not
provide a reliable solution to the problem. This is because modularity maximisation has an additional bias:
large subgraphs are usually split in smaller pieces (Lancichinetti and Fortunato, 2011). This problem has the
same source as the resolution limit, namely the choice of
the null model. Since modularity has a preferential scale
for the communities, when a subgraph is too large it is
convenient to break it down, to increase the modularity of
the partition. So, when there is no characteristic scale for
the communities, like when there is a broad cluster size
distribution, large communities are likely to be broken,
and small communities are likely to be merged. Since
multi-resolution versions of modularity can only shift the
resolution scale of the measure back and forth, they are
unable to correct both effects at the same time35 (Lan-

35

More promising results can be obtained with hierarchical multilevel methods, in which multi-resolution modularity is applied
iteratively on every cluster with independent resolution param-

30
cichinetti and Fortunato, 2011). In addition, tuning the
resolution parameter in the search for good partitions
is usually computationally very demanding, as in many
cases the optimisation procedure has to be repeated over
and over for all γ-values one desires to investigate.
We stress that the resolution limit is a feature of modularity itself, not of the specific way adopted to maximise
it. Therefore, there is no magic heuristic that can circumvent this issue. The Louvain method (Blondel et al.,
2008) has been held as one such magic heuristic. The
method performs a greedy optimisation of Q in a hierarchical manner, by assigning each vertex to the community
of their neighbours yielding the largest Q, and creating
a smaller weighted super-network whose vertices are the
clusters found previously. Partitions found on this supernetwork hence consist of clusters including the ones found
earlier, and represent a higher hierarchical level of clustering. The procedure is repeated until one reaches the
level with largest modularity. In the comparative analysis of clustering algorithms performed by Lancichinetti
and Fortunato on the LFR benchmark (Lancichinetti and
Fortunato, 2009), the Louvain algorithm was the second best-performing method, after Infomap (Rosvall and
Bergstrom, 2008). This has given the impression that
the peculiar strategy of the method solves the resolution
problems above, which is not true. The reason why the
performance is so good is that Lancichinetti and Fortunato adopted the lowest partition of the hierarchy, the
one with the smallest clusters (Lancichinetti and Fortunato, 2014). By using the partition with highest modularity performance degrades considerably (Fig. 26), as expected. As suggested by the developers of the algorithm
themselves, using the lowest level helps avoiding unnatural community mergers; as an example, they showed that
the natural partition of the ring of cliques (Fig. 25) can
be recovered this way. However, the bottom level has
lower modularity than the top level, so we face a sort of
contradiction, in that users are encouraged to use suboptimal partitions, even though one assumes that the best
clustering corresponds to the highest value of the quality function, which is what the method is supposed to
find. There is no guarantee that the bottom level yields
the most meaningful solution. On the other hand, users
have the option of choosing among a few partitions and
a slightly higher chance to find what they search for.
Moreover, the modularity landscape is characterised
by a larger than exponential36 number of distinct partitions, whose modularity values are very close to the
global maximum (Good et al., 2010). This explains why
many heuristic methods of modularity maximisation are
able to come very close to the global maximum of Q,

36

eters, so that a coexistence of very diverse scales is permitted (Granell et al., 2012). Such approaches, however, deviate
from the original idea of modularity maximisation, which is based
on a global null model valid for the network as a whole.
Exponential in the number n of graph vertices.

N=1000, S

N=1000, B

N=5000, S

N=5000, B

FIG. 26 Performance of the Louvain method. The panels
indicate the accuracy of the algorithm to detect the planted
partition of the LFR benchmark as a function of the mixing
parameter, for different choices of the network size (1000 and
5000 vertices) and of the range of community sizes (label S indicates that communities have between 10 and 50 vertices, label B that they have between 20 and 100 vertices). Accuracy
is calculated via the normalised mutual information (NMI),
in the version by Lancichinetti, Fortunato and Kertész (Lancichinetti et al., 2009). Results are heavily depending on the
hierarchical level one chooses at the end of the procedure.
When one picks the top level (diamonds), which is the one
with largest modularity, the accuracy is poor, as expected,
especially when communities are smaller. When one goes for
the bottom level (squares), which has lower modularity and
smaller clusters than the top level partition, there is a far better agreement with the planted partition and the performance
gets closer to that of Infomap (circles). The squares follow the
performance curves used in the comparative analysis by Lancichinetti and Fortunato (Lancichinetti and Fortunato, 2009).
Courtesy from Andrea Lancichinetti.

but it also implies that the global maximum is basically
impossible to find. In addition, high-Q partitions are
not necessarily similar to each other, despite the proximity of their modularity scores. The optimal structural
partition, which may not correspond to the modularity
maximum due to problems exposed above, may however
have a large Q-value. Therefore the optimal partition is
basically indistinguishable from a huge number of highmodularity partitions, which are in general structurally
dissimilar from it. The large structural diversity of highmodularity partitions implies that one cannot rely on any
of them, at least in principle. Reliable solutions could
be singled out when the domain user imposes some constraints on the clustering of the system, or when she
expects it to have specific features. In the absence of
additional information or expectations, consensus clustering could be used to derive more robust partitions.
Indeed, it has been shown that the consensus of many
high-modularity partitions, combined with a hierarchical

31
approach, could help to solve resolution problems and
to avoid to find communities in random graphs without
groups (Zhang and Moore, 2014).
As of today, modularity optimisation is still the most
used clustering technique in applications. This may appear odd, given the serious issues of the method and the
fact that nowadays more powerful techniques are available, like a posteriori stochastic block modelling (Section IV.J). Indeed Newman has proven that optimising
modularity is equivalent to maximising the likelihood
that the planted partition model reproduces the network (Newman, 2016). But the planted partition model
is a very specific case of the general stochastic block
model, in that the intra-group edge probabilities are all
equal to the same value pin and the inter-group edge
probabilities are all equal to the same value pout . There
is no reason to limit the inference to this specific case,
when one could use the full model.
Optimising partition quality functions may lead to resolution problems, just like it happens for modularity37 .
Instead, one could try to optimise cluster quality functions. One starts with some function q(C) expressing how
“community-like” a subgraph is and with a seed vertex
s. The goal is to build a cluster Cs including s such that
q(Cs ) is maximum38 . This is usually done by exploring
the neighbours of the temporary subgraph Cs , starting
from the neighbours of s when Cs includes only s. The
neighbouring vertex whose inclusion yields the largest increase of q is added to the subgraph. When a new vertex
is included, the structure of the subgraph is altered and
the other vertices can be examined again, as it might be
advantageous to knock some of them out. The process
stops when the quality q(Cs ) cannot be increased anymore via the inclusion or the exclusion of vertices.
The optimisation of cluster quality functions offers a
number of advantages over the optimisation of partition
quality functions. First, it is consistent with the idea
that communities are local structures, which are sensitive to what happens in their neighbourhood, but are
fairly unaffected by the rest of the network: the structure of a social circle in Europe is hardly influenced by
the dynamics of social circles in Australia, though they
are parts of the same global social network of humans.
Consequently, if a network undergoes structural changes
in one region, community structure is altered and is to

37

38

Traag and Van Dooren have shown that one can design additive quality functions such that the best partition of the network induces the optimal partition for any subgraph S, i. e.,
the partition found when the detection is performed only on
S (Traag et al., 2011). This is possible when the coefficients
of the summand corresponding to each community does not depend on global properties of the graph. Even those functions
have their own preferential community scale, though.
For some functions the optimum corresponds to their minimum,
not the maximum. This occurs when they are related to variables
that are supposed to be small when communities are good, like
the density of edges between clusters.

be recovered only in that region, while the clustering of
the rest of the network remains the same. By optimising
partition quality functions, instead, any little change may
have an effect on every community of the graph. Second,
since cluster quality functions do not embody any global
scale, severe resolution problems are usually avoided39 .
Moreover, one can investigate only parts of the network,
which is particularly valuable when the graph is large
and a global analysis would be out of reach, computationally. The local exploration of the graph allows to
reach vertices already assigned to clusters, so overlaps
can be naturally detected. In the last years several algorithms based on the optimisation of cluster quality functions have been designed (Baumes et al., 2005; Clauset,
2005; Huang et al., 2011; Lancichinetti et al., 2009, 2011).
G. Methods based on dynamics

Communities can also be identified by running dynamical processes on the network, like diffusion (Jeub et al.,
2015; Pons and Latapy, 2005; Rosvall and Bergstrom,
2008; Van Dongen, 2000; Zhou, 2003a,b; Zhou and
Lipowsky, 2004), spin dynamics (Raghavan et al., 2007;
Reichardt and Bornholdt, 2006; Ronhovde and Nussinov,
2010; Traag et al., 2011), synchronisation (Arenas et al.,
2006; Boccaletti et al., 2007), etc.. In this section we
focus on diffusion and spin dynamics, that inform most
approaches.
Random walk dynamics is by far the most exploited in
community detection. If communities have high internal
edge density and are well-separated from each other, random walkers would be trapped in each cluster for quite
some time, before finding a way out and migrating to
another cluster. We briefly discuss two broad classes
of algorithms: methods based on vertex similarity and
methods based on the map equation.
The first class of techniques consists in using random
walk dynamics to estimate the similarity between pairs
of vertices. For instance, in the popular method Walktrap the similarity between vertices i and j is given by
the probability that a random walker moves from i to j
in a fixed number of steps t (Pons and Latapy, 2005).
The parameter t has to be large enough, to allow for the
exploration of a significant portion of the graph, but not
too big, as otherwise one would approach the stationary
limit in which transition probabilities trivially depend on
the degrees of the vertices. If there is a pronounced community structure, pairs of vertices in the same cluster are
much more easily reachable by a random walk than pairs
of vertices in different clusters, so the vertex similarity
is expected to be considerably higher within groups than
between groups40 . In that case, clusters can be readily

39
40

If one defines the quality of the cluster with respect to the rest
of the network, global scales may still slip into the function.
A related method is the Markov Cluster Algorithm

32

A

B

C

01011

D

001

0110

01
110

0011
10111

1100

1111100

0000

011

11

11011

10000

101

100

0100

0

111
100

11010
1010

111111
111

0111
000

10001

010

10110
1100

00010
10

1110
0010

1111101

10
00

10101
1101

01010

011

0000
10100

111

00

1001

0010
11110

00011

1111100 1100 0110 11011 10000 11011 0110 0011 10111 1001 0011
1001 0100 0111 10001 1110 0111 10001 0111 1110 0000 1110 10001
0111 1110 0111 1110 1111101 1110 0000 10100 0000 1110 10001 0111
0100 10110 11010 10111 1001 0100 1001 10111 1001 0100 1001 0100
0011 0100 0011 0110 11011 0110 0011 0100 1001 10111 0011 0100
0111 10001 1110 10001 0111 0100 10110 111111 10110 10101 11110
00011

11

010

10

111

0001

0

10

0001

110

1011

110

011

000

111 0000 11 01 101 100 101 01 0001 0 110 011 00 110 00 111 1011 10
111 000 10 111 000 111 10 011 10 000 111 10 111 10 0010 10 011 010
011 10 000 111 0001 0 111 010 100 011 00 111 00 011 00 111 00 111
110 111 110 1011 111 01 101 01 0001 0 110 111 00 011 110 111 1011
10 111 000 10 000 111 0001 0 111 010 1010 010 1011 110 00 10 011

111 0000 11 01 101 100 101 01 0001 0 110 011 00 110 00 111 1011 10
111 000 10 111 000 111 10 011 10 000 111 10 111 10 0010 10 011 010
011 10 000 111 0001 0 111 010 100 011 00 111 00 011 00 111 00 111
110 111 110 1011 111 01 101 01 0001 0 110 111 00 011 110 111 1011
10 111 000 10 000 111 0001 0 111 010 1010 010 1011 110 00 10 011

FIG. 27 Infomap. The random walk in (A) can be described as a sequence of the vertices, each labeled with unique codewords
(B), or by dividing the graph in regions and using unique codewords only for the vertices of the same region (C). This way the
same codeword can be used for multiple vertices, at the cost of indicating when the random walker leaves a region to enter a
new one, as in that case one has to specify the codeword of the new region, to unambiguously locate the walker. The network
has four communities [indicated by the colours in (C)], and in this case the map-like description of (C) is more parsimonious
than the one in (B). This is shown by looking at the actual code needed in either case (bottom of the figures), which is clearly
shorter for (C). In (D) the transitions between the clusters are highlighted. Reprinted figure with permission from (Rosvall and
Bergstrom, 2008). c 2008, by the National Academy of Sciences, USA.

identified via standard hierarchical or partitional clustering techniques (Jain et al., 1999; Xu and Wunsch, 2008).
This class of methods have a high computational complexity, higher than quadratic in the number n of vertices (on sparse graphs), so they cannot be used on large
networks. Besides, they are often parameter-dependent.
The map equation stems from a seminal paper by Rosvall and Bergstrom (Rosvall and Bergstrom, 2008), who
asked what is the most parsimonious way to describe an
infinitely long random walk taking place on the graph.
The information content of any description is given by
the total number of bits required to indicate the various stages of the process. The simplest description is
obtained by listing sequentially all vertices reached by
the random walker, each vertex being described by a
unique codeword. However, if the network has a community structure, there may be a more compact description, which follows the principle of geographic maps,

(MCL) (Van Dongen, 2000), which consists of iterating
two operations: raising to a power the transfer matrix T of
the graph, whose element Tij equals the probability that a
random walker, sitting at j, moves to i; raising the elements
of the resulting matrix to a power, such that the larger values
are enhanced with respect to the smaller ones, many of which
are set to zero to lighten the calculations, while the remaining
ones are normalised by dividing them by the sum of elements
of their column, yielding a new transfer matrix. The process
eventually reaches a stationary state, corresponding to the
matrix of a disconnected graph, whose connected components
are the sought clusters.

where there are multiple cities and streets with the same
name across regions. Vertex codewords could be recycled among different communities, which play the role of
regions/states, and vertices with identical name are distinguished by specifying the community they belong to.
If clusters are well separated from each other, transitions
between clusters are infrequent, so it is advantageous to
use the map, with the communities as regions, because
in the description of the random walk the codewords of
the clusters will not be repeated many times, while there
is a considerable saving in the description due to the limited length of the codewords used to denote the vertices
(Fig. 27). The map equation yields the description length
of an infinite random walk consists of two terms, expressing the Shannon entropy of the walk within and between
clusters. The best partition is the one yielding the minimum description length.
This method, called Infomap, can be applied to
weighted networks, both undirected and directed. In the
latter case, random walk dynamics is modified by introducing a teleportation probability, as in the PageRank
process (Brin and Page, 1998), to ensure that a nontrivial stationary state is reached. It has been successively extended to the detection of hierarchical community structure (Rosvall and Bergstrom, 2011) and of overlapping clusters (Viamontes Esquivel and Rosvall, 2011).
In classic random walks the probability of reaching a vertex only depends on where the walker stands, not on
where it is coming from. The map equation has also
been extended to random walks whose transition probabilities depend on earlier steps too (higher-order Markov

33
dynamics) (Persson et al., 2016; Rosvall et al., 2014), retaining memory of the (recent) past. Applications show
that in this way one can recover overlapping communities more easily than by using standard first-order random walk dynamics, especially pervasive overlaps, which
are usually out of reach for most clustering algorithms
(Section III.E).
Infomap and its variants usually return different partitions than structure-based methods (e. g., modularity
optimisation). This is because they are based on flows
running across the system, as opposed to structural variables like number of edges, vertex degrees, etc.. The difference is particularly striking on directed graphs (Rosvall and Bergstrom, 2008), where edge directions heavily
constrain the possible flows. Structural features obviously play a major role on the dynamics of processes
running on graphs, but dynamics cannot be generally reduced to an interplay of structural elements, at least not
simple ones like, e. g., vertex degrees. Sometimes structural and dynamic approaches are equivalent, though.
For instance, Newman-Girvan’s modularity is a special
case of a general quality function, called stability, expressing the persistence of a random walk within communities (Delvenne et al., 2010; Lambiotte et al., 2008).
The methods we have discussed so far are global, in
that they aim at finding the whole community structure
of the system. However, random walks along with other
dynamical processes can be used as well to explore the
network locally, starting from seed vertices (Jeub et al.,
2015). Good communities correspond to bottlenecks of
the dynamics and depend on the choice of the seed vertices, the time scale of the dynamics, etc.. Such local
perspective enables to identify community overlaps in a
natural way, due to the possibility of reaching vertices
multiple times, even if they are already classified.
Spin dynamics (Baxter, 2007) are also regularly used
in network clustering. The first step is to define a spin
model on the network, consisting of a set of spin variables
{si , i = 1, 2, . . . , n}, assigned to the vertices and a Hamiltonian H({s}), expressing the energy of the spin configuration {s}. For community detection, spins are usually
integers: s = 1, 2, . . . , q. Contributions to the energy are
usually given by spin-spin interactions. The coupling of a
spin-spin interaction can be ferromagnetic (negative) or
antiferromagnetic (positive), if the energy is lower when
the spins are equal or not, respectively. The goal is to find
those spin configurations that minimise the Hamiltonian
H({s}). If couplings are all ferromagnetic, the minimum
energy would be trivially obtained for the configurations
where all vertices have identical spin values. Instead,
one would like to have identical spins for vertices of the
same cluster, and different spins for vertices in different
clusters, to identify the community structure. Therefore, Hamiltonians feature both ferromagnetic and antiferromagnetic interactions [spin glass dynamics (Mezard
et al., 1987)]. A popular model consists in rewarding
edges between vertices in the same cluster, as well as
non-edges between vertices in different clusters, and pe-

nalising edges between vertices of different clusters, along
with non-edges between vertices in the same cluster. This
way, if the edge density within communities is appreciably larger than the edge density between communities,
as it often happens, having equal spin values for vertices in the same cluster would considerably lower the
energy of the configuration. On the other hand, to bring
the energy further down the spins of vertices in different
clusters should be different, as many such vertices would
be disjoint from each other, and such non-edges would
increase the energy of the system if the corresponding
spin variables were equal. A general expression for the
Hamiltonian along these lines is (Reichardt and Bornholdt, 2006)
X
H({s}) = −
[aij Aij − bij (1 − Aij )]δ(si , sj ) , (23)
ij

where Aij is the element of the adjacency matrix,
aij , bij ≥ 0 are arbitrary coefficients, and the Kronecker
delta selects only the pairs of vertices with the same spin
value.
A popular model is obtained by setting aij = 1 − bij
and bij = γPij , where γ is a tunable parameter and Pij
a null model term, expressing the expected number of
edges running between vertices i and j under a suitable randomisation of the graph structure. The resulting
Hamiltonian is (Reichardt and Bornholdt, 2006)
X
HRB ({s}) = −
(Aij − γPij )δ(si , sj ) .
(24)
ij

If γ = 1 and Pij = ki kj /2m, ki (kj ) being the degree
of i (j) and m the total number of graph edges, the
Hamiltonian of Eq. (24) coincides with the modularity
by Newman and Girvan [Eq. (20)], up to an irrelevant
multiplicative constant. Consequently, modularity can
be interpreted as the Hamiltonian of a spin glass as well.
By setting aij = 1 and bij = γ we obtain the Absolute Potts Model (APM) (Ronhovde and Nussinov, 2010),
whose Hamiltonian reads
X
HAP M ({s}) = −
[Aij − γ(1 − Aij )]δ(si , sj ) . (25)
ij

Here, there is no null model term. The models of
Eqs. (24) and (25) can be trivially extended to weighted
graphs (Traag et al., 2011). They allow to explore the
network at different resolutions, by suitably tuning the
parameter γ. However, there usually is no information
about the community sizes, so it is not possible to decide a priori the proper value(s) of γ for a specific graph.
A common heuristic is to estimate the stability of partitions as a function of γ. It is plausible that partitions
recovered for a given γ-value will not be disrupted if γ
is varied a little. So, the whole range of γ can be split
into intervals, each interval corresponding to the most
frequent partition detected in it. Good candidates for
the unknown community structure of the system could

34
be the partitions found in the widest intervals of γ, as
they are likely to be more stable (or robust) than the
other partitions41 . However, the results of the algorithm
do not usually have a linear relationship with γ, hence
the width of the intervals is not necessarily correlated
with stability, as intervals of the same width but centred
at different values of γ may have rather different importance.
A good operational definition of stability is based on
the stochastic character of optimisation methods, which
typically deliver different results for the same system and
set of parameters, by changing initial conditions and/or
random seeds. If a partition is robust in a given range
of γ-values, most partitions delivered by the algorithm
will be very similar. On the other hand, if one explores
a γ-region in between two strong partitions, the algorithm will deliver the one or the other partition and the
individual replicas will be, on average, not so similar to
each other. So, by calculating the similarity S(γ) of partitions found by the method at a given resolution parameter γ (for different choices of initial conditions and
random seeds), stable communities are revealed by peaks
of S(γ) (Ronhovde and Nussinov, 2009). Since clustering in large graphs can be very noisy, peaks may not be
well resolved. Noise can be reduced by working with consensus partitions of the individual partitions returned by
the method for a given γ (Section IV.B). These manipulations are computationally costly, though. Besides, multiresolution techniques may miss relevant cluster sizes, as
it happens for multi-resolution modularity (Lancichinetti
et al., 2011) (Section IV.F).

H. Dynamic clustering

Due to the increasing availability of time-stamped network data, there is currently a lot of activity on the development of methods to analyse temporal networks (Holme
and Saramäki, 2012). In particular, the problem of detecting dynamic communities has received a lot of attention (Fortunato, 2010; Spiliopoulou, 2011).
Clustering algorithms used for static graphs can be
(and often are) used for dynamic networks as well. What
needs to be established is how to handle the evolution.
Typically one can describe it as a succession of snapshots
G1 , G2 , . . . , Gl , where each snapshot Gt corresponds to
the configuration of the graph in a given time window42 .
There are two possible strategies.

41

42

We stress, however, that the persistence of partitions in intervals
is not necessarily related to clustering robustness (Lewis et al.,
2010; Onnela et al., 2012).
A graph configuration consists of the sets of vertices and edges
that are active within the given time frame, along with the intensity of their interactions in that frame (weights) and possibly
other aspects of the dynamics, like burstiness (Barabási, 2010),
duration of the interactions, etc..

The simplest approach is to detect the community
structure for each individual snapshot, which is a static
graph (Asur et al., 2007; Hopcroft et al., 2004; Palla et al.,
2007). Next, pairs of communities of consecutive windows are associated. A standard procedure is finding
the cluster Cti in window t that is most similar to clusj
ter Ct+1
in window t + 1, for instance by using Jaccard
similarity score [Eq. (6)] (Palla et al., 2007). This way
every community has an image in each phase of the network evolution and one can track its dynamics. Various
scenarios are possible. Communities may disappear at
some point and new communities may appear, following
the exclusion or the introduction of vertices and edges,
respectively. Furthermore, a cluster may fragment into
smaller ones or merge with others. However, since snapshots are handled separately, this strategy often produces
significant variations between partitions close in time, especially when the data sets are noisy, as it usually happens in applications.
It would be preferable to have a unified framework,
in which communities are inferred both from the current
structure of the graph and from the knowledge of the
community structure at previous times. An interesting
implementation of this strategy is evolutionary clustering (Chakrabarti et al., 2006). The goal of the approach
is finding a partition that is both faithful to the system
configuration at snapshot t and close to the partition derived for the previous snapshot t − 1. A cost function is
introduced, whose optimisation yields a tradeoff between
such two constraints. There is ample flexibility on how
this can be done, in practice. Many known clustering
techniques normally used for static graphs can be reformulated within this evolutionary framework. Some interesting algorithms based on evolutionary clustering have
been proposed (Chi et al., 2007; Lin et al., 2008). Mucha
et al. have also presented a method that couples the
system’s configurations of different snapshots, within a
modularity-based framework (Mucha et al., 2010). In the
resulting quality function (multislice modularity), all configurations are simultaneously taken into account and the
coupling between them is expressed by a tunable parameter. The approach can handle general multilayer networks (Boccaletti et al., 2014; Kivelä et al., 2014), where
layers are either networks whose vertices are connected
by a specific edge type (e. g., friendship, work relationships, etc., in social networks), or networks whose vertices have connections (interactions, dependencies) with
the vertices of other networks/layers. On the other hand,
since the approach is based on modularity optimisation,
it has the drawbacks exposed in Section IV.F.
Consensus clustering (Section IV.B) is a natural approach to find stable dynamic clusterings by combining
multiple snapshots. Let us suppose we have a time range
going from t0 to tm , that we want to divide into w windows of size ∆t. For the sake of stability, one should consider sliding windows, i. e., overlapping time intervals.
This way consecutive partitions will be based on system
configurations sharing a lot of vertices and edges, and

35
change is (typically) smooth. In order to have exactly
w frames, each of them has to be shifted by an interval
δt = (tm − t0 )/w with respect to the previous one. So we
obtain the windows [t0 , t0 + ∆t], [t0 + δt, t0 + ∆t + δt],
[t0 + 2δt, t0 + ∆t + 2δt], ..., [tm − ∆t, tm ]. The community
structure of each snapshot can be found via any reliable
static clustering technique. Next, the consensus partition
from the clusterings of r consecutive snapshots, with r
suitably chosen, is derived (Lancichinetti and Fortunato,
2012). Again, one could consider sliding windows: for instance, the first window could consist of the first r snapshots, the second one by those from 2 to r + 1, and so
on until the interval spanned by the last r snapshots. In
Fig. 28 we show an application of this procedure on the
citation network of papers published in journals of the
American Physical Society (APS).

Self
Organized
Criticality
Sandpiles

Percolation
Random
Fractal
Elastic

Granular Packing
Materials

Noise
Resonance
Neural Cell

Semiflexible
Polymer
Microrheology
Prisoner's Dilemma
Noise Resonance Induced Excitable Network Coherence System Cell Stochastic Oscillatory Spatially Diversity Correlated Neural Med■um , year: 2002.5

Synchronization
Oscillators
Social Clustering

Small World
Scale Free

2007 - 2008

Fractal
Aggregation
Growth
Diffusion

2000 - 2005

Complex Spreading
Epidemics

1990 - 1995

Percolation
Conductivity
Critical
Threshold

1980 - 1985

1970 - 1975

Community
Modularity

Random Boolean
Cellular Automaton
Feedback Noise
Delay Stochastic
Coloring
Satisfiability Code

FIG. 28 Consensus clustering on dynamic networks. Time
evolution of clusters of the citation network of papers published in journals of the American Physical Society (APS).
The clusters with the keyword Network(s) among the top 15
most frequent words appearing in the title of the papers were
selected. Communities were detected with Infomap (Rosvall
and Bergstrom, 2008) on snapshots spanning each a window
of 5 years, except at the right end of each diagram: since
there is no data after 2008, the last windows must have 2008
as upper limit, so their size shrinks (2004 − 2008, 2005 − 2008,
2006 − 2008, 2007 − 2008). Each vertical bar represents a consensus partition combining pairs of consecutive snapshots. A
color uniquely identifies a community, the width of the links
between clusters is proportional to the number of papers they
have in common. The rapid growth of the field Complex Networks is clearly visible, as well as its later split into a number of smaller subtopics, like Community Structure, Epidemic
Spreading, Robustness, etc.. Reprinted figure with permission
from (Lancichinetti and Fortunato, 2012). c 2012, by the
Nature Publishing Group.

An alternative way to uncover the evolution of communities by accounting for the correlation between configurations of neighbouring time intervals is to use probabilistic models (Peixoto, 2015a; Peixoto and Rosvall, 2015;

Sarkar and Moore, 2005; Yang et al., 2009).
If the system is large and its structure is updated in
a stream fashion, instead of working on snapshots one
could detect the clustering online, every time the configuration of the system varies due to new information,
like the addition of a new vertex or edge (Aggarwal and
Philip, 2005; Zanghi et al., 2008). An advantage of this
approach is that change is due to the effect that the small
variation in the network structure has on the system, and
it can be tracked by simply adjusting the partition of the
previous configuration, which can be usually done rather
quickly.

I. Significance

Let us suppose that we have identified the communities, somehow. Are we done? Unfortunately, things are
not that simple.
In Fig. 29 (top left) we show the adjacency matrix
of the random graph á la Erdős-Rényi illustrated in
Fig. 8d. The graph has 5 000 vertices, so the matrix
is 5 000 × 5 000. Black dots indicate the existence of an
edge between the corresponding vertices, while missing
edges are represented in white. By construction, there
is no group structure. However, we can rearrange the
elements of the matrix, by reordering the vertex labels.
In Fig. 29 (top right) we see that, by doing that, one
can generate a group structure, of the assortative type,
with two blocks of equal size. If we increase the number
of blocks to three Fig. 29 (bottom left) and ten Fig. 29
(bottom right) we can make the matrix look more and
more modular. This is why many clustering techniques
detect communities in random networks as well, though
they should not. Where do the groups come from? Since
they cannot be real by construction, they must be generated by random fluctuations in the network construction
process. Random fluctuations are particularly relevant
on sparse graphs (Section III.C).
The lesson we learn from this example is that it is
not sufficient to identify groups in the network, but one
should also ask how significant, or non-random, they are.
Unfortunately, most clustering algorithms are not able
to assess the significance of their results. If the groups
are compatible with random fluctuations, they are not
proper groups and should be disregarded. The lower the
chance that they are generated by randomness, the more
confident we can be that the blocks reflect some actual
group structure. Naturally, this can be done only if one
has a reliable null model, describing how the structure
of the network at study can be randomised and allowing us to estimate how likely it is that the candidate
group structure is generated this way. The configuration model (Bollobás, 1980; Molloy and Reed, 1995) is
a popular null model in the literature. It generates all
possible configurations preserving the number of vertices
and edges of the network at study, and the degrees of its
vertices. One may compute some variables of the origi-

36

4000

4000

3000

3000
Vertex

5000

Vertex

5000

2000

2000

1000

1000

0

0

1000

2000
3000
Vertex

4000

0

5000

4000

4000

3000

3000

1000

2000
3000
Vertex

4000

5000

0

1000

2000
3000
Vertex

4000

5000

Vertex

5000

Vertex

5000

0

2000

2000

1000

1000

0

0

1000

2000
3000
Vertex

4000

5000

0

FIG. 29 Artificial groups in random networks. The elements of the adjacency matrix of an Erdős-Rényi random graph (top left)
can be rearranged, by suitably reshuffling the list of vertices. This procedure may produce a block structure, with the blocks
becoming increasingly more visible the smaller their size. Such groups are not real, though, but they are generated by random
fluctuations in the edge patterns among the vertices. The construction principle of the network does not give any special role
to groups of vertices, since all pairs of vertices have identical probability to be joined. Courtesy by Tiago P. Peixoto.

37
nal network, and estimate the probability that the model
reproduces them, or p-value, i. e., the fraction of model
configurations yielding values of the variables compatible with those measured on the original graph. If the
p-value is sufficiently low (5% is a standard threshold),
one concludes that the property at study cannot be generated by randomness only. For community structure,
one can compute various properties of the clusters, e. g.,
their internal density, and compare them with the model
values. Some clustering algorithms, like OSLOM (Lancichinetti et al., 2011) are based on this principle. Along
the same lines, z-scores can be used as well [see the example of Eq. (22)]. Degree-corrected stochastic block
models (Karrer and Newman, 2011) (Section IV.E) also
include the configuration model, which corresponds to
the case without group structure43 . In this case significance can be estimated by doing model selection between
the versions with and without blocks (Section IV.E).
A concept very related to significance is that of robustness. If clusters are significant it means that they are
resilient if the network structure is perturbed, to some
extent. One way to quantitatively assess this is introducing into the system a controlled amount of noise, and
checking how much noise it takes to disrupt the group
structure. The greater the required perturbation, the
more robust the communities. For instance, a perturbation could be rewiring a fraction of randomly chosen
edges (Karrer et al., 2008). After the network is perturbed, the community structure is derived and compared to the one of the original network44 . The trend
of the partition similarity shows how the group structure
responds to perturbations.
A similar approach consists in sampling network configurations from a population which the original network
is supposed to belong to (bootstrapping), and comparing
the clusterings found in those configurations, to check
how frequently subsets of vertices are clustered together
in different samples, which is an index of the robustness
(significance) of their clusters (Rosvall and Bergstrom,
2010).

J. Which method then?

At the end of the day, what most people want to know
is: which method shall I use on my data? Since the clustering problem is ill-defined, there is no clear-cut answer
to it.
Popular techniques are based on similar ideas of com-

43

44

Actually it would be a variant of the configuration model, as the
degree sequence of the vertices would be preserved only on average, not exactly. It is the same null model used in the standard
formulation of modularity (Section IV.F).
For reliable results multiple configurations have to be generated,
for a given amount of noise, and the similarity scores have to be
averaged.

munities, like the ones we reviewed in Sections II.B and
II.C. What makes the difference is the way clusters are
sought. The specific procedure affects the reliability of
the results (e. g., because of resolution problems) and
the time complexity of the calculation, determining the
scope of the method and constraining its applicability.
Most methods propose a universal recipe, that is supposed to hold on every data set. In so doing, one neglects
the peculiarities of the network at study, which is valuable information that could orient the method towards
more reliable solutions. But algorithms are usually not
so flexible to account for specific network features. For
instance, in some cases, there is no straightforward extension capable to handle high-level features like edge
direction or overlapping communities45 .
Validation of algorithms, like the comparative analysis
of (Lancichinetti and Fortunato, 2009), have allowed to
identify a set of methods that perform well on artificial
benchmarks. There are two important issues, though.
First, we do not know how well real networks are described by currently used benchmark models. Therefore, there is no guarantee that methods performing well
on benchmarks also give reliable results on real data
sets. Structural analyses like the ones discussed in Section III.E might allow to identify more promising benchmark models. Second, if we rely so much on current
benchmarks, which are versions of the stochastic block
model (SBM), we already know what the best method
is: a posteriori block modelling, i. e., fitting a SBM on
the data. Indeed, there are several advantages to this
approach. It is more general, it does not only discover
communities but several types of group structures, like
disassortative groups (Fig. 8b) and core-periphery structure (Fig. 8c). It can also capture the existence of hierarchies among the clusters. Moreover, it yields much richer
results than standard clustering algorithms, as it delivers
the entire set of parameters of the most likely SBM, with
which one can construct the whole network, instead of
just grouping vertices. SBMs are very versatile as well.
They can be extended to a variety of contexts, e. g., directed networks (Peixoto, 2014), networks with weighted
edges (Aicher et al., 2014), with overlapping communities (Airoldi et al., 2008), with multiple layers (Peixoto,
2015a), with annotations (Hric et al., 2016; Newman and
Clauset, 2016). Besides, the procedure can be applied to
any network model with group structure, not necessarily SBMs. The choice between alternative models can be
done via model selection. A posteriori block modelling
is not among the fastest techniques available. Networks

45

Especially extensions of clustering algorithms to the case of
directed graphs are not straightforward and often impossible.
Spectral methods may not work because spectra of directed
graphs may be rather involved (for instance the eigenvalues of
the adjacency matrix are typically not real). Likewise, some
processes on directed graphs may not reach a stationary state,
like simple random walks.

38
Python and MatLab and are easy to find. The
variant of the NMI for covers proposed by Lancichinetti et al. (Lancichinetti et al., 2009) can
be found at https://sites.google.com/site/
andrealancichinetti/mutual, the one by Esquivel and Rosvall (Esquivel and Rosvall, 2012)
at https://bitbucket.org/dsign/gecmi/wiki/
Home.

with millions of vertices and edges could be investigated
this way, but very large networks remain out of reach.
Fortunately, many networks of interest can be attacked.
The biggest problem of this class of methods, i. e., the
determination of the number of clusters, seems to be solvable (Section IV.E). We recommend to exploit the power
of this approach in applications.
Algorithms based on the optimisation of cluster quality
functions should be considered as well (Section IV.F),
because they may avoid resolution problems and explore
the network locally, which is often the only option when
the system is too large to be studied as a whole.
Algorithms based on the optimisation of partition
quality functions, like modularity maximisation, are
plagued by the problems we discussed in Section IV.F.
Nevertheless, if one knows, or discovers, the correct number of clusters q, and the optimisation is constrained on
the subset of partitions having q clusters, such algorithms
become competitive (Darst et al., 2014; Nadakuditi and
Newman, 2012).
We also encourage to use approaches based on dynamics (Section IV.G). In principle, the resulting clustering
depends on the specific dynamics adopted. In practice,
there often is a substantial overlap between the clusters
found with different dynamics. An important question
is whether dynamics may uncover groups that are not
recoverable from network structure alone. Differences in
the clusterings found via dynamical versus structural approaches could be due to the fact that dynamical processes are sensitive to more complex structural elements
than edges (e. g., paths, motifs) (Arenas et al., 2008a;
Benson et al., 2016; Serrour et al., 2011) (Section II.C).
However, even if that were true, dynamical approaches
could be more natural ways to handle such higher-order
structures, and to make sense of the resulting community
structure.
In general, however, the final word on the reliability
of a clustering algorithm is to be given by the user, and
any output is to be taken with care. Intuition and domain knowledge are indispensable elements to support or
disregard solutions.

• Consensus clustering.
The technique proposed by Lancichinetti and Fortunato (Lancichinetti and Fortunato, 2012) to derive
consensus partitions from multiple outputs of
stochastic clustering algorithms can be downloaded from https://sites.google.com/site/
andrealancichinetti/software.
• Spectral methods. The spectral clustering method
by Krzakala et al. (Krzakala et al., 2013), based
on the non-backtracking matrix (Sections IV.A and
IV.C), can be downloaded here: http://lib.itp.
ac.cn/html/panzhang/dea/dea.tar.gz.
• Edge clustering. The code for the edge clustering technique by Ahn et al. (Ahn et al., 2010)
can be found here: http://barabasilab.neu.
edu/projects/linkcommunities/. The link to
the stochastic block model based on edge clustering
by Ball et al. (Ball et al., 2011) is provided below.
• Methods based on statistical inference.
The
code to perform the inference of the degreecorrected stochastic block model46 by Karrer and
Newman is available at http://www-personal.
umich.edu/~mejn/dcbm/. The weighted stochastic block model by Aicher et al. (Aicher et al.,
2014) can be found at http://tuvalu.santafe.
edu/~aaronc/wsbm/.
The code for the overlapping stochastic block model based on edge
clustering by Ball et al. (Ball et al., 2011)
is at http://www-personal.umich.edu/~mejn/
OverlappingLinkCommunities.zip. The model
combining structure and metadata by Newman and Clauset (Newman and Clauset, 2016)
is coded at http://www-personal.umich.edu/
~mejn/Newman_Clauset_code.zip. The program
to infer the bipartite stochastic block model by Larremore et al. (Larremore et al., 2014) can be found
at http://danlarremore.com/bipartiteSBM/.

V. SOFTWARE

In this section we provide a number of links where one
can find the code of clustering algorithms and related
techniques and models.
• Artificial benchmarks.
Code to generate
LFR benchmark graphs (Section III.A) can
be found here https://sites.google.com/
site/andrealancichinetti/files.
The
code for the dynamic benchmark by Granell
et al. (Granell et al., 2015) is available at
https://github.com/rkdarst/dynbench.

The algorithms for the inference of community
structure developed by Tiago Peixoto are implemented within the Python module graph-tool and
can be found at https://graph-tool.skewed.

46

• Partition similarity measures. Many partition similarity measures have their own function in R,

We stress that the method is parametric, in that the number of
clusters has to be provided as input. In Section IV.E we have
pointed to techniques to infer the number of clusters beforehand.

39
de/static/doc/dev/community.html. They allow us to perform model selection of various kinds of stochastic block models: degreecorrected (Karrer and Newman, 2011), with overlapping groups (Peixoto, 2015b), and for networks
with layers, with valued edges and evolving in
time (Peixoto, 2015a). The hierarchical block
model of (Peixoto, 2014), that searches for clusters
at high resolution, is also available. All such variants can be combined at ease by selecting a suitable
set of options.

2014). Infomap has also its own functions on
igraph, both in the R and in the Python package (cluster infomap and community infomap,
respectively). Walktrap (Pons and Latapy, 2005),
another popular method based on random walk dynamics, is available on igraph, via the functions
cluster walktrap (R) and community walktrap
(Python). The local community detection algorithms proposed in (Jeub et al., 2015) can be
downloaded from http://people.maths.ox.ac.
uk/jeub/code.html.

The algorithms for the inference of overlapping
communities via the Community-Affiliation Graph
Model (AGM) (Yang and Leskovec, 2012a) and the
Cluster Affiliation Model for Big Networks (BIGCLAM) (Yang and Leskovec, 2013) (Section III.E)
can be found in the package http://infolab.
stanford.edu/~crucis/code/agm-package.zip.

• Dynamic clustering. The code to optimise the multislice modularity by Mucha et al. (Mucha et al.,
2010) is available at http://netwiki.amath.unc.
edu/GenLouvain/GenLouvain. Detection of dynamic communities can be performed as well
with consensus clustering (Section IV.B) and via
stochastic block models (Section IV.E). Links to
the related code have been provided above.

• Methods based on optimisation. There is a lot
of free software for modularity optimisation. In
the igraph library (http://igraph.org) there
are several functions, both in the R and in the
Python package: cluster fast greedy (R) and
community fastgreedy (Python), implementing the fast greedy optimisation by Clauset et
al. (Clauset et al., 2004); cluster leading eigen
(R)
and
community leading eigenvector
(Python) for the optimisation based on the
leading eigenvector of the modularity matrix (Newman, 2006); cluster louvain (R)
and community multilevel (Python) are the
implementations of the Louvain method (Blondel et al., 2008); cluster optimal (R) and
community optimal modularity turn the task
into an integer programming problem (Brandes et al., 2008); cluster spinglass (R) and
community spinglass (Python) optimise the
multi-resolution modularity proposed by Reichardt
and Bornholdt (Reichardt and Bornholdt, 2006).
Some methods based on the optimisation of cluster quality functions are also available.
The
code for the optimisation of the local modularity by Clauset (Clauset, 2005) can be
found at http://tuvalu.santafe.edu/~aaronc/
shared/LocalCommunity2005_GPL.zip. The code
for OSLOM is downloadable from the dedicated
website http://www.oslom.org.
• Methods based on dynamics.
Infomap (Rosvall and Bergstrom, 2008) is currently a very
popular algorithm and its code can be found
in various places. It has a dedicated website
http://www.mapequation.org, where several extensions can be downloaded, including hierarchical community structure (Rosvall and Bergstrom,
2011), overlapping clusters (Viamontes Esquivel
and Rosvall, 2011) and memory (Rosvall et al.,

VI. OUTLOOK

As long as there will be networks, there will be people
looking for communities in them. So it is of uttermost importance to have a set of reliable concepts and principles
guiding scholars towards promising solutions for network
clustering. We have presented established views of the
main aspects of the problem, and exposed the strengths
as well as the limits of popular notions and approaches.
What’s next? We believe that there will be a trend towards the development of domain-dependent algorithms,
exploiting as much as possible information and peculiarities of network data sets. Generalist methods could still
be used to get first indications about community structure and orient the investigation in promising directions.
Some existing approaches are sufficiently flexible to accommodate various features of networks and community
structure (Section IV.J).
At the same time, we believe that it is necessary to find
accurate models of networks with community structure,
both for the purpose of designing realistic benchmark
graphs for validation, and for a more precise inference of
the groups and of their features. Investigations of real
networks at the level of subgraphs, along the lines of
those discussed in Section III.E, are instrumental to the
definition of such models.
While benchmark graphs can be improved, there is one
test that one can rely on to assess the performance of clustering algorithms: applying methods on random graphs
without group structure. We know that many popular
techniques find groups in such graphs as well, failing the
test. On a related note, it is critical to determine how
non-random the clusters detected on real networks are,
i. e., to estimate their significance (Section IV.I).
We stress that this exposition is by no means complete.
The emphasis is on the fundamental aspects of network
clustering and on main stream approaches. We discussed

40
works and listed references which are of more immediate
relevance to the topics discussed. A number of topics
have not been dealt with. Still we hope that this work
will help practitioners to design more and more reliable
methods and domain users to extract useful information
from their data.

Acknowledgments

We thank Alex Arenas, Florian Kimm, Tiago Peixoto,
Mason Porter and Martin Rosvall for a careful reading
of the manuscript and many valuable comments. We
gratefully acknowledge MULTIPLEX, grant No. 317532
of the European Commission.

References
Aggarwal, C. C., and S. Y. Philip, 2005, in Proc. of SIAM
Int. Conf. on Data Mining (SDM) (SIAM), pp. 56–67.
Ahn, Y.-Y., J. P. Bagrow, and S. Lehmann, 2010, Nature
466(7307), 761.
Aicher, C., A. Z. Jacobs, and A. Clauset, 2014, J. Complex
Netw. 3(2), 221.
Airoldi, E. M., D. M. Blei, S. E. Fienberg, and E. P. Xing,
2008, J. Mach. Learn. Res. 9, 1981.
Alba, R. D., 1973, J. Math. Sociol. 3, 113.
Albert, R., H. Jeong, and A.-L. Barabási, 1999, Nature 401,
130.
Angel, O., J. Friedman, and S. Hoory, 2015, Trans. Am. Math.
Soc. 367(6), 4287.
Arenas, A., A. Dı́az-Guilera, and C. J. Pérez-Vicente, 2006,
Phys. Rev. Lett. 96(11), 114102.
Arenas, A., A. Fernández, S. Fortunato, and S. Gómez, 2008a,
J. Phys. A 41(22), 224001.
Arenas, A., A. Fernández, and S. Gómez, 2008b, New J. Phys.
10(5), 053039.
Asur, S., S. Parthasarathy, and D. Ucar, 2007, in KDD ’07:
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM,
New York, NY, USA), pp. 913–921.
Ball, B., B. Karrer, and M. E. J. Newman, 2011, Phys. Rev.
E 84, 036103.
Barabási, A.-L., 2010, Bursts: the hidden patterns behind everything we do, from your e-mail to bloody crusades (Penguin).
Barber, M. J., 2007, Phys. Rev. E 76(6), 066102.
Barrat, A., M. Barthélemy, and A. Vespignani, 2008, Dynamical processes on complex networks (Cambridge University
Press, Cambridge, UK).
Baumes, J., M. K. Goldberg, M. S. Krishnamoorthy, M. M.
Ismail, and N. Preston, 2005, in IADIS AC, edited by
N. Guimaraes and P. T. Isaias (IADIS), pp. 97–104.
Baxter, R. J., 2007, Exactly solved models in statistical mechanics (Courier Corporation).
Ben-Hur, A., A. Elisseeff, and I. Guyon, 2001, in Pacific Symposium on Biocomputing, volume 7, pp. 6–17.
Benson, A. R., D. F. Gleich, and J. Leskovec, 2016, Science
353(6295), 163.
Blondel, V. D., J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, 2008, J. Stat. Mech. P10008.

Boccaletti, S., G. Bianconi, R. Criado, C. del Genio, J. G.G. nes, M. Romance, I. Sendiña-Nadal, Z. Wang, and
M. Zanin, 2014, Phys. Rep. 544(1), 1.
Boccaletti, S., M. Ivanchenko, V. Latora, A. Pluchino, and
A. Rapisarda, 2007, Phys. Rev. E 75(4), 045102.
Bollobás, B., 1980, Eur. J. Combin. 1(4), 311.
Bomze, I. M., M. Budinich, P. M. Pardalos, and M. Pelillo,
1999, in Handbook of Combinatorial Optimization, edited
by D.-Z. Du and P. Pardalos (Kluwer Academic Publishers,
Norwell, USA), pp. 1–74.
Bothorel, C., J. D. Cruz, M. Magnani, and B. Micenkova,
2015, Netw. Sci. 3(03), 408.
Brandes, U., D. Delling, M. Gaertler, R. Görke, M. Hoefer,
Z. Nikoloski, and D. Wagner, 2008, IEEE Trans. Knowl.
Data Eng. 20(2), 172.
Brennan, R. L., and R. J. Light, 1974, Br. J. Math. Stat.
Psychol. 27(2), 154.
Brin, S., and L. E. Page, 1998, Comput. Netw. ISDN 30, 107.
Bron, C., and J. Kerbosch, 1973, Commun. ACM 16, 575.
Bruno, A. M., W. N. Frost, and M. D. Humphries, 2015,
Neuron 86(1), 304 .
Bui, T. N., S. Chaudhuri, F. T. Leighton, and M. Sipser, 1987,
Combinatorica 7(2), 171.
Caldarelli, G., 2007, Scale-free networks (Oxford University
Press, Oxford, UK).
Chakrabarti, D., R. Kumar, and A. Tomkins, 2006, in
KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, NY, USA), pp. 554–560.
Chakraborty, T., A. Dalmia, A. Mukherjee, and N. Ganguly,
2016, preprint arXiv:1604.03512 .
Chi, Y., X. Song, D. Zhou, K. Hino, and B. L. Tseng, 2007,
in KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (ACM, New York, NY, USA), pp. 153–162.
Clauset, A., 2005, Phys. Rev. E 72(2), 026132.
Clauset, A., M. E. J. Newman, and C. Moore, 2004, Phys.
Rev. E 70(6), 066111.
Cohen, R., and S. Havlin, 2010, Complex Networks: Structure, Robustness and Function (Cambridge University
Press, Cambridge, UK).
Collins, L. M., and C. W. Dent, 1988, Multivar. Behav. Res.
23(2), 231.
Côme, E., and P. Latouche, 2015, Stat. Modelling 15(6), 564.
Condon, A., and R. M. Karp, 2001, Random Struct. Algor.
18, 116.
Coscia, M., F. Giannotti, and D. Pedreschi, 2011, Stat. Anal.
Data Min. 4(5), 512.
Danon, L., A. Dı́az-Guilera, J. Duch, and A. Arenas, 2005, J.
Stat. Mech. P09008.
Danon, L., J. Duch, A. Arenas, and A. Dı́az-Guilera, 2007,
in Large Scale Structure and Dynamics of Complex Networks: From Information Technology to Finance and Natural Science, edited by C. G. and V. A. (World Scientific,
Singapore), pp. 93–114.
Darst, R. K., Z. Nussinov, and S. Fortunato, 2014, Phys. Rev.
E 89(3), 032809.
Daudin, J.-J., F. Picard, and S. Robin, 2008, Stat. Comput.
18(2), 173.
Decelle, A., F. Krzakala, C. Moore, and L. Zdeborová, 2011,
Phys. Rev. Lett. 107, 065701.
Delling, D., M. Gaertler, R. Görke, and D. Wagner, 2006,
Experiments on comparing graph clusterings, Technical Report, Universität Karlsruhe, Germany.

41
Delvenne, J.-C., S. N. Yaliraki, and M. Barahona, 2010, Proc.
Natl. Acad. Sci. USA 107(29), 12755.
Dorogovtsev, S. N., and J. F. F. Mendes, 2013, Evolution of
networks: From biological nets to the Internet and WWW
(Oxford University Press).
Dyer, M. E., and A. M. Frieze, 1989, J. Algorithms 10(4),
451.
Erdös, P., and A. Rényi, 1959, Publ. Math. Debrecen 6, 290.
Erdös, P., and A. Rényi, 1960, Publ. Math. Inst. Hungar.
Acad. Sci 5, 17.
Esquivel, A. V., and M. Rosvall, 2012, eprint arXiv:1202.0425.
Estrada, E., 2011, The structure of complex networks: theory
and applications (Oxford University Press, UK).
Estrada, E., and P. A. Knight, 2015, A First Course in Network Theory (Oxford University Press, UK).
Evans, T. S., 2010, J. Stat. Mech. Theor. Exp. 2010(12),
P12037.
Evans, T. S., and R. Lambiotte, 2009, Phys. Rev. E 80(1),
016105.
Expert, P., T. S. Evans, V. D. Blondel, and R. Lambiotte,
2011, Proc. Natl. Acad. Sci. USA 108(19), 7663.
Fienberg, S. E., and S. Wasserman, 1981, Sociol. Methodol.
12, 156.
Fortunato, S., 2010, Phys. Rep. 486, 75.
Fortunato, S., and M. Barthélemy, 2007, Proc. Natl. Acad.
Sci. USA 104, 36.
Fred, A., and A. K. Jain, 2003, in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer
Society Conference on (IEEE), volume 2, pp. II–128.
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin, 2014,
Bayesian Data Analysis, volume 2 (Taylor & Francis).
Girvan, M., and M. E. Newman, 2002, Proc. Natl. Acad. Sci.
USA 99(12), 7821.
Goder, A., and V. Filkov, 2008, in ALENEX, pp. 109–117.
Good, B. H., Y.-A. de Montjoye, and A. Clauset, 2010, Phys.
Rev. E 81(4), 046106.
Granell, C., R. K. Darst, A. Arenas, S. Fortunato, and
S. Gómez, 2015, Phys. Rev. E 92(1), 012805.
Granell, C., S. Gómez, and A. Arenas, 2012, Int. J. Bifurcat.
Chaos 22(07), 1250171.
Grünwald, P. D., I. J. Myung, and M. A. Pitt, 2005, Advances
in Minimum Description Length: Theory and Applications
(MIT Press, Cambridge, USA).
Guimerà, R., and L. A. N. Amaral, 2005, Nature 433, 895.
Guimerà, R., and M. Sales-Pardo, 2009, Proc. Natl. Acad.
Sci. USA 106(52), 22073.
Guimerà, R., M. Sales-Pardo, and L. A. Amaral, 2004, Phys.
Rev. E 70(2), 025101 (R).
Handcock, M. S., A. E. Raftery, and J. M. Tantrum, 2007, J.
Roy. Stat. Soc. A 170(46), 1.
Hastings, M. B., 2006, Phys. Rev. E 74(3), 035102.
Holland, P., K. B. Laskey, and S. Leinhardt, 1983, Soc. Netw.
5, 109.
Holme, P., and J. Saramäki, 2012, Phys. Rep. 519(3), 97.
Hopcroft, J., O. Khan, B. Kulis, and B. Selman, 2004, Proc.
Natl. Acad. Sci. USA 101, 5249.
Hric, D., R. K. Darst, and S. Fortunato, 2014, Phys. Rev. E
90, 062805.
Hric, D., T. P. Peixoto, and S. Fortunato, 2016, Phys. Rev.
X 6, 031038.
Hu, Y., H. Chen, P. Zhang, M. Li, Z. Di, and Y. Fan, 2008,
Phys. Rev. E 78(2), 026121.
Huang, J., H. Sun, Y. Liu, Q. Song, and T. Weninger, 2011,
PLoS ONE 6(8), e23829.

Hubert, L., and P. Arabie, 1985, J. Classif. 2(1), 193.
Hüllermeier, E., and M. Rifqi, 2009, in Joint 2009 International Fuzzy Systems Association World Congress and 2009
European Society of Fuzzy Logic and Technology Conference, IFSA-EUSFLAT 2009, pp. 1294–1298.
Jaccard, P., 1901, Bull. Soc. Vaud. Sci. Nat. 37, 547.
Jain, A. K., M. N. Murty, and P. J. Flynn, 1999, ACM Comput. Surv. 31(3), 264.
Jeub, L. G., P. Balachandran, M. A. Porter, P. J. Mucha, and
M. W. Mahoney, 2015, Phys. Rev. E 91(1), 012821.
Karrer, B., E. Levina, and M. E. J. Newman, 2008, Phys.
Rev. E 77(4), 046119.
Karrer, B., and M. E. J. Newman, 2011, Phys. Rev. E 83,
016107.
Kivelä, M., A. Arenas, M. Barthelemy, J. P. Gleeson,
Y. Moreno, and M. A. Porter, 2014, J. Complex Netw.
2(3), 203.
Krzakala, F., C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, and P. Zhang, 2013, Proc. Natl. Acad. Sci. USA
110(52), 20935.
Lambiotte, R., J. . Delvenne, and M. Barahona, 2008, eprint
arXiv:0812.1770.
Lancichinetti, A., and S. Fortunato, 2009, Phys. Rev. E 80(1),
016118.
Lancichinetti, A., and S. Fortunato, 2009, Phys. Rev. E 80(5),
056117.
Lancichinetti, A., and S. Fortunato, 2011, Phys. Rev. E 84,
066122.
Lancichinetti, A., and S. Fortunato, 2012, Sci. Rep. 2, 336.
Lancichinetti, A., and S. Fortunato, 2014, Phys. Rev. E 89,
049902.
Lancichinetti, A., S. Fortunato, and J. Kertesz, 2009, New J.
Phys. 11(3), 033015.
Lancichinetti, A., S. Fortunato, and F. Radicchi, 2008, Phys.
Rev. E 78(4), 046110.
Lancichinetti, A., M. Kivelä, J. Saramäki, and S. Fortunato,
2010, PLoS ONE 5(8), e11976.
Lancichinetti, A., F. Radicchi, J. J. Ramasco, and S. Fortunato, 2011, PLoS ONE 6(4), e18961.
Larremore, D. B., A. Clauset, and A. Z. Jacobs, 2014, Phys.
Rev. E 90(1), 012805.
Latouche, P., E. Birmele, and C. Ambroise, 2012, Stat. Modelling 12(1), 93.
Leng, M., Y. Yao, J. Cheng, W. Lv, and X. Chen, 2013,
in Database Systems for Advanced Applications (Springer),
pp. 324–338.
Leskovec, J., K. J. Lang, A. Dasgupta, and M. W. Mahoney,
2009, Internet Math. 6(1), 29.
Lewis, A., N. Jones, M. Porter, and C. Deane, 2010, BMC
Syst. Biol. 4(1), 100.
Liben-Nowell, D., J. Novak, R. Kumar, P. Raghavan, and
A. Tomkins, 2005, Proc. Natl. Acad. Sci. USA 102(33),
11623.
Lin, Y.-R., Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng,
2008, in WWW ’08: Proceeding of the 17th International
Conference on World Wide Web (ACM, New York, NY,
USA), pp. 685–694.
Luccio, F., and M. Sami, 1969, IEEE Trans. Circuit Th. CT
16, 184.
Luce, R. D., 1950, Psychometrika 15(2), 169.
Luce, R. D., and A. D. Perry, 1949, Psychometrika 14(2), 95.
Lusseau, D., 2003, Proc. Royal Soc. London B 270, S186.
von Luxburg, U., 2006, A tutorial on spectral clustering, Technical Report 149, Max Planck Institute for Biological Cy-

42
bernetics.
Mackay, D. J. C., 2003, Information Theory, Inference, and
Learning Algorithms (Cambridge University Press, Cambridge, UK).
MacMahon, M., and D. Garlaschelli, 2015, Phys. Rev. X 5,
021006.
MacQueen, J. B., 1967, in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, edited
by L. M. L. Cam and J. Neyman (University of California
Press, Berkeley, USA), volume 1, pp. 281–297.
Malliaros, F. D., and M. Vazirgiannis, 2013, Phys. Rep.
533(4), 95.
McDaid, A. F., D. Greene, and N. Hurley, 2011, eprint
arXiv:1110.2515.
Meilă, M., 2005, in Proceedings of the 22nd International Conference on Machine Learning (ACM), pp. 577–584.
Meilă, M., 2007, J. Multivar. Anal. 98(5), 873.
Meilă, M., and D. Heckerman, 2001, Mach. Learn. 42(1), 9.
Mezard, M., G. Parisi, and M. Virasoro, 1987, Spin glass
theory and beyond (World Scientific Publishing Company,
Singapore).
Mokken, R. J., 1979, Qual. Quant. 13(2), 161.
Molloy, M., and B. Reed, 1995, Random Struct. Algor. 6, 161.
Moody, J., and D. R. White, 2003, Am. Sociol. Rev. 68(1),
103.
Moore, C., X. Yan, Y. Zhu, J.-B. Rouquier, and T. Lane,
2011, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, NY, USA), KDD ’11, pp. 841–849.
Mucha, P. J., T. Richardson, K. Macon, M. A. Porter, and
J. P. Onnela, 2010, Science 328(5980), 876.
Nadakuditi, R. R., and M. E. J. Newman, 2012, Phys. Rev.
Lett. 108, 188701.
Newman, M., 2010, Networks: An Introduction (Oxford University Press, Inc., New York, NY, USA).
Newman, M., 2013, eprint arXiv:1308.6494.
Newman, M., 2016, eprint arXiv:1606.02319.
Newman, M., and A. Clauset, 2016, Nat. Commun. 7, 11863.
Newman, M. E. J., 2004, Phys. Rev. E 70(5), 056131.
Newman, M. E. J., 2004a, Eur. Phys. J. B 38, 321.
Newman, M. E. J., 2004b, Phys. Rev. E 69(6), 066133.
Newman, M. E. J., 2006, Proc. Natl. Acad. Sci. USA 103,
8577.
Newman, M. E. J., 2012, Nat. Phys. 8(1), 25.
Newman, M. E. J., and M. Girvan, 2004, Phys. Rev. E 69(2),
026113.
Newman, M. E. J., and E. A. Leicht, 2007, Proc. Natl. Acad.
Sci. USA 104, 9564.
Newman, M. E. J., and G. Reinert, 2016, Phys. Rev. Lett.
117, 078301.
Onnela, J.-P., D. J. Fenn, S. Reid, M. A. Porter, P. J. Mucha,
M. D. Fricker, and N. S. Jones, 2012, Phys. Rev. E 86,
036104.
Palla, G., A.-L. Barabási, and T. Vicsek, 2007, Nature 446,
664.
Palla, G., I. Derényi, I. Farkas, and T. Vicsek, 2005, Nature
435, 814.
Parthasarathy, S., Y. Ruan, and V. Satuluri, 2011, in Social
Network Data Analytics (Springer), pp. 79–113.
Peel, L., 2015, J. Complex Netw. 3(3), 431.
Peixoto, T. P., 2013, Phys. Rev. Lett. 110, 148701.
Peixoto, T. P., 2014, Phys. Rev. X 4, 011047.
Peixoto, T. P., 2015a, Phys. Rev. E 92(4), 042807.
Peixoto, T. P., 2015b, Phys. Rev. X 5, 011033.

Peixoto, T. P., and M. Rosvall, 2015, eprint arXiv:1509.04740.
Perotti, J. I., C. J. Tessone, and G. Caldarelli, 2015, Phys.
Rev. E 92, 062825.
Persson, C., L. Bohlin, D. Edler, and M. Rosvall, 2016, eprint
arXiv:1606.08328.
Pons, P., and M. Latapy, 2005, in International Symposium on
Computer and Information Sciences (Springer), pp. 284–
293.
Porter, M. A., J.-P. Onnela, and P. J. Mucha, 2009, Notices
Amer. Math. Soc. 56(9), 1082.
Radicchi, F., C. Castellano, F. Cecconi, V. Loreto, and
D. Parisi, 2004, Proc. Natl. Acad. Sci. USA 101, 2658.
Raghavan, U. N., R. Albert, and S. Kumara, 2007, Phys. Rev.
E 76(3), 036106.
Rand, W. M., 1971, J. Am. Stat. Assoc. 66(336), 846.
Reichardt, J., and S. Bornholdt, 2006, Phys. Rev. E 74(1),
016110.
Rissanen, J., 1978, Automatica 14, 465.
Ronhovde, P., and Z. Nussinov, 2009, Phys. Rev. E 80(1),
016109.
Ronhovde, P., and Z. Nussinov, 2010, Phys. Rev. E 81,
046114.
Rosvall, M., and C. T. Bergstrom, 2008, Proc. Natl. Acad.
Sci. USA 105, 1118.
Rosvall, M., and C. T. Bergstrom, 2010, PLoS one 5(1),
e8694.
Rosvall, M., and C. T. Bergstrom, 2011, PLoS ONE 6(4),
e18209.
Rosvall, M., A. V. Esquivel, A. Lancichinetti, J. D. West, and
R. Lambiotte, 2014, Nat. Commun. 5.
Sarkar, P., and A. W. Moore, 2005, ACM SIGKDD Explor.
Newsl. 7(2), 31.
Sarkar, S., S. Chawla, P. A. Robinson, and S. Fortunato, 2016,
Phys. Rev. E 93, 062312.
Schaeffer, S. E., 2007, Comput. Sci. Rev. 1(1), 27.
Scott, J., 2000, Social Network Analysis: A Handbook (SAGE
Publications, London, UK).
Seidman, S. B., and B. L. Foster, 1978, J. Math. Sociol. 6,
139.
Serrour, B., A. Arenas, and S. Gómez, 2011, Comput. Commun. 34(5), 629 .
Simon, H., 1962, Proc. Am. Phil. Soc. 106(6), 467.
Singh, A., and M. D. Humphries, 2015, Scientific reports 5,
8828.
Snijders, T., and K. Nowicki, 1997, J. Classif. 14, 75.
Spiliopoulou, M., 2011, in Social Network Data Analytics,
edited by C. C. Aggarwal (Springer US), pp. 149–175.
Strehl, A., and J. Ghosh, 2002, J. Mach. Learn. Res. 3, 583,
ISSN 1532-4435.
Topchy, A., A. K. Jain, and W. Punch, 2005, IEEE Trans.
Pattern Anal. Mach. Intell. 27, 1866.
Traag, V. A., and J. Bruggeman, 2009, Phys. Rev. E 80(3),
036115.
Traag, V. A., P. Van Dooren, and Y. Nesterov, 2011, Phys.
Rev. E 84, 016114.
Traud, A., E. Kelsic, P. Mucha, and M. Porter, 2011, SIAM
Review 53(3), 526.
Traud, A. L., P. J. Mucha, and M. A. Porter, 2012, Physica
A 391(16), 4165.
Van Dongen, S., 2000, Graph Clustering by Flow Simulation, Ph.D. thesis, Dutch National Research Institute for
Mathematics and Computer Science, University of Utrecht,
Netherlands.
Viamontes Esquivel, A., and M. Rosvall, 2011, Phys. Rev. X

43
1, 021025.
Wasserman, S., and K. Faust, 1994, Social network analysis
(Cambridge University Press, Cambridge, UK).
Xie, J., S. Kelley, and B. K. Szymanski, 2013, ACM Comput.
Surv. 45(4), 43:1.
Xie, J., and B. K. Szymanski, 2012, in Proceedings of the
16th Pacific-Asia Conference on Advances in Knowledge
Discovery and Data Mining - Volume Part II (SpringerVerlag, Berlin, Heidelberg), PAKDD’12, pp. 25–36.
Xu, R., and D. Wunsch, 2008, Clustering (John Wiley & Sons,
Piscataway, NJ, USA).
Yang, J., and J. Leskovec, 2012a, in Data Mining (ICDM),
2012 IEEE 12th International Conference on (IEEE), pp.
1170–1175.
Yang, J., and J. Leskovec, 2012b, in Proceedings of the ACM
SIGKDD Workshop on Mining Data Semantics (ACM,
New York, NY, USA), MDS’12, pp. 3:1–3:8.
Yang, J., and J. Leskovec, 2013, in Proceedings of the Sixth
ACM International Conference on Web Search and Data
Mining (ACM, New York, NY, USA), WSDM’13, pp. 587–
596.
Yang, J., and J. Leskovec, 2014, ACM Trans. Intell. Syst.

Technol. 5(2), 26:1.
Yang, J., J. McAuley, and J. Leskovec, 2013, in 2013 IEEE
13th International Conference on Data mining (ICDM)
(IEEE), pp. 1151–1156.
Yang, T., Y. Chi, S. Zhu, Y. Gong, and R. Jin, 2009, in SIAM
Int. Conf. on Data Mining (SDM) (SIAM), volume 9, pp.
990–1001.
Zachary, W. W., 1977, J. Anthropol. Res. 33, 452.
Zanghi, H., C. Ambroise, and V. Miele, 2008, Pattern Recogn.
41(12), 3592.
Zhang, P., 2015, J. Stat. Mech. Theor. Exp. P11006.
Zhang, P., and C. Moore, 2014, Proc. Natl. Acad. Sci.
111(51), 18144.
Zhang, P., C. Moore, and M. Newman, 2016, Phys. Rev. E
93(1), 012303.
Zhang, X., T. Martin, and M. E. Newman, 2015, Phys. Rev.
E 91(3), 032803.
Zhou, H., 2003a, Phys. Rev. E 67(6), 061901.
Zhou, H., 2003b, Phys. Rev. E 67(4), 041908.
Zhou, H., and R. Lipowsky, 2004, Lect. Notes Comp. Sci.
3038, 1062.



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Author                          : 
Create Date                     : 2016:11:04 00:29:31Z
Creator                         : LaTeX with hyperref package
Modify Date                     : 2016:11:04 00:29:31Z
PTEX Fullbanner                 : This is pdfTeX, Version 3.1415926-2.3-1.40.12 (TeX Live 2011) kpathsea version 6.0.1
Producer                        : pdfTeX-1.40.12
Subject                         : 
Title                           : 
Trapped                         : False
Page Mode                       : UseOutlines
Page Count                      : 43
EXIF Metadata provided by EXIF.tools

Navigation menu