Networks can be directed…
…or undirected…
…or weighted
Explore the dataset
In this first exercise, you will explore the Madrid train bombing dataset. This dataset consists of two parts: the nodes (also called vertices), which refer to people, and the ties (also called edges) which refer to the relationships between these people. You will use the package readr
to read the nodes
and ties
from CSV files into variables in R. For your convenience, the package readr
is already loaded into the workspace.
library(readr)
# Read the nodes file into the variable nodes
nodes <- read_csv("_data/nodes.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## id = col_double(),
## name = col_character()
## )
# Read the ties file into the variable ties
ties <- read_csv("_data/ties.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## from = col_double(),
## to = col_double(),
## weight = col_double()
## )
# Print nodes
nodes
## # A tibble: 64 x 2
## id name
## <dbl> <chr>
## 1 1 Jamal Zougam
## 2 2 Mohamed Bekkali
## 3 3 Mohamed Chaoui
## 4 4 Vinay Kholy
## 5 5 Suresh Kumar
## 6 6 Mohamed Chedadi
## 7 7 Imad Eddin Barakat
## 8 8 Abdelaziz Benyaich
## 9 9 Abu Abderrahame
## 10 10 Omar Dhegayes
## # ... with 54 more rows
# Print ties
ties
## # A tibble: 243 x 3
## from to weight
## <dbl> <dbl> <dbl>
## 1 1 2 1
## 2 1 3 3
## 3 1 4 1
## 4 1 5 1
## 5 1 6 1
## 6 1 7 4
## 7 1 8 1
## 8 1 9 1
## 9 1 11 4
## 10 1 12 1
## # ... with 233 more rows
Good start! Notice that there are more ties than nodes.
Build and explore the network (part 1)
In this exercise, you are going to begin using the igraph
package. This package lets you analyze data that are represented as networks, which are also called graphs by mathematicians. In particular, you will learn how to build a network from a data frame and explore the nodes and ties of the network.
For your convenience, the package igraph
and the data frames nodes
and ties
are already loaded into the workspace.
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
# Make the network from the data frame ties and print it
g <- graph_from_data_frame(ties, directed = FALSE, vertices = nodes)
g
## IGRAPH 1ac9149 UNW- 64 243 --
## + attr: name (v/c), weight (e/n)
## + edges from 1ac9149 (vertex names):
## [1] Jamal Zougam--Mohamed Bekkali Jamal Zougam--Mohamed Chaoui
## [3] Jamal Zougam--Vinay Kholy Jamal Zougam--Suresh Kumar
## [5] Jamal Zougam--Mohamed Chedadi Jamal Zougam--Imad Eddin Barakat
## [7] Jamal Zougam--Abdelaziz Benyaich Jamal Zougam--Abu Abderrahame
## [9] Jamal Zougam--Amer Azizi Jamal Zougam--Abu Musad Alsakaoui
## [11] Jamal Zougam--Mohamed Atta Jamal Zougam--Ramzi Binalshibh
## [13] Jamal Zougam--Mohamed Belfatmi Jamal Zougam--Said Bahaji
## [15] Jamal Zougam--Galeb Kalaje Jamal Zougam--Abderrahim Zbakh
## + ... omitted several edges
# Explore the set of nodes
V(g)
## + 64/64 vertices, named, from 1ac9149:
## [1] Jamal Zougam Mohamed Bekkali Mohamed Chaoui
## [4] Vinay Kholy Suresh Kumar Mohamed Chedadi
## [7] Imad Eddin Barakat Abdelaziz Benyaich Abu Abderrahame
## [10] Omar Dhegayes Amer Azizi Abu Musad Alsakaoui
## [13] Mohamed Atta Ramzi Binalshibh Mohamed Belfatmi
## [16] Said Bahaji Galeb Kalaje Abderrahim Zbakh
## [19] Farid Oulad Ali José Emilio Suárez Khalid Ouled Akcha
## [22] Rafa Zuher Naima Oulad Akcha Abdelkarim el Mejjati
## [25] Anwar Adnan Ahmad Basel Ghayoun S B Abdelmajid Fakhet
## [28] Jamal Ahmidan Said Ahmidan Hamid Ahmidan
## + ... omitted several vertices
# Print the number of nodes
vcount(g)
## [1] 64
# Explore the set of ties
E(g)
## + 243/243 edges from 1ac9149 (vertex names):
## [1] Jamal Zougam--Mohamed Bekkali Jamal Zougam--Mohamed Chaoui
## [3] Jamal Zougam--Vinay Kholy Jamal Zougam--Suresh Kumar
## [5] Jamal Zougam--Mohamed Chedadi Jamal Zougam--Imad Eddin Barakat
## [7] Jamal Zougam--Abdelaziz Benyaich Jamal Zougam--Abu Abderrahame
## [9] Jamal Zougam--Amer Azizi Jamal Zougam--Abu Musad Alsakaoui
## [11] Jamal Zougam--Mohamed Atta Jamal Zougam--Ramzi Binalshibh
## [13] Jamal Zougam--Mohamed Belfatmi Jamal Zougam--Said Bahaji
## [15] Jamal Zougam--Galeb Kalaje Jamal Zougam--Abderrahim Zbakh
## [17] Jamal Zougam--Naima Oulad Akcha Jamal Zougam--Abdelkarim el Mejjati
## [19] Jamal Zougam--Basel Ghayoun Jamal Zougam--S B Abdelmajid Fakhet
## + ... omitted several edges
# Print the number of ties
ecount(g)
## [1] 243
Build and explore the network (part 2)
A network built using igraph
can have attributes. These include:
In this exercise, we will explore all these types of attributes.
igraph
and the variable g
containing the network are already loaded into the workspace.
# Give the name "Madrid network" to the network and print the network `name` attribute
g$name <- "Madrid network"
g$name
## [1] "Madrid network"
# Add node attribute id and print the node `id` attribute
V(g)$id <- seq_len(vcount(g))
V(g)$id
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64
# Print the tie `weight` attribute
E(g)$weight
## [1] 1 3 1 1 1 4 1 1 4 1 1 2 2 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 3 1 1 2 1
## [38] 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 3 1 1 1 2 2 3 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1
## [112] 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 3 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# Print the network and spot attributes
g
## IGRAPH 1ac9149 UNW- 64 243 -- Madrid network
## + attr: name (g/c), name (v/c), id (v/n), weight (e/n)
## + edges from 1ac9149 (vertex names):
## [1] Jamal Zougam--Mohamed Bekkali Jamal Zougam--Mohamed Chaoui
## [3] Jamal Zougam--Vinay Kholy Jamal Zougam--Suresh Kumar
## [5] Jamal Zougam--Mohamed Chedadi Jamal Zougam--Imad Eddin Barakat
## [7] Jamal Zougam--Abdelaziz Benyaich Jamal Zougam--Abu Abderrahame
## [9] Jamal Zougam--Amer Azizi Jamal Zougam--Abu Musad Alsakaoui
## [11] Jamal Zougam--Mohamed Atta Jamal Zougam--Ramzi Binalshibh
## [13] Jamal Zougam--Mohamed Belfatmi Jamal Zougam--Said Bahaji
## [15] Jamal Zougam--Galeb Kalaje Jamal Zougam--Abderrahim Zbakh
## + ... omitted several edges
Visualize the network (part 1)
Throughout this course, you’ll use the ggraph
package. This extends ggplot2
with new geometries to visualize the nodes and ties of a network.
geom_node_
. For example, geom_node_point() draws each node as a point. geom_node_text()
draws a text label on each node.geom_edge_
. For example, geom_edge_link() draws edges as a straight line between nodes.How networks are laid out in a plot to make them more readable is not an exact science. There are many algorithms, and you may need to try several of them. In this exercise, you’ll use the Kamada-Kawai layout that you specify by setting the layout
argument to "with_kk"
. The possible layout values are not currently well documented; the easiest way to see a list is to run ggraph:::igraphlayouts
.
For your convenience, ggraph
is already loaded, the graph theme is set with the function set_graph_style()
, and the network g
is at your disposal.
library(ggplot2)
library(ggraph)
# Visualize the network with the Kamada-Kawai layout
ggraph(g, layout = "with_kk") +
# Add an edge link geometry mapping transparency to weight
geom_edge_link(aes(alpha = weight)) +
# Add a node point geometry
geom_node_point()
ggraph(g, layout = "with_kk") +
geom_edge_link(aes(alpha = weight)) +
geom_node_point() +
# Add a node text geometry, mapping label to id and repelling
geom_node_text(aes(label = id), repel = TRUE)
The network has a typical core-periphery structure, with a densely knitted center and a sparser periphery around it.
Visualize the network (part 2)
In the previous exercise, we used a force-directed layout (the Kamada-Kawai layout) to visualize the nodes and ties, in other words, it placed tied nodes at equal distances, so that all ties had roughly the same length.
In this exercise, we will use two alternative layouts:
For your convenience, the variable g
containing the network is at your disposal.
# Visualize the network in a circular layout
ggraph(g, layout = "in_circle") +
# Map tie transparency to its weight
geom_edge_link(aes(alpha = weight)) +
geom_node_point()
# Change the layout so points are on a grid
ggraph(g, layout = "on_grid") +
geom_edge_link(aes(alpha = weight)) +
geom_node_point()
A network is unique, but it can be displayed in many different ways!
Network of natural gas pipelines in Europe
“Webs without a spider” (no central authority, self-organized)
Node centrality
Degree
Strength
used for weighted networks
sum of weights of weights of ties
Find the most connected terrorists
The challenge of this exercise is to spot the most connected terrorists of the train bombing network. We will take advantage of the most simple and popular centrality measure in network science: degree centrality. The degree of each node is the number of adjacent ties it has. In the context of this dataset, that means the number of other people that person is connected to.
The centrality degree is calculated using degree(), which takes the graph object (not the nodes) as its only input.
You will use both igraph
and dplyr
, which are already loaded in the workspace. The network, g
, and its nodes, nodes
, are also pre-loaded.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:igraph':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
nodes_with_centrality <- nodes %>%
# Add a column containing the degree of each node
mutate(degree = degree(g)) %>%
# Arrange rows by descending degree
arrange(desc(degree))
# See the result
nodes_with_centrality
## # A tibble: 64 x 3
## id name degree
## <dbl> <chr> <dbl>
## 1 1 Jamal Zougam 29
## 2 3 Mohamed Chaoui 27
## 3 7 Imad Eddin Barakat 22
## 4 11 Amer Azizi 18
## 5 38 Said Berrak 17
## 6 17 Galeb Kalaje 16
## 7 23 Naima Oulad Akcha 16
## 8 18 Abderrahim Zbakh 15
## 9 28 Jamal Ahmidan 14
## 10 55 Mohamed El Egipcio 13
## # ... with 54 more rows
Excellent finding! The ranking leader, Jamal Zougam, was in fact directly involved in the bombings and was one of the first to be arrested.
Find the most strongly connected terrorists The degree measure from the last exercise measured how many people each person was connected to. However, not all relationships are equal, for example, you typically have a much stronger relationship with your family members than with someone you met in the street. Another centrality measure, strength centrality, takes account of this by assigning a weight to each tie.
The strength measure is calculated using strength(), which takes the network as its only input. You will use it to find the most strongly connected terrorists of the train bombing network.
Again, you will use both igraph
and dplyr
, which are already loaded in the workspace. The network, g
, and its nodes, nodes
, are also pre-loaded.
nodes_with_centrality <- nodes %>%
mutate(
degree = degree(g),
# Add a column containing the strength of each node
strength = strength(g)
) %>%
# Arrange rows by descending strength
arrange(desc(strength))
# See the result
nodes_with_centrality
## # A tibble: 64 x 4
## id name degree strength
## <dbl> <chr> <dbl> <dbl>
## 1 1 Jamal Zougam 29 43
## 2 7 Imad Eddin Barakat 22 35
## 3 3 Mohamed Chaoui 27 34
## 4 11 Amer Azizi 18 27
## 5 17 Galeb Kalaje 16 21
## 6 15 Mohamed Belfatmi 11 19
## 7 38 Said Berrak 17 19
## 8 16 Said Bahaji 11 17
## 9 23 Naima Oulad Akcha 16 16
## 10 18 Abderrahim Zbakh 15 15
## # ... with 54 more rows
Strong work! Degree and strength are closely related here; the order of rows is almost the same.
More on centrality
There are other centrality measures, but covering them all is beyond the scope of this course. A couple other centrality measures include:
Betweenness of ties
Previously you saw that nodes can have a measure of betweenness. Ties can also have this measure: betweenness of ties is defined by the number of shortest paths going through a tie.
Ties with high betweenness may have considerable influence within a network by virtue of their control over information passing between nodes. Removing them will most disrupt communication between nodes.
In the Madrid dataset, the weight of a tie is the strength of the connection between two people – a high weight means the two people are closely connected. However, when you calculate betweenness using edge_betweenness(), the weights
argument works as a distance between two nodes – a high weight means the two people are considered further apart. To reconcile this, we pass the reciprocal of the edge weights to the weights
argument of edge_betweenness()
, thus giving them the same meaning.
The network g
and the data frame ties
are at your disposal.
# Calculate the reciprocal of the tie weights
dist_weight <- 1 / E(g)$weight
ties_with_betweenness <- ties %>%
# Add an edge betweenness column weighted by dist_weight
mutate(betweenness = edge_betweenness(g, weights = dist_weight))
# Review updated ties
ties_with_betweenness
## # A tibble: 243 x 4
## from to weight betweenness
## <dbl> <dbl> <dbl> <dbl>
## 1 1 2 1 47.4
## 2 1 3 3 16
## 3 1 4 1 27.1
## 4 1 5 1 27.1
## 5 1 6 1 47.7
## 6 1 7 4 268
## 7 1 8 1 33.6
## 8 1 9 1 42.4
## 9 1 11 4 185
## 10 1 12 1 26.0
## # ... with 233 more rows
Find ties with high betweenness
In the tidy approach to network science, a network is represented with a pair of data frames: one for nodes and one for ties. Sometimes it is useful to have the information from both of these in a single data frame. For example, the ties
data frame contains the IDs of the terrorists, but their names are stored in the nodes
data frame.
In this exercise, we will exploit the dplyr
function left_join()
to extract information from both the nodes
and ties
data frames.
The graph g
, the ties
and the nodes
are loaded for you. The ties have been fortified with the edge betweenness
score.
A reminder on joining data with dplyr is below
taken from https://www.youtube.com/watch?v=2W5-WrBEnEA
#fortify tidied dataframes
ties <- ties_with_betweenness
nodes <- nodes_with_centrality
#Step zero
ties
## # A tibble: 243 x 4
## from to weight betweenness
## <dbl> <dbl> <dbl> <dbl>
## 1 1 2 1 47.4
## 2 1 3 3 16
## 3 1 4 1 27.1
## 4 1 5 1 27.1
## 5 1 6 1 47.7
## 6 1 7 4 268
## 7 1 8 1 33.6
## 8 1 9 1 42.4
## 9 1 11 4 185
## 10 1 12 1 26.0
## # ... with 233 more rows
nodes
## # A tibble: 64 x 4
## id name degree strength
## <dbl> <chr> <dbl> <dbl>
## 1 1 Jamal Zougam 29 43
## 2 7 Imad Eddin Barakat 22 35
## 3 3 Mohamed Chaoui 27 34
## 4 11 Amer Azizi 18 27
## 5 17 Galeb Kalaje 16 21
## 6 15 Mohamed Belfatmi 11 19
## 7 38 Said Berrak 17 19
## 8 16 Said Bahaji 11 17
## 9 23 Naima Oulad Akcha 16 16
## 10 18 Abderrahim Zbakh 15 15
## # ... with 54 more rows
#Step one
ties %>%
# Left join to the nodes matching 'from' to 'id'
left_join(nodes, by = c("from" = "id"))
## # A tibble: 243 x 7
## from to weight betweenness name degree strength
## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 1 2 1 47.4 Jamal Zougam 29 43
## 2 1 3 3 16 Jamal Zougam 29 43
## 3 1 4 1 27.1 Jamal Zougam 29 43
## 4 1 5 1 27.1 Jamal Zougam 29 43
## 5 1 6 1 47.7 Jamal Zougam 29 43
## 6 1 7 4 268 Jamal Zougam 29 43
## 7 1 8 1 33.6 Jamal Zougam 29 43
## 8 1 9 1 42.4 Jamal Zougam 29 43
## 9 1 11 4 185 Jamal Zougam 29 43
## 10 1 12 1 26.0 Jamal Zougam 29 43
## # ... with 233 more rows
#Steps one and two
ties_joined <- ties %>%
# Left join to the nodes matching 'from' to 'id'
left_join(nodes, by = c("from" = "id")) %>%
# Left join to nodes again, now matching 'to' to 'id'
left_join(nodes, by = c("to" = "id"))
# See the result
ties_joined
## # A tibble: 243 x 10
## from to weight betweenness name.x degree.x strength.x name.y degree.y
## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 1 2 1 47.4 Jamal~ 29 43 Moham~ 2
## 2 1 3 3 16 Jamal~ 29 43 Moham~ 27
## 3 1 4 1 27.1 Jamal~ 29 43 Vinay~ 10
## 4 1 5 1 27.1 Jamal~ 29 43 Sures~ 10
## 5 1 6 1 47.7 Jamal~ 29 43 Moham~ 7
## 6 1 7 4 268 Jamal~ 29 43 Imad ~ 22
## 7 1 8 1 33.6 Jamal~ 29 43 Abdel~ 6
## 8 1 9 1 42.4 Jamal~ 29 43 Abu A~ 4
## 9 1 11 4 185 Jamal~ 29 43 Amer ~ 18
## 10 1 12 1 26.0 Jamal~ 29 43 Abu M~ 10
## # ... with 233 more rows, and 1 more variable: strength.y <dbl>
# Select only relevant variables
ties_selected <- ties_joined %>%
select(from, to, name_from = name.x, name_to = name.y, betweenness)
# See the result
ties_selected
## # A tibble: 243 x 5
## from to name_from name_to betweenness
## <dbl> <dbl> <chr> <chr> <dbl>
## 1 1 2 Jamal Zougam Mohamed Bekkali 47.4
## 2 1 3 Jamal Zougam Mohamed Chaoui 16
## 3 1 4 Jamal Zougam Vinay Kholy 27.1
## 4 1 5 Jamal Zougam Suresh Kumar 27.1
## 5 1 6 Jamal Zougam Mohamed Chedadi 47.7
## 6 1 7 Jamal Zougam Imad Eddin Barakat 268
## 7 1 8 Jamal Zougam Abdelaziz Benyaich 33.6
## 8 1 9 Jamal Zougam Abu Abderrahame 42.4
## 9 1 11 Jamal Zougam Amer Azizi 185
## 10 1 12 Jamal Zougam Abu Musad Alsakaoui 26.0
## # ... with 233 more rows
ties_selected %>%
# Arrange rows by descending betweenness
arrange(desc(betweenness))
## # A tibble: 243 x 5
## from to name_from name_to betweenness
## <dbl> <dbl> <chr> <chr> <dbl>
## 1 37 57 Abdeluahid Berrak Semaan Gaby Eid 346.
## 2 1 37 Jamal Zougam Abdeluahid Berrak 292.
## 3 1 7 Jamal Zougam Imad Eddin Barakat 268
## 4 1 11 Jamal Zougam Amer Azizi 185
## 5 11 55 Amer Azizi Mohamed El Egipcio 164.
## 6 1 23 Jamal Zougam Naima Oulad Akcha 140.
## 7 7 50 Imad Eddin Barakat Taysir Alouny 132.
## 8 1 24 Jamal Zougam Abdelkarim el Mejjati 108.
## 9 20 57 José Emilio Suárez Semaan Gaby Eid 106.
## 10 1 18 Jamal Zougam Abderrahim Zbakh 100
## # ... with 233 more rows
Great, this wasn’t easy! What are the pairs of connected terrorists with high influence?
Visualize node centrality
A useful visualization technique is to make the most important nodes and edges more prominent in the network plot.
In Chapter 1, you saw how to make important edges more eye-catching by mapping the transparency to the weight. In this exercise, you will also make the node size proportional to its centrality (either degree or strength). That is, the central (“important”) nodes in the network appear bigger.
The network g
is already loaded in the workspace.
#update graph with fortified ties and nodes
g <- graph_from_data_frame(ties, directed = FALSE, vertices = nodes)
# Plot with the Kamada-Kawai layout
ggraph(g, layout = "with_kk") +
# Add an edge link geom, mapping alpha to weight
geom_edge_link(aes(alpha = weight)) +
# Add a node point geom, mapping size to degree
geom_node_point(aes(size = degree))
# Update the previous plot, mapping node size to strength
ggraph(g, layout = "with_kk") +
geom_edge_link(aes(alpha = weight)) +
geom_node_point(aes(size = strength))
Visualize tie centrality
In this exercise, you will use the ggraph
package again, but this time you will visualize the network by making tie size proportional to tie betweenness centrality.
Can you visually spot the central ties in the network topology? Recall that high betweenness ties typically act as bridges between different communities of the network.
Next, we will add degree centrality to visualize important nodes.
The network g
is already loaded in the workspace.
ggraph(g, layout = "with_kk") +
# Add an edge link geom, mapping the edge transparency to betweenness
geom_edge_link(aes(alpha = betweenness))
ggraph(g, layout = "with_kk") +
geom_edge_link(aes(alpha = betweenness)) +
# Add a node point geom, mapping size to degree
geom_node_point(aes(size = degree))
Well done! Notice how the prominent chains of edges all flow through the central node, corresponding to Jamal Zougam.
Filter important ties
As networks get larger, the plots can become messy and difficult to understand. One way to deal with this is to filter out some parts that aren’t interesting. For example, in order to concentrate on the most important chains of relationships, you can filter out ties with small betweenness values.
In this exercise, you will filter for ties with a betweenness value larger than the median betweenness. This will remove half of the ties from the visualization, leaving only the important ties.
The network g
is already loaded in the workspace, and a plot of the network with ties weighted by betweenness is shown in the previous exercise.
# Calculate the median betweenness
median_betweenness = median(E(g)$betweenness)
ggraph(g, layout = "with_kk") +
# Filter ties for betweenness greater than the median
geom_edge_link(aes(alpha = betweenness, filter = betweenness > median_betweenness)) +
theme(legend.position="none")
Fantastic filtering! By removing the things you don’t care about, it becomes easier to see the important results.
Mark Granovetter’s theory of strength of weak ties
In its weakness lies its strength
How many weak ties are there?
Recall that a weak tie as a tie with a weight equal to 1 (the minimum weight).
In this exercise, we are going to use the dplyr
function group_by()
to group ties by their weights and the summarize()
function to count them. Hence, we are going to discover how many weak ties there are in the network.
The ties
data frame is loaded in the workspace.
library(hablar)
##
## Attaching package: 'hablar'
## The following object is masked from 'package:dplyr':
##
## na_if
ties <- ties %>% hablar::convert(int(from, to, weight))
tie_counts_by_weight <- ties %>%
# Count the number of rows with each weight
count(weight) %>%
# Add a column of the percentage of rows with each weight
mutate(percentage = 100 * n / nrow(ties))
# See the result
tie_counts_by_weight
## # A tibble: 4 x 3
## weight n percentage
## <int> <int> <dbl>
## 1 1 214 88.1
## 2 2 21 8.64
## 3 3 6 2.47
## 4 4 2 0.823
Awesome! 88% of the network ties are weak, quite an impressive share!
Visualize the network highlighting weak ties
In this exercise, we use the ggraph
package to visualize weak and strong ties in different colors. It is useful to have an immediate visual perception of the importance of weak ties in a network.
The ties
data frame and the network g
are already loaded in the workspace for your convenience.
# Make is_weak TRUE whenever the tie is weak
is_weak <- E(g)$weight == 1
# Check that the number of weak ties is the same as before
sum(is_weak)
## [1] 214
ggraph(g, layout = "with_kk") +
# Add an edge link geom, mapping color to is_weak
geom_edge_link(aes(color = is_weak))
Indeed, weak ties are the large majority!
Visualize the sub-network of weak ties
In this exercise, we will use ggraph
again to visualize the sub-network containing only the weak ties. We will use the aesthetic filter
to filter the ties.
The network g
and the Boolean vector is_weak
are already loaded in the workspace for your convenience.
ggraph(g, layout = "with_kk") +
# Map filter to is_weak
geom_edge_link(aes(filter = is_weak), alpha = 0.5)
Well done! Now it’s time to move on!
More on betweenness
Typically, only the shortest paths are considered in the definition of betweenness. However, there are a couple issues with this approach:
In many applications, however, it is reasonable to consider both the quantity and the length of all paths of the network, since communication on the network is enhanced as soon as more routes are possible, particularly if these pathways are short.
Visualizing connection patterns
We use a raster plot to visualize the ties between nodes in a network. The idea is to draw a point in the plot at position (x, y) if there is a tie that connects the nodes x and y. We use different colors for the points to distinguish the connection weights.
The resulting visualization is useful to detect similar connection patterns. If two rows (or columns) of the plot are similar, then the two corresponding nodes have similar tie patterns to other nodes in the network.
The ties
data frame is already loaded in the workspace.
ties_swapped <- ties %>%
# Swap the variables from and to
mutate(temp = to, to = from, from = temp) %>%
select(-temp)
# Bind ties and ties_swapped by row
ties_bound <- bind_rows(ties, ties_swapped)
# Using ties_bound, plot to vs. from, filled by weight
ggplot(ties_bound, aes(x = from, y = to, fill = factor(weight))) +
# Add a raster geom
geom_raster() +
# Label the color scale as "weight"
labs(fill = "weight")
Did you spot nodes with similar connection patterns?
The adjacency matrix (part 1)
Two nodes are adjacent when they are directly connected by a tie. An adjacency matrix contains the details about which nodes are adjacent for a whole network.
For example, if the second node is adjacent to the third node, the entries in row 2, column 3 will be 1. In an undirected network (like the Madrid network), row 3, column 2 will also be 1. If the second node is not connected to the fourth node, the entries in row 2, column 4 (and row 4, column 2) will be 0.
In a weighted adjency matrix, the entries for adjacent nodes have a weight score rather than always being 1.
Most entries in the matrix are zero, so as_adjacency_matrix()
creates a sparse matrix. For ease of reading, zeroes are printed as .
.
# Get the weighted adjacency matrix
A <- as_adjacency_matrix(g, attr = "weight", names = FALSE)
# See the results
A
## 64 x 64 sparse Matrix of class "dgCMatrix"
##
## [1,] . 4 3 4 2 2 2 2 1 1 2 1 1 1 1 1 2 1 1 . 1 1 1 1 1 . 1 1 1 . . . . . . . .
## [2,] 4 . 3 3 3 2 1 2 . . 1 . 1 1 1 . 1 . 2 . . . 1 1 2 . . . 1 1 . . . . . . .
## [3,] 3 3 . 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 . 1 1 1 1 1 . 1 . 1 . . . . . . . .
## [4,] 4 3 2 . 2 2 1 1 . . 1 . 2 1 1 . 1 . 1 . . . 1 1 1 . . . . . . . . . . . .
## [5,] 2 3 2 2 . 1 1 1 . . 1 . 1 1 . . 1 . 1 . . . 1 1 1 . . . . . . . . . . . .
## [6,] 2 2 1 2 1 . . 3 . . 3 . . 2 . . . . 1 . . . 1 . . . . . . . . . . . . . .
## [7,] 2 1 2 1 1 . . . 1 1 . 1 1 . 1 1 1 1 . . 1 1 . 1 1 . . . . . . . . . . . .
## [8,] 2 2 1 1 1 3 . . . . 2 . . 2 . . . . 1 . . . 1 . . . . . . . . . . . . . .
## [9,] 1 . 1 . . . 1 . . 1 . 1 . . 1 1 . 1 . . 1 1 . . . . . . . . 1 . . . . 1 .
## [10,] 1 . 1 . . . 1 . 1 . . 1 . . 1 1 . 1 . . 1 1 . . . . . 1 . . . . . . . . .
## [11,] 2 1 1 1 1 3 . 2 . . . . . 1 . . . . 1 . . . 1 . . . . . . . . . . . . . .
## [12,] 1 . 1 . . . 1 . 1 1 . . . . 1 1 . 1 . 1 1 1 . . . . . . . . . . . . . . 1
## [13,] 1 1 1 2 1 . 1 . . . . . . . . . 1 1 . . . . . 1 . . . . . . . . . . . . .
## [14,] 1 1 1 1 1 2 . 2 . . 1 . . . . . . . 1 . . . 1 . . . . . . . . . . . . . .
## [15,] 1 1 1 1 . . 1 . 1 1 . 1 . . . 1 . 1 . . 1 1 . . . . . . . . . . . . . . .
## [16,] 1 . 1 . . . 1 . 1 1 . 1 . . 1 . . 1 . . 1 1 . . . . . . . . . . . . . . 1
## [17,] 2 1 1 1 1 . 1 . . . . . 1 . . . . . . 1 . . . 1 1 . . 1 . . . . . . . . .
## [18,] 1 . 1 . . . 1 . 1 1 . 1 1 . 1 1 . . . . 1 1 . . . . . . . . . . . . . . .
## [19,] 1 2 1 1 1 1 . 1 . . 1 . . 1 . . . . . . . . 1 . . . . . . . . . . . . . .
## [20,] . . . . . . . . . . . 1 . . . . 1 . . . . . . . . 1 . . . . . 1 1 1 1 1 1
## [21,] 1 . 1 . . . 1 . 1 1 . 1 . . 1 1 . 1 . . . 1 . . . . . . . . . . . . . . .
## [22,] 1 . 1 . . . 1 . 1 1 . 1 . . 1 1 . 1 . . 1 . . . . . . . . . . . . . . . .
## [23,] 1 1 1 1 1 1 . 1 . . 1 . . 1 . . . . 1 . . . . . . . . . . . . . . . . . .
## [24,] 1 1 1 1 1 . 1 . . . . . 1 . . . 1 . . . . . . . 1 . . 1 . . . . . . . . .
## [25,] 1 2 1 1 1 . 1 . . . . . . . . . 1 . . . . . . 1 . . . . . . . . . . . . .
## [26,] . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . 1 1 1 1 . .
## [27,] 1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 . . . . . . .
## [28,] 1 . . . . . . . . 1 . . . . . . 1 . . . . . . 1 . . . . . . 1 . . . . . .
## [29,] 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . .
## [30,] . 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . .
## [31,] . . . . . . . . 1 . . . . . . . . . . . . . . . . . . 1 . . . . . . . 1 .
## [32,] . . . . . . . . . . . . . . . . . . . 1 . . . . . 1 . . . . . . 1 1 1 . .
## [33,] . . . . . . . . . . . . . . . . . . . 1 . . . . . 1 . . . . . 1 . 1 1 . .
## [34,] . . . . . . . . . . . . . . . . . . . 1 . . . . . 1 . . . . . 1 1 . 1 . .
## [35,] . . . . . . . . . . . . . . . . . . . 1 . . . . . 1 . . . . . 1 1 1 . . .
## [36,] . . . . . . . . 1 . . . . . . . . . . 1 . . . . . . . . . . 1 . . . . . .
## [37,] . . . . . . . . . . . 1 . . . 1 . . . 1 . . . . . . . . . . . . . . . . .
## [38,] . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . 1 1 1 1 . .
## [39,] . . . . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . 1 . . . . 1 .
## [40,] . . . . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . 1 . . . . 1 .
## [41,] . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . 2 . . . . . . .
## [42,] 1 . 1 . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . . . . . . .
## [43,] . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . 1 . . . . . . .
## [44,] . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . 1 . . . . . . .
## [45,] 1 1 . 1 . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . .
## [46,] . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . . . . . . . .
## [47,] . . . . . . . . 1 . . . . . . . . . . . . . . . . 1 . . . . 1 . . . . . .
## [48,] . . . . . . . . . . . 1 . . . 1 . . . . . . . . . . . . . . . . . . . . 1
## [49,] 1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [50,] . . . . . . . . . . . . 1 . . . . . . . . . . . . . . 1 . . . . . . . . .
## [51,] . . . . 1 . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . .
## [52,] . 1 . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [53,] . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . .
## [54,] . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . .
## [55,] . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . .
## [56,] . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . 1
## [57,] . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [58,] . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [59,] . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
## [60,] . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [61,] . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [62,] . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . .
## [63,] . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [64,] . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . .
##
## [1,] . . . . 1 . . 1 . . . 1 . . . . . . . . . . . . . . .
## [2,] . . . . . . . 1 1 . . . . . 1 . . . . . . . . . . 1 .
## [3,] . . . . 1 . . . . . . 1 . . . . . . . . . . . . . . .
## [4,] . . . . . . . 1 . . . . . . . . . . . . . . 1 . . . .
## [5,] . . . . . . . . . . . . . 1 . . . . . . . . . . . . .
## [6,] . . . . . . . . . . . . . . 1 . . . . . . . . . . . .
## [7,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [8,] . . . . . . . . . . . . . . . 1 . . . . . . . . . . .
## [9,] . 1 1 . . . . . . 1 . . . . . . . . . 1 . . . . . . .
## [10,] . 1 1 . . . . . . . . . . . . . . . . . 1 . . 1 . . .
## [11,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [12,] . . . . . . . . . . 1 . . . . . . . . . . . . . . . 1
## [13,] . . . . . . . 1 . . . . 1 . . . 1 1 . . . . . . . . .
## [14,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [15,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [16,] . . . . . . . . . . 1 . . . . . . . . . . . . . . . .
## [17,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [18,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [19,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [20,] . . . . . . . . . . . . . . . . . . 1 . . 1 . . . . .
## [21,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [22,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [23,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [24,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [25,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [26,] 1 . . . . . . . . 1 . . . . . . . . . . . . . . 1 . .
## [27,] . . . 1 1 1 1 . . . . . . . . . . . . . . . . . . . .
## [28,] . . . . . . . . . . . . 1 1 . . . . . . . . . . . . .
## [29,] . . . . 1 . . . 2 . . . . . . . . . . . . . . . . . .
## [30,] . . . 2 . 1 1 . . . . . . . . 1 . . . . . . . . . . .
## [31,] . 1 1 . . . . . . 1 . . . . . . . . . . . . . . . . .
## [32,] 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
## [33,] 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
## [34,] 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
## [35,] 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
## [36,] . 1 1 . . . . . . . . . . . . . . . . . . . . . . . .
## [37,] . . . . . . . . . . 1 . . . . . . . 1 . . . . . . . .
## [38,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [39,] . . 1 . . . . . . . . . . . . . . . . . . . . . . . .
## [40,] . 1 . . . . . . . . . . . . . . . . . . . . . . . . .
## [41,] . . . . . 1 1 . . . . . . . . . . . . . . . . . . . .
## [42,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [43,] . . . 1 . . 1 . . . . . . . . . . . . . . . . . . . .
## [44,] . . . 1 . 1 . . . . . . . . . . . . . . . . . . . . .
## [45,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [46,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [47,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [48,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [49,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [50,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [51,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [52,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [53,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [54,] . . . . . . . . . . . . . . . . . 1 . . . . . . . . .
## [55,] . . . . . . . . . . . . . . . . 1 . . . . . . . . . .
## [56,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [57,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [58,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [59,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [60,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [61,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [62,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [63,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [64,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
Good job! The network is undirected, so it must be symmetric. This is because if node x is adjacent to node y, then node y is adjacent to node x.
The adjacency matrix (part 2)
The adjacency matrix encodes the structure of the network, that is nodes and ties. It can be manipulated with matrix algebra operations to obtain useful insights about the network, including centrality measures.
In this exercise, we use the adjacency matrix to compute, once again, the node degrees and node strengths. The weighted adjacency matrix A is loaded in the workspace.
library(Matrix)
# Calculate node strengths as row sums of adjacency
rowSums(A)
## [1] 43 35 34 27 21 19 19 17 16 15 14 14 14 12 12 12 12 11 11 11 10 10 10 10 9
## [26] 8 8 7 7 7 6 6 6 6 6 5 5 5 5 5 5 4 4 4 4 3 3 3 2 2
## [51] 2 2 2 2 2 2 1 1 1 1 1 1 1 1
# Create a logical adjacency matrix
B <- A > 0
B
## 64 x 64 sparse Matrix of class "lgCMatrix"
##
## [1,] . | | | | | | | | | | | | | | | | | | . | | | | | . | | | . . . . . . . .
## [2,] | . | | | | | | . . | . | | | . | . | . . . | | | . . . | | . . . . . . .
## [3,] | | . | | | | | | | | | | | | | | | | . | | | | | . | . | . . . . . . . .
## [4,] | | | . | | | | . . | . | | | . | . | . . . | | | . . . . . . . . . . . .
## [5,] | | | | . | | | . . | . | | . . | . | . . . | | | . . . . . . . . . . . .
## [6,] | | | | | . . | . . | . . | . . . . | . . . | . . . . . . . . . . . . . .
## [7,] | | | | | . . . | | . | | . | | | | . . | | . | | . . . . . . . . . . . .
## [8,] | | | | | | . . . . | . . | . . . . | . . . | . . . . . . . . . . . . . .
## [9,] | . | . . . | . . | . | . . | | . | . . | | . . . . . . . . | . . . . | .
## [10,] | . | . . . | . | . . | . . | | . | . . | | . . . . . | . . . . . . . . .
## [11,] | | | | | | . | . . . . . | . . . . | . . . | . . . . . . . . . . . . . .
## [12,] | . | . . . | . | | . . . . | | . | . | | | . . . . . . . . . . . . . . |
## [13,] | | | | | . | . . . . . . . . . | | . . . . . | . . . . . . . . . . . . .
## [14,] | | | | | | . | . . | . . . . . . . | . . . | . . . . . . . . . . . . . .
## [15,] | | | | . . | . | | . | . . . | . | . . | | . . . . . . . . . . . . . . .
## [16,] | . | . . . | . | | . | . . | . . | . . | | . . . . . . . . . . . . . . |
## [17,] | | | | | . | . . . . . | . . . . . . | . . . | | . . | . . . . . . . . .
## [18,] | . | . . . | . | | . | | . | | . . . . | | . . . . . . . . . . . . . . .
## [19,] | | | | | | . | . . | . . | . . . . . . . . | . . . . . . . . . . . . . .
## [20,] . . . . . . . . . . . | . . . . | . . . . . . . . | . . . . . | | | | | |
## [21,] | . | . . . | . | | . | . . | | . | . . . | . . . . . . . . . . . . . . .
## [22,] | . | . . . | . | | . | . . | | . | . . | . . . . . . . . . . . . . . . .
## [23,] | | | | | | . | . . | . . | . . . . | . . . . . . . . . . . . . . . . . .
## [24,] | | | | | . | . . . . . | . . . | . . . . . . . | . . | . . . . . . . . .
## [25,] | | | | | . | . . . . . . . . . | . . . . . . | . . . . . . . . . . . . .
## [26,] . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . . | | | | . .
## [27,] | . | . . . . . . . . . . . . . . . . . . . . . . . . . | | . . . . . . .
## [28,] | . . . . . . . . | . . . . . . | . . . . . . | . . . . . . | . . . . . .
## [29,] | | | . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . . .
## [30,] . | . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . . .
## [31,] . . . . . . . . | . . . . . . . . . . . . . . . . . . | . . . . . . . | .
## [32,] . . . . . . . . . . . . . . . . . . . | . . . . . | . . . . . . | | | . .
## [33,] . . . . . . . . . . . . . . . . . . . | . . . . . | . . . . . | . | | . .
## [34,] . . . . . . . . . . . . . . . . . . . | . . . . . | . . . . . | | . | . .
## [35,] . . . . . . . . . . . . . . . . . . . | . . . . . | . . . . . | | | . . .
## [36,] . . . . . . . . | . . . . . . . . . . | . . . . . . . . . . | . . . . . .
## [37,] . . . . . . . . . . . | . . . | . . . | . . . . . . . . . . . . . . . . .
## [38,] . . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . | | | | . .
## [39,] . . . . . . . . | | . . . . . . . . . . . . . . . . . . . . | . . . . | .
## [40,] . . . . . . . . | | . . . . . . . . . . . . . . . . . . . . | . . . . | .
## [41,] . . . . . . . . . . . . . . . . . . . . . . . . . . | . . | . . . . . . .
## [42,] | . | . . . . . . . . . . . . . . . . . . . . . . . | . | . . . . . . . .
## [43,] . . . . . . . . . . . . . . . . . . . . . . . . . . | . . | . . . . . . .
## [44,] . . . . . . . . . . . . . . . . . . . . . . . . . . | . . | . . . . . . .
## [45,] | | . | . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . .
## [46,] . | . . . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . .
## [47,] . . . . . . . . | . . . . . . . . . . . . . . . . | . . . . | . . . . . .
## [48,] . . . . . . . . . . . | . . . | . . . . . . . . . . . . . . . . . . . . |
## [49,] | . | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [50,] . . . . . . . . . . . . | . . . . . . . . . . . . . . | . . . . . . . . .
## [51,] . . . . | . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . .
## [52,] . | . . . | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [53,] . . . . . . . | . . . . . . . . . . . . . . . . . . . . . | . . . . . . .
## [54,] . . . . . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . .
## [55,] . . . . . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . .
## [56,] . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . . . . . . . |
## [57,] . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [58,] . . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [59,] . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . . . . . . . .
## [60,] . . . | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [61,] . . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [62,] . . . . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . .
## [63,] . | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [64,] . . . . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . . .
##
## [1,] . . . . | . . | . . . | . . . . . . . . . . . . . . .
## [2,] . . . . . . . | | . . . . . | . . . . . . . . . . | .
## [3,] . . . . | . . . . . . | . . . . . . . . . . . . . . .
## [4,] . . . . . . . | . . . . . . . . . . . . . . | . . . .
## [5,] . . . . . . . . . . . . . | . . . . . . . . . . . . .
## [6,] . . . . . . . . . . . . . . | . . . . . . . . . . . .
## [7,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [8,] . . . . . . . . . . . . . . . | . . . . . . . . . . .
## [9,] . | | . . . . . . | . . . . . . . . . | . . . . . . .
## [10,] . | | . . . . . . . . . . . . . . . . . | . . | . . .
## [11,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [12,] . . . . . . . . . . | . . . . . . . . . . . . . . . |
## [13,] . . . . . . . | . . . . | . . . | | . . . . . . . . .
## [14,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [15,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [16,] . . . . . . . . . . | . . . . . . . . . . . . . . . .
## [17,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [18,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [19,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [20,] . . . . . . . . . . . . . . . . . . | . . | . . . . .
## [21,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [22,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [23,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [24,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [25,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [26,] | . . . . . . . . | . . . . . . . . . . . . . . | . .
## [27,] . . . | | | | . . . . . . . . . . . . . . . . . . . .
## [28,] . . . . . . . . . . . . | | . . . . . . . . . . . . .
## [29,] . . . . | . . . | . . . . . . . . . . . . . . . . . .
## [30,] . . . | . | | . . . . . . . . | . . . . . . . . . . .
## [31,] . | | . . . . . . | . . . . . . . . . . . . . . . . .
## [32,] | . . . . . . . . . . . . . . . . . . . . . . . . . .
## [33,] | . . . . . . . . . . . . . . . . . . . . . . . . . .
## [34,] | . . . . . . . . . . . . . . . . . . . . . . . . . .
## [35,] | . . . . . . . . . . . . . . . . . . . . . . . . . .
## [36,] . | | . . . . . . . . . . . . . . . . . . . . . . . .
## [37,] . . . . . . . . . . | . . . . . . . | . . . . . . . .
## [38,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [39,] . . | . . . . . . . . . . . . . . . . . . . . . . . .
## [40,] . | . . . . . . . . . . . . . . . . . . . . . . . . .
## [41,] . . . . . | | . . . . . . . . . . . . . . . . . . . .
## [42,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [43,] . . . | . . | . . . . . . . . . . . . . . . . . . . .
## [44,] . . . | . | . . . . . . . . . . . . . . . . . . . . .
## [45,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [46,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [47,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [48,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [49,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [50,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [51,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [52,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [53,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [54,] . . . . . . . . . . . . . . . . . | . . . . . . . . .
## [55,] . . . . . . . . . . . . . . . . | . . . . . . . . . .
## [56,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [57,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [58,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [59,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [60,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [61,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [62,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [63,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
## [64,] . . . . . . . . . . . . . . . . . . . . . . . . . . .
# Calculate node degrees as row sums of logical adjacency
rowSums(B)
## [1] 29 22 27 18 16 11 17 11 16 15 10 14 13 10 12 12 11 11 10 11 10 10 10 10 8
## [26] 8 8 7 6 6 6 6 6 6 6 5 5 5 5 5 4 4 4 4 4 2 3 3 2 2
## [51] 2 2 2 2 2 2 1 1 1 1 1 1 1 1
You can learn much more from the adjacency matrix of a network!
Computing Pearson similarity
Recall that a correlation matrix measures the similarity of its entries. The correlation coefficient runs from -1, or maximum dissimilarity, to 1, maximum similarity, and values close to 0 indicate no correlation.
You can also use correlation matrices to find similarities between the nodes in the network.
The general idea is to associate each node with its column in the adjacency matrix. The similarity of two nodes is then measured as the correlation coefficient between the node columns.
Here we will use the Pearson correlation coefficient, which is the most common method of calculation.
For convenience, the adjacency matrix, A
, has been created as a non-sparse matrix.
# Get the weighted adjacency matrix as non-sparse
A <- as_adjacency_matrix(g, attr = "weight", names = FALSE, sparse = FALSE)
# Compute the Pearson correlation matrix of A
S <- cor(A)
# Set the diagonal of S to 0
diag(S) <- 0
# Flatten S to be a vector
flat_S <- as.vector(S)
# Plot a histogram of similarities
hist(flat_S, xlab = "Similarity", main = "Histogram of similarity")
All right! There exists a large number of similarities slightly below zero.
There are more negative than positive similarities (~ 61%)
Explore correlation between degree and strength
To review Pearson correlation, we correlate centrality measures degree and strength that we computed in the first chapter. Recall that the Pearson correlation coefficient runs from -1 (a perfect negative correlation) to 1 (a perfect positive correlation). Values close to 0 indicate no correlation.
Moreover, we use the ggplot2
package to draw a scatterplot among degree and strength variables adding a linear regression line.
The data frame nodes
, which contains the nodes of the network is at your disposal.
# Using nodes, plot strength vs.degree
ggplot(nodes, aes(x = degree, y = strength)) +
# Add a point geom
geom_point() +
# Add a smooth geom with linear regression method
geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
# Calculate the Pearson correlation coefficient
cor(nodes$degree, nodes$strength)
## [1] 0.9708946
Indeed there is a strong positive relationship between degree and strength. Good to know!
Transforming the similarity matrix
For programming with similarity matrices—especially to leverage tidyverse packages like dplyr
and ggplot2
— you can convert them to a data frame with one entry per row.
There are many ways to do this, but the situation is complicated by the fact that for large networks, it is better to store the adjacency matrix as a sparse matrix to save memory, and different tools are needed.
Here we take the approach of converting them to be graphs using graph_from_adjacency_matrix()
. Next we convert this to a data.frame
using igraph
’s as_data_frame()
, and finally convert that to a tidyverse tibble using as_tibble()
. You need to be slightly careful here since dplyr
also has a function named as_data_frame()
, which is an alias for as_tibble()
. This is fairly convoluted code, but it works.
The similarity matrix S
is in the workspace.
# Convert weighted similarity matrix to a graph
h <- graph_from_adjacency_matrix(S, mode = "undirected", weighted = TRUE)
# See the results
plot(h)
# Convert h to a data.frame
sim_df <- igraph::as_data_frame(h, what = "edges")
# See the result
head(sim_df)
## from to weight
## 1 1 2 0.5181534
## 2 1 3 0.7139825
## 3 1 4 0.5144899
## 4 1 5 0.7286891
## 5 1 6 0.5601511
## 6 1 7 0.5201766
# Notice that this is a base-R data.frame
class(sim_df)
## [1] "data.frame"
# Convert sim_df to a tibble
sim_tib <- as_tibble(sim_df)
# See the results
sim_tib
## # A tibble: 2,014 x 3
## from to weight
## <dbl> <dbl> <dbl>
## 1 1 2 0.518
## 2 1 3 0.714
## 3 1 4 0.514
## 4 1 5 0.729
## 5 1 6 0.560
## 6 1 7 0.520
## 7 1 8 0.508
## 8 1 9 0.0482
## 9 1 10 0.115
## 10 1 11 0.484
## # ... with 2,004 more rows
Bravo! In the following exercises we will use the similarity data frame!
Join similarity and nodes data frames
The similarity data frame sim
contains pairs of nodes and their similarities. The terrorist data frame nodes that we built in the previous lessons contains, for each terrorist, the name, degree, and strength.
Here we make use of dplyr
to join these two data frames. The resulting data frame will contain named pairs of terrorists with their similarity score and the centrality measures, degree and strength.
The similarity data frame sim
is loaded in the workspace for your convenience.
sim <- sim_tib %>% rename(similarity = weight)
sim_joined <- sim %>%
# Left join to nodes matching "from" to "id"
left_join(nodes, by = c("from" = "id")) %>%
# Left join to nodes matching "to" to "id", setting suffixes
left_join(nodes, by = c("to" = "id"), suffix = c("_from", "_to"))
# See the results
sim_joined
## # A tibble: 2,014 x 9
## from to similarity name_from degree_from strength_from name_to degree_to
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 1 2 0.518 Jamal Zo~ 29 43 Mohame~ 2
## 2 1 3 0.714 Jamal Zo~ 29 43 Mohame~ 27
## 3 1 4 0.514 Jamal Zo~ 29 43 Vinay ~ 10
## 4 1 5 0.729 Jamal Zo~ 29 43 Suresh~ 10
## 5 1 6 0.560 Jamal Zo~ 29 43 Mohame~ 7
## 6 1 7 0.520 Jamal Zo~ 29 43 Imad E~ 22
## 7 1 8 0.508 Jamal Zo~ 29 43 Abdela~ 6
## 8 1 9 0.0482 Jamal Zo~ 29 43 Abu Ab~ 4
## 9 1 10 0.115 Jamal Zo~ 29 43 Omar D~ 2
## 10 1 11 0.484 Jamal Zo~ 29 43 Amer A~ 18
## # ... with 2,004 more rows, and 1 more variable: strength_to <dbl>
Bravo! We are ready to reveal the most similar pairs of terrorists!
Find most similar and dissimilar pairs In this exercise, we use the similarity data frame sim_joined
we built in the previous exercise, to discover the most similar and least similar pairs of terrorists.
We will also find the most similar and dissimilar pairs of terrorists in the pairs of central terrorists (those with a degree larger than the threshold).
sim_joined %>%
# Arrange by descending similarity
arrange(desc(similarity))
## # A tibble: 2,014 x 9
## from to similarity name_from degree_from strength_from name_to degree_to
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 58 61 1 Emilio L~ 6 6 El Git~ 6
## 2 11 14 0.906 Amer Azi~ 18 27 Ramzi ~ 10
## 3 21 22 0.881 Khalid O~ 5 5 Rafa Z~ 3
## 4 19 23 0.855 Farid Ou~ 6 6 Naima ~ 16
## 5 14 23 0.847 Ramzi Bi~ 10 14 Naima ~ 16
## 6 6 23 0.836 Mohamed ~ 7 7 Naima ~ 16
## 7 18 21 0.831 Abderrah~ 15 15 Khalid~ 5
## 8 18 22 0.831 Abderrah~ 15 15 Rafa Z~ 3
## 9 8 23 0.826 Abdelazi~ 6 7 Naima ~ 16
## 10 8 19 0.821 Abdelazi~ 6 7 Farid ~ 6
## # ... with 2,004 more rows, and 1 more variable: strength_to <dbl>
sim_joined %>%
# Filter for degree from & degree to greater than or equal to 10
filter(degree_from >= 10 & degree_to >= 10) %>%
arrange(desc(similarity))
## # A tibble: 276 x 9
## from to similarity name_from degree_from strength_from name_to degree_to
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 11 14 0.906 Amer Azi~ 18 27 Ramzi ~ 10
## 2 14 23 0.847 Ramzi Bi~ 10 14 Naima ~ 16
## 3 11 23 0.813 Amer Azi~ 18 27 Naima ~ 16
## 4 12 16 0.811 Abu Musa~ 10 10 Said B~ 11
## 5 4 5 0.763 Vinay Kh~ 10 10 Suresh~ 10
## 6 15 18 0.736 Mohamed ~ 11 19 Abderr~ 15
## 7 16 18 0.736 Said Bah~ 11 17 Abderr~ 15
## 8 1 5 0.729 Jamal Zo~ 29 43 Suresh~ 10
## 9 7 15 0.725 Imad Edd~ 22 35 Mohame~ 11
## 10 5 23 0.722 Suresh K~ 10 10 Naima ~ 16
## # ... with 266 more rows, and 1 more variable: strength_to <dbl>
# Repeat the previous steps, but arrange by ascending similarity
sim_joined %>%
# Filter for degree from & degree to greater than or equal to 10
filter(degree_from >= 10 & degree_to >= 10) %>%
arrange(similarity)
## # A tibble: 276 x 9
## from to similarity name_from degree_from strength_from name_to degree_to
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 3 26 -0.276 Mohamed ~ 27 34 Basel ~ 11
## 2 1 26 -0.271 Jamal Zo~ 29 43 Basel ~ 11
## 3 7 26 -0.215 Imad Edd~ 22 35 Basel ~ 11
## 4 3 38 -0.212 Mohamed ~ 27 34 Said B~ 17
## 5 1 38 -0.209 Jamal Zo~ 29 43 Said B~ 17
## 6 4 26 -0.198 Vinay Kh~ 10 10 Basel ~ 11
## 7 5 26 -0.194 Suresh K~ 10 10 Basel ~ 11
## 8 13 26 -0.184 Mohamed ~ 10 12 Basel ~ 11
## 9 15 26 -0.182 Mohamed ~ 11 19 Basel ~ 11
## 10 16 26 -0.182 Said Bah~ 11 17 Basel ~ 11
## # ... with 266 more rows, and 1 more variable: strength_to <dbl>
Visualize similarity
The whole Madrid network can be difficult to reason about. One useful way to make it more comprehensible is to think about clusters of similar people. By filtering the similarity matrix, then converting it to a network, you can see how many group the whole network contains.
In the Madrid network, clusters of similar nodes correspond to terrorist cells. Can you spot them? We will investigate similarity between clusters deeper in the next chapter.
The similarity data frame sim_joined
is loaded in the workspace for your convenience.
sim_filtered <- sim_joined %>%
# Filter on similarity greater than 0.6
filter(similarity > 0.6)
# Convert to an undirected graph
filtered_network <- graph_from_data_frame(sim_filtered, directed = FALSE)
# Plot with Kamada-Kawai layout
ggraph(filtered_network, layout = "with_kk") +
# Add an edge link geom, mapping transparency to similarity
geom_edge_link(aes(alpha = similarity))
Well done! I can see three main clusters of similar terrorists.
The similarity measure
The clustering algorithm
Cluster the similarity network
In this exercise, we will explore hierarchical clustering to find groups (clusters) of similar terrorists.
The basic idea behind hierarchical clustering is to define a measure of similarity between groups of nodes and then incrementally merge together the most similar groups of nodes until all nodes belongs to a unique cluster. The result of this process is called a dendrogram.
We will use Pearson similarity to determine similarity between nodes and extend it to find similarity between groups using the average-linkage strategy. The Pearson similarity matrix S
is already loaded in the workspace.
# compute a distance matrix
D <- 1-S
# obtain a distance object
d <- as.dist(D)
# run average-linkage clustering method and plot the dendrogram
cc <- hclust(d, method = "average")
plot(cc)
# find the similarity of the first pair of nodes that have been merged
S[58, 61]
## [1] 1
Cut the dendrogram
In hierarchical clustering, each merge of groups of nodes happens sequentially (1, 2, 3, …) until a unique group containing all nodes is formed.
A dendrogram is a tree structure where every node of the tree corresponds to a particular merging of two node groups in the clustering process. Hence, a dendrogram contains merging information of the entire clustering process.
Here, we freeze the state in which the nodes are grouped into 4 clusters and add the cluster information to the nodes
dataset for future analysis. The dendrogram variable cc
and the data frame nodes
are loaded in the workspace.
# Cut the dendrogram tree into 4 clusters
cls <- cutree(cc, k = 4)
# Add cluster information to nodes
nodes_with_clusters <- nodes %>%
mutate(cluster = cls)
# See the result
nodes_with_clusters
## # A tibble: 64 x 5
## id name degree strength cluster
## <dbl> <chr> <dbl> <dbl> <int>
## 1 1 Jamal Zougam 29 43 1
## 2 7 Imad Eddin Barakat 22 35 1
## 3 3 Mohamed Chaoui 27 34 1
## 4 11 Amer Azizi 18 27 1
## 5 17 Galeb Kalaje 16 21 1
## 6 15 Mohamed Belfatmi 11 19 1
## 7 38 Said Berrak 17 19 2
## 8 16 Said Bahaji 11 17 1
## 9 23 Naima Oulad Akcha 16 16 2
## 10 18 Abderrahim Zbakh 15 15 2
## # ... with 54 more rows
Analyze clusters
We are finally ready to work on the clusters using the dplyr
package. In particular, we will show how to select nodes in a given cluster and how to compute aggregate statistics on the node clusters.
The nodes
dataset is ready in the workspace.
nodes <- nodes_with_clusters
# Who is in cluster 1?
nodes %>%
# Filter rows for cluster 1
filter(cluster == 1) %>%
# Select the name column
select(name)
## # A tibble: 28 x 1
## name
## <chr>
## 1 Jamal Zougam
## 2 Imad Eddin Barakat
## 3 Mohamed Chaoui
## 4 Amer Azizi
## 5 Galeb Kalaje
## 6 Mohamed Belfatmi
## 7 Said Bahaji
## 8 Ramzi Binalshibh
## 9 Mohamed El Egipcio
## 10 Mohamed Atta
## # ... with 18 more rows
# Calculate properties of each cluster
nodes %>%
# Group by cluster
group_by(cluster) %>%
# Calculate summary statistics
summarize(
# Number of nodes
size = n(),
# Mean degree
avg_degree = mean(degree),
# Mean strength
avg_strength = mean(strength)
) %>%
# Arrange rows by decreasing size
arrange(desc(size))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 4
## cluster size avg_degree avg_strength
## <int> <int> <dbl> <dbl>
## 1 1 28 9.07 11.7
## 2 2 21 7.62 7.71
## 3 3 10 5.2 5.2
## 4 4 5 4 4.4
Notice that the clusters with higher importance (degree and strength) correspond to larger terrorists cells
Visualize the clusters
Here we will use ggraph
to visualize the original network using colored clusters and facet the visualization into four sub-networks, one for each terrorist cell or cluster.
The variable g
that contains the network and the nodes
data frame are loaded in the workspace.
# Add cluster information to the network's nodes
V(g)$cluster <- nodes$cluster
# Plot the graph
ggraph(g, layout = "with_kk") +
# Add an edge link geom with alpha mapped to weight
geom_edge_link(aes(alpha = weight), show.legend=FALSE) +
# Add a node point geom, colored by cluster as a factor
geom_node_point(aes(color = factor(cluster))) +
labs(color = "cluster")
# Update the plot
ggraph(g, layout = "with_kk") +
geom_edge_link(aes(alpha = weight), show.legend=FALSE) +
geom_node_point(aes(color = factor(cluster))) +
labs(color = "cluster") +
# Facet the nodes by cluster, with a free scale
facet_nodes(~ cluster, scales="free")
Wow, the four clusters neatly partition the terrorists of the network!
Basic visualization
In this final lesson, we will explore the visNetwork
package to produce fulfilling interactive network visualizations.
With this package, it is possible to visualize networks, in particular igraph
networks, and interact with them, by clicking, moving, zooming and much more.
In this first exercise, we will use basic steps to visualize and explore our terrorism network g
, which is loaded in the workspace.
Make sure to enjoy the live networks by interacting with them: click on a node, move a node, move the entire network, zoom in and out!
library(visNetwork)
# Convert from igraph to visNetwork
data <- toVisNetworkData(g)
# Print the head of the data nodes
head(data$nodes)
## id degree strength cluster
## Jamal Zougam Jamal Zougam 29 43 1
## Imad Eddin Barakat Imad Eddin Barakat 22 35 1
## Mohamed Chaoui Mohamed Chaoui 27 34 1
## Amer Azizi Amer Azizi 18 27 1
## Galeb Kalaje Galeb Kalaje 16 21 1
## Mohamed Belfatmi Mohamed Belfatmi 11 19 1
## label
## Jamal Zougam Jamal Zougam
## Imad Eddin Barakat Imad Eddin Barakat
## Mohamed Chaoui Mohamed Chaoui
## Amer Azizi Amer Azizi
## Galeb Kalaje Galeb Kalaje
## Mohamed Belfatmi Mohamed Belfatmi
# ... do the same for the edges (ties)
head(data$edges)
## from to weight betweenness
## 1 Jamal Zougam Mohamed Bekkali 1 47.41667
## 2 Jamal Zougam Mohamed Chaoui 3 16.00000
## 3 Jamal Zougam Vinay Kholy 1 27.08333
## 4 Jamal Zougam Suresh Kumar 1 27.08333
## 5 Jamal Zougam Mohamed Chedadi 1 47.66190
## 6 Jamal Zougam Imad Eddin Barakat 4 268.00000
# Visualize the network
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470)
Did you like the interaction?
Change the layout
It is possible to change the layout of the visualization using the visNetwork()
and visIgraphLayout()
function calls. The igraph
package contains several functions that provide algorithms to lay out the nodes. You can pass the function name as a string to the layout
argument of visIgraphLayout()
to use it.
The data
variable containing the visNetwork
is loaded in the workspace.
# Add to the plot
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470) %>%
# Set the layout to Kamada-Kawai
visIgraphLayout(layout = "layout_with_kk")
# See a list of possible layouts
ls("package:igraph", pattern = "^layout_.")
## [1] "layout_as_bipartite" "layout_as_star" "layout_as_tree"
## [4] "layout_components" "layout_in_circle" "layout_nicely"
## [7] "layout_on_grid" "layout_on_sphere" "layout_randomly"
## [10] "layout_with_dh" "layout_with_drl" "layout_with_fr"
## [13] "layout_with_gem" "layout_with_graphopt" "layout_with_kk"
## [16] "layout_with_lgl" "layout_with_mds" "layout_with_sugiyama"
# Update the plot
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470) %>%
# Change the layout to be in a circle
visIgraphLayout(layout = "layout_in_circle")
# Update the plot
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470) %>%
# Change the layout to be on a grid
visIgraphLayout(layout = "layout_on_grid")
Did you try to deconstruct the circle or the grid?
Highlight nearest nodes and ties
We can also add extra interaction features to our network. Here, we will highlight the nearest nodes and ties when a node is selected.
An interesting thing about visNetwork
is the use of pipes (%>%
), like in dplyr
queries, to add extra layers to the visualization.
The data
variable containing the visNetwork
is loaded in the workspace.
# Add to the plot
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470) %>%
# Choose an operator
visIgraphLayout(layout = "layout_with_kk") %>%
# Change the options to highlight the nearest nodes and ties
visOptions(highlightNearest = TRUE)
One more step!
Select nodes and groups of nodes
Finally, we will select nodes by their names and by the groups they belong to.
The group variable in the nodes
data frame we used in the visNetwork
representation contains information about which group a node belongs to and is used to select nodes by group. The function toVisNetworkData()
converts an igraph
network to a visNetwork
and reads group information from the color attribute of the igraph network.
The data
variable containing the visNetwork
and the network g
are loaded in the workspace.
# Update the plot
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470) %>%
visIgraphLayout(layout = "layout_with_kk") %>%
# Change the options to allow selection of nodes by ID
visOptions(nodesIdSelection = TRUE)
# Copy cluster node attribute to color node attribute
V(g)$color <- V(g)$cluster
# Convert g to vis network data
data <- toVisNetworkData(g)
# Update the plot
visNetwork(nodes = data$nodes, edges = data$edges, width = 780, height = 470) %>%
visIgraphLayout(layout = "layout_with_kk") %>%
# Change options to select by group
visOptions(selectedBy = "group", highlightNearest = TRUE)
Bravo! Explore your network in various ways by clicking on the nodes and using the dropdown menu.
Deeper inside network science
You now know how to:
For more information: