+ - 0:00:00
Notes for current slide
Notes for next slide

PEER Advanced Field School 2023

Day 3 - Whole Networks

Eric Brewe
Professor of Physics at Drexel University

15 June 2023, last update: 2023-06-22

1 / 17

Welcome Back!

Our project is about building and analyzing the Workshop network

We will continue to use the data file from yesterday, WorkshopNetwork.rds.

library(tidyverse) #tools for cleaning data
library(igraph) #package for doing network analysis
library(tidygraph) #tools for doing tidy networks
library(here) #tools for project-based workflow
library(ggraph) #plotting tools for networks
library(boot) #to do resampling
2 / 17

Let's load our data

gr <- readRDS(here("data", "WorkshopNetwork.rds"))
3 / 17

Let's plot it quickly

While we are at it we might as well set a layout.

Whare are distinctive features of the network?

```r
GrLayout <- create_layout(gr,
layout = "kk")
ggraph(GrLayout) +
geom_edge_link() +
geom_node_point() +
theme(legend.position="bottom")
```

4 / 17

Let's explore whole network metrics

These are fairly easy to calculate

  • Density: Proportion of possible connections that exist
  • Diameter: Longest Shortest Path across network
  • Reciprocity: Extent to which each directed edge is a two way edge
  • Transitivity: Extent to which three connected nodes form triangles
  • Average Path Length: Average shortest distance between every pair of nodes

These require a bit more work

  • Giant: Number of nodes connected to largest component
  • Homophily: Extent to which nodes with similar attribute are connected
  • Average degree: This is pretty descriptive.

This is a list of many common graph metrics used in tidygraph https://rdrr.io/cran/tidygraph/man/graph_measures.html

5 / 17

Lets calculate a couple of these...

For many of these it is a very simple one line command.

Here is density

edge_density(gr)
## [1] 0.02853107

Here is diameter, note that with tidygraph you have to use the with_graph() function, and specify the graph and function graph_diameter()

with_graph(gr, graph_diameter())
## [1] 6
6 / 17

Lets calculate a couple of these...

For many of these it is a very simple one line command.

Here is density

edge_density(gr)
## [1] 0.02853107

Here is diameter, note that with tidygraph you have to use the with_graph() function, and specify the graph and function graph_diameter()

with_graph(gr, graph_diameter())
## [1] 6

So?

6 / 17

Let's calculate a couple of others

It is your turn, choose one of the ones that are easy to calculate and make the calculation.

7 / 17

Let's calculate a couple of others

It is your turn, choose one of the ones that are easy to calculate and make the calculation.

with_graph(gr, graph_reciprocity()) #Reciprocity
## [1] 0.2142857
transitivity(gr) #Transitivity
## [1] 0.2669492
with_graph(gr, graph_mean_dist()) #Average Distance
## [1] 1.88125
7 / 17

Let's create a function

Creating a function accomplishes two things:

  1. We can calculate some of the harder to calcluate metrics
  2. We do not have to re-do each time we want to analyze a new network.
GrFunction <- function(gr){
Giant = max(clusters(gr)$csize)
AveDeg = gr %>%
activate(nodes) %>%
mutate(Deg = centrality_degree( mode = 'total') ) %>%
select(Deg) %>%
as_tibble() %>%
summarise(AveDeg = mean(Deg))
df = tibble(Giant, AveDeg )
return(df)
}
8 / 17

Let's run this function

GrFunction(gr)
## # A tibble: 1 × 2
## Giant AveDeg
## <dbl> <dbl>
## 1 30 3.37

What do these numbers mean?

9 / 17

Let's run this function

GrFunction(gr)
## # A tibble: 1 × 2
## Giant AveDeg
## <dbl> <dbl>
## 1 30 3.37

What do these numbers mean?

Giant = 30, there are 30 people connected in the largest component.

AveDeg, the average person has 3.37 incoming or outgoing edges

9 / 17

Let's talk about clustering

Whole networks sometimes have regions which are different.

Perhaps you want to know, are there groups that have some common characteristic?

So we need a way to look for groups within our network.

Start with random assignment:

```r
#First generate a grouping based on random assignment.
gr %>%
activate(nodes) %>%
mutate(RandGroup = sample(1:2,60, replace = TRUE)) -> gr
```
10 / 17

Let's plot this with the random assignment to groups

```r
GrLayout <- create_layout(gr,
layout = "kk")
ggraph(GrLayout) +
geom_edge_link() +
geom_node_point(aes(color = factor(RandGroup))) +
theme(legend.position="bottom")
```

11 / 17

Let's talk about Modularity

Modularity is a metric that tells us whether there is a way to break up our network into chunks which have more connections within the chunk than from the chunk to other parts of the network.

with_graph(gr, graph_modularity(group = RandGroup)) #Modularity
## [1] 0.01470444

So the modularity of our network with nodes randomly assigned to groups is .12

This means that based on our grouping, there is a very slightly greater proportion of links between members of the group than across groups.

12 / 17

Let's use an actual community detection algorithm

We'll use Walktrap, in which a random surfer can jump node to node based on the presence of edges. Then the walktrap algorithm establishes community structure by finding the propensity to end up in a cluster of nodes.

There are two ways to do this...

#First generate a grouping based on the walktrap algorithm and add it to the graph.
gr %>%
activate(nodes) %>%
mutate(WTGroup = group_walktrap()) -> gr
#The other way is to just call the walktrap algorithm.
wtg <- with_graph(gr, group_walktrap())
wtg
## [1] 9 3 5 1 2 7 2 4 1 1 4 3 1 5 6 3 8 2 10 1 1 4 5 3 2
## [26] 4 4 6 1 4 4 4 1 7 3 2 2 2 2 2 4 1 1 1 1 1 3 3 3 3
## [51] 3 1 1 5 5 6 6 8 2 1
13 / 17

Let's talk about Modularity

Now we can check the modularity with the walktrap defined communities

with_graph(gr, graph_modularity(group = WTGroup)) #Modularity
## [1] 0.7393393

So the modularity of our network with nodes assigned to groups via the walktrap algorithm is .74

This means that based on our grouping, there is a much greater proportion of links between members of the group than across groups.

14 / 17

Let's plot the walktrap communities

This is the hard way to plot these.

```r
GrLayout <- create_layout(gr,
layout = "kk")
ggraph(GrLayout) +
geom_edge_link() +
geom_node_point(aes(color = factor(WTGroup))) +
theme(legend.position="bottom")
```

15 / 17

Let's plot the walktrap communities again

This is the easy way to plot these (though not tidy).

```r
wtg = cluster_walktrap(gr)
modularity(wtg)
plot(wtg, gr)
```
## [1] 0.6267523

16 / 17

Now it is your turn

  • Choose a community detection algorithm,
  • calculate it for the Workshop Network, and
  • be ready to talk about what the salient points of the community detection algorithm you chose.

Here are the community detection algorithms available to you in tidygraph. https://tidygraph.data-imaginist.com/reference/index.html#community-detection

17 / 17

Welcome Back!

Our project is about building and analyzing the Workshop network

We will continue to use the data file from yesterday, WorkshopNetwork.rds.

library(tidyverse) #tools for cleaning data
library(igraph) #package for doing network analysis
library(tidygraph) #tools for doing tidy networks
library(here) #tools for project-based workflow
library(ggraph) #plotting tools for networks
library(boot) #to do resampling
2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow