R For SNAWorkshop 3Eric Brewe 
 Associate Professor of Physics at Drexel University 
18 August 2020, last update: 2020-08-181 / 25

Welcome Back!

Our project is about building and analyzing the Workshop network

So we will not need a new project folder, we will continue to use RForSNA, which you should have set up in WS1.

This time we will add a new .Rmd file, give it a title like "network_properties.Rmd"

So navigate to your RForSNA file and create a new .Rmd file.

Then, add a code chunk to load the libraries necessary and run this chunk.

library(tidyverse) #tools for cleaning data 
library(igraph)  #package for doing network analysis
library(tidygraph) #tools for doing tidy networks
library(here) #tools for project-based workflow
library(ggraph) #plotting tools for networks
library(boot)  #to do resampling

2 / 25

Let's load our data

This workshop assumes you did the work from week 1 and you know how to install libraries, read csv files into data, and to manipulate data in R.

The benefit of having done the work previously is that we don't have to re-do it (unless we want to change something)

So I took the network at the end of workshop #1 and saved it as a .rds file. I could have saved it as a pair of csv files and rebuilt the network, but if I save as rds it will be ready to go!

To load an rds file the syntax is a little different:

gr <- readRDS(here("data", "WorkshopNetwork.rds"))

3 / 25

Let's plot it quickly

While we are at it we might as well set a layout.

Whare are distinctive features of the network?


```r
GrLayout <- create_layout(gr,
                          layout = "kk")
ggraph(GrLayout) +
  geom_edge_link() +
  geom_node_point() +
  theme(legend.position="bottom")
```

4 / 25

Let's explore whole network metrics

These are fairly easy to calculate

Density: Proportion of possible connections that exist
Diameter: Longest Shortest Path across network
Reciprocity: Extent to which each directed edge is a two way edge
Transitivity: Extent to which three connected nodes form triangles
Average Path Length: Average shortest distance between every pair of nodes

These require a bit more work

Giant: Number of nodes connected to largest component
Homophily: Extent to which nodes with similar attribute are connected
Average degree: This is pretty descriptive.

This is a list of many common graph metrics used in tidygraph https://rdrr.io/cran/tidygraph/man/graph_measures.html

5 / 25

Lets calculate a couple of these...

For many of these it is a very simple one line command.

Here is density

edge_density(gr)

## [1] 0.02853107

Here is diameter, note that with tidygraph you have to use the with_graph() function, and specify the graph and function graph_diameter()

with_graph(gr, graph_diameter())

## [1] 6

6 / 25

Lets calculate a couple of these...

For many of these it is a very simple one line command.

Here is density

edge_density(gr)

## [1] 0.02853107

Here is diameter, note that with tidygraph you have to use the with_graph() function, and specify the graph and function graph_diameter()

with_graph(gr, graph_diameter())

## [1] 6

So?

6 / 25

Let's calculate a couple of othersIt is your turn, choose one of the ones that are easy to calculate and make the calculation.7 / 25

Let's calculate a couple of others

It is your turn, choose one of the ones that are easy to calculate and make the calculation.

with_graph(gr, graph_reciprocity()) #Reciprocity

## [1] 0.2142857

transitivity(gr) #Transitivity

## [1] 0.2669492

with_graph(gr, graph_mean_dist()) #Average Distance

## [1] 1.88125

7 / 25

Let's create a function

Creating a function accomplishes two things:

We can calculate some of the harder to calcluate metrics
We do not have to re-do each time we want to analyze a new network.

GrFunction <- function(gr){
  Giant = max(clusters(gr)$csize)
  AMPMHomophily = with_graph(gr, graph_assortativity(AMPM))
  DessertHomophily = with_graph(gr, graph_assortativity(Dessert))
  AveDeg = gr %>%
    activate(nodes) %>%
    mutate(Deg = centrality_degree( mode = 'total') ) %>%
    select(Deg) %>%
    as_tibble() %>%
    summarise(AveDeg = mean(Deg))
  df = tibble(Giant, AMPMHomophily, DessertHomophily, AveDeg )
  return(df)
}

8 / 25

Let's run this function

GrFunction(gr)

## # A tibble: 1 x 4
##   Giant AMPMHomophily DessertHomophily AveDeg
##   <dbl>         <dbl>            <dbl>  <dbl>
## 1    30        -0.723           0.0468   3.37

What do these numbers mean?

9 / 25

Let's run this function

GrFunction(gr)

## # A tibble: 1 x 4
##   Giant AMPMHomophily DessertHomophily AveDeg
##   <dbl>         <dbl>            <dbl>  <dbl>
## 1    30        -0.723           0.0468   3.37

What do these numbers mean?

Giant = 30, there are 30 people connected in the largest component.

AMPM Homophily, Since this is negative, there is a propensity for people to associate with others, (e.g., A morning person is more likely connected to a night owl)

Dessert Homophily, this is pretty close to zero, so there doesn't seem to be an association.

AveDeg, the average person has 3.37 incoming or outgoing edges

9 / 25

Let's run this function

GrFunction(gr)

## # A tibble: 1 x 4
##   Giant AMPMHomophily DessertHomophily AveDeg
##   <dbl>         <dbl>            <dbl>  <dbl>
## 1    30        -0.723           0.0468   3.37

What do these numbers mean?

Giant = 30, there are 30 people connected in the largest component.

AMPM Homophily, Since this is negative, there is a propensity for people to associate with others, (e.g., A morning person is more likely connected to a night owl)

Dessert Homophily, this is pretty close to zero, so there doesn't seem to be an association.

AveDeg, the average person has 3.37 incoming or outgoing edges

Is this unique?

9 / 25

Let's decide if our network is uniqueWhat is the big problem with the network that we have collected?10 / 25

Let's decide if our network is uniqueWhat is the big problem with the network that we have collected?We have one network.
There is no variance in the metrics.  
What is the null hypothesis/model?
10 / 25

Let's talk about campsThis is a place where there are sort of two camps of network analysisStatistical Camp
Variance: Permutation/resampling techniques such as bootstrap. 
Null hypothesis: Exponential random graph models (ERGMs) Or rewired network

Graph Theoretic Camp
Null Model Theory driven modelsRandom Graph (Erdos-Renyi)
Small World (Watts-Strogatz)
Preferential Attachment (Barabasi-Albert)

Variance Network simulation

11 / 25

Let's try a graph-theoretic model

We'll generate a Erdos-Renyi model and compare our graph characteristics

First, we need to tell the simulation how many nodes and edges to include.

N_Nodes = with_graph(gr, graph_order())
N_Edges = with_graph(gr, graph_size())

12 / 25

Let's try a graph-theoretic model

We'll generate a Erdos-Renyi model and compare our graph characteristics

First, we need to tell the simulation how many nodes and edges to include.

N_Nodes = with_graph(gr, graph_order())
N_Edges = with_graph(gr, graph_size())

Now we can actually generate the graph

set.seed(522)
ER_gr <- play_erdos_renyi(n = N_Nodes, m = N_Edges)

12 / 25

Let's check out our ER Graph

ER_gr %>%
  ggraph(layout = "kk") + 
  geom_edge_link() +
  geom_node_point() +
  theme(legend.position="bottom")

13 / 25

Let's put them side by side

ggraph(GrLayout) +
  geom_edge_link() +
  geom_node_point()

ER_gr %>%
  ggraph(layout = "kk") + 
  geom_edge_link() +
  geom_node_point()

14 / 25

Let's assign attributes

In order to compare homophily, we need our ER_gr to have the same attributes.

I emailed you all an additional file that has a random graph with attributes (because it was a pain to add those in.) So now, we need to add these in...

ER_gr_attr <- readRDS(here("data", "RandomGraph.rds"))

## # A tbl_graph: 58 nodes and 101 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 58 x 3 (active)
##   name  AMPM            Dessert                  
##   <chr> <chr>           <chr>                    
## 1 5106  morning person. Brownies                 
## 2 6633  morning person. I don't care for dessert.
## 3 7599  night owl.      Ice Cream                
## 4 4425  morning person. I don't care for dessert.
## 5 2495  morning person. Brownies                 
## 6 6355  night owl.      Ice Cream                
## # … with 52 more rows
## #
## # Edge Data: 101 x 2
##    from    to
##   <int> <int>
## 1     1    12
## 2     1    25
## 3     1    33
## # … with 98 more rows

15 / 25

Let's use our function to compare

Measured network

GrFunction(gr)

## # A tibble: 1 x 4
##   Giant AMPMHomophily DessertHomophily AveDeg
##   <dbl>         <dbl>            <dbl>  <dbl>
## 1    30        -0.723           0.0468   3.37

Random network

GrFunction(ER_gr_attr)

## # A tibble: 1 x 4
##   Giant AMPMHomophily DessertHomophily AveDeg
##   <dbl>         <dbl>            <dbl>  <dbl>
## 1    58        -0.684         -0.00420   3.48

16 / 25

Let's now do this 100 times!

In order to do this, lets reduce the number of graph metrics we want to calculate.

SmGrFunction <- function(gr){
  Giant = max(clusters(gr)$csize)
  Recip  = with_graph(gr, graph_reciprocity())
  Trans = transitivity(gr) 
  Dist = with_graph(gr, graph_mean_dist()) 
  Dia = with_graph(gr, graph_diameter())
  df = tibble(Giant, Recip, Trans, Dist, Dia )
  return(df)
}

for (i in 1:100)
  {
  test_gr = play_erdos_renyi(n = N_Nodes, m = N_Edges)
  if(i==1) 
    {df <- SmGrFunction(test_gr)} else
    {df <- bind_rows(df, SmGrFunction(test_gr))}
  }

17 / 25

Let's explore our simulated networks

head(df)

## # A tibble: 6 x 5
##   Giant  Recip  Trans  Dist   Dia
##   <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1    56 0.0396 0.0561  5.14    14
## 2    58 0.0198 0.0464  6.43    23
## 3    55 0.0396 0.0636  5.04    12
## 4    60 0.0198 0.135   4.87    10
## 5    59 0.0594 0.0677  5.41    16
## 6    57 0      0.0752  4.69    14

18 / 25

Let's get a summary

And lets average these out.

df %>%
  summarise(across(everything(), mean))

## # A tibble: 1 x 5
##   Giant  Recip  Trans  Dist   Dia
##   <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1  58.0 0.0297 0.0545  5.16  13.7

And standard deviations.

df %>%
  summarise(across(everything(), sd))

## # A tibble: 1 x 5
##   Giant  Recip  Trans  Dist   Dia
##   <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1  1.46 0.0230 0.0239 0.602  2.49

19 / 25

Let's compare with measured

df %>%
  summarise(across(everything(), mean))

## # A tibble: 1 x 5
##   Giant  Recip  Trans  Dist   Dia
##   <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1  58.0 0.0297 0.0545  5.16  13.7

SmGrFunction(gr)

## # A tibble: 1 x 5
##   Giant Recip Trans  Dist   Dia
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1    30 0.214 0.267  1.88     6

20 / 25

Let's get a confidcence interval

If we want to estimate the confidence with which we think our measured network is different on one of these metrics, we can look for the percentiles.

Let's see if the measured average distance falls within the 95% of the simulated network....

quantile(df$Dist, probs = c(0.025, 0.975))

##     2.5%    97.5% 
## 3.889960 6.243622

Since the measured average distance is outside of this range, we can say that the measured network is not a random network!

21 / 25

Let's summarizeWe measured one network
To compare we generated a random network
We replicated 100 times
We checked whether metrics from the measured network falls within some confidence range.
We can say the measured network is different than random. 

22 / 25

Let's reflect

We did it!

We can import data into R
We can manipulate these data
We can create a network
We can plot a network
We can calculate a number of centrality measures
We can use these in plotting networks
We can use these in testing hypotheses
We can calculate a number of whole graph metrics
We can use these in comparing networks

23 / 25

Let's note what we missedWe didn't do community detection
We didn't really touch Base R
We didn't deal with ERGMs.

24 / 25

Let's be thankful

Thanks to Dali Ma

Thanks to SSRC

Thanks to all of you

👏

25 / 25

Welcome Back!

Our project is about building and analyzing the Workshop network

So we will not need a new project folder, we will continue to use RForSNA, which you should have set up in WS1.

This time we will add a new .Rmd file, give it a title like "network_properties.Rmd"

So navigate to your RForSNA file and create a new .Rmd file.

Then, add a code chunk to load the libraries necessary and run this chunk.

library(tidyverse) #tools for cleaning data library(igraph) #package for doing network analysis library(tidygraph) #tools for doing tidy networks library(here) #tools for project-based workflow library(ggraph) #plotting tools for networks library(boot) #to do resampling

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

R For SNA

Workshop 3

Eric Brewe Associate Professor of Physics at Drexel University

18 August 2020, last update: 2020-08-18

Welcome Back!

Our project is about building and analyzing the Workshop network

Let's load our data

Let's plot it quickly

Whare are distinctive features of the network?

Let's explore whole network metrics

Lets calculate a couple of these...

For many of these it is a very simple one line command.

Lets calculate a couple of these...

For many of these it is a very simple one line command.

Let's calculate a couple of others

It is your turn, choose one of the ones that are easy to calculate and make the calculation.

Let's calculate a couple of others

It is your turn, choose one of the ones that are easy to calculate and make the calculation.

Let's create a function

Creating a function accomplishes two things:

Let's run this function

What do these numbers mean?

Let's run this function

What do these numbers mean?

Let's run this function

What do these numbers mean?

Is this unique?

Let's decide if our network is unique

What is the big problem with the network that we have collected?

Let's decide if our network is unique

What is the big problem with the network that we have collected?

Let's talk about camps

This is a place where there are sort of two camps of network analysis

Statistical Camp

Graph Theoretic Camp

Let's try a graph-theoretic model

We'll generate a Erdos-Renyi model and compare our graph characteristics

Let's try a graph-theoretic model

We'll generate a Erdos-Renyi model and compare our graph characteristics

Now we can actually generate the graph

Let's check out our ER Graph

Let's put them side by side

Let's assign attributes

In order to compare homophily, we need our ER_gr to have the same attributes.

Let's use our function to compare

Measured network

Random network

Let's now do this 100 times!

Let's explore our simulated networks

Let's get a summary

Let's compare with measured

Let's get a confidcence interval

If we want to estimate the confidence with which we think our measured network is different on one of these metrics, we can look for the percentiles.

Let's summarize

Let's reflect

Let's note what we missed

Let's be thankful

Welcome Back!

Our project is about building and analyzing the Workshop network

Help

Eric Brewe
Associate Professor of Physics at Drexel University