This should be a bit more reasonably paced.
First of all, we will not need to build the network all over again.
But we should start with some project-oriented workflow.
The blog post by Jenny Bryan linked above describes the basic logic for a project orinted workflow.
Each project gets:
So we will not need a new project folder, we will continue to use RForSNA, which you should have set up in WS1.
This time we will add a new .Rmd file, give it a title like "node_properties.Rmd"
So navigate to your RForSNA file and create a new .Rmd file.
Then, add a code chunk to load the libraries necessary and run this chunk.
library(tidyverse) #tools for cleaning data library(igraph) #package for doing network analysislibrary(tidygraph) #tools for doing tidy networkslibrary(here) #tools for project-based workflowlibrary(ggraph) #plotting tools for networks
This workshop assumes you did the work from week 1 and you know how to install libraries, read csv files into data, and to manipulate data in R.
The benefit of having done the work previously is that we don't have to re-do it (unless we want to change something)
So I took the network at the end of workshop #1 and saved it as a .rds file. I could have saved it as a pair of csv files and rebuilt the network, but if I save as rds it will be ready to go!
To load an rds file the syntax is a little different:
gr <- readRDS(here("data", "WokshopNetwork.rds"))
gr
## # A tbl_graph: 60 nodes and 101 edges## ### # A directed multigraph with 8 components## ### # Node Data: 60 x 4 (active)## name AMPM Dessert Pages## <chr> <chr> <chr> <dbl>## 1 5106 morning person. Brownies 350## 2 6633 morning person. I don't care for dessert. 12## 3 7599 night owl. Ice Cream 300## 4 4425 morning person. I don't care for dessert. 0## 5 2495 morning person. Brownies 264## 6 6355 night owl. Ice Cream 4## # … with 54 more rows## ### # Edge Data: 101 x 2## from to## <int> <int>## 1 1 1## 2 2 35## 3 3 14## # … with 98 more rows
```rgr %>% ggraph(layout = "kk") + geom_edge_link() + geom_node_point(aes(color = Dessert)) + theme(legend.position="bottom")```
Lets calculate degree
gr %>% activate(nodes) %>% mutate(deg = centrality_degree(mode = "total")) %>% arrange(desc(deg)) -> grgr
## # A tbl_graph: 60 nodes and 101 edges## ### # A directed multigraph with 8 components## ### # Node Data: 60 x 5 (active)## name AMPM Dessert Pages deg## <chr> <chr> <chr> <dbl> <dbl>## 1 7743 morning person. Ice Cream 300 14## 2 2921 morning person. Ice Cream 426 12## 3 1386 night owl. Ice Cream 500 11## 4 7040 morning person. I don't care for dessert. 400 9## 5 9929 morning person. Brownies 527 8## 6 5051 night owl. Ice Cream 240 8## # … with 54 more rows## ### # Edge Data: 101 x 2## from to## <int> <int>## 1 28 28## 2 43 32## 3 29 10## # … with 98 more rows
```rgr %>% ggraph(layout = "kk") + geom_edge_link() + geom_node_point(aes(size = deg )) + theme(legend.position="bottom")```
gr %>% activate(nodes) %>% mutate(InDeg = centrality_degree(mode = "in")) %>% arrange(desc(InDeg)) -> grgr
## # A tbl_graph: 60 nodes and 101 edges## ### # A directed multigraph with 8 components## ### # Node Data: 60 x 6 (active)## name AMPM Dessert Pages deg InDeg## <chr> <chr> <chr> <dbl> <dbl> <dbl>## 1 5844 <NA> <NA> NA 8 8## 2 7743 morning person. Ice Cream 300 14 5## 3 1386 night owl. Ice Cream 500 11 4## 4 2169 <NA> <NA> NA 4 4## 5 4755 <NA> <NA> NA 4 4## 6 9929 morning person. Brownies 527 8 3## # … with 54 more rows## ### # Edge Data: 101 x 2## from to## <int> <int>## 1 38 38## 2 51 20## 3 39 15## # … with 98 more rows
```rgr %>% ggraph(layout = "kk") + geom_edge_link() + geom_node_point(aes(size = InDeg )) + theme(legend.position="bottom")```
```rGrLayout = create_layout(gr, layout = "kk")ggraph(GrLayout) + geom_edge_link() + geom_node_point(aes(size = InDeg)) + theme(legend.position="bottom")```
ggraph(GrLayout) + geom_edge_link() + geom_node_point(aes(size = InDeg)) + theme(legend.position="bottom")
ggraph(GrLayout) + geom_edge_link() + geom_node_point(aes(size = deg)) + theme(legend.position="bottom")
We should establish a research question. How about:
To test this we might use:
Either way, we need to choose a centrality metric.
Our winner...centrality_eigen!
Eigenvector centrality is interesting because it allows a node to inherit centrality from neighbors. It works with weighted networks, and but it treats directed networks as undirected.
We want to calculate eigenvector centrality, then add it as an attribute to our graph, and add it to our layout dataframe.
gr %>% activate(nodes) %>% mutate(CentE = centrality_eigen()) %>% arrange(desc(CentE)) -> gr#This makes a dataframe of names and CentEgr %>% activate(nodes) %>% select(name, CentE) %>% as_tibble()-> CentEdf#This adds it to the GrLayout dataframe.GrLayout = left_join(GrLayout, CentEdf, by = "name")
```rggraph(GrLayout) + geom_edge_link() + geom_node_point(aes(size = CentE)) + theme(legend.position="bottom")```
```rgr %>% select(AMPM,CentE) %>% as_tibble() %>% ggplot(., aes(x = CentE)) + geom_histogram()```
"Nodes and interactions are interdependent*"
and
"Nodes and interactions are interdependent*"
and
"* Violates basic assumption of inferential statistics"
"Nodes and interactions are interdependent*"
and
"* Violates basic assumption of inferential statistics"
Well, this is the problem, because network metrics are interdependent, they are invariably not normally distributed. Which means we have to take some additional precautions about how we use them in hypothesis testing. Which means...
"Nodes and interactions are interdependent*"
and
"* Violates basic assumption of inferential statistics"
Well, this is the problem, because network metrics are interdependent, they are invariably not normally distributed. Which means we have to take some additional precautions about how we use them in hypothesis testing. Which means... Bootstrapping!
I said it here: https://ericbrewe.com/slides/rforsna/rforsna_ws1.html#8
Bootstrapping is a resampling method, where you draw samples based on your existing dataset, calculate your measure of interest, store this, and then rinse and repeat.
Bootstrapping is a resampling method, where you draw samples based on your existing dataset, calculate your measure of interest, store this, and then rinse and repeat.
R has a package for that.
Bootstrapping is a resampling method, where you draw samples based on your existing dataset, calculate your measure of interest, store this, and then rinse and repeat.
R has a package for that.
```rinstall.packages("boot")library(boot)```
So we will run a bootstrapped linear model to look at this relationship.
First, get our data and assign it to its own dataframe (you don't have to, but.)
GrLayout %>% select(AMPM,CentE) -> LmData
So we will run a bootstrapped linear model to look at this relationship.
First, get our data and assign it to its own dataframe (you don't have to, but.)
GrLayout %>% select(AMPM,CentE) -> LmData
Next, we need to define the linear model as a function. Why? I'll explain in next slide.
bs <- function(formula, data, indices) { d <- data[indices,] fit <- lm(formula, data = d) return(coef(fit))}
Since we want to run this once, and keep our results around, we should do two things.
set.seed(327)results <- boot(data = LmData, statistic = bs, R = 50, formula = CentE ~ AMPM)
lm(CentE ~ AMPM, data = LmData)
## ## Call:## lm(formula = CentE ~ AMPM, data = LmData)## ## Coefficients:## (Intercept) AMPMnight owl. ## 0.121912 -0.003964
results
## ## ORDINARY NONPARAMETRIC BOOTSTRAP## ## ## Call:## boot(data = LmData, statistic = bs, R = 50, formula = CentE ~ ## AMPM)## ## ## Bootstrap Statistics :## original bias std. error## t1* 0.121911858 0.009475059 0.05578382## t2* -0.003963586 -0.019188955 0.07939136
boot.ci(results, type = "basic")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS## Based on 50 bootstrap replicates## ## CALL : ## boot.ci(boot.out = results, type = "basic")## ## Intervals : ## Level Basic ## 95% (-0.0285, 0.2213 ) ## Calculations and Intervals on Original Scale## Some basic intervals may be unstable
The confidence interval on the coefficients are (-0.0285, 0.2213)
Probabaly not
😞
This should be a bit more reasonably paced.
First of all, we will not need to build the network all over again.
But we should start with some project-oriented workflow.
The blog post by Jenny Bryan linked above describes the basic logic for a project orinted workflow.
Each project gets:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |