Or at least it is hard to screw up
There are four basic assumptions:
* Violates basic assumption of inferential statistics
Wasserman, S., Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge university press.
Joseph Moreno & Helen Hall Jennings (1932)
Quantitative Sociology/Anthropology
Seminal Articles
Seminal Articles
Two main camps
Two main camps
Statistical -> hypothesis testing
Two main camps
Statistical -> hypothesis testing
Graph theoretic -> network models and simulation
Two main camps
Statistical -> hypothesis testing
Graph theoretic -> network models and simulation
They often don't agree.
There is often distain.
They have different language, journals, conferences
## 5 x 5 sparse Matrix of class "dgCMatrix"## ## [1,] . 1 1 1 1## [2,] 1 . . . .## [3,] 1 . . . .## [4,] 1 . . . 1## [5,] 1 . . 1 .
## [[1]]## + 4/5 edges from 184c722:## [1] 1--2 1--3 1--4 1--5## ## [[2]]## + 1/5 edge from 184c722:## [1] 1--2## ## [[3]]## + 1/5 edge from 184c722:## [1] 1--3## ## [[4]]## + 2/5 edges from 184c722:## [1] 1--4 4--5## ## [[5]]## + 2/5 edges from 184c722:## [1] 1--5 4--5
Lets do this!
You'll only need to do this once.
To install tidyverse package...
install.packages("tidyverse")
Repeat this with the following packages:
igraph tidygraph here ggraph
You'll need to do this every time you restart R.
To load tidyverse package...
library(tidyverse) #tools for cleaning data library(igraph) #package for doing network analysislibrary(tidygraph) #tools for doing tidy networkslibrary(here) #tools for project-based workflowlibrary(ggraph) #plotting tools for networks
You'll need to do this every time you restart R.
To load tidyverse package...
library(tidyverse) #tools for cleaning data library(igraph) #package for doing network analysislibrary(tidygraph) #tools for doing tidy networkslibrary(here) #tools for project-based workflowlibrary(ggraph) #plotting tools for networks
Once you have done this, you will want to put include a code chunk with all of your libraries into your markdown document so that you don't have to type this every time.
I've sent you a csv file that includes the data for workshop 1, I hope you saved this in your folder titled "data".
If you have loaded the package "here" this should just work. If you have not loaded the "here" package you will need to set the working directory.
Again, you will want to include this as a code chunk in your RMD file.
#This loads the csv and saves it as a dataframe titled WorkshopDataWorkshopData <- read_csv(here("data", "AnonSurveyData.csv"))
glimpse(WorkshopData)
## Rows: 34## Columns: 18## $ ID <dbl> 5106, 6633, 7599, 4425, 2495, 6355, 8810, 387…## $ StartDate <dttm> 2020-07-30 12:15:20, 2020-07-30 12:18:50, 20…## $ EndDate <dttm> 2020-07-30 12:18:59, 2020-07-30 12:20:21, 20…## $ Status <chr> "IP Address", "IP Address", "IP Address", "IP…## $ Progress <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, …## $ `Duration (in seconds)` <dbl> 219, 91, 137, 97, 127, 233, 97, 144, 103, 231…## $ Finished <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…## $ RecordedDate <dttm> 2020-07-30 12:18:59, 2020-07-30 12:20:22, 20…## $ SurveyID <chr> "R_ssTGHZwy5EpQaNX", "R_2fv9VCk0tjdrDOr", "R_…## $ ExternalReference <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…## $ DistributionChannel <chr> "email", "email", "email", "email", "email", …## $ UserLanguage <chr> "EN", "EN", "EN", "EN", "EN", "EN", "EN", "EN…## $ Q2 <chr> "morning person.", "morning person.", "night …## $ Q3 <chr> "Brownies", "I don't care for dessert.", "Ice…## $ Q4 <chr> "Coffee,Water,The tears of my enemies", "Coff…## $ Q5 <dbl> 350, 12, 300, 0, 264, 4, 289, 550, 349, 300, …## $ Q6 <dbl> 350, 56, 5000, 50, 286, 250, 76, 185, 108, 22…## $ `as.character(Q5)` <dbl> 350, 12, 300, 0, 264, 4, 289, 550, 349, 300, …
There is a ton of data there that doesn't make sense for us to keep around.
We will use the '%>%' (pipe) operator and the verb select
WorkshopData %>% select(ID,Q2:Q6) -> WorkshopDataglimpse(WorkshopData)
## Rows: 34## Columns: 6## $ ID <dbl> 5106, 6633, 7599, 4425, 2495, 6355, 8810, 3877, 1554, 7743, 8353, …## $ Q2 <chr> "morning person.", "morning person.", "night owl.", "morning perso…## $ Q3 <chr> "Brownies", "I don't care for dessert.", "Ice Cream", "I don't car…## $ Q4 <chr> "Coffee,Water,The tears of my enemies", "Coffee,Tea,Water,Milk", "…## $ Q5 <dbl> 350, 12, 300, 0, 264, 4, 289, 550, 349, 300, 424, 426, 231, 290, 1…## $ Q6 <dbl> 350, 56, 5000, 50, 286, 250, 76, 185, 108, 220, 500, 350, 412, 113…
Note the data are not numbers.
WorkshopData %>% head()
## # A tibble: 6 x 6## ID Q2 Q3 Q4 Q5 Q6## <dbl> <chr> <chr> <chr> <dbl> <dbl>## 1 5106 morning pers… Brownies Coffee,Water,The tears of… 350 350## 2 6633 morning pers… I don't care for d… Coffee,Tea,Water,Milk 12 56## 3 7599 night owl. Ice Cream Fruit Juice,Tea,Water,Fiz… 300 5000## 4 4425 morning pers… I don't care for d… Fruit Juice,Coffee,Water 0 50## 5 2495 morning pers… Brownies Coffee,Water 264 286## 6 6355 night owl. Ice Cream Fruit Juice,Tea,Water 4 250
I am going to blast through these next slides, to show you some of the things that you might want to do with R
Here is code to do this for the question about morning or night person.
WorkshopData %>% select(Q2) %>% group_by(Q2) %>% tally()
## # A tibble: 2 x 2## Q2 n## <chr> <int>## 1 morning person. 19## 2 night owl. 15
Here is code to do this for the favorite dessert type.
WorkshopData %>% select(Q3) %>% group_by(Q3) %>% tally()
## # A tibble: 5 x 2## Q3 n## <chr> <int>## 1 Brown Butter Chocolate Chip Cookies 2## 2 Brownies 9## 3 Cheese 3## 4 I don't care for dessert. 3## 5 Ice Cream 17
WorkshopData %>% select(Q3) %>% ggplot(aes(y = Q3)) + geom_bar()
WorkshopData %>% select(ID, Q4) %>% head()
## # A tibble: 6 x 2## ID Q4 ## <dbl> <chr> ## 1 5106 Coffee,Water,The tears of my enemies## 2 6633 Coffee,Tea,Water,Milk ## 3 7599 Fruit Juice,Tea,Water,Fizzy Water ## 4 4425 Fruit Juice,Coffee,Water ## 5 2495 Coffee,Water ## 6 6355 Fruit Juice,Tea,Water
Notice, these are not tidy data, more than one variable per line
WorkshopData %>% select(ID, Q4) %>% separate_rows(Q4, sep = ",") %>% head(10)
## # A tibble: 10 x 2## ID Q4 ## <dbl> <chr> ## 1 5106 Coffee ## 2 5106 Water ## 3 5106 The tears of my enemies## 4 6633 Coffee ## 5 6633 Tea ## 6 6633 Water ## 7 6633 Milk ## 8 7599 Fruit Juice ## 9 7599 Tea ## 10 7599 Water
WorkshopData %>% select(ID, Q4) %>% separate_rows(Q4, sep = ",") %>% mutate(Checked = 1) %>% pivot_wider(names_from = Q4, values_from = Checked, values_fill = 0)
## # A tibble: 34 x 11## ID Coffee Water `The tears of m… Tea Milk `Fruit Juice` `Fizzy Water`## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 5106 1 1 1 0 0 0 0## 2 6633 1 1 0 1 1 0 0## 3 7599 0 1 0 1 0 1 1## 4 4425 1 1 0 0 0 1 0## 5 2495 1 1 0 0 0 0 0## 6 6355 0 1 0 1 0 1 0## 7 8810 1 0 1 0 0 0 1## 8 3877 0 1 1 1 1 0 1## 9 1554 0 1 1 0 0 1 0## 10 7743 0 1 0 1 1 1 1## # … with 24 more rows, and 3 more variables: `A delicious 12 year single malt## # scotch from the Scottish lowlands with notes of apple` <dbl>, `## # cinnamon` <dbl>, ` and dried fruit served with a single ice cube` <dbl>
WorkshopData %>% select(Q5) %>% summarize(Ave = mean(Q5, na.rm = TRUE), SD = sd(Q5, na.rm = TRUE))
## # A tibble: 1 x 2## Ave SD## <dbl> <dbl>## 1 327. 247.
WorkshopData %>% select(Q2, Q5) %>% group_by(Q2) %>% summarize(Ave = mean(Q5), SD = sd(Q5))
## # A tibble: 2 x 3## Q2 Ave SD## <chr> <dbl> <dbl>## 1 morning person. 321 215.## 2 night owl. 335. 289.
WorkshopData %>% select(Q2, Q5) %>% ggplot(., aes(x = Q2, y = Q5)) + geom_boxplot()
WorkshopData %>% select(Q5:Q6) %>% mutate(Q5 = as.numeric(Q5), Q6 = as.numeric(Q6)) %>% ggplot(aes(x = Q5, y = Q6)) + geom_point()
summary(lm(Q6 ~ Q5, data = WorkshopData))
## ## Call:## lm(formula = Q6 ~ Q5, data = WorkshopData)## ## Residuals:## Min 1Q Median 3Q Max ## -365.6 -313.8 -179.1 -98.8 4563.5 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 382.9414 249.7136 1.534 0.135## Q5 0.1785 0.6126 0.291 0.773## ## Residual standard error: 867.6 on 32 degrees of freedom## Multiple R-squared: 0.002647, Adjusted R-squared: -0.02852 ## F-statistic: 0.08492 on 1 and 32 DF, p-value: 0.7726
There is an issue here.
I wanted to make these data anonymous (so we don't know who likes scotch)
But to do that I had to make the edgelist for you.
So I sent you an edgelist as a csv. Sorry.
EL = read_csv(here("data", "AnonEL.csv"))head(EL)
## # A tibble: 6 x 2## ID Connections## <dbl> <dbl>## 1 5106 5106## 2 6633 6196## 3 7599 5462## 4 4425 7743## 5 2495 3940## 6 6355 6355
Before we can convert our Edgelist to a network, we should add in the attributes.
We have several candidate attributes:
We will develop a separate dataframe for the attributes.
WorkshopData %>% select(ID, Q2, Q3, Q5) -> AttributeDf
Experience tells me that when you try to add attributes that you often make a mistake where the number attributes don't match up well to the number of nodes...but lets see.
gr <- graph_from_data_frame(EL, directed = TRUE)plot(gr)
gr = as_tbl_graph(gr)
So actually the easiest way to add the attributes is to add them while you make the graph.
But that isn't as easy as it seems
gr %>% activate(nodes) %>% mutate(AMPM = AttributeDf$Q2)
But that isn't as easy as it seems...
The warning was: "Input AMPM
must be size 60 or 1, not 34."
What this means is we need to take our attributes dataframe and make sure all the nodes are listed.
To do this we need to:
#This will get a vector of all nodesgr %>% activate(nodes) %>% as_tibble() %>% transmute(ID = name) %>% mutate(ID = as.numeric(ID))-> GrNodes#Now we pull in the attributes using a left_joinNodeAttributes = left_join(GrNodes, AttributeDf, by = "ID")
You should inspect Node Attributes
head(NodeAttributes)
## # A tibble: 6 x 4## ID Q2 Q3 Q5## <dbl> <chr> <chr> <dbl>## 1 5106 morning person. Brownies 350## 2 6633 morning person. I don't care for dessert. 12## 3 7599 night owl. Ice Cream 300## 4 4425 morning person. I don't care for dessert. 0## 5 2495 morning person. Brownies 264## 6 6355 night owl. Ice Cream 4
tail(NodeAttributes)
## # A tibble: 6 x 4## ID Q2 Q3 Q5## <dbl> <chr> <chr> <dbl>## 1 7128 <NA> <NA> NA## 2 1050 <NA> <NA> NA## 3 3799 <NA> <NA> NA## 4 1651 <NA> <NA> NA## 5 8984 <NA> <NA> NA## 6 1958 <NA> <NA> NA
gr %>% as_tbl_graph() %>% activate(nodes) %>% mutate(AMPM = NodeAttributes$Q2) %>% mutate(Dessert = NodeAttributes$Q3) %>% mutate(Pages = NodeAttributes$Q5) -> grsummary(gr)
## IGRAPH 1723efe DN-- 60 101 -- ## + attr: name (v/c), AMPM (v/c), Dessert (v/c), Pages (v/n)
## # A tbl_graph: 60 nodes and 101 edges## ### # A directed multigraph with 8 components## ### # Node Data: 60 x 4 (active)## name AMPM Dessert Pages## <chr> <chr> <chr> <dbl>## 1 5106 morning person. Brownies 350## 2 6633 morning person. I don't care for dessert. 12## 3 7599 night owl. Ice Cream 300## 4 4425 morning person. I don't care for dessert. 0## 5 2495 morning person. Brownies 264## 6 6355 night owl. Ice Cream 4## # … with 54 more rows## ### # Edge Data: 101 x 2## from to## <int> <int>## 1 1 1## 2 2 35## 3 3 14## # … with 98 more rows
```rggraph(gr) + geom_edge_link() + geom_node_point()```
Not super pretty
```rggraph(gr, layout = 'circle') + geom_edge_link() + geom_node_point()```
```rggraph(gr, layout = 'circle') + geom_edge_link() + geom_node_point(aes(shape = AMPM))```
```rggraph(gr, layout = 'circle') + geom_edge_link() + geom_node_point(aes(color = Dessert))```
```rggraph(gr, layout = 'circle') + geom_edge_link() + geom_node_point(aes(size = Pages))```
```rggraph(gr, layout = 'circle') + geom_edge_link() + geom_node_point(aes(shape = AMPM, color = Dessert, size = Pages))```
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |