Quarter Review

PS312 Statistical Research Methods

Review of the Quarter

Substantive

  • Asking Statistical Questions
  • Causality
  • Confounders

Technical

  • Data Merging

  • Statistical Tests

  • Regressions

  • Diagnostics

Review of the Quarter

Substantive

  • Asking Statistical Questions
  • Causality
  • Confounders

Technical

  • Data Merging

  • Statistical Tests

  • Regressions

  • Diagnostics

Good Statistical Question

  • Specific

  • Isn’t purely normative

  • Can be made measurable

  • More than just a few relevant cases exist

Causal Relationship

Source: Heiss, Andrew. Program Evaluation. https://evalsp25.classes.andrewheiss.com/content/06-content.html

Causal Relationship

Source: Heiss, Andrew. Program Evaluation. https://evalsp25.classes.andrewheiss.com/content/06-content.html

Distributions (I)

set.seed(123)         # set seed for reproducibility

norm_d = rnorm(10000) # generate random observations

ggplot() +
  geom_histogram(aes(x = norm_d)) +
  labs(x = NULL,
       y = NULL) +
  theme_bw()

Distributions (II)

ggplot() +
  geom_boxplot(aes(x = norm_d)) +
  labs(x = NULL,
       y = NULL) +
  theme_bw()

Distribution Comparisons


    Welch Two Sample t-test

data:  norm_x and norm_y
t = -55.188, df = 17450, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.059542 -1.918262
sample estimates:
mean of x mean of y 
0.9928849 2.9817871 
set.seed(123)                               # set the seed for reproducibility

norm_x = rnorm(n = 10000, mean = 1, sd = 3) # generate data with different properties
norm_y = rnorm(n = 10000, mean = 3, sd = 2) # generate data with different properties

histogram = ggplot() +
  geom_histogram(aes(x = norm_x, fill = "Distribution X"), alpha = 0.5) +
  geom_histogram(aes(x = norm_y, fill = "Distribution Y"), alpha = 0.5) +
  geom_vline(xintercept = mean(norm_x), color = "red") +
  geom_vline(xintercept = mean(norm_y), color = "blue") +
  labs(x = NULL,
       y = NULL,
       fill = NULL) +
  theme_bw()

boxplot = ggplot() +
  geom_boxplot(aes(x = norm_x, y = "X", fill = "Distribution X"), alpha = 0.5) +
  geom_boxplot(aes(x = norm_y, y = "Y", fill = "Distribution Y"), alpha = 0.5) +
  labs(x = NULL,
       y = NULL,
       fill = NULL) +
  theme_bw() 

Data Merging

df_x = data.frame(ID = c("1", "2", "3", "4"), 
                X = c(34, 22, 19, 85))

df_y = data.frame(ID = c("1", "2", "4", "4"), 
                Y = c("Blue", "Red", "Green", "Yellow"))
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow

Data Merging

df_x %>% 
  left_join(df_y, by = "ID") 
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow
ID X Y
1 34 Blue
2 22 Red
3 19 NA
4 85 Green
4 85 Yellow

Data Merging

df_x %>% 
  left_join(df_y, by = "ID") 
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow
ID X Y
1 34 Blue
2 22 Red
3 19 NA
4 85 Green
4 85 Yellow

Data Merging

df_x %>% 
  left_join(df_y, by = "ID") 
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow
ID X Y
1 34 Blue
2 22 Red
3 19 NA
4 85 Green
4 85 Yellow

Data Merging

df_x %>% 
  left_join(df_y, by = "ID") 
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow
ID X Y
1 34 Blue
2 22 Red
3 19 NA
4 85 Green
4 85 Yellow

Turning to the Sript

Let’s discuss

Substantive

  • Asking Statistical Questions
  • Causality
  • Confounders

Technical

  • Data Merging

  • Statistical Tests

  • Regressions

  • Diagnostics