library(tidyverse)
library(marginaleffects)
library(GGally)
tjet = read.csv("data/tjet.csv")Publishing and Quarter Review
Week 9
Before we start
No Discussion Section and Office Hours Next Week (but available by demand)
Any questions?
Agenda
Quick Review
Example of Quarto and Overleaf for Publishing
Quarto for Publishing
Quarto and Overleaf are both excellent tools for publishing. For this class, itэs better to stick with Quarto – the scope of your projects makes it a suitable choice. However, if your papers become longer or you plan to collaborate with others, Overleaf is the better option.
Here is how to get started.
Overleaf. Click
New project->Example projectto get started.
Helpful tips:
Google Scholar has BibTeX citations!
First of all, load the usual libraries and the Transnational Justice dataset from the previous week.
Let’s explain under what circumstances countries adopt reparations policies. Could the involvement of international organizations influence a state’s decision to implement such policies?
rep_binarypresence of reparations policyICC_investigationpresence of ICC investigationuninvongoing UN investigations, counttrials_domesticcount of domestic trials
First, let’s see if the probability of reparations (rep_binary) is increasing when ICC intervenes (ICC_investigation). Set up a glm() logit model.
model_icc = ...Present summary of the model.
summary(model_icc)Now, let’s draw the graph using plot_predictions() from marginaleffects library. What is worrisome?
...(model_icc,
... = c("ICC_investigation"))Now, let’s add a couple of variables. Add uninv and trials_domestic.
model_full = glm(rep_binary ~ ICC_investigation + ..., family = binomial, data = tjet) Present plot_predictions() for ICC_investigation only.
...Finally, present the modelsummary() table for both models. Compare.
modelsummary(list("ICC Model" = model_icc,
"Full Model" = model_full),
stars = c("*" = 0.05),
gof_omit = "AIC|BIC|Log.Lik|F|RMSE" ,
statistic = "p.value")Don’t forget to clean the environment.
Exploring Data
Today we are working with World Happiness Report data, the one we had on the week 5.
Country_nameis the name of the countryLadder_scoreis the happiness scoreContinent
Load the data.
whr = read.csv("data/WHR.csv")Explore the following variables with the ggpairs() function from GGally. What can you see on the graph?
library(GGally)
ggpairs(whr,
columns = c("Ladder_score",
"Logged_GDP_per_capita",
"Continent"))Let’s zoom in the Happiness Score by Continent. Is Africa different from Europe in terms of the Happiness?
ggplot(whr) +
geom_boxplot(aes(x = Ladder_score, y = Continent)) +
labs(x = "Happiness") +
theme_bw()Zoom in even more. Use filter() to leave only the continents we are analyzing. And leave only the variables we want to work with: Country_name, Ladder_score, Continent and Logged_GDP_per_capita.
ae_whr = whr %>%
filter(Continent %in% c("Africa", "Europe")) %>%
select(Country_name, Ladder_score, Continent, Logged_GDP_per_capita)Plot again. Make color more transparent with alpha = 0.5.
ggplot(ae_whr) +
geom_boxplot(aes(x = Ladder_score, y = Continent, fill = Continent), alpha = 0.5) +
labs(x = "Happiness",
y = NULL) +
theme_light()Formal Tests
Finally, let’s test it. Interpret the results.
t.test(ae_whr$Ladder_score ~ ae_whr$Continent)
Welch Two Sample t-test
data: ae_whr$Ladder_score by ae_whr$Continent
t = -12.395, df = 73.907, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Africa and group Europe is not equal to 0
95 percent confidence interval:
-2.277125 -1.646372
sample estimates:
mean in group Africa mean in group Europe
4.447886 6.409634
We can confirm it with the simple model. Estimate the Happiness Score depending on the continent, and plot the confidence intervals for the estimate. Is it different from the boxplots?
lm(Ladder_score ~ Continent,
ae_whr) %>%
plot_predictions(by = "Continent") +
coord_flip()Now, let’s plug in the Wealth, measured as Log GDP per capita in the model. Explore difference.
model = lm(Ladder_score ~ Logged_GDP_per_capita + Continent, ae_whr)
plot_predictions(model,
condition = c("Logged_GDP_per_capita", "Continent"))Ignoring unknown labels:
• linetype : "Continent"
Present summary. Do Continent variable matter?
summary(model)
Call:
lm(formula = Ladder_score ~ Logged_GDP_per_capita + Continent,
data = ae_whr)
Residuals:
Min 1Q Median 3Q Max
-1.90103 -0.34965 0.02907 0.32646 1.23085
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.84400 0.76697 -1.100 0.2748
Logged_GDP_per_capita 0.64181 0.09235 6.950 1.29e-09 ***
ContinentEurope 0.54416 0.23916 2.275 0.0258 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5426 on 73 degrees of freedom
Multiple R-squared: 0.8017, Adjusted R-squared: 0.7963
F-statistic: 147.6 on 2 and 73 DF, p-value: < 2.2e-16
Let’s add one more variable, and run the model on the whole dataset. What does it tell us?
lm(Ladder_score ~ Social_support + Logged_GDP_per_capita + Continent, whr) %>%
summary()
Call:
lm(formula = Ladder_score ~ Social_support + Logged_GDP_per_capita +
Continent, data = whr)
Residuals:
Min 1Q Median 3Q Max
-1.82847 -0.31606 0.02814 0.35224 1.14812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.15661 0.49274 -2.347 0.02051 *
Social_support 4.34406 0.59174 7.341 2.52e-11 ***
Logged_GDP_per_capita 0.32703 0.06767 4.833 3.93e-06 ***
ContinentAsia -0.04027 0.15293 -0.263 0.79275
ContinentEurope 0.25751 0.18574 1.386 0.16812
ContinentNorth America 0.58400 0.21045 2.775 0.00638 **
ContinentOceania 0.65633 0.42383 1.549 0.12405
ContinentSouth America 0.28415 0.21400 1.328 0.18670
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5394 on 123 degrees of freedom
(6 observations deleted due to missingness)
Multiple R-squared: 0.7867, Adjusted R-squared: 0.7746
F-statistic: 64.81 on 7 and 123 DF, p-value: < 2.2e-16
What’s next?
Congrats with finalizing the class! No optional exercises this time.
Check the main page for additional resources
Learning to code is like learning a new language: if you’re interested, keep practicing!
Linear models next!
What we have covered in coding
| Library | Functions | Description |
|---|---|---|
| tidyverse | filter(), mutate(), ggplot() |
data wrangling and visualization |
| modelsummary | modelsummary() |
present good looking model summary tables |
| broom | tidy() |
extract additional information from the models |
| marginaleffects | plot_predictions() |
calculate and visualize marginal effects |
| GGally | ggpairs() |
extension to ggplot |
| tinytable | tt() |
present good looking tables |
| patchwork | /, + |
plot several graphs together |
Datasets we have used
| Dataset | Description | Link |
|---|---|---|
| V-Dem | Measures democracy worldwide | V-Dem |
| World Happiness Report | Annual happiness report | World Happiness Report |
| Who Governs | Dataset on political elites | Who Governs |
| SIPRI | Data on military operations | SIPRI |
| Comparative Political Data Set | A dataset covering political institutions | CPDS |
| World Values Survey | Survey measuring values across the globe | WVS |
| Transitional Justice Evaluation Tools Dataset | Dataset assessing transitional justice | TJET Dataset |