Publishing and Quarter Review

Week 9

Published

November 14, 2025

Before we start

  • No Discussion Section and Office Hours Next Week (but available by demand)

  • Any questions?

Agenda

  • Quick Review

  • Example of Quarto and Overleaf for Publishing

Quarto for Publishing

Quarto and Overleaf are both excellent tools for publishing. For this class, itэs better to stick with Quarto – the scope of your projects makes it a suitable choice. However, if your papers become longer or you plan to collaborate with others, Overleaf is the better option.

Here is how to get started.

Helpful tips:

Download script
Review

First of all, load the usual libraries and the Transnational Justice dataset from the previous week.

library(tidyverse)
library(marginaleffects)
library(GGally)

tjet = read.csv("data/tjet.csv")

Let’s explain under what circumstances countries adopt reparations policies. Could the involvement of international organizations influence a state’s decision to implement such policies?

  • rep_binary presence of reparations policy

  • ICC_investigation presence of ICC investigation

  • uninv ongoing UN investigations, count

  • trials_domestic count of domestic trials

First, let’s see if the probability of reparations (rep_binary) is increasing when ICC intervenes (ICC_investigation). Set up a glm() logit model.

model_icc = ...

Present summary of the model.

summary(model_icc)

Now, let’s draw the graph using plot_predictions() from marginaleffects library. What is worrisome?

...(model_icc, 
                 ... = c("ICC_investigation"))

Now, let’s add a couple of variables. Add uninv and trials_domestic.

model_full = glm(rep_binary ~ ICC_investigation + ..., family = binomial, data = tjet) 

Present plot_predictions() for ICC_investigation only.

...

Finally, present the modelsummary() table for both models. Compare.

modelsummary(list("ICC Model" = model_icc,
                  "Full Model" = model_full),
                   stars = c("*" = 0.05),
                   gof_omit = "AIC|BIC|Log.Lik|F|RMSE" ,
                   statistic = "p.value")

Don’t forget to clean the environment.

Exploring Data

Today we are working with World Happiness Report data, the one we had on the week 5.

  • Country_name is the name of the country

  • Ladder_score is the happiness score

  • Continent

Load the data.

whr = read.csv("data/WHR.csv")

Explore the following variables with the ggpairs() function from GGally. What can you see on the graph?

library(GGally)
ggpairs(whr,
        columns = c("Ladder_score",
                    "Logged_GDP_per_capita",
                    "Continent"))

Let’s zoom in the Happiness Score by Continent. Is Africa different from Europe in terms of the Happiness?

ggplot(whr) +
  geom_boxplot(aes(x = Ladder_score, y = Continent)) +
  labs(x = "Happiness") +
  theme_bw()

Zoom in even more. Use filter() to leave only the continents we are analyzing. And leave only the variables we want to work with: Country_name, Ladder_score, Continent and Logged_GDP_per_capita.

ae_whr = whr %>%
  filter(Continent %in% c("Africa", "Europe")) %>%
  select(Country_name, Ladder_score, Continent, Logged_GDP_per_capita)

Plot again. Make color more transparent with alpha = 0.5.

ggplot(ae_whr) +
  geom_boxplot(aes(x = Ladder_score, y = Continent, fill = Continent), alpha = 0.5) +
  labs(x = "Happiness",
       y = NULL) +
  theme_light()

Formal Tests

Finally, let’s test it. Interpret the results.

t.test(ae_whr$Ladder_score ~ ae_whr$Continent)

    Welch Two Sample t-test

data:  ae_whr$Ladder_score by ae_whr$Continent
t = -12.395, df = 73.907, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Africa and group Europe is not equal to 0
95 percent confidence interval:
 -2.277125 -1.646372
sample estimates:
mean in group Africa mean in group Europe 
            4.447886             6.409634 

We can confirm it with the simple model. Estimate the Happiness Score depending on the continent, and plot the confidence intervals for the estimate. Is it different from the boxplots?

lm(Ladder_score ~ Continent, 
   ae_whr) %>%
  plot_predictions(by = "Continent") +
  coord_flip()

Now, let’s plug in the Wealth, measured as Log GDP per capita in the model. Explore difference.

model = lm(Ladder_score ~ Logged_GDP_per_capita + Continent, ae_whr)
plot_predictions(model,
                 condition = c("Logged_GDP_per_capita", "Continent"))
Ignoring unknown labels:
• linetype : "Continent"

Present summary. Do Continent variable matter?

summary(model)

Call:
lm(formula = Ladder_score ~ Logged_GDP_per_capita + Continent, 
    data = ae_whr)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.90103 -0.34965  0.02907  0.32646  1.23085 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -0.84400    0.76697  -1.100   0.2748    
Logged_GDP_per_capita  0.64181    0.09235   6.950 1.29e-09 ***
ContinentEurope        0.54416    0.23916   2.275   0.0258 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5426 on 73 degrees of freedom
Multiple R-squared:  0.8017,    Adjusted R-squared:  0.7963 
F-statistic: 147.6 on 2 and 73 DF,  p-value: < 2.2e-16

Let’s add one more variable, and run the model on the whole dataset. What does it tell us?

lm(Ladder_score ~ Social_support + Logged_GDP_per_capita + Continent, whr) %>%
  summary()

Call:
lm(formula = Ladder_score ~ Social_support + Logged_GDP_per_capita + 
    Continent, data = whr)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.82847 -0.31606  0.02814  0.35224  1.14812 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            -1.15661    0.49274  -2.347  0.02051 *  
Social_support          4.34406    0.59174   7.341 2.52e-11 ***
Logged_GDP_per_capita   0.32703    0.06767   4.833 3.93e-06 ***
ContinentAsia          -0.04027    0.15293  -0.263  0.79275    
ContinentEurope         0.25751    0.18574   1.386  0.16812    
ContinentNorth America  0.58400    0.21045   2.775  0.00638 ** 
ContinentOceania        0.65633    0.42383   1.549  0.12405    
ContinentSouth America  0.28415    0.21400   1.328  0.18670    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5394 on 123 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.7867,    Adjusted R-squared:  0.7746 
F-statistic: 64.81 on 7 and 123 DF,  p-value: < 2.2e-16

What’s next?

Congrats with finalizing the class! No optional exercises this time.

  • Helpful resource for R

  • Check the main page for additional resources

  • Learning to code is like learning a new language: if you’re interested, keep practicing!

  • Linear models next!

What we have covered in coding

Library Functions Description
tidyverse filter(), mutate(), ggplot() data wrangling and visualization
modelsummary modelsummary() present good looking model summary tables
broom tidy() extract additional information from the models
marginaleffects plot_predictions() calculate and visualize marginal effects
GGally ggpairs() extension to ggplot
tinytable tt() present good looking tables
patchwork /, + plot several graphs together

Datasets we have used

Dataset Description Link
V-Dem Measures democracy worldwide V-Dem
World Happiness Report Annual happiness report World Happiness Report
Who Governs Dataset on political elites Who Governs
SIPRI Data on military operations SIPRI
Comparative Political Data Set A dataset covering political institutions CPDS
World Values Survey Survey measuring values across the globe WVS
Transitional Justice Evaluation Tools Dataset Dataset assessing transitional justice TJET Dataset

Check List