Publishing and Quarter Review

Week 9

Published

November 14, 2025

Before we start

No Discussion Section and Office Hours Next Week (but available by demand)
Any questions?

Agenda

Quick Review
Example of Quarto and Overleaf for Publishing

Quarto for Publishing

Quarto and Overleaf are both excellent tools for publishing. For this class, itэs better to stick with Quarto – the scope of your projects makes it a suitable choice. However, if your papers become longer or you plan to collaborate with others, Overleaf is the better option.

Here is how to get started.

Paper Template for Quarto.
Overleaf. Click New project -> Example project to get started.

Helpful tips:

GitHub Desktop Tutorial
Exporting .bib file from Zotero
Google Scholar has BibTeX citations!

Download script

Review

First of all, load the usual libraries and the Transnational Justice dataset from the previous week.

library(tidyverse)
library(marginaleffects)
library(GGally)

tjet = read.csv("data/tjet.csv")

Let’s explain under what circumstances countries adopt reparations policies. Could the involvement of international organizations influence a state’s decision to implement such policies?

rep_binary presence of reparations policy
ICC_investigation presence of ICC investigation
uninv ongoing UN investigations, count
trials_domestic count of domestic trials

First, let’s see if the probability of reparations (rep_binary) is increasing when ICC intervenes (ICC_investigation). Set up a glm() logit model.

model_icc = ...

Present summary of the model.

summary(model_icc)

Now, let’s draw the graph using plot_predictions() from marginaleffects library. What is worrisome?

...(model_icc, 
                 ... = c("ICC_investigation"))

Now, let’s add a couple of variables. Add uninv and trials_domestic.

model_full = glm(rep_binary ~ ICC_investigation + ..., family = binomial, data = tjet)

Present plot_predictions() for ICC_investigation only.

...

Finally, present the modelsummary() table for both models. Compare.

modelsummary(list("ICC Model" = model_icc,
                  "Full Model" = model_full),
                   stars = c("*" = 0.05),
                   gof_omit = "AIC|BIC|Log.Lik|F|RMSE" ,
                   statistic = "p.value")

Don’t forget to clean the environment.

Exploring Data

Today we are working with World Happiness Report data, the one we had on the week 5.

Country_name is the name of the country
Ladder_score is the happiness score
Continent

Load the data.

whr = read.csv("data/WHR.csv")

Explore the following variables with the ggpairs() function from GGally. What can you see on the graph?

library(GGally)
ggpairs(whr,
        columns = c("Ladder_score",
                    "Logged_GDP_per_capita",
                    "Continent"))

Let’s zoom in the Happiness Score by Continent. Is Africa different from Europe in terms of the Happiness?

ggplot(whr) +
  geom_boxplot(aes(x = Ladder_score, y = Continent)) +
  labs(x = "Happiness") +
  theme_bw()

Zoom in even more. Use filter() to leave only the continents we are analyzing. And leave only the variables we want to work with: Country_name, Ladder_score, Continent and Logged_GDP_per_capita.

ae_whr = whr %>%
  filter(Continent %in% c("Africa", "Europe")) %>%
  select(Country_name, Ladder_score, Continent, Logged_GDP_per_capita)

Plot again. Make color more transparent with alpha = 0.5.

ggplot(ae_whr) +
  geom_boxplot(aes(x = Ladder_score, y = Continent, fill = Continent), alpha = 0.5) +
  labs(x = "Happiness",
       y = NULL) +
  theme_light()

Formal Tests

Finally, let’s test it. Interpret the results.

t.test(ae_whr$Ladder_score ~ ae_whr$Continent)


    Welch Two Sample t-test

data:  ae_whr$Ladder_score by ae_whr$Continent
t = -12.395, df = 73.907, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Africa and group Europe is not equal to 0
95 percent confidence interval:
 -2.277125 -1.646372
sample estimates:
mean in group Africa mean in group Europe 
            4.447886             6.409634

We can confirm it with the simple model. Estimate the Happiness Score depending on the continent, and plot the confidence intervals for the estimate. Is it different from the boxplots?

lm(Ladder_score ~ Continent, 
   ae_whr) %>%
  plot_predictions(by = "Continent") +
  coord_flip()

Now, let’s plug in the Wealth, measured as Log GDP per capita in the model. Explore difference.

model = lm(Ladder_score ~ Logged_GDP_per_capita + Continent, ae_whr)
plot_predictions(model,
                 condition = c("Logged_GDP_per_capita", "Continent"))

Ignoring unknown labels:
• linetype : "Continent"

Present summary. Do Continent variable matter?

summary(model)


Call:
lm(formula = Ladder_score ~ Logged_GDP_per_capita + Continent, 
    data = ae_whr)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.90103 -0.34965  0.02907  0.32646  1.23085 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -0.84400    0.76697  -1.100   0.2748    
Logged_GDP_per_capita  0.64181    0.09235   6.950 1.29e-09 ***
ContinentEurope        0.54416    0.23916   2.275   0.0258 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5426 on 73 degrees of freedom
Multiple R-squared:  0.8017,    Adjusted R-squared:  0.7963 
F-statistic: 147.6 on 2 and 73 DF,  p-value: < 2.2e-16

Let’s add one more variable, and run the model on the whole dataset. What does it tell us?

lm(Ladder_score ~ Social_support + Logged_GDP_per_capita + Continent, whr) %>%
  summary()


Call:
lm(formula = Ladder_score ~ Social_support + Logged_GDP_per_capita + 
    Continent, data = whr)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.82847 -0.31606  0.02814  0.35224  1.14812 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            -1.15661    0.49274  -2.347  0.02051 *  
Social_support          4.34406    0.59174   7.341 2.52e-11 ***
Logged_GDP_per_capita   0.32703    0.06767   4.833 3.93e-06 ***
ContinentAsia          -0.04027    0.15293  -0.263  0.79275    
ContinentEurope         0.25751    0.18574   1.386  0.16812    
ContinentNorth America  0.58400    0.21045   2.775  0.00638 ** 
ContinentOceania        0.65633    0.42383   1.549  0.12405    
ContinentSouth America  0.28415    0.21400   1.328  0.18670    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5394 on 123 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.7867,    Adjusted R-squared:  0.7746 
F-statistic: 64.81 on 7 and 123 DF,  p-value: < 2.2e-16

What’s next?

Congrats with finalizing the class! No optional exercises this time.

Helpful resource for R
Check the main page for additional resources
Learning to code is like learning a new language: if you’re interested, keep practicing!
Linear models next!

What we have covered in coding

Library	Functions	Description
tidyverse	`filter()`, `mutate()`, `ggplot()`	data wrangling and visualization
modelsummary	`modelsummary()`	present good looking model summary tables
broom	`tidy()`	extract additional information from the models
marginaleffects	`plot_predictions()`	calculate and visualize marginal effects
GGally	`ggpairs()`	extension to ggplot
tinytable	`tt()`	present good looking tables
patchwork	`/`, `+`	plot several graphs together

Datasets we have used

Dataset	Description	Link
V-Dem	Measures democracy worldwide	V-Dem
World Happiness Report	Annual happiness report	World Happiness Report
Who Governs	Dataset on political elites	Who Governs
SIPRI	Data on military operations	SIPRI
Comparative Political Data Set	A dataset covering political institutions	CPDS
World Values Survey	Survey measuring values across the globe	WVS
Transitional Justice Evaluation Tools Dataset	Dataset assessing transitional justice	TJET Dataset

Check List

I know that every lab has check list below! And I will use it to navigate what we have learned
R doesn’t scare me anymore
I have developed the intuition behind application of quantitative methods and happy to learn more!