library(tidyverse)
sipri_vdem = readRDS("data/sipri_vdem.RDS")Presenting Results of Regression
Week 7
Before we start
- Questions regarding Lab Assignment or the class? 
- Please, feel in the survey after the class if you did not yet 
Agenda
- Exploring multiple variables at once 
- Visualizing complex regressions 
- Presenting results of regression 
Set up
We are working with SIPRI Arms Transfers Database. It contains information on all transfers of major conventional arms. The variables are:
- Recipientof arms
- Yearof the transfer
- Importof arms
- Regimea V-Dem variable for political regime
And, we subset some variables from V-Dem. We are choosing the following variables:
- yearof the coded data
- e_gdpGDP of a country
- e_miintecoArmed conflict, international
- e_miintercArmed conflict, internal
You can see how I merged two datasets here
For our convenience, rename the variables in the newly created dataframe.
sipri_vdem = sipri_vdem %>%
  rename(GDP = e_gdp,
         International_conflict = e_miinteco,
         Internal_conflict = e_miinterc)
head(sipri_vdem)  Recipient Year Import                  Regime      GDP International_conflict
1     India 1950    141              Autocratic 42170.12                      0
2     India 1951    277 Electoral Authoritarian 42010.81                      0
3     India 1952    104    Minimally Democratic 43033.76                      0
4     India 1953    430    Minimally Democratic 44503.00                      0
5     India 1954    265    Minimally Democratic 45960.55                      0
6     India 1955    350    Minimally Democratic 47305.30                      1
  Internal_conflict
1                 0
2                 0
3                 0
4                 0
5                 0
6                 1Explore the GDP variable. Does it need a transformation?
ggplot(sipri_vdem) +
  geom_histogram(aes(x = GDP))It might be the case. But to double-check our assumption, let’s draw a boxplot. Take the log() of GDP directly in the plot. Did it get better?
ggplot(sipri_vdem) +
  geom_boxplot(aes(y = log(GDP)))Therefore, let’s create a new variable Log_GDP.
sipri_vdem = sipri_vdem %>%
  mutate(Log_GDP = log(GDP))To explore multiple variables at once, it is useful to plot them in pairs plot. There’s a library GGally which is based on ggplot2, and it’s quite straightforward. Be careful with the wrong class identification! Can you notice anything?
library(GGally)
sipri_vdem %>%
  ggpairs(columns = c("Import", "Regime", "Log_GDP", "International_conflict", "Internal_conflict"))Don’t forget to fix the classes of the variables!
sipri_vdem$International_conflict = as.factor(sipri_vdem$International_conflict)
sipri_vdem$Regime = as.factor(sipri_vdem$Regime)Model Building
Let’s set up a basic model. We are interested in explaining the Import of arms. Is it related to the economic capacity of the state? We can use proxy Log_GDP. We can interpret and plot it quite easily, right? But does the intercept make sense?
model_basic = lm(Import ~ Log_GDP, sipri_vdem)
summary(model_basic)
Call:
lm(formula = Import ~ Log_GDP, data = sipri_vdem)
Residuals:
   Min     1Q Median     3Q    Max 
-676.3 -222.9  -87.5   66.0 5291.1 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -718.034     29.032  -24.73   <2e-16 ***
Log_GDP      108.547      3.168   34.26   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 457.9 on 6439 degrees of freedom
  (436 observations deleted due to missingness)
Multiple R-squared:  0.1542,    Adjusted R-squared:  0.1541 
F-statistic:  1174 on 1 and 6439 DF,  p-value: < 2.2e-16Let’s run a Multiple Linear Regression with fixed effects. Add International_conflict and Regime to the model. You can interpret these coefficients easily, too.
model_fe = lm(Import ~ Log_GDP + International_conflict + Regime, sipri_vdem)
summary(model_fe)
Call:
lm(formula = Import ~ Log_GDP + International_conflict + Regime, 
    data = sipri_vdem)
Residuals:
   Min     1Q Median     3Q    Max 
-991.3 -217.9  -82.6   81.4 5169.6 
Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   -783.055     36.852 -21.248  < 2e-16 ***
Log_GDP                        126.364      4.468  28.281  < 2e-16 ***
International_conflict1        206.923     26.470   7.817 6.72e-15 ***
RegimeDemocratic               -95.328     17.191  -5.545 3.11e-08 ***
RegimeElectoral Authoritarian -151.434     23.328  -6.491 9.45e-11 ***
RegimeMinimally Democratic    -216.183     50.820  -4.254 2.15e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 454.6 on 4347 degrees of freedom
  (2524 observations deleted due to missingness)
Multiple R-squared:  0.1995,    Adjusted R-squared:  0.1986 
F-statistic: 216.6 on 5 and 4347 DF,  p-value: < 2.2e-16Visual Presenation of Regression
To convey a better idea what’s going on, you can plot these three independent variables. But what if there are more?
library(ggeffects)
ggpredict(model_fe, terms = c("Log_GDP", "International_conflict", "Regime")) %>%
  plot()Finally, let’s set up another model. This time with an interaction effect.
model_int = lm(Import ~ Log_GDP * International_conflict + Regime, sipri_vdem)
summary(model_int)
Call:
lm(formula = Import ~ Log_GDP * International_conflict + Regime, 
    data = sipri_vdem)
Residuals:
    Min      1Q  Median      3Q     Max 
-1183.9  -214.6   -83.1    77.6  5177.3 
Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     -747.804     37.877 -19.743  < 2e-16 ***
Log_GDP                          122.263      4.582  26.682  < 2e-16 ***
International_conflict1         -375.723    151.132  -2.486    0.013 *  
RegimeDemocratic                 -95.161     17.162  -5.545 3.12e-08 ***
RegimeElectoral Authoritarian   -151.842     23.290  -6.520 7.85e-11 ***
RegimeMinimally Democratic      -218.460     50.740  -4.305 1.70e-05 ***
Log_GDP:International_conflict1   63.002     16.090   3.916 9.16e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 453.8 on 4346 degrees of freedom
  (2524 observations deleted due to missingness)
Multiple R-squared:  0.2023,    Adjusted R-squared:  0.2012 
F-statistic: 183.7 on 6 and 4346 DF,  p-value: < 2.2e-16Presentation of Results
Tables
The most common way to present the results of the regression is in tables. For details and examples see details.
Let’s run modelsummary() with a base model.
library(modelsummary)
modelsummary(model_basic)| (1) | |
|---|---|
| (Intercept) | -718.034 | 
| (29.032) | |
| Log_GDP | 108.547 | 
| (3.168) | |
| Num.Obs. | 6441 | 
| R2 | 0.154 | 
| R2 Adj. | 0.154 | 
| AIC | 97207.7 | 
| BIC | 97228.0 | 
| Log.Lik. | -48600.843 | 
| F | 1173.891 | 
| RMSE | 457.88 | 
Don’t forget to:
- Include p-values and \(\alpha\) (critical values for stars display)! 
- Include standard errors or confidence intervals for \(\beta\) 
- If you have a categorical variable, mention the reference category 
- \(R^2\) for simple linear regression and \(R^2_a\) for multiple linear regression 
- Include number of observations 
Add the following argument: output = "table.html".
publishable_table = modelsummary(list("Base model" = model_basic,
                    "Fixed Effects model" = model_fe,
                    "Interaction model" = model_int),
                   title = "Arms Import Models",  
                   stars = TRUE,
                   gof_omit = "AIC|BIC|Log.Lik|F|RMSE",
                   coef_rename = c("(Intercept)", 
                             "Log GDP", 
                             "International Conflict", 
                             "Regime: Democratic",
                             "Regime: Electoral Authoritarian",
                             "Regime: Minimally Democratic",
                             "Log GDP × International Conflict"),
                   notes = "Regime reference category: Autocratic")
publishable_table| Base model | Fixed Effects model | Interaction model | |
|---|---|---|---|
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | |||
| Regime reference category: Autocratic | |||
| (Intercept) | -718.034*** | -783.055*** | -747.804*** | 
| (29.032) | (36.852) | (37.877) | |
| Log GDP | 108.547*** | 126.364*** | 122.263*** | 
| (3.168) | (4.468) | (4.582) | |
| International Conflict | 206.923*** | -375.723* | |
| (26.470) | (151.132) | ||
| Regime: Democratic | -95.328*** | -95.161*** | |
| (17.191) | (17.162) | ||
| Regime: Electoral Authoritarian | -151.434*** | -151.842*** | |
| (23.328) | (23.290) | ||
| Regime: Minimally Democratic | -216.183*** | -218.460*** | |
| (50.820) | (50.740) | ||
| Log GDP × International Conflict | 63.002*** | ||
| (16.090) | |||
| Num.Obs. | 6441 | 4353 | 4353 | 
| R2 | 0.154 | 0.199 | 0.202 | 
| R2 Adj. | 0.154 | 0.199 | 0.201 | 
The more complex the model, the harder it is to interpret the results. Let’s write out our models. You can often see these in articles. By plugging the coefficients we or our readers can calculate the predicted value for any GDP value, regime type and presence or absence of international conflict.
The basic model is easy, right?
\[ \text{Arms Import} = \beta_0 + \beta_1(\text{LogGDP}) \]
Now, fixed effects. It gets slightly trickier.
\[ \text{Arms Import} = \beta_0 + \beta_1(\text{LogGDP}) + \beta_2(\text{International Conflict}) + \beta_3(\text{Regime}) \]
And finally, interaction model. You can try to interpret the interaction, but we need to know the reference categories. Which is a bit of a hustle.
\[ \text{Arms Import} = \beta_0 + \beta_1(\text{LogGDP}) + \beta_2(\text{International Conflict}) + \beta_3(\text{Regime}) + \beta_4(\text{LogGDP} \times \text{International Conflict}) \]
Statistical Significance and Magnitude of \(\beta\)
In GGally library there is a useful function ggcoef() which allows to to quickly plot the coefficients of a model. Feel free to explore variations of the function here.
Plotting the effects is great to provide intuition about the significance and the magnitude of the effects. But! Don’t forget about the scales! Are you comparing comparable? Scaling may be helpful: how can you compare one unit increase in Log_GDP to “one unit increase” in International_conflict?
Moreover, you cannot interpret the magnitude of the interaction term, if you have it!
ggcoef(model_fe,
       exclude_intercept = TRUE) +
  theme_bw()Try ggcoef_model() out!
Additional Information
Predicted values and Marginal Effects
Presenting predicted values provides the reader with intuition about what happens when variables take specific values. For example, we take the most typical GDP, and calculate the effects for two categorical variables: International_conflict and Regime.
ggpredict(model_int, terms = c("International_conflict", "Regime")) # Predicted values of Import
Regime: Autocratic
International_conflict | Predicted |         95% CI
---------------------------------------------------
0                      |    308.57 | 288.87, 328.27
1                      |    477.20 | 423.80, 530.59
Regime: Democratic
International_conflict | Predicted |         95% CI
---------------------------------------------------
0                      |    213.41 | 187.75, 239.07
1                      |    382.03 | 322.22, 441.85
Regime: Electoral Authoritarian
International_conflict | Predicted |         95% CI
---------------------------------------------------
0                      |    156.73 | 114.75, 198.70
1                      |    325.35 | 258.22, 392.48
Regime: Minimally Democratic
International_conflict | Predicted |         95% CI
---------------------------------------------------
0                      |     90.11 |  -7.60, 187.82
1                      |    258.74 | 146.76, 370.71
Adjusted for:
* Log_GDP = 8.64Let’s make it look publishable. We wrangle the data a bit. By now, the code shouldn’t stress you out. If it does, then run it step-by-step to get a sense of what’s going on!
model_int_groupped = ggpredict(model_int, terms = c("International_conflict", "Regime")) %>%
  as.data.frame() %>%
  rename(International_conflict = x,
         Regime = group,
         Import = predicted) 
model_int_groupped  International_conflict    Import std.error   conf.low conf.high
1                      0 308.56877  10.04954 288.866542  328.2710
2                      0 213.40769  13.08769 187.749148  239.0662
3                      0 156.72724  21.40898 114.754718  198.6998
4                      0  90.10883  49.84073  -7.604411  187.8221
5                      1 477.19551  27.23333 423.804293  530.5867
6                      1 382.03443  30.51067 322.217952  441.8509
7                      1 325.35398  34.24112 258.223927  392.4840
8                      1 258.73557  57.11487 146.761296  370.7098
                   Regime
1              Autocratic
2              Democratic
3 Electoral Authoritarian
4    Minimally Democratic
5              Autocratic
6              Democratic
7 Electoral Authoritarian
8    Minimally DemocraticFinally, present the marginal effects using tinytable.1 The code below is quite messy, in your free time you can explore it in a more detail. For now, I want this code to be available to you. Add this line of code in the end: save_tt("predicted_values.html") to save the table on your machine.
library(tinytable)
model_int_groupped %>%
  group_by(International_conflict, Regime) %>%
  summarize(Import = paste0(round(Import, 3), "\n", "[", round(conf.low, 3), "; ", round(conf.high, 3), "]")) %>%
  pivot_wider(names_from = Regime,
              values_from = Import) %>%
  mutate(International_conflict = ifelse(International_conflict == 1, "Present", "Absent")) %>%
  rename(`International Conflict` = International_conflict) %>% 
  tt(note = "Adjusted for average log of GDP (=8.64)") %>%
  group_tt(j = list("Regime" = 2:5)) | Regime | ||||
|---|---|---|---|---|
| International Conflict | Autocratic | Democratic | Electoral Authoritarian | Minimally Democratic | 
| Adjusted for average log of GDP (=8.64) | ||||
| Absent | 308.569 [288.867; 328.271] | 213.408 [187.749; 239.066] | 156.727 [114.755; 198.7] | 90.109 [-7.604; 187.822] | 
| Present | 477.196 [423.804; 530.587] | 382.034 [322.218; 441.851] | 325.354 [258.224; 392.484] | 258.736 [146.761; 370.71] | 
Check List
 I know how to use GGally to analyze pairs of variables
 I can plot complex models with ggeffects library
 I know how to save modelsummary() output to my machine
I know I need to include \(\alpha\) values and stars in my model
I have an intuition regarding predicted values and marginal effects
Footnotes
- Great example of research presenting the results in a similar manner can be found in Reuter, O.J. and Szakonyi, D., 2015. Online social media and political awareness in authoritarian regimes. British Journal of Political Science, 45(1), pp.29-51.↩︎