Joins Example

Merging Datasets in R

Devtools in R

Sometimes, it is helpful to utilize versions of packages that are under development. Those are impossible to install directly, but you can download them frob GitHub. To simplify this process, you need special package called devtools.

install.packages("devtools")

Now, install the vdemdata library. This way we’ll be able to load the most current V-Dem dataset directly to the R.

devtools::install_github("vdeminstitute/vdemdata")

Let’s test it. We see the dataset is here! But for the future, this is the way to install packages that are not released yet.

library(tidyverse)
library(vdemdata)
vdem %>%
  select(country_name, year, histname, v2x_polyarchy) %>%
  head()
  country_name year                 histname v2x_polyarchy
1       Mexico 1789 Viceroyalty of New Spain         0.028
2       Mexico 1790 Viceroyalty of New Spain         0.028
3       Mexico 1791 Viceroyalty of New Spain         0.028
4       Mexico 1792 Viceroyalty of New Spain         0.028
5       Mexico 1793 Viceroyalty of New Spain         0.028
6       Mexico 1794 Viceroyalty of New Spain         0.028

Explorig Data

We are working with SIPRI Arms Transfers Database. It contains information on all transfers of major conventional arms. The variables are:

  • Recipient of arms

  • Year of the transfer

  • Import of arms

  • Regime a V-Dem variable for political regime

sipri = read.csv("data/sipri.csv")

Let’s see.

head(sipri)
  Recipient Year Import                  Regime
1     India 1950    141              Autocratic
2     India 1951    277 Electoral Authoritarian
3     India 1952    104    Minimally Democratic
4     India 1953    430    Minimally Democratic
5     India 1954    265    Minimally Democratic
6     India 1955    350    Minimally Democratic

Now, subset some variables from V-Dem. We are choosing the following variables:

  • country_name

  • year of the coded data

  • e_gdp GDP of a country

  • e_miinteco Armed conflict, international

  • e_miinterc Armed conflict, internal

vdem_variables = vdem %>%
  select(country_name, year, e_gdp, e_miinteco, e_miinterc)

Let’s print first couple of observations

head(vdem_variables)
  country_name year    e_gdp e_miinteco e_miinterc
1       Mexico 1789 1914.148          0          0
2       Mexico 1790 1923.035          0          0
3       Mexico 1791 1957.039          0          0
4       Mexico 1792 1989.183          0          0
5       Mexico 1793 2018.233          0          0
6       Mexico 1794 2041.574          0          0

Merging Datasets

Note the syntax below. We are joining two dataframes by two variables: Recipient and Year, but in the V-Dem data those have different name or spelling.

sipri_vdem = sipri %>%
  left_join(vdem_variables, by = c("Recipient" = "country_name", 
                                   "Year" = "year"))

Since we are using left_join(), the SIPRI variables are on the left

Check the result

head(sipri_vdem)
  Recipient Year Import                  Regime     e_gdp e_miinteco e_miinterc
1     India 1950    141              Autocratic  98082.17          0          0
2     India 1951    277 Electoral Authoritarian  98714.66          0          0
3     India 1952    104    Minimally Democratic 100562.77          0          0
4     India 1953    430    Minimally Democratic 103797.20          0          0
5     India 1954    265    Minimally Democratic 106489.29          0          0
6     India 1955    350    Minimally Democratic 109680.55          1          1

Now, we can save the data in RDS format

saveRDS("sipri_vdem.RDS")