Data Management with GitHub

Week 6

Published

October 24, 2025

Before we start

  • Any questions?

  • Today the class is different: we will cover the review on the Website.

Review

Let’s quickly review joins in R. Exercises are adapted from here.

Create a sample dataset with information on how far the students live from campus. Why are we using set.seed()? Do you remember what runif() function does?

Create another data frame with information on how students get to school.

First, take student_distance dataset and using left_join() merge the student_transport dataframe.

Now, use full_join(). What is the difference?

Take student_transport and use left_join() to merge the student_distance. What is the difference? How would you make a choice?

Agenda

  • Working with GitHub

  • Covering extra topics (if time allows!)

GitHub

You already know what GitHub is! But GitHub desktop is a bit more intuitive than working with the Git within RStudio terminal.

Download GitHub Desktop. Then, clone the repository we are working with Today. Let’s do it together, step-by-step (the same logic applies to creating a new repository or linking a local folder to a GitHub repository).

  • Install GitHub Desktop

  • Open GitHub Desktop β†’ File β†’ Clone repository

  • URL β†’ Paste the URL β†’ choose local folder β†’ Clone

GitHub Desktop

Repository

Check List

Optional Exercises

Download this script.

Download script


Load the tidyverse and patchwork libraries.

Solution
...

Move the CSES_MARPOR.rds that you downloaded from GitHub to your data folder. Load it into R.

Solution
cses = readRDS("data/CSES_MARPOR.rds")

Draw a histogram of opportunity variable. Save it to p1 object.

Solution
p1 = ...

Draw a histogram of loyalty variable. Save it to p2 object.

Solution
p2 = ...

Display both graphs (p1 and p2) side-by-side using + operator from patchwork library.

Solution
... + p2

Customize the graphs (add axes names and title, and change theme to theme_bw()), and display two graphs together again.

Solution
...