...Data Management with GitHub
Week 6
Before we start
Any questions?
Today the class is different: we will cover the review on the Website.
Letβs quickly review joins in R. Exercises are adapted from here.
Create a sample dataset with information on how far the students live from campus. Why are we using set.seed()? Do you remember what runif() function does?
Create another data frame with information on how students get to school.
First, take student_distance dataset and using left_join() merge the student_transport dataframe.
Now, use full_join(). What is the difference?
Take student_transport and use left_join() to merge the student_distance. What is the difference? How would you make a choice?
Agenda
Working with GitHub
Covering extra topics (if time allows!)
GitHub
You already know what GitHub is! But GitHub desktop is a bit more intuitive than working with the Git within RStudio terminal.
Download GitHub Desktop. Then, clone the repository we are working with Today. Letβs do it together, step-by-step (the same logic applies to creating a new repository or linking a local folder to a GitHub repository).
Install GitHub Desktop
Open GitHub Desktop β File β Clone repository
URL β Paste the URL β choose local folder β Clone
Check List
Optional Exercises
Download this script.
Load the tidyverse and patchwork libraries.
Move the CSES_MARPOR.rds that you downloaded from GitHub to your data folder. Load it into R.
Draw a histogram of opportunity variable. Save it to p1 object.
Draw a histogram of loyalty variable. Save it to p2 object.
Display both graphs (p1 and p2) side-by-side using + operator from patchwork library.
Customize the graphs (add axes names and title, and change theme to theme_bw()), and display two graphs together again.