This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to R/tidyverse for Exploratory Data Analysis

This lesson is a general introduction to the statistical program R, which will give you foundational skills for doing exploratory data analysis. Our sessions will be very hands-on, with a strong emphasis on data visualisation and manipulation using a collection of packages known as the tidyverse.

We will use a generic dataset from the Gapminder Foundation, but the skills we learn here apply to a wide range of datasets.

Along the way, we will not just learn R itself, but also (and importantly) about fundamental principles of exploratory data analysis. In that sense, R will be taught as a tool that gives us the freedom to ask a range of questions from our data in a reproducible manner. We will discuss topics such as how to critically evaluate the quality of our data, how it can be used to answer specific questions, explore sources of variation, what makes a good visualisation, how to deal with missing data, and so on.

Prerequisites

These lessons assume no prior knowledge of the skills or tools covered.

You will need a computer with a working copy of R and RStudio. Please make sure to install everything before working through this lesson.

Follow the instructions on the “Setup” tab to install the software and download the necessary data.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to R and RStudio How to setup and organise an analysis project?
How to interact with R and RStudio?
How to install packages?
00:25 2. Basic objects and data types in R What are the basic data structures and data types in R?
How can values be assigned to objects?
How can subsets be extracted from vectors?
How are missing values represented in R?
00:55 3. Working with Tabular Data How to import tabular data into R?
What kind of object stores tabular data?
How to investigate the contents of such object (types of variables and missing data)?
01:55 4. Data visualisation with `ggplot2` How to build a graph in R?
What types of visualisation are suitable for different types of data?
03:15 5. Manipulating variables (columns) with `dplyr` How to select and/or rename specific columns from a data frame?
How to create a new column or modify an existing one?
How to ‘chain’ several commands together with pipes?
04:05 6. Manipulating observations (rows) with `dplyr` How to order rows in a table?
How to retain only unique rows (no duplicates)?
How to identify observations of a dataset that fulfill certain conditions?
05:10 7. Grouped operations using `dplyr` How to calculate summary statistics from a dataset?
How to apply those summaries to groups within the data?
How to apply other data manipulation steps to groups within the data?
06:30 8. Working with categorical data + Saving data How to fix common typos in character variables?
How to reorder values in ordinal categorical variables?
How to save data into a file?
07:15 9. Joining tables How to join different tables together?
How to identify mis-matches between tables?
07:50 10. Data reshaping: from wide to long and back How to change the shape of a table from a ‘wide’ to a ‘long’ format?
When is one or the other format more suitable for analysis?
08:25 11. Data visualisation with `ggplot2` - part II How can we fully customise a plot, by adding annotations, labels, control the axis limits, and change its overall look?
How can we compose several plots together?
09:45 12. Extra practice exercises How to apply the tools and concepts learned to new data?
09:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.