Introduction to data analysis with R

1, 2 & 4 July 2019, Cambridge University Bioinformatics Training

Instructors: Hugo Tavares & Sandra Cortijo

This is a general introduction to R for exploratory data analysis.

Our practicals will be very hands-on, focusing on learning the necessary sintax to allow you to do exploratory data analysis in R, from data manipulation to visualisation. We will focus on tabular data, which is general enough to allow you to apply these skills to a wide range of problems. On the third day we will go through a more complex example using transcriptomic data.

Below, we provide links to detailed materials for your reference, many of which were developed by the Data Carpentry organisation.

If you have any questions please post a new issue on our GitHub repository.

Setup

All necessary software and data will be available on the training machines at the Bioinformatics Training Room (Craik-Marshall Building).

However, you are welcome to use your own laptop, in which case you need to:

Download and install R (here)
Download and install RStudio (here)
Install the CRAN R packages tidyverse, corrplot, cowplot and ggfortify (open RStudio and go to Tools > Install Packages)
Install the Bioconductor R package ComplexHeatmap (instructions here)

Materials

Introduction to R (Mon)

This lesson will cover the basics of using R with RStudio and how to produce a wide range of graphs for data visualisation.

exercises

Data manipulation and visualisation in R (Tues)

This lesson will cover some functions to effectively manipulate and summarise tabular data and we will learn more about data visualisation.

Data Organisation in Spreadsheets (Tues evening)

Digital data recording often starts with a spreadsheet software (e.g. Excel). For an effective data analysis, it’s crucial to start with a well structured and formatted dataset. Because of this, we will have a brief discussion about common issues that should be considered when recording data.

Download data for this lesson here
Find detailed materials here
- example of tidy data

Exploratory analysis of multivariate data (Thu)

In this session we will apply the concepts learned so far to a worked example of an exploratory data analysis of transcriptomic data.

Exploratory analysis of gene expression data

exercises

Further resources

One page summary of functions
Summary of R basics
Summary of dplyr functions and their equivalent in base R
Cheatsheets for dplyr, ggplot2 and more
- dplyr cheatsheet
- ggplot2 cheatsheet
Data-to-Viz website with great tips for choosing the right graphs for your data

Reference books:

Holmes S, Huber W, Modern Statistics for Modern Biology - covers many aspects of data analysis relevant for biology/bioinformatics from statistical modelling to image analysis.
Peng R, Exploratory Data Analysis with R - an more general introduction to exploratory data analysis techniques.
Grolemund G & Wickham H, R for Data Science - a good follow up from this course if you want to learn more about tidyverse packages.
McElreath R, Statistical Rethinking - an introduction to statistical modelling and inference using R (a more advanced topic, but written in an accessible way to non-statisticians).
- Also see the lecture materials, which include access to the draft of the book’s second edition.
James G, Witten D, Hastie T & Tibshirani R, Introduction to Statistical Learning - an introductory book about machine learning using R (also advanced topic).
- Also see this course material for a practical introduction to this topic.

Other courses at Cambridge:

List of scheduled courses
Some particular courses that might be of interest:
Note that you do not need to attend the “Intro to R” courses, because we’ve already covered that material in this course.