Introduction to R - Part III Systems Biology
This is a general introduction to R for data analysis.
Our practicals will be very hands-on, focusing on learning the necessary syntax
to allow you to do data analysis in R, from data manipulation to visualisation.
We will focus on tabular data, which is general enough to allow you to apply
these skills to a wide range of problems.
If you have any queries please contact Hugo Tavares (hugo.tavares@slcu.cam.ac.uk)
or post an issue
on our GitHub repository.
Setup
All necessary software and data will be available on the training machines at
the Bioinformatics Training Room
(Craik-Marshall Building).
However, you are welcome to use your own laptop, in which case you need to:
- Download and install R (here)
- Download and install RStudio (here)
- Open RStudio and go to
Tools > Install Packages
and paste this into the “Packages” field: tidyverse
INT1.23 / INT1.26 / INT1.30
These lessons will cover the basics of using R with RStudio.
INT1.33
Further resources
Reference Books
- Holmes S, Huber W, Modern Statistics for Modern Biology - covers many aspects of data analysis relevant for biology/bioinformatics from statistical modelling to image analysis.
- Peng R, Exploratory Data Analysis with R - an more general introduction to exploratory data analysis techniques.
- Grolemund G & Wickham H, R for Data Science - a good follow up from this course if you want to learn more about
tidyverse
packages.
- Wickham H, Advanced R - more advanced topics in how R works internally, how object-oriented programming works, etc.
- McElreath R, Statistical Rethinking - an introduction to statistical modelling and inference using R (a more advanced topic, but written in an accessible way to non-statisticians). Also see the lecture materials.
- Barabási A, Network Science - an introduction to network analysis (it’s a general book, not focused on R. But see the igraph package).
- James G, Witten D, Hastie T & Tibshirani R, Introduction to Statistical Learning - an introductory book about machine learning using R (also advanced topic).
- Lovelace R, Nowosad J, Muenchow J, Geocomputation with R - using R for visualisation and analysis of spatial data.
Other Languages and Packages
- For fast data frame manipulation and calculations, especially for very large data (millions of observations) see the
data.table
package.
- For bioinformatic applications, see the Bioconductor repository - these packages typically have excellent documentation in the form of vignetes. See these materials for an overview of the Bioconductor project.
- For data analysis in Python:
- Some of the equivalent functionality of base R to manipulate data can be achieved in python using a combination of these packages:
pandas
(data frames), matplotlib
(plotting), numpy
(matrix/arrays)
- For a
ggplot2
-like package see plotnine
- For a
dplyr
-like package see siuba
- For high-performant data frame operations see
datatable
Other Topics
Acknowledgements
Many of these materials were inspired/developed by the Data Carpentry organisation.