In this lesson, we will apply some of the skills that we’ve gained so far to manipulate and explore a dataset from an RNAseq experiment.
This lesson uses data from an experiment included in the fission
R/Bioconductor package. Very briefly, we have transcriptome data for:
Let’s say that you did this experiment yourself, and that a bioinformatician analysed it and provided you with four files of data:
sample_info.csv
- information about each sample.counts_raw.csv
- “raw” read counts for all genes, which gives a measure of the genes’ expression. (these are simply scaled to the size of each library to account for the fact that different samples have more or less total number of reads).counts_transformed.csv
- normalised read counts for all genes, on a log scale and transformed to correct for a dependency between the mean and the variance. This is typical of count data, and we will look at it in the exploratory data analysis lesson).test_result.csv
- results from a statistical test that assessed the probability of observed expression differencesThe data are provided as CSV files, which you can download and read into your R session.
File > New Project...
).scripts
01_prepare_data.R
.Use the code below to download the data into a new directory called data
:
# Create a "data" directory
dir.create("data")
# Download the data provided by your collaborator
# using a for loop to automate this step
for(i in c("counts_raw.csv", "counts_transformed.csv", "sample_info.csv", "test_result.csv")){
download.file(
url = paste0("https://github.com/tavareshugo/data-carpentry-rnaseq/blob/master/data/", i, "?raw=true"),
destfile = paste0("data/", i)
)
}
Finally, load the tidyverse
package and do the exercises below:
# load the package
library(tidyverse)
Exercise:
Import data into R and familiarise yourself with it.
Create four objects called
raw_cts
,trans_cts
,sample_info
andtest_result
.