back to lesson’s homepage

Lesson Objectives

  • Understand how MA plots can be useful to explore the results of differential gene expression tests
  • Understand how scaling data can help in visualising gene expression trends

Further resources

Setup

In your project’s directory, create a new script called 04_gene_clustering.R, and start with the following code:

##### setup ####

# load packages
library(tidyverse)

# read the data
trans_cts <- read_csv("./data/counts_transformed.csv")
sample_info <- read_csv("./data/sample_info.csv")
test_result <- read_csv("./data/test_result.csv")

MA plots

Before starting with our gene clustering, it’s useful to reduce the dimensionality of the data and work only with those genes that are likely to show some differences of expression.

We were given a table with the results of a differential analysis test between 0 min and each of the other time-points:

## # A tibble: 30,055 x 8
##    gene        baseMean log2FoldChange lfcSE   stat pvalue  padj comparison
##    <chr>          <dbl>          <dbl> <dbl>  <dbl>  <dbl> <dbl>      <dbl>
##  1 SPAC212.11      8.55         1.54   0.497  1.09  0.276      1         15
##  2 SPAC212.09c    50.8          0.399  0.273  0     1          1         15
##  3 SPAC212.04c    38.3         -0.0230 0.269  0     1          1         15
##  4 SPNCRNA.601     9.47        -0.0841 0.483  0     1          1         15
##  5 SPAC977.11     70.4         -0.819  0.201  0     1          1         15
##  6 SPAC977.13c    36.7          1.19   0.344  0.552 0.581      1         15
##  7 SPAC977.15     49.1          0.600  0.208  0     1          1         15
##  8 SPAC977.16c    83.2          0.148  0.239  0     1          1         15
##  9 SPNCRNA.607    60.4          0.0638 0.268  0     1          1         15
## 10 SPAC1F8.06     74.2         -1.58   0.298 -1.94  0.0520     1         15
## # … with 30,045 more rows

The padj column contains p-values (adjusted for multiple testing) of each test’s comparison. We can therefore use this table to help us focus on a subset of potentially interesting genes.

One way to visualise the differences between T0 and the other time points is with an MA plot, which shows the average expression of each gene plotted against the log-fold-change in expression between the samples:


Exercise:

Try to create the MA plot by yourself.

Why is the fold-change reported on a log-scale?

Link to full exercise


The plot above is in agreement with our PCA analysis, which showed that cells from T30 were overall transcriptomically quite distinct from T0 cells.

From the coloured clouds of points, we can also see that more genes seem to markedly increase than decrease their expression in relation to T0.