This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to R and RStudio

Overview

Teaching: 20 min
Exercises: 5 min
Questions
  • How to setup and organise an analysis project?

  • How to interact with R and RStudio?

  • How to install packages?

Objectives
  • Navigate the RStudio GUI.

  • Create a new R project.

  • Write your first script run some basic commands.

  • Install additional packages.

What is R? What is RStudio?

The term “R” is used to refer to both the programming language and the software that interprets the scripts written using it.

RStudio is an additional software that makes it easier to interact with R by providing tools that make programming easier. To function correctly, RStudio needs R and therefore both need to be installed on your computer.

Some advantages of using R for your data analysis include:

The RStudio Interface

RStudio is divided into four “panes”, illustrated below. The default layout is:

RStudio_startup

Getting set up

It is good practice to keep a set of related data, analyses, and text self-contained in a single folder called the working directory (usually a folder on your computer where you have all the files relating to a project you’re working on). The working directory is an important concept to understand. It is the place where R will look for and save files.

All of the scripts within this folder can then use relative paths to files. Relative paths indicate where inside the project a file is located (as opposed to absolute paths, which point to where a file is on a specific computer). Working this way makes it a lot easier to move your project around on your computer and share it with others without having to directly modify file paths in the individual scripts.

RStudio provides a helpful set of tools to do this through its Projects interface, which not only creates a working directory for you but also remembers its location (allowing you to quickly navigate to it). The interface also preserves custom settings and open files to make it easier to resume work after a break.

Creating a new project

Usually, you will already have a folder on your computer for your project, for example with some data you collected or downloaded from the web.

In the setup page, we gave instructions to download the data we’ll be working with to a directory called r-workshop. At the moment, your directory should schematically look like:

r-workshop/
└── data
    └── raw
        ├── gapminder1960to2010_socioeconomic.csv
        └── gapminder2010_socioeconomic.csv

To create an R Project within the r-workshop directory:

From now on, whenever you want to work on this project, open the the Rproj file that was created in your r-workshop folder, it should look similar to this:

This will ensure your working directory is automatically set correctly. This also means that you can move the project folder to a different location or even different computer. As long as you open the Rproj file, your working directory will be set correctly.

If you need to check your working directory, you can run getwd() on the console. If for some reason your working directory is not what it should be, you can change it in the RStudio interface by navigating in the file browser (bottom-right panel) to where your working directory should be, clicking on the blue gear icon More > Set As Working Directory. Alternatively, you can run setwd("/path/to/working/directory") on the console to reset your working directory. However, your scripts should not include this line, because it will fail on someone else’s computer.

Organizing your working directory

Using a consistent folder structure across your projects will help keep things organized and make it easy to find things in the future. In general, you might create directories (folders) for scripts, data, and documents. Here are some examples of suggested directories:

You may want additional directories or subdirectories depending on your project needs, but these should form the backbone of your working directory.

Writting your first code

Now that we have a project, let’s run our first commands in R.

On the console panel, type:

1 + 100

And R will print out the answer, with a preceding [1]. Don’t worry about this for now, we’ll explain that later. For now think of it as indicating output.

If you type in an incomplete command, R will wait for you to complete it:

> 1 +
+

Any time you hit return and the console shows a “+” instead of a “>”, it means it’s waiting for you to complete the command. If you want to cancel a command you can hit Esc and RStudio will give you back the > prompt.

When doing arithmetic operations, the order of those operations is the same as you would have learned back in school. From highest to lowest precedence:

3 + 5 * 2     # 5 * 2 will be evaluated first
(3 + 5) * 2   # 3 + 5 will be evaluated first

The text after each line of code is called a comment. Anything that follows after the hash # symbol is ignored by R when it executes code. Comments are extremely important part of the code, as they help you document the logic of the analysis you are running.

Really small or large numbers get a scientific notation:

2/10000
[1] 2e-04

The e is shorthand for “multiplied by 10x”. So 2e-4 is shorthand for 2 * 10^(-4).

You can write numbers in scientific notation too:

5e3  # same as 5 * 10^3
[1] 5000

Creating scripts

So far, we’ve been typing these commands directly in the R console. However, if we closed RStudio and later wanted to recreate these operations, there would be no record of them anywhere.

In practice, we should always write our code in a script, which is a plain text document with our commands written in it. To create a new R script go to File > New File > R Script.

This will open a panel on the top-left. This is a text editor, which in RStudio does some syntax highlighting (it colours the code) to help read the code.

As you’re adding code to the script, you can run it interactively on the console by pressing the shortcut Ctrl+Enter.

Installing (or updating) packages

Additional packages can be installed to extend the functionality of R. Most packages are available in a central repository called CRAN and can be installed from within R using the install.packages() function.

For example, to install (or update) the tidyverse package, you would run the following command on the console:

install.packages("tidyverse")

Because the install process accesses the CRAN repository, you will need an Internet connection to install packages.

After this, you can then load the package to use it in your analysis (we will see how to do this later).

There are other repositories available. A very popular one is the Bioconductor project, which contains thousands of packages for bioinformatics applications. These use a different installation command, but the instructions are always given in each package’s page.

Challenge

Install the package ggbeeswarm, which will give you extended functionality for the visualisation package ggplot2 (which we will cover in a future episode).

Solution

# this will download and install the package
install.packages("ggbeeswarm")

Data Tip: Organising your Analysis Project

A key step for a successful analysis is to start with a tidy directory structure. This ensures that you can keep track of what each file is and can avoid many headaches when analysis gets more complex.

Here’s some practical suggestions:

  • Early in your project create a few sub-directories such as scripts, data/raw, data/processed, figures and any others that might be relevant for your specific work.
  • To ensure reproducibility, save your code in scripts and define file paths relative to your project’s folder.
  • Keep your raw data separate from processed data, so that you can go back to it if needed.
  • In RStudio specifically, you can create an “R Project” within your project’s folder (File > New Project...). This will ensure you always have the right working directory set up.

Key Points

  • Create an ‘RStudio Project’ whenever you are initiating a new analysis.

  • When you come back to work on the project, open the RProj file to resume your work.

  • Ensure your project directory is well structured, for example with directories for scripts and data.

  • To document your analysis write and save your code in scripts.

  • You should try to comment your code. Use # to write comments in your scripts.

  • Use install.packages() to install (or update) packages.