back to course home


Brief introduction to R

R is a statistical software and programming language. This means that, in order to do anything in R, we need to write down instructions using specific code.

In this course we will be using RStudio, , which is a program that provides with a convenient interface to work with R.

This is how RStudio tipically looks like:

There are 4 main panels, and we will learn about them progressively throughout the course.

Running commands in R

For now, let’s focus on the panel on the bottom left called Console. This is where R commands are executed and give an output.

For example, we can use R as a calculator:

# All the basic arithmetic operators in R
3 + 2    # sum
## [1] 5
3 - 2    # subtraction
## [1] 1
3 * 2    # multiplication
## [1] 6
3 / 2    # division
## [1] 1.5
3 ^ 2    # exponentiation
## [1] 9

Notice that lines starting with # symbol are “Comments”, which R ignores and does not run as code.

Creating objects

We can store the output of a command in objects that we can name.

To store the result of some operation in a variable we use <-, which we can think of as an arrow pointing left (note: no spaces between < and -!).

# Store the result of this operation in a variable called "x"
my_variable <- 3 + 2

We can then re-use this variable again an again, for example:

my_variable + 1
## [1] 6
my_variable + 2
## [1] 7
my_variable * 2 + 3
## [1] 13

Note that the value of my_variable itself did not change:

my_variable
## [1] 5

But we can change its value by “over-writing” it with a new value:

my_variable <- 6

Note that the previous value of my_variable is lost. If we want it back we’d have to run our previous commands again.

What do you think the value of my_variable is after the following command?

my_variable <- my_variable + 1

Functions

A lot of the functionality in R is in its functions, which allow us to perform certain operations (sometimes very complicated ones) with a single command.

We can recognise functions in code easily, because they always follow the pattern:

function(inputs)

that is, we have the function’s name, then open parenthesis, and inside the parenthesis we include all the necessary inputs (arguments) that the function needs to perform its operations.

We will come across many functions throughout the course, but here’s a few examples of functions that operate on a single number:

# Square root of 3
sqrt(3)
## [1] 1.732051
# Round 3.14
round(3.14)
## [1] 3

We can see that the round function rounds our number to the nearest integer (no decimal points). However, we might want to round the number to one, or two, or three, etc., decimal points instead. In fact, we can do this with the round() function, because it accepts extra arguments as its input:

# Round 3.14 to the nearest digit
round(3.14, digits = 1)
## [1] 3.1

Arguments within functions are separated by a comma ,.

But how would we know what that option was called? Or that it even existed?

The answer is: by looking at its documentation. Every function in R has a help page, which we can look at by using ? followed by the function’s name, for example ?round.

From its help page we can see that this function accepts two arguments called “x” and “digits”. “x” is the number we want to round, and “digits” is the number of decimal points we want to round to.

Use these examples to understand more about how functions work:

round(x = 2.72, digits = 1)
## [1] 2.7
round(digits = 1, x = 2.72)
## [1] 2.7
round(2.72, 1)
## [1] 2.7
round(1, 2.72)
## [1] 1

Scripts

R code can be saved into text documents, called scripts (usually with a file extension .R). Writing code into scripts is ideal, because you keep a record of all the operations that you did during your data analysis. This code can then be re-run, modified, and shared for reproducible data analysis!

You can start a new script by clicking on the menu button with the green “plus” symbol, on the top-left and selecting “R script”. A new document opens in the top left panel of RStudio.

This new panel is a text editor, with some extra functionality like code highlighting and some auto-completion functionality.

It is advised that all your analysis code is stored in a script file.

Reading data into R

To read files in R we need to learn how to tell the program where our file is… without using the file browser!

This is done by specifying the path to that file. This is like an address of where that file is located on the computer.

File paths are built like so:

directory/subdirectory/another_subdirectory/some_file.txt

  • Each directory is split by a /
  • The file name comes at the end (don’t forget to include the file extension!)
  • Spaces should be avoided, but in R they are tolerated

But what is the starting point of this path?

This varies between operating system, but generally a good way to start is to look at the working directory that R is using. This is the path that R is taking as its reference point while you are working.

Understanding your directory struture

First of all, let’s see what is the working directory that R is using at the moment:

getwd()

The output of that command will vary depending on your operating system and username.

For example, for a user called “slcu.user”, this might be the default working directory on a Mac:

/Users/slcu.user/

And this one on a Windows:

C:/Users/slcu.user/Documents

When doing an analysis, it’s best to set the working directory to be in the folder where your project is stored in.

Let’s say that the course materials are on your Desktop. So, schematically, this is where your data file is located:

Desktop
  |
  |_slcu_r_course
      |
      |_module01_data_and_files
          |
          |_data
             |
             |_dataset_tidy.csv

Therefore, it’s best if you first change your working directory to be in module01_data_and_files.

We can change the working directory using the function getwd(). For our previous example this might be something like:

# On a Mac
setwd("/Users/slcu.user/Desktop/slcu_r_course/module01_data_and_files")

# On a Windows
setwd("C:/Users/slcu.user/Desktop/slcu_r_course/module01_data_and_files")

Exercise

Change your working directory to your course materials folder following the examples above. Confirm that the working directory has changed by running the getwd() command again.

Tips:

  • When writting paths in R always use " quotes
  • Try pressing the Tab key when writing paths - RStudio will auto-complete the path for you
  • If you start your path with the ~ symbol, this means “your home directory”, which will vary depending on your username and operating system
  • paths can be relative to where your working directory is at the moment
    • Use ../ to symbolise the directory above your current directory

Read CSV files

To read a CSV file into R, we will use a function called read_csv().

This function is part of a package called tidyverse, which you will need to load first using the library() function:

library(tidyverse)

Note: there is another function called read.csv(), which is the usual default function to read CSV files, and is very similar to read_csv(). During the workshops we will use read_csv(), as it is more convenient.


Exercise

Create a script that does the following:

  • Load the tidyverse package (using library())
  • Set the working directory to the module01_data_and_files folder (using setwd())
  • Read the CSV file that you previously exported from the data directory (using read_csv()) (solution)

Homework

Based on the principles outlined in this module, try and organise your own files and data, and then read it into R.

If you encounter any difficulties, we will discuss them in the next module!


back to course home