Recap Day 1

Create objects

In R create objects (something that contains a value) with <-

x <- 53.341

Functions

We can use functions to do specific operations

round(x)   # round the the value of x

## [1] 53

Functions have options that can change their behaviour. Separate options using a comma

round(x, digits = 1) # round to one decimal point

## [1] 53.3

Vector

A vector contains several values all of the same type (numeric, character, logical)

x_chr <- c("dog", "cat", "goldfish")   # character vector
x_num <- c(1, 5, 23.3, 55.2)           # numeric vector
x_log <- c(TRUE, TRUE, FALSE, TRUE)    # logical vector

Access values inside the vector with []

x_chr[2]

## [1] "cat"

x_num[c(2, 4)]

## [1]  5.0 55.2

Conditions

Create logical vectors using conditions

x_num

## [1]  1.0  5.0 23.3 55.2

x_num > 20                # is x_num greater than 20

## [1] FALSE FALSE  TRUE  TRUE

x_num == 5                # is x_num equal to 5

## [1] FALSE  TRUE FALSE FALSE

x_num %in% c(20, 30, 1)   # is x_num contained the vector on the right

## [1]  TRUE FALSE FALSE FALSE

Combine conditions with & (AND) and | (OR)

x_num

## [1]  1.0  5.0 23.3 55.2

x_num >= 10 & x_num <= 30   # is x_num greater than or equal to 10 AND smaller than or equal to 30

## [1] FALSE FALSE  TRUE FALSE

x_num < 10 | x_num > 30   # is x_num smaller than 10 OR greater than 30

## [1]  TRUE  TRUE FALSE  TRUE

We can use logical vectors (TRUE/FALSE) to subset vectors

x_num[x_num > 20]   # return values of x_num where x_num is greater than 20

## [1] 23.3 55.2

To set the filtering conditions, several relational operators can be used:

== is equal to
!= is different from
%in% is contained in
> is greater than
>= is greater than or equal to

It is also possible to combine several conditions together using the following logical operators:

& AND
| OR

data.frame

A data.frame is a tabular object (rows and columns). Usually we create these from a file with read.csv()

# Add option to turn off behaviour of converting strings to factors
surveys <- read.csv("https://ndownloader.figshare.com/files/2292169",
                    stringsAsFactors = FALSE)

Subset data.frames using [rows, columns]

surveys[1:6, c(5, 11)]   # rows 1 to 6 and columns 5 and 11

##   plot_id  species
## 1       2 albigula
## 2       2 albigula
## 3       2 albigula
## 4       2 albigula
## 5       2 albigula
## 6       2 albigula

Access individual columns by name using $

surveys$species   # returns a vector with the values of the species column

##  [1] "albigula" "albigula" "albigula" "albigula" "albigula" "albigula"
##  [7] "albigula" "albigula" "albigula" "albigula" "..."

missing values

Sometimes we have missing values, encoded as NA

y <- c(23, 44, NA, 212)

We need to ensure these are dealt with properly

mean(y)   # returns NA

## [1] NA

mean(y, na.rm = TRUE)  # removes NA and then calculates mean

## [1] 93

The is.na() function is important to deal with missing values:

## [1]  23  44  NA 212

# create a logical that is true if value is missing
is.na(y)

## [1] FALSE FALSE  TRUE FALSE

# Negate that expression using !
!is.na(y)

## [1]  TRUE  TRUE FALSE  TRUE

We can remove NA by using this function:

y[!is.na(y)]  # return values of y that are not missing

## [1]  23  44 212