In R create objects (something that contains a value) with <-
x <- 53.341
We can use functions to do specific operations
round(x) # round the the value of x
## [1] 53
Functions have options that can change their behaviour. Separate options using a comma
round(x, digits = 1) # round to one decimal point
## [1] 53.3
A vector contains several values all of the same type (numeric, character, logical)
x_chr <- c("dog", "cat", "goldfish") # character vector
x_num <- c(1, 5, 23.3, 55.2) # numeric vector
x_log <- c(TRUE, TRUE, FALSE, TRUE) # logical vector
Access values inside the vector with []
x_chr[2]
## [1] "cat"
x_num[c(2, 4)]
## [1] 5.0 55.2
Create logical vectors using conditions
x_num
## [1] 1.0 5.0 23.3 55.2
x_num > 20 # is x_num greater than 20
## [1] FALSE FALSE TRUE TRUE
x_num == 5 # is x_num equal to 5
## [1] FALSE TRUE FALSE FALSE
x_num %in% c(20, 30, 1) # is x_num contained the vector on the right
## [1] TRUE FALSE FALSE FALSE
Combine conditions with & (AND) and | (OR)
x_num
## [1] 1.0 5.0 23.3 55.2
x_num >= 10 & x_num <= 30 # is x_num greater than or equal to 10 AND smaller than or equal to 30
## [1] FALSE FALSE TRUE FALSE
x_num < 10 | x_num > 30 # is x_num smaller than 10 OR greater than 30
## [1] TRUE TRUE FALSE TRUE
We can use logical vectors (TRUE/FALSE) to subset vectors
x_num[x_num > 20] # return values of x_num where x_num is greater than 20
## [1] 23.3 55.2
To set the filtering conditions, several relational operators can be used:
==
is equal to!=
is different from%in%
is contained in>
is greater than>=
is greater than or equal toIt is also possible to combine several conditions together using the following logical operators:
&
AND|
ORA data.frame is a tabular object (rows and columns). Usually we create these from a file with read.csv()
# Add option to turn off behaviour of converting strings to factors
surveys <- read.csv("https://ndownloader.figshare.com/files/2292169",
stringsAsFactors = FALSE)
Subset data.frames using [rows, columns]
surveys[1:6, c(5, 11)] # rows 1 to 6 and columns 5 and 11
## plot_id species
## 1 2 albigula
## 2 2 albigula
## 3 2 albigula
## 4 2 albigula
## 5 2 albigula
## 6 2 albigula
Access individual columns by name using $
surveys$species # returns a vector with the values of the species column
## [1] "albigula" "albigula" "albigula" "albigula" "albigula" "albigula"
## [7] "albigula" "albigula" "albigula" "albigula" "..."
Sometimes we have missing values, encoded as NA
y <- c(23, 44, NA, 212)
We need to ensure these are dealt with properly
mean(y) # returns NA
## [1] NA
mean(y, na.rm = TRUE) # removes NA and then calculates mean
## [1] 93
The is.na()
function is important to deal with missing values:
y
## [1] 23 44 NA 212
# create a logical that is true if value is missing
is.na(y)
## [1] FALSE FALSE TRUE FALSE
# Negate that expression using !
!is.na(y)
## [1] TRUE TRUE FALSE TRUE
We can remove NA by using this function:
y[!is.na(y)] # return values of y that are not missing
## [1] 23 44 212