Compendium of functions

Key Points

Introduction to R and RStudio	Create an ‘RStudio Project’ whenever you are initiating a new analysis. When you come back to work on the project, open the RProj file to resume your work. Ensure your project directory is well structured, for example with directories for scripts and data. To document your analysis write and save your code in scripts. You should try to comment your code. Use `#` to write comments in your scripts. Use `install.packages()` to install (or update) packages.
Basic objects and data types in R	Assign values to objects using `<-` Functions perform operations on objects: they take inputs (arguments) and return outputs (values). The basic data structure in R is called a vector, which you construct with the `c()` function. The main types of vector values are: numeric (or double), integer, character and logical. To subset vectors use `[]` When doing vector operations R will ‘recycle’ shorter vectors if it needs to. Missing data is supported by functions and is represented by the special value `NA` Vectors can only contain one type of value. If there are mixed types of values in a vector, R will coerce those values into a single type according to the following hierarchy: character > numeric > logical
Working with Tabular Data	Use `library()` to load a library into R. You need to do this every time you start a new R session. Read data using the `read_()` family of functions (`read_csv()` and `read_tsv()` are two common types for comma- and tab-delimited values, respectively). In R tabular data is stored in a `data.frame` object. Columns in a `data.frame` are vectors. Therefore, a `data.frame` is a list* of vectors of the same length. A vector can only contain data of one type (e.g. all numeric, or all character). Therefore, each column of a `data.frame` can only be of one type also (although different columns may be of different types).
Data visualisation with `ggplot2`	To build a `ggplot2` graph you need to define: data, aesthetics, geometries (and scales). To change an aesthetic of our graph based on data, include it inside `aes()`. To manually change an aesthetic regardless of data then it goes outside `aes()`. You can overlay multiple geometries in the same graph, and control their aesthetics individually. Adjust scales of your graph using `scale_*` family of functions. You can custommise your graphs using pre-defined themes (e.g. `theme_classic()`) or more finely with the `theme()` function. To save graphs use the `ggsave()` function.
Manipulating variables (columns) with `dplyr`	Use `dplyr::select()` to select columns from a table. Select a range of columns using `:`, columns matching a string with `contains()`, and unselect columns by using `-`. Rename columns using `dplyr::rename()`. Modify or update columns using `dplyr::mutate()`. Chain several commands together with `%>%` pipes.
Manipulating observations (rows) with `dplyr`	Order rows in a table using `arrange()`. Use the `desc()` function to sort in descending order. Retain unique rows in a table using `distinct()`. Choose rows based on conditions using `filter()`. Conditions can be set using several operators: `>`, `>=`, `<`, `<=`, `==`, `!=`, `%in%`. Conditions can be combined using `&` and `\|`. The function `is.na()` can be used to identify missing values. It can be negated as `!is.na()` to find non-missing values. Use the `ifelse()` function to define two different outcomes of a condition.
Grouped operations using `dplyr`	Use `summarise()` to calculate summary statistics in your data (e.g. mean, median, maximum, minimum, quantiles, etc.). Chain together `group_by() %>% summarise()` to calculate those summaries across groups in the data (e.g. countries, years, world regions). Chain together `group_by() %>% mutate()` or `group_by() %>% filter()` to apply these functions based on groups in the data. As a safety measure, always remember to `ungroup()` tables after using `group_by()` operations.
Working with categorical data + Saving data	Use functions from the `stringr` package to manipulate strings. All these functions start with `str_`, making them easy to identify. Use factors to encode ordinal variables, ensuring the levels are set in a logical order.
Joining tables	Use `full_join()`, `left_join()`, `right_join()` and `inner_join()` to merge two tables together. Specify the column(s) to match between tables using the `by` option. Use `anti_join()` to identify the rows from the first table which do not have a match in the second table.
Data reshaping: from wide to long and back	Use `pivot_wider()` to reshape a table from long to wide format. Use `pivot_longer()` to reshape a table from wide to long format. To figure out which data format is more suited for a given analysis, it can help to think about what visualisation you want to make with `ggplot`: any aesthetics needed to build the graph should exist as columns of your table.
Data visualisation with `ggplot2` - part II	Use `labs()` to customise the labels of your plot’s aesthetics (e.g. `x`, `y`, `colour`, `fill`, `size`, etc.). Use `annotate()` to freely add text, segments or rectangles to your plot. Use built-in `theme_*()` functions to change the overall look of your graphs. Use `theme()` to change the look of every single element of the graph. Use `set_theme()` to change the theme for the rest of your R session. Use the `patchwork` package to compose graphs, using the `\|` and `/` operators to place two plots side-by-side or top-and-bottom, respectively. The `plot_layout()` function can be used to adjust your plot arrangement. Useful options are `widths` and `heights` to adjust the relative size of the panels, and `guides = 'collect'` to make a single legend common to the whole figure. Use `plot_annotation(tag = 'A')` to automatically add a letter tag to each panel.
Extra practice exercises	The initial exploration of data is crucial to detect any data quality issues that need fixing.

to be added…

Useful keyboard shortcuts

Assignment operator <-: Alt + - (PC); ⌘ + - (Mac)
Run line of code: Ctrl + Enter
Pipe %>%: Ctrl + Shift + M

Glossary

Arguments (functions): …
Assignment operator: the assignment operator in R is <-
Comment: Comments are preceded by # symbol and are used to add information to your code. Your comments could describe the objective of a particular piece of code.
Data frame: This is the basic type of object in R that stores tabular data. A tibble is a variant of this type of object.
Function: …
Object: …
Vector: One of the basic types of object in R. The word vector refers to atomic vectors, which are one-dimensional collections of values. They are created with the c() function.
R projects: …
Working directory: …

Introduction to R/tidyverse for Exploratory Data Analysis: Compendium of functions

Key Points

Compendium of functions

Useful keyboard shortcuts

Glossary