back to course home


These materials are adapted from here.

Motivation

In order to effectively analyse data it is crucial that we have tidy data. Although organising, cleaning and formatting data is often seen as a “boring” and “unrewarding” task, it is absolutely vital to ensure that your data analysis efforts run smoothly later on and ensures that you can make the most out of your data!

Most of us use, at some point or another, a spreadsheet software (aka “Excel”) to input data in the computer. In this session, we’re going to discuss:

Data Organisation: best practices

Exercise:

A survey was done to determine the sex and weights of different animals occurring in different experimental plots. The experiment was repeated in two years.

Open discussion: what data-formatting problems did you find? And how would you solve them?

Data Organisation: formatting problems

Data Organisation: exporting data

Although Excel software is commonly used, its default file format is mostly only compatible with that program. For that reason, it’s best practice to store tabular data in a text-based format instead.

CSV (comma-separated-values) is a commonly used format:

species,year,month,day,weight_kg,height_cm
mouse,2014,3,21,2,10
dog,2013,7,2,20,60
cat,2016,12,7,4.2,25

This can be done from any spreadsheet program by choosing “Save As” and then selecting the file format to be “CSV”.

This is the recommended way to export data ready to be imported into R.

Exercise:

Further reading


back to course home