Find us on GitHub

A Data Carpentry Workshop

ATC Computer Training Lab, EMBL Heidelberg

2-4 May 2018

9:00 - 17:00

Instructors: Hugo Tavares, Florian Huber

Helpers: Toby Hodges, Malvika Sharan, Marc Gouw

Course Description

Data Carpentry workshops are for any researcher who has data they want to analyze, and no prior computational experience is required. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

From astronomy to molecular biology, our increasing capacity to collect data is changing science. It allows us to ask questions that previously could not have been answered, and it changes the impact that science has on society. Although petabytes of data are now available, most scientific disciplines are failing to translate this sea of data into scientific advances. The missing step between data collection and research progress is a lack of training for researchers in crucial skills for effectively managing and analyzing large amounts of data.

Data Carpentry addresses this gap by teaching researchers the fundamental data skills they need to conduct their work. Our goal is to provide researchers high-quality, domain-specific training covering the full lifecycle of data-driven research. We teach hands-on workshops in data organization, management, and analysis to increase data literacy and improve research efficiency. Our domain-specific approach allows us to tailor the data, content, and tools to reflect the specific data and analysis needs of different areas. Domain specificity also allows us to build new skills on knowledge frameworks familiar to learners, and to motivate workshops using real scientific questions and data relevant to the learners’ field of study. This approach allows learners to see immediate value in the skills they are learning and to put new techniques immediately into practice.

A workshop can’t teach a researcher everything they need to know about data management and analysis, however it drastically reduces the barrier to entry and imparts the skills for continued learning and engagement. Our ultimate goal is to enable data-driven research in diverse disciplines by creating strong communities of data scientists and empowering them to conduct more innovative and effective research.

General Information

On the first two days of this workshop we will cover basic skills in data organisation and analysis, primarily focusing on the statistical and programming language R. On the third day we will introduce more advanced data analysis methods with applications to genomic research (although any researcher with an interest in multi-dimensional data might also benefit from the materials covered on this day).

Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

Who: The course is aimed at graduate students and other researchers in the life sciences who would like to learn good practices in data management and analysis. Students need not have any prior experience in computational research, but some familiarity with working with tabular data on a computer is welcome. We create a friendly environment for learning to empower researchers and enable data driven discovery.

Where: Meyerhofstraße 1, 69117 Heidelberg, Germany. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating sytem (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Contact: Please mail Toby Hodges for more information.


Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Day 1

Data organization in spreadsheets (Hugo)

Data files for the lesson are available here.

Introduction to R (Florian)

Recap of basic R intro

Data manipulation in R (Hugo)

Day 2

Data manipulation in R continued (Hugo)

Data visualisation in R (Florian)

Interacting with databases in R

Day 3


The software required for this Data Carpentry workshop will be installed on the computers at our training venue.

However, if you want to use your own laptop, you will need working copies of the described software. Please make sure to install everything (or at least to download the installers) before the start of your workshop. Participants can bring and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop.

Please follow these Setup Instructions.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.