Activities

Container Technology for Reproducible Research

An introduction

Course description

Why should you want to learn about Container Technology?


Container Technology offers valuable skills and tools that enhance your research workflows. By understanding containerization, you can ensure the reproducibility of your experiments, collaborate effectively with peers and supervisors, and in the era of open science, easily share your research with a broad academic community. Moreover, as container technology continues to gain popularity across various industries, acquiring this knowledge equips you with valuable expertise that can be applied in academia and beyond, enhancing your career prospects and adaptability in the fast-evolving technological landscape.

Only sharing data and code does not suffice for the reproducibility of research. In fact, although using the same code or software for analysis, differences in the computing environment might result in differences in analysis outcomes. Therefore, in order to reproduce a (computational) experiment or analysis, the full stack of data, software and runtime environments must be replicated.

Container Technology can help solve these issues. Containers allow for the packaging and isolation of applications with their entire runtime environment - all of the files necessary to run (e.g. software, binary code, libraries, and configuration files). This makes it easy to move the contained application between different computer environments (e.g., a company data center, a public cloud server or your own personal laptop) while retaining full functionality. Because containers are based on open-source technology, you can get the latest and greatest advancements as soon as they are available.

Examples in which Container Technology can be useful:

  • Reproducibility of research environments ensures that experiments and analyses can be easily reproduced, reducing potential issues related to software versioning and compatibility.
  • Quickly running algorithms across multiple platforms such as Linux, Windows, MacOSX and High-Performance Computing (HPC)
  • Running algorithms (on HPC) that require more resources than available in personal computers
  • Sharing fully reproducible results 
  • Facilitating collaboration by providing a standardized way to share research workflows
  • Reproducing analysis on different topics such as sound, medical and sequencing data


Workshop set up

This workshop will provide the tools and techniques necessary to enhance reproducibility, efficiency, and collaboration in your research, by the use of 'container technology'. You will be equipped with the practical skills and knowledge needed to use containers for reproducible research and reusable data science methods.

The workshop will begin with an introduction to container technology, where you will gain a solid understanding of how containers work and their role in ensuring consistent and reproducible computational environments. Next, we will discuss the current out of the box containerized environments for data analysis. You will discover existing tools, frameworks, and platforms that leverage containerization to enable efficient and reproducible data analysis workflows. We will discuss real-world use cases and examples that demonstrate the value of containerization in addressing challenges related to reproducibility, and portability. Building upon this foundation, we will then discuss the practical aspect of building containers. You will learn step-by-step techniques to create your own containers tailored to your research needs. Through hands-on exercises, you will gain experience in containerizing your environments and data analysis workflows, making them easily shareable and reproducible. Finally, we will look into containerized reusable data science methods. We will discuss strategies for creating modular components - using container technology - that can be shared, reused, and extended by the research community. By embracing containerization, you will maximize the impact and efficiency of your data science work.

Prerequisites

  • Willingness to learn state of the art container technology
  • Hábrók account, see here to request an account*
  • Docker installation on your computer/laptop*
  • At least 20 GB of free space.

* No command line experience is needed but make sure you can connect to Hábrók via a terminal (no webportal). For technical support contact Venustiano Soancatl Aguilar (v.soancatl.aguilar@rug.nl).

 

Be sure to join, as this workshop is being offered only once as a pilot and will be repeated only if there is sufficient interest!

Course objectives

By the end of the workshop, you will have a solid understanding of

  • container technology  
  • running containers on HPC and personal computers
  • the ability to build your own containers,
  • the ability to share your containers
  • the insights to integrate containerization into your data analysis workflows

 

ECTS

0.2

Back to listing