Container Technology for Reproducible Research
An introduction
Course description |
Why should you want to learn about Container Technology?
Only sharing data and code does not suffice for the reproducibility of research. In fact, although using the same code or software for analysis, differences in the computing environment might result in differences in analysis outcomes. Therefore, in order to reproduce a (computational) experiment or analysis, the full stack of data, software and runtime environments must be replicated. Container Technology can help solve these issues. Containers allow for the packaging and isolation of applications with their entire runtime environment - all of the files necessary to run (e.g. software, binary code, libraries, and configuration files). This makes it easy to move the contained application between different computer environments (e.g., a company data center, a public cloud server or your own personal laptop) while retaining full functionality. Because containers are based on open-source technology, you can get the latest and greatest advancements as soon as they are available. Examples in which Container Technology can be useful:
This workshop will provide the tools and techniques necessary to enhance reproducibility, efficiency, and collaboration in your research, by the use of 'container technology'. You will be equipped with the practical skills and knowledge needed to use containers for reproducible research and reusable data science methods. The workshop will begin with an introduction to container technology, where you will gain a solid understanding of how containers work and their role in ensuring consistent and reproducible computational environments. Next, we will discuss the current out of the box containerized environments for data analysis. You will discover existing tools, frameworks, and platforms that leverage containerization to enable efficient and reproducible data analysis workflows. We will discuss real-world use cases and examples that demonstrate the value of containerization in addressing challenges related to reproducibility, and portability. Building upon this foundation, we will then discuss the practical aspect of building containers. You will learn step-by-step techniques to create your own containers tailored to your research needs. Through hands-on exercises, you will gain experience in containerizing your environments and data analysis workflows, making them easily shareable and reproducible. Finally, we will look into containerized reusable data science methods. We will discuss strategies for creating modular components - using container technology - that can be shared, reused, and extended by the research community. By embracing containerization, you will maximize the impact and efficiency of your data science work. Prerequisites
* No command line experience is needed but make sure you can connect to Hábrók via a terminal (no webportal). For technical support contact Venustiano Soancatl Aguilar (v.soancatl.aguilar@rug.nl).
Be sure to join, as this workshop is being offered only once as a pilot and will be repeated only if there is sufficient interest!
|
Course objectives |
By the end of the workshop, you will have a solid understanding of
|
ECTS |
0.2 |