posted on 2020-12-07, 22:00authored byEric Burgueno
Reproducible research is a fundamental goal of scientific computing, and containers are tremendously helpful at this. Ensuring the portability of an analysis environment is fundamental for the peer-review process, even when data gravity continues to be challenging.
It is no secret that software packagers face the daunting task of dealing with dependency and integration hell. Yet, creators of scientific software continue to miss the memo and leave the problem of software distribution unsolved, or at the whims of a sysadmin who must often "hack the code" to deploy it in their HPC environment. Container technologies such as Docker and Singularity can definitely help, but they come with their own set of limitations which are often not known or poorly understood.
In this talk I will share my learnings from publishing the containers we created for powerPlant, our in-house HPC cluster. I will cover some of the strategies that make container recipes as small as possible, how to deal with raw data, how we integrate with Environment Modules, and in general how to make life easier for Data Scientists so that technology does not get in the way of Science.
ABOUT THE AUTHOR(S)
Name: Eric Burgueño
Bio: Hello there! I am an IT professional specialising in GNU/Linux and Open Source. I also have a Law degree but computers are my true passion, so I wrangle a bunch of them at the HPC Services Team at Plant & Food Research.
I have approximate knowledge of many things. I am a science enthusiast and an aspiring polyglot (in both human and computer lingos). I use Oxford commas and indent my code with spaces. When I take a break from being a geek, I enjoy discovering the world and other cultures, particularly if there's food, wine, or beer involved.