Hacker News new | past | comments | ask | show | jobs | submit login

There's definitely a big push in that direction in the scientific community right now. More and more tools and pipelines are getting distributed as containerized workflows.

Big projects have realized the need to make their code available and versioned just as they do their input data, side by side with hashes recorded all the way along and reproducibility made as simple as possible. Now we're starting to see it trickle down into less organized/large/disciplined projects as well.




Containers as a technology are nice, but it's easy to fall back into the same traps that make software non-reproducible using containers as well. You can precisely specify all your dependencies, but it often takes a lot of effort to make that happen.

I'm a fan of the approach Guix developers are taking for scientific computing, because it makes reproducible software simple enough for people to use without too many headaches: https://hpc.guix.info/blog/2019/10/towards-reproducible-jupy...


Isn't the whole idea behind containers to eliminate external dependencies?


In a sense. You're right that once a container is built, it has few external dependencies. But you need to get those dependencies from somewhere at build-time, and if you're not careful it's easy to do that in a way that makes it extremely difficult to rebuild that container in the future.

To use a slightly more concrete example: let's say you're using a library in your container that has a severe bug. This bug results in incorrect computations, so you would like to upgrade to a fixed version.

Now let's say that when you built that container initially, you installed packages in the Dockerfile by running e.g. "pip install <package>". The problem is that once this image is built, it's nontrivial to rebuild this image and ensure you're using the same dependencies you were the first time. In a sense, you've lost that information (though you can probably start to figure it out with close inspection of the image).

Yes, there are usually ways around this with the language-specific package managers; Node has package-lock.json, Python has Pipfile.lock, etc. But it's not even close to being the default.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: