Hacker News new | past | comments | ask | show | jobs | submit login

I've gone down the same path. I love deterministic builds, and I think Docker's biggest fault is that to the average developer a Dockerfile _looks_ deterministic - and it even is for a while (build a container twice in a row on the same machine => same output), but then packages get updated in the package manager, base images get updated w/ the same tag, and when you rebuild a month later you get something completely different. Do that times 40 (the number of containers my team manages) and now fixing containers is a significant part of your job.

So in theory Nix would be perfect. But it's not, because it's so different. Get a tool from a vendor => won't work on Nix. Get an error => impossible to quickly find a solution on the web.

Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine. Currently consists of an immutable Docker Hub pull-through cache, full daily copies of the Ubuntu/Debian/Alpine package registries, full daily copies of most popular PPAs, daily copies of the PyPI index (we do a lot of ML), and arbitrary immutable file/URL cache.

So far it's been the best of both worlds in my day job: easy to write, easy to debug, wide software compatibility, and we have seen 0 issues due to non-determinism in containers that we moved over to StableBuild in my day job.




I think this issue is not specific to containers.

I've work many years on bare metal. We did (by requirement) acceptance tests, so we did need deterministic builds, before such thing had even a name, or at least before it was mentioned as much as nowadays.

Redhat has a lot of tooling around versioning of mirrors, channels, releases, updates, etc. But I'm so old that even foreman and spacewalk didn't exist, redhat satellite was out of the budget, and the project was migrating from the first versions of CentOS to Debian.

What I did was simply use DNS + Vhosts (dev, stage, prod + versions) for our own package mirrors, and bash+rsync (and of course, raid+backups), with both, CentOS and Debian (and our project packages).

So we had repos like prod/v1.1.0, stage/v1.1.0, dev/v1.1.0, dev/v2.0.0, dev/2.0.1, etc Allowing us to rebuild things without praying, backport bug fixings with confidence, etc

Feels old and simple, however I think it was the same problem/issue that people gets now (re)building containers.

If you need to be able to produce the same output from the same input, you need the same input.

BTW about stablebuild: nice project!


But also Nix solves more problems than Docker. For example if you need to use different versions of software for different projects. Nix lets you pick and choose the software that is visible in your current environment without having to build a new Docker image for every combination, which leads to a combinatorial explosion of images and is not practical.

But I also agree with all the flaws of Nix people are pointing out here.


I don't have any experience with Nix but regarding stable builds of Docker: we provide Java application, have all dependencies as fixed versions so when doing a release, if someone is not doing anything fishy (re-releasing particular version, which is bad-bad-bad) you will get exactly same binaries on top of the same image (again, considering you are not using `:latest` or somesuch)...


Until someone overwrites or deletes the Docker base image (regularly happens), or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).


I am convinced that any sort of free public service is fundamentally incomapatible with long term reproducible builds. It is simply unfair to expect free service to maintain archives forever and never clean them up, rename itself, or go out of business.

If you want reproducibility, the first step is to copy everything to a storage you control. Luckily, this is pretty cheap nowdays


> Until someone overwrites or deletes the Docker base image (regularly happens)

Any source of that claim?

> or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).

Well... please re-read my previous comment - we do Java thing so we use any JDK base image and then we slap our distribution on top of it (which are mostly fixed-version jars).

Of course if you are after perfection and require additional packages then you can install it via dpgk or somesuch but... do you really need that? What about security implications?


> Any source of that claim?

Any tag like ubuntu:20.04 -> this tag gets overwritten every time there's a new release (which is very often)

https://hub.docker.com/r/nvidia/cuda -> these get removed (see e.g. https://stackoverflow.com/questions/73513439/on-what-conditi...)


You gave example of nvidia and not ubuntu itself. What's more, you are referring to devel(opment) version, i.e. "1.0-devel-ubuntu20.04" which seems like a nightly so it's expected to be overriden (akin to "-SNAPSHOT" for java/maven)?

Besides, if you really need utmost stability you can use image digest instead of tag and you will always get exactly the same image...


Do you have an example that isn't Nvidia? They're infamous for terrible Linux support, so an egregious disregard for tag etiquette is entirely unsurprising.


> Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine.

Very nice project!


Another option for reproducible container images is https://github.com/reproducible-containers although you may need to cache package downloads yourself, depending on the distro you choose.


Yeah, very similar approach. We did this before, see e.g. https://www.stablebuild.com/blog/create-a-historic-ubuntu-pa... - but then figured everyone needs exactly the same packages cached, so why not set up a generic service for that.


For Debian, Ubuntu, and Arch Linux there are official snapshots available so you don't need to cache package downloads yourself. For example, https://snapshot.debian.org/.


Yes, fantastic work. Downside is that snapshot.debian.org is extremely slow, times out / errors out regularly - very annoying. See also e.g. https://github.com/spesmilo/electrum/issues/8496 for complaints (but it's pretty apparent once you integrate this in your builds).


Ubuntu now has snapshot.ubuntu.com, see https://ubuntu.com/blog/ubuntu-snapshots-on-azure-ensuring-p...

Here's a related discussion about reproducible builds by the Docker people, where they provide some more details: https://github.com/docker-library/official-images/issues/160...


Just pin the dependencies and your mostly fine right?


Yeah, but it's impossible to properly pin w/o running your own mirrors. Anything you install via apt is unpinnable, as old versions get removed when a new version is released; pinning multi-arch Docker base images is impossible because you can only pin on a tag which is not immutable (pinning on hashes is architecture dependent); Docker base images might get deleted (e.g. nvidia-cuda base images); pinning Python dependencies, even with a tool like Poetry is impossible, because people delete packages / versions from PyPI (e.g. jaxlib 0.4.1 this week); GitHub repos get deleted; the list goes on. So you need to mirror every dependency.


> Anything you install via apt is unpinnable, as old versions get removed when a new version is released

Huh, I have never had this issue with apt (Debian/Ubuntu) but frequently with apk/Alpine: The package's latest version this week gets deleted next week.


> apt is unpinnable, as old versions get removed

not necessarily, eg snapshot.debian.org

> pinning on hashes is architecture dependent

can't you pin the multi-arch manifest instead?

I still like StableBuild for protection against package deletion, and mirroring non-pinnable deps


The pricing page for StableBuild says

Free …

Number of Users 1

Number of Users 15GB

Is that a mistake or if not can you explain please?

https://www.stablebuild.com/pricing


Ah, yes, on mobile it shows the wrong pricing table... Copying here while I get it fixed:

Free => Access to all functionality, 1 user, 15GB traffic/month, 1GB of storage for files/URLs. $0

Pro => Unlimited users, 500GB traffic included (overage fees apply), 1TB of storage included. $199/mo

Enterprise => Unlimited users, 2,000GB traffic included (overage fees apply), 3TB of storage included, SAML/SSO. $499/mo


Are you associated with the project?


I’m an investor in StableBuild.


What is an efficient process to avoid using versions with known vulnerabilities for long times when using a tool like stablebuild?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: