Hacker News new | past | comments | ask | show | jobs | submit login
Towards Idempotent Rebuilds? (josefsson.org)
59 points by JNRowe 9 days ago | hide | past | favorite | 68 comments

I maintain the reproducible builds effort for my company and, please, let me tell you that this is the main pitfall of the whole effort.

There is always going to be a degree of un-reproducibility just due to the nature of math. If you don't have the same system, same compiler version (down to the minor or patch level), same dependency versions, same build flags, filesystem ordering, OS handling etc. . .you're going to get differences.

The RB project has readily disclosed that there is a degree of "significantly reproducible" sussing that each end user is going to have to do. The fact that the Debian maintainers chose not to display the degree of reproducibility is probably because showing low reproducibility scores undermines the efforts to evangelize the movement.

I think that's understandable, but also is a bit of a two edged sword. If we don't disclose scores, we allow for the misrepresentation that "this is safe because it has the word reproducible in it". If we disclose scores, we get articles like this saying "wow, thats a really low score, wtf" and short lived paranoia gives way to ambivalence about the whole thing.

It's difficult to capture the nuance in this in pithy tidbits, hence blog post on HN with me explaining this :).

The reproducible builds effort means same input (code, libraries, compiler, flags, build-environment), should give the same output (build artifacts). I don't think RB is trying to get reproducible across platforms and across varying compilers/dependencies -- that is a much bigger and much harder goal.

> There is always going to be a degree of un-reproducibility just due to the nature of math.

Fundamentally the math is deterministic (reproducible): if you do the same sequence of mathematical operations you get the same result. I gather you're getting at the non-associativity of floating point (eg additions), which is a fair point, but if you arrange to do your floating point operations in the same sequence then it will be reproducible.

Multithreaded code can easily be non-deterministic, with things like OS preemption leaking into the sandbox. Guaranteed determinism is hard, it needs a deterministic emulator.

Not really for building code, at least. It's a common problem, but it's not insurmountable. Use the equivalent of 'make -j1' or whatever...

This requires engineering on the part of compiler writers, but ultimately is solvable, given enough funding.

(About the compiler writers part: I'm thinking about GHC Haskell and the like... the Clangs/GCCs of the world will have no problems because of Translation Units. Things may change when Modules are a practical choice.)

You're gonna need a better programming language if you want to get to that level.

For these sorts of things, please do not assume hardware is reliable and compilers are bug free.

Bug free is less important for reproducible builds, so long as the bugs are reproducible. Compiler bugs are of course still a problem for other reasons.

My point is only that even when the software is correct, the reproducible build rate will either be 100% or we've got a representative sample set.

Until then, those issues will continue to be lumped together with all the software reasons it wasn't reproducible.

Sounds like a mechanism to test for bad hardware (akin to memtest86) and find compiler bugs =)

In a slightly different form, this has already happened: https://julialang.org/blog/2020/09/rr-memory-magic/

Bingo - even if we do everything right the percentage of matching builds isn't expected to be 100%. If it is, there just aren't enough samples.

My first Pentium could usually but not always math correctly until Intel replaced it during a recall.

From my experience, you either go full reproducible builds with nix, or none at all.

Sitting in the middle results with additional downsides from modifying pipeline without core upsides of reproducible builds.

Since hardware is the most difficult piece to replicate exactly for a reproducible build, should we say that for a complete reproducible environment we need an abstract virtual machine?

So we need our toolchains to run in WASM, too. Haha, only serious.

For whatever it is worth, Google internal builds using the internal version of bazel are deterministic and reproducible. And google spends a lot of time and effort keeping them that way. You do have to ensure that nothing ever sorts based on pointer value, for example.

Clang works fine as a compiler for this--there is nothing in it that normally produces different results due to timing or whatever. When something does leak in, we fix it upstream. You do have to ensure that no one uses __DATE__ or similar macros, or that you redefine them to a known value on the command line.

> (...) rebuilding packages using a bootsrappable build process, both seem orthogonal to the idempotent rebuild problem

You know what would be awesome? If someone could start from, let's say, live-bootstrap[1] and build towards matching the checksums for some distro kernel+toolchain.

It sounds like the same kind of problem, it all comes down to knowing what build conditions affect the resulting binaries, so I think you nailed the problem description on this and yes, it all feels very orthogonal from that perspective!

Thanks for writing this blog entry!

[1]: http://github.com/fosslinux/live-bootstrap

Coming from maths, I am confused by use of the term "idempotent" here. Unless we are talking about bootstrapping a compiler and I do not see how it applicable here. Am I missing something?

They are definitely redefining "idempotent" here, even from a software development standpoint.

Since we already have "deterministic build" and "reproducible build" for this, there's no need to overload "idempotent" (which already causes confusion).

Agree. "Reproducible" is the right term here.

An idempotent build is when you type "make" twice and the second one doesn't do anything.

This stuff matters. You can't trust open source supply chains any more. There have been too many incidents of someone inserting a security hole.

Idempotent sounds confusing to me as an idempotent HTTP request (not very mathy, I know) I could run many times over, the effect being as if it were done once on the first time.

For a software build it sounds like running `make` multiple times. But those builds will be idempotent even without reproducibility/determinism as they happen on the same system.

But isn't that just deterministic behavior?

"Multiple runs of the same compiler on the same code produced the same output"

No more revolutionary a concept than a hash function. Valuable like a hash function, but not revolutionary.

Well, a second call to "make" with unmodified source on a local system has the same resulting output because make knows how to avoid rebuilding output files when the source hasn't changed since the last build.

The goal of reproducible/deterministic builds is typically to be able to produce the same output files without already having that output. (That's because the people interested in reproducible/deterministic builds are usually trying to prove something about the relationship between the source and the build output. Make doesn't really do that -- the only thing it can prove is that the mod times of output files are later than the mod times of the source files they depend on, according to the make file.)

So, yes, make is typically deterministic in a general sense (given a good makefile and certain assumptions that are pretty reasonable in the context of a developer doing development on a local machine). But isn't what people are looking for from reproducible/deterministic builds.

I assumed idempotent here was going to be about some trick of incremental compilation. Not reproducible builds.

A lot of things in computer science that are conceptually functions actually aren’t, in this context when one compiles e.g. a C program there is a bunch of state, some pretty counter-intuitive, that the resulting object files or executable artifacts to change even though the source code did not. This has serious implication for relying on computer systems.

When a more systems-inclined person in computer science uses the word idempotent they very often mean something like “repeatable” or “deterministic”.

AFIAK this usage first gained traction in the theory of distributed systems, in which it (roughly) means that one application of an operation might change the state of the system, but subsequent applications will not change it further. Set union is sort of the canonical example.

I feel that "deterministic" is probably a better word here than "idempotent".

Not speaking to your comment specifically, but adding context to the thread: idempotence would mean having the same result, regardless of whether you run something once, twice, or 10 times, over the same input set. Idempotence requires but goes beyond determinism, as it also accounts for any externalities and side effects.

For example, let’s consider a function that accepts a string as an argument and then writes that string to disk. We can consider the disk state as a side effect of the function.

The function itself is perfectly deterministic (output string is a predictable and consistent function of input string), but depending on the implementation of side effects it may not be idempotent. If, for example, this function room simply added the output to a file “output.txt”, this file would grow with every incantation, which is not idempotent. If instead we overwrote the output file so that it reflects only the singular output of the previous run, then the side effects would also be deterministic, that would be idempotent.

At a pedantic level you could redefine your scope of deterministic to not just include outputs, but also include the external state and side effects, but for practical purposes the above distinction is generally how deterministic and idempotent would be applied in practice in computing. I cannot speak to the math-centric view, if there is a different definition there.

This captures the mathematical definition too which is just that an element x is idempotent if x applied to itself gives you back x. I.e, what you said that the function applied many times produces no change to the system.

I don't think the string to disk example can be idempotent no matter how you implement it as it is dependent on the external state of the disk.

That may be true on a theoretical level, but if you’re talking practically to a data engineer that’s the definition of idempotence you’re going to find they expect.

Practical consideration might be that a disk may experience a fault in one of the executions: works fine a hundred times but fails on 101st (eg hits disk-full error or a bad block).

But that simply means it's harder to implement true idempotency when it comes to disk usage.

This is why the problem is usually simplified to ignore unhappy paths.

I fear y'all (or I) may be dancing a bit too deeply into hypothetical here...

The idempotent version of this function doesn't blindly write. Most of Ansible, for example, is Python code doing 'state machines' or whatever - checking if changes are needed and facilitating if so.

Where y'all assume one function is, actually, an entire script/library. Perhaps I'm fooled by party tricks?

Beyond all of this, the disk full example is nebulous. On Linux (at least)... if you open an existing file for writing, the space it is using is reserved. Writing the same data back out should always be possible... just kind of silly.

It brings about the risk of corruption in transit; connectors faulty or what-have-we. Physical failure like this isn't something we can expect software to handle/resolve IMO. Wargames and gremlins combined!

To truly tie all this together I think we have to consider atomicity

> Writing the same data back out should always be possible...

Depends on the implementation: maybe you open a temp file and then mv it into place after you are done writing (for increased atomicity)?

But as I already said, in practice we ignore these externalities because it makes the problem a lot harder for minor improvements — not because this isn't something software can "handle/resolve".

Maybe even hermetic builds? https://bazel.build/basics/hermeticity

To give a concrete example: some build systems embed a "build number" in the output that increments for each non-clean build (yeah this is stupid but I have seen it).

This is deterministic (doesn't change randomly), but not idempotent.

We already have isomorphic web apps, some day we might see a bundler advertised as "Calabi-Yau with benefits of mirror symmetry and vanishing first Chern class".

A second countable language basis.

Idempotent, deterministic builds are an argument in favor of synthetic virtual machines. Synthetic VMs like the JVM or CLR should, at least insofar as they don't contain native code, execute in a manner largely isolated from the vagaries of minor hardware/OS differences. Not an expert, but native VMs do not and cannot isolate processes from hardware details (e.g. Xen, Virtual Box), or from OS details (e.g. Docker, containerd).

This is most true of software VMs too, which leak information via side channels.

Men will do anything except go to therapy ^H^H^H^H^H^H^H^H^H^H learn Nix.

It's odd that the article didn't mention nix, at least to say "We didn't like it for this reason".

Nix's definition of "reproducible" is different from what the author says they want though. Nix gives you a similar system from a given config, when it works, where the author wants bitwise identical results (even when compiled at a much later point in time and on a different machine, which you can't accomplish just from caching (unless the caches are remotely hosted)).

The reason they don't like Nix should be obvious; it blatantly and vocally doesn't even try to do the thing the authors want. It'd be just as relevant to bring up Gentoo or something -- another fine project, like Nix, but not especially helpful for the stated problem, and so obviously so that it shouldn't be worth paying lip service to in the article.

"Doesn't even try" is too strong.

"When compiling from the same source on independent infrastructure yields bit-by-bit identical results, this gives confidence that the build infrastructure was not compromised and the artifact really does correspond to the source." - https://reproducible.nixos.org/

That's fair. Attempting to rationalize my choice of language:

- Their homepage defines reproducibility to be something other than bitwise identical results.

- Getting actual bitwise reproducible builds is still hard for most large projects, even with the work Nix has done (note that the quote you pulled doesn't actually say that Nix _does_ provide such builds, and the rest of that linked text just tries to highlight some of the tools you have at your disposal to achieve that).

They do "try" insofar as they're aware of the desire for bitwise identical results, provide them in some cases, and provide tools to diagnose problems. They're also 20 years old and more than happy to call the current results reproducible. At the very least, it doesn't look like one of the top properties for the project.

I thought nix did perfect bit-for-bit reproductions at any time on any machine so long as you used flakes to lock the versions of inputs (and also I gather that flakes by default lock out some remaining sources of nondeterminism like reading environment vars). Is that not correct? (I don't see flakes mentioned in that link, nor the linked lobsters discussion, and the only mention in HN comments of the past was someone arguing that it doesn't count because some packages still fail to reproduce, which is notable but would still leave it in at least the same ballpark as ex. Debian)

> Is that not correct?

It's not. That would be comparable to magic.

Just writing to agree, since you are getting pushback.

@Foxboron wrote a compelling if clickbaity post a few months back to poke at the bit-for-bit interpretation of what reproducibility means in nixland: https://linderud.dev/blog/nixos-is-not-reproducible/

It didn't get much traction here (https://news.ycombinator.com/from?site=linderud.dev) but there was more lively discussion on lobsters: https://lobste.rs/s/jpoy4q/nixos_is_not_reproducible

(That said, the nix ecosystem does have levers and practices for working to weed out common sources of build reproducibility problems and the community is reasonably interested/active here. But there isn't any magic in it that fixes all bit-for-bit issues. People still have to catch, tag, and fix them.)

>Nix gives you a similar system from a given config, when it works

Not going to lie, I either don't understand what you're saying, or don't understand the shade you're trying to throw.

Nix is going to do as good, or better a job, than any other solution other there. Probably with a lot less duck-tape.

Not to mention that much of Nix packages are reliably bit-for-bit reproducible.

Not to not to mention that bit-for-bit reproducability is really only beneficial for "trust", not reliability or re-playability.

And no, comparing Nix to Gentoo in this case only very much confirms you don't understand Nix. Maybe you should check out the talk from a past NixCon where the speaker rebuilds Firefox from 10 years ago and boots up Flash swfs.

> shade

There is none. Nix does lots of great things, like that Firefox demo you mentioned. TFA wants bitwise identical builds, and Nix isn't great at that in general, by their own admission.

> various praises for Nix

Yep, Nix is great.

> downplaying TFA's wants

That's worth writing somewhere probably, but not when arguing that Nix solves their problem. If their problem isn't real then potato peelers solve it just as well. Nix isn't really relevant at that point.

> Nix compared to Gentoo

The commonality is that neither solve TFA's problem, and neither try very hard to do so.

> Nix is going to do as good, or better a job, than any other solution other there.

Bazel does this sort of thing remarkably well. Internal to google, it provides a multi-billion line monorepo with deterministic, highly-concurrent, bit-wise identical builds.

I'm not as familiar with Nix, but knowing what it takes for Bazel to do it, I would be surprised if Nix is as good.

> Probably with a lot less duck-tape.

And that is probably true.

> Not to mention that much of Nix packages are reliably bit-for-bit reproducible.

We don't know that. Nobody has even attempted to build more then an incredibly small fraction of the packages `nixpkgs` provides.

Considering the article confuses idempotence for reproducibility... Nix was probably beyond them.

I spent some time trying to prove two go binaries were the same in the name of reproducible builds but couldn’t figure out if it was possible, even though I had built both myself and knew they were in effect the exact same. Go binaries have some sort of randomness (timestamp? Map entry? No idea) that I couldn’t pin down. Sometimes the hash of the binaries were the same and sometimes they weren’t. Short of cataloguing and hashing every file that went into the build I couldn’t figure it out and gave up.

It is possible to make Go builds reproducible. The first step might be to inspect two binaries with the diffoscope (https://diffoscope.org).

I was expecting to see the word "yocto", did I miss it?

This goes quite far along the path, building all the build tools and toolchain to the same version before building the packages.

Idempotent is a somewhat confusing word choice here. "Verifiable builds" seems more a accurate description of what they want. (See also https://go.dev/blog/rebuild.)

As I know, LLVM optimizer (maybe also GCC) is not idempotent, does it mean we need to pay performance for idempotent buildings?

Shouldn't same system+same version+same flags produce the same result?

The algorithms used might not produce a deterministic result (like order is different), and maybe it also depends on timings of the CPU with threads scheduled on different cores, or faster/slower cores, etc.

Deterministic builds aren't always that straightforward

All C and C++ compilers in common use are single-threaded, and parallelism is achieved by compiling multiple files at once. An optimizer pass producing nondeterministic results due to a bug is not impossible, but would be a bug to fix. Multithreaded linkers which are expected to have nondeterministic results do exist, and you have to not use that for reproduce able builds.

There's a long tail of tricky problems to solve for reproducible builds, but a specific build of a compiler producing different executable code from run to run hasn't really been one of them.

Now I'm wondering how differently my multithreaded tcc fork produces code. I don't think it should but it always wouldn't surprise me if there was some shared state remaining.

Forcing the ordering to be deterministic is a well-established part of the process



I'm using nix and clang/llvm - never have I ever got a different binary.

If what you're saying is true - stage 3 bootstrap wouldn't be possible.

bit obvious, but all it takes something that takes something that changes everytime to make it non-reproducible - like __DATE__ and __TIME__

Nix solves this by forcing __DATE__ and __TIME__ to be a single immutable value in the sandbox for all builds. If you look in the store the file timestamps will be this value.

Google takes great care to ensure that the LLVM optimizer produces bit-identical results for identical inputs across plenty of machines. If the llvm optimizer produced non-bit-identical results for identical inputs (all sources and command-line flags), Google would consider that a high priority bug.

So I'm reasonably sure it does produce bit-identical results.

Compilation performance or end executable performance?

The latter I mean, conservative optimization level is benefit to keep idempotency


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact