Reproducible builds *without* patching tool chains

Foxboron · on May 16, 2019

Ok. Let's stop the efforts into patching toolchains. We only need cloudseal to make sure software is reproducible.

Here is the catch: How do we bootstrap cloudseal in a reproducible fashion. We don't have cloudseal to do so. So we either can't reproduce cloudseal, because we have neglected to patch the toolchains it quite possibly it depends on, or we can reproduce cloudseal! But this is most likely because of the reproducible builds effort.

I would like to evaluate the solution it tries to solve. But the technical details but the lack of any code backing this effort and the spare details makes it somewhat hard. No information if the product is Open-Source as well.

>in our initial experiments we’ve achieved 100% reproducibility for over ten thousand unmodified Debian packages

Yes. Debian is currently doing this on all their packages where 96% is reproducible. How was this test done? What packages did they pick? Unreproducible ones, reproducible ones?

A lot of questions in general.

rrnewton · on May 16, 2019

Thanks for these points and questions!

First, we'd love it if you would try our early container prototype on your build. Contact us and we'll add you to our closed alpha test. Also, we can send you the full paper (preprint) on the Debian case study, with the gory details.

I would never argue against patching tool chains to make them deterministic. If I were the maintainer I would always accept these patches! (And the GHC community we're part of has made some efforts on that front.)

You raise an interesting question regarding bootstrapping which I believe is similar to the good old Ken Thompson "Trusting Trust" attack: https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7...

Well, there's always some trusted code base (even with Coq), but I would argue that it would be harder for us to execute the trusting-trust attack in a container than in a compiler like Ken Thompson's original. The compiler can inject malware when it recognizes that its compiling itself, but we run arbitrary build systems, compiler tool chains, JITs. On top of that, one can build the container itself without ever using the container, so we wouldn't have an opportunity to inject.

Any containers or VMs that we use in binary form need to be trusted for the same reasons. And Debian Reproducible Builds still has the original trusting-trust approach to contend with as well: i.e. they could bootstrap with a previously-corrupted gcc binary. The DRB website lists a Ph.D. dissertation on improving this situation, but I haven't read it, and I don't think it's been implemented in DRB: https://dwheeler.com/trusting-trust/

Foxboron · on May 16, 2019

>First, we'd love it if you would try our early container prototype on your build. Contact us and we'll add you to our closed alpha test.

If the code is available; sure. If it's not then sadly this isn't interesting.

>Also, we can send you the full paper (preprint) on the Debian case study, with the gory details.

This would be more interesting as I'm doing a master thesis regarding reproducible builds and rebuilders for debian at this moment, along with contributing to the effort in general. Feel free to send a copy to morten@linderud.pw

>The compiler can inject malware when it recognizes that its compiling itself, but we run arbitrary build systems, compiler tool chains, JITs.

The compiler can inject anny backdoor into any code as long as its invoked. We are not strictly limited to it compiling itself. I'm unsure how appealing to the diversity of the things you compile is an argument. A lot of things utilize `ld`, `gcc` or `tar` somewhere in their build chain regardless of what they build. It's not going to be harder to backdoor something if you have enough diversity.

This is a neat example of the possibilities you have with the rust compiler as an example: https://manishearth.github.io/blog/2016/12/02/reflections-on...

>The DRB website lists a Ph.D. dissertation on improving this situation, but I haven't read it, and I don't think it's been implemented in DRB

Davids PhD hasn't really been implemented to any large extent in the real-world, beside smaller toy examples, as it would need a second trusted compiler to verify. This is hard work and I don't see anyone actually want to implement a second compiler to C++18 (as an example) to help verify the gcc output they are getting.