
Reproducible builds *without* patching tool chains - rrnewton
https://www.cloudseal.io/blog/2019-05-15-introduction-to-reproducible-builds
======
Foxboron
Ok. Let's stop the efforts into patching toolchains. We only need cloudseal to
make sure software is reproducible.

Here is the catch: How do we bootstrap cloudseal in a reproducible fashion. We
don't have cloudseal to do so. So we either can't reproduce cloudseal, because
we have neglected to patch the toolchains it quite possibly it depends on, or
we can reproduce cloudseal! But this is most likely because of the
reproducible builds effort.

I would like to evaluate the solution it tries to solve. But the technical
details but the lack of any code backing this effort and the spare details
makes it somewhat hard. No information if the product is Open-Source as well.

>in our initial experiments we’ve achieved 100% reproducibility for over ten
thousand unmodified Debian packages

Yes. Debian is currently doing this on _all_ their packages where 96% is
reproducible. How was this test done? What packages did they pick?
Unreproducible ones, reproducible ones?

A lot of questions in general.

~~~
rrnewton
Thanks for these points and questions!

First, we'd love it if you would try our early container prototype on your
build. Contact us and we'll add you to our closed alpha test. Also, we can
send you the full paper (preprint) on the Debian case study, with the gory
details.

I would never argue _against_ patching tool chains to make them deterministic.
If I were the maintainer I would always accept these patches! (And the GHC
community we're part of has made some efforts on that front.)

You raise an interesting question regarding bootstrapping which I believe is
similar to the good old Ken Thompson "Trusting Trust" attack:
[https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7...](https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf)

Well, there's always _some_ trusted code base (even with Coq), but I would
argue that it would be harder for us to execute the trusting-trust attack in a
container than in a compiler like Ken Thompson's original. The compiler can
inject malware when it recognizes that its compiling itself, but we run
arbitrary build systems, compiler tool chains, JITs. On top of that, one can
build the container itself without ever _using_ the container, so we wouldn't
have an opportunity to inject.

Any containers or VMs that we use in binary form need to be trusted for the
same reasons. And Debian Reproducible Builds still has the original trusting-
trust approach to contend with as well: i.e. they could bootstrap with a
previously-corrupted gcc binary. The DRB website lists a Ph.D. dissertation on
improving this situation, but I haven't read it, and I don't think it's been
implemented in DRB: [https://dwheeler.com/trusting-
trust/](https://dwheeler.com/trusting-trust/)

~~~
Foxboron
>First, we'd love it if you would try our early container prototype on your
build. Contact us and we'll add you to our closed alpha test.

If the code is available; sure. If it's not then sadly this isn't interesting.

>Also, we can send you the full paper (preprint) on the Debian case study,
with the gory details.

This would be more interesting as I'm doing a master thesis regarding
reproducible builds and rebuilders for debian at this moment, along with
contributing to the effort in general. Feel free to send a copy to
morten@linderud.pw

>The compiler can inject malware when it recognizes that its compiling itself,
but we run arbitrary build systems, compiler tool chains, JITs.

The compiler can inject anny backdoor into any code as long as its invoked. We
are not strictly limited to it compiling itself. I'm unsure how appealing to
the diversity of the things you compile is an argument. A lot of things
utilize `ld`, `gcc` or `tar` somewhere in their build chain regardless of what
they build. It's not going to be harder to backdoor something if you have
enough diversity.

This is a neat example of the possibilities you have with the rust compiler as
an example: [https://manishearth.github.io/blog/2016/12/02/reflections-
on...](https://manishearth.github.io/blog/2016/12/02/reflections-on-rusting-
trust/)

>The DRB website lists a Ph.D. dissertation on improving this situation, but I
haven't read it, and I don't think it's been implemented in DRB

Davids PhD hasn't really been implemented to any large extent in the real-
world, beside smaller toy examples, as it would need a second trusted compiler
to verify. This is hard work and I don't see anyone actually want to implement
a second compiler to C++18 (as an example) to help verify the gcc output they
are getting.

