
How to make your software build reproducibly [pdf] - walterbell
https://reproducible.alioth.debian.org/presentations/2015-08-13-CCCamp15-outline.pdf
======
middleclick
Reproducible builds are _amazing_. Just think about it: you are assured that
the binary you are using corresponds to the source code. It is one of those
ideas that makes you think why wasn't this done before?

~~~
throwaway7767
The reason it was not done before is that we did not find a solution to the
"trusting trust" issue until quite recently. David Wheeler solved the problem
and did his PhD dissertation on this in 2009:
[http://www.dwheeler.com/trusting-trust/](http://www.dwheeler.com/trusting-
trust/)

Once that was solved, reproducible builds became a lot more interesting as we
no longer needed to trust the toolchain. As a result, all of the reproducible
build projects mentioned in the PDF (debian, tor, bitcoin being the pioneers
AFAIK) got started.

~~~
wyldfire
As a practical matter, how do you go about getting the trusted compiler for
DDC? Does this include an audit of the source for that compiler and/or the
path that the compiler source or compiler binaries take to their destination?

~~~
throwaway7767
> As a practical matter, how do you go about getting the trusted compiler for
> DDC? Does this include an audit of the source for that compiler and/or the
> path that the compiler source or compiler binaries take to their
> destination?

No, the important bit to realise about DDC is that you don't need a compiler
you know to be trusted (otherwise, of course, there would be no point). You
take two different compilers, and you just have to trust that they don't both
contain the same backdoor.

The most concise description I've seen is in Bruce Schneier's writeup. Quoting
him:

Suppose we have two completely independent compilers: A and T. More
specifically, we have source code SA of compiler A, and executable code EA and
ET. We want to determine if the binary of compiler A -- EA -- contains this
trusting trust attack.

Step 1: Compile SA with EA, yielding new executable X.

Step 2: Compile SA with ET, yielding new executable Y.

Since X and Y were generated by two different compilers, they should have
different binary code but be functionally equivalent. So far, so good. Now:

Step 3: Compile SA with X, yielding new executable V.

Step 4: Compile SA with Y, yielding new executable W.

Since X and Y are functionally equivalent, V and W should be bit-for-bit
equivalent.

~~~
wyldfire
The first I'd heard of DDC was here, so I might misunderstand some parts of
the concept. But Wheeler seems to state it pretty plainly in this summary:

> In the DDC technique, source code is compiled twice: once with a second
> (trusted) compiler (using the source code of the compiler’s parent), and
> then the compiler source code is compiled using the result of the first
> compilation. If the result is bit-for-bit identical with the untrusted
> executable, then the source code accurately represents the executable.

I'd speculate that you need at least one of the compilers to be trusted
otherwise they could both be subverted to produce X, Y that would produce
matching V, W binaries with the attack still in place on both.

~~~
throwaway7767
> I'd speculate that you need at least one of the compilers to be trusted
> otherwise they could both be subverted to produce X, Y that would produce
> matching V, W binaries with the attack still in place on both.

Ideally one of them is not compromised. But if both are compromised but by
different backdoors, the resulting binary will not be bit-for-bit identical
(as they include different backdoors). So you can still detect it, and that's
really all DDC can do.

The only way to get a bit-for-bit identical binary from two compromised
compilers is if they both have the same backdoor.

------
ilurk
> A more radical extension to the former approach is to actually check
> everything in your version control system. Everything as in the source of
> every single tool. That’s how it’s working when you are “building the world”
> on BSD-like systems. That’s also how Google is doing it internally. To make
> absolutely sure that everything is checked-in, you can even use “sandboxing”
> mechanisms

> Google recently started open-sourcing the tool they use internally to drive
> such large scale builds under the name Bazel. So despite its syntax that I
> personally find hard to read, it’s probably worth checking out.

Can anyone comment on the differences between Bazel and Debian's reproducible
builds?

Is the "Google is doing it internally" the same as Bazel? (they are eating
their dog food [1])

If it is then, does Bazel allow building from a reference VM or docker instead
of building every tool and library from source?

What I'd like to know is what is worth adopting if you're in a small company
(not a AmaGooFaceTwi).

[1]
[https://plus.google.com/+RipRowan/posts/eVeouesvaVX](https://plus.google.com/+RipRowan/posts/eVeouesvaVX)

~~~
walterbell
This discussion of Bazel, by a Google Blaze user, identifies four issues for
adoption of Bazel by OSS projects, [http://julipedia.meroh.net/2015/04/on-
bazel-and-open-source....](http://julipedia.meroh.net/2015/04/on-bazel-and-
open-source.html)

    
    
      1. Cross-project dependency tracking
      2. Software autoconfiguration
      3. It's not only about the build
      4. The Java blocker

~~~
jschwartzi
What do you have to do to get Basel to "build" a target that its designers
have never encountered? The biggest problem I have with modern build systems
is that they invest a lot in building specific types of artifacts. For
instance, you might be able to compile and link a C program in three or four
lines. In exchange, they fail miserably to account for situations like
statically linking raw binary data into an executable or facilitating the
automation of a target image build. So it's usually easy to build a desktop
application but not much else.

How easy would it be to automate a complete Android image build in Bazel?

------
voltagex_
See also
[http://media.ccc.de/browse/conferences/camp2015/camp2015-665...](http://media.ccc.de/browse/conferences/camp2015/camp2015-6657-how_to_make_your_software_build_reproducibly.html)

~~~
agumonkey
Amazing, I forgot to watch it. The security issues mentioned are ...
impressive.

ps: the first minute seem silent but it's not a bug.

------
chocolait
Another take of reproducible builds by Ted Unangst:
[http://www.tedunangst.com/flak/post/reproducible-builds-
are-...](http://www.tedunangst.com/flak/post/reproducible-builds-are-a-waste-
of-time)

------
devit
This is really important work, hopefully it is completed as soon as possible.

It basically allows multiple reputable people to independently build the code
and sign a statement that a given source code hash produces a given binary
hash (and that the source code hash is the correct one for a given package),
and have the presence of at least a certain amount of these signatures be
verified on package installation.

This means that it will no longer be possible to backdoor Debian without
changing source code by compromising the main build infrastructure.

------
lamby
If you have any specifi cquestions, swing by #debian-reproducible on OFTC.

~~~
voltagex_
I did, thanks for your help - I think I'm one linker script away from a
reproducible hello world.

