
Status update from the Reproducible Builds project - lamby
https://lists.debian.org/debian-devel-announce/2017/07/msg00004.html
======
dingdingdang
So happy someone are spending time on this issue, it's like a breath of fresh
air and intelligence in the midst of all the usual software
(security/privacy/etc. take your pick) mayhem. It's worth reading
[https://reproducible-builds.org/](https://reproducible-builds.org/) for a
brief (re-)reminder on why this project is important.

Outtake: "With reproducible builds, multiple parties can redo this process
independently and ensure they all get exactly the same result. We can thus
gain confidence that a distributed binary code is indeed coming from a given
source code."

~~~
Kenji
Reproducible builds are extremely useful. There are more benefits. For
example, suppose you have a build server compiling software packages. If your
builds are not reproducible and you want to debug a core dump, but you have no
debug information, you are out of luck (well, you could dive into the assembly
code, but it's inconvenient). If you want to keep debug information, you need
to store them for _every single build_ (what a waste of storage...) because
the binary for each build is different. Not so with reproducible builds, you
could simply check out the old version and compile it with debug information!

~~~
mschuster91
Most huge Debian packages carry a separate -debug package so you can get the
symbols without huge recompilation times + having to set up a buildchain and
associated dev libraries of all the packages.

~~~
jakeogh
Similarly Gentoo has 'splitdebug':
[https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backt...](https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces#Stripping)

------
adamb
Beyond providing security, reproducible builds also provide an important
ingredient for caching build artifacts (and thus accelerating build times)
across CI and developer machines. They also can form the basis of a much
simpler deploy and update pipeline, where the version of source code deployed
is no longer as important. Instead a simple (recursive) binary diff can
identify which components of a system must be updated, and which have not
changed since the last deploy. This means a simpler state machine with fewer
edge cases that works more quickly and reliably than the alternative.

I'm very grateful for the work that this project has done and continues to do.
Thank you!

------
seagreen
Amazing work. Thanks so much to everyone who's contributing. The upstream bugs
filed are especially appreciated since they make the whole Linux ecosystem
more solid, not just Debian.

~~~
jbergstroem
Not just Linux; FreeBSD and NetBSD have been along the ride for a while:
[https://reproducible-builds.org/who/](https://reproducible-builds.org/who/)

~~~
cperciva
In some cases, a _very_ long while. I brought up the question of build
reproducibility at BSDCon 2003 because it was relevant to FreeBSD Update, and
a lot of my early FreeBSD commits were working on this.

~~~
jbergstroem
Yeah; I've seen both work and developer mindset about this for a long time in
bsd-centric mailing-lists. I tried to keep it short here though, seeing how
the debian developers have done a great job and didn't want to shift that
focus [in this thread].

I think its great that we have come to a point where packagers shift mindset
from "it works" to "we can reproduce the results" in more than one package
manager.

------
cperciva
Does anyone know if they've made the Packages file (repository metadata file,
listing the packages in the repository) build reproducibly yet?

I tripped over this a couple weeks ago and was both amused and annoyed, since
it seemed that packages were being listed in the file in a random order. I'm
asking here because it might already be fixed; we're using a slightly old
version of the package/repository tools.

~~~
jwilk
Why do you care about the order of packages in Packages?

What does "build reproducibly" even mean in this context?

~~~
cperciva
_Why do you care about the order of packages in Packages?_

In my personal case, so that when I build a repository which has some new
packages and some old packages, when I look at the resulting pull request in
github I can see that the packages which haven't changed have indeed not
changed.

 _What does "build reproducibly" even mean in this context?_

Two repositories with the same packages have identical Packages files. Or for
me, slightly more generally, when the Packages file changes, it changes as
minimally as possible.

------
phreack
In case anyone is not aware of what reproducibility is and why it's a worthy
goal, here's their statement:
[https://wiki.debian.org/ReproducibleBuilds/About](https://wiki.debian.org/ReproducibleBuilds/About)

------
pmoriarty
How does the kind of reproducibility spoken of here compare to that offered by
Guix and Nix?

~~~
arianvanp
Guix and Nix are input-reproducible. Given the same input description (input
being the source files and any dependencies) an output comes out. Builds are
then looked up in a build cache based on the hash of al lathe combined inputs.
However. The _output_ of Nix artifacts are not reproducible. Running the same
input twice will yield a different result.

Nix does some tricks to improve output reproducibility like building things in
sandboxes with fixed time, and using tarballs without modification dates but
output bit-by-bit reproducible is not their goal. They also don't have the
manpower for this.

Currently, a build is built by a trusted builderver for which you have the
public key. And you look up the built by input hash but have no way to check
if the thing the builderver is serving is legit. It's fully based on trust.

However, with debian putting so much effort in reproducible output, Nix can
benefit too. In the future, we would like to get rid of the 'trust-based'
build servers and instead move to a consensus model. Say if 3 servers give the
same output hash given an input hash, then we trust that download and avoid a
compile from source. If you still don't trust it, you can build from source
yourself and check if the source is trustworthy.

Summary: Nix does not do bit-by-bit reproducibility, but we benefit greatly
from the work that debian is doing. In the future we will look at setting up
infrastructure for buildservers with an output-hash based trust model instead
of an input based one. However this will take time.

Feel Free to ask any other questions.

~~~
rekado
> output bit-by-bit reproducible is not their goal

I think you are wrong.

The Nix people (and the Guix people, including myself) are also involved in
the reproducible builds project. I've met with a couple of them in Berlin last
year. It's not just Debian doing this.

I can't speak for Nix but for the Guix project bit-for-bit reproducibility is
an explicitly stated goal. It's very important and the reason why Guix is used
in several HPC environments as the foundation for reproducible research.

Disclaimer: I'm co-maintainer of GNU Guix and currently at a bioinfo
conference where I talked about Guix and reproducibility.

~~~
StavrosK
Nix and Guix sound interesting. I run Ubuntu currently, what's the easiest way
to get start with one or the other? I hear Guix is more user-friendly, is that
so?

Do I need to install a whole other OS, or can I install Guix in Ubuntu?

~~~
rekado
You can use Guix as a package manager on top of practically any variant of the
GNU system. By design it is completely independent of the libraries that your
system provides.

At work I'm using the same binaries on a cluster with CentOS 6 and on
workstations running a mix of Ubuntu, Fedora, CentOS 7, etc.

GuixSD ("Guix System Distribution") is the variant of the GNU system where the
principles of Guix are extended to the operating system, but you don't have to
use it if all you want is play with the package manager.

The easiest way to get started is to download "GNU Guix 0.13.0 Binary" for
your architecture and follow the instructions at
[http://www.gnu.org/software/guix/manual/html_node/Binary-
Ins...](http://www.gnu.org/software/guix/manual/html_node/Binary-
Installation.html).

If you are into Lisp you'll feel right at home with extending Guix. If you
don't care for Lisp you might at least find the command line interface to be a
little easier to understand than that of Nix, but really: that's a personal
preference.

~~~
StavrosK
Oh, that looks pretty straightforward, thank you!

------
pen2l
What does "reproducibility" mean? I understand and appreciate the importance
of reproducibility in the context of scientific experiments, but I don't
understand what it means in terms of computer programs. I am guessing it has
to do with being able to build on different architectures without issue?

~~~
cesarb
In the context of "reproducible builds", it means that if you compile the same
source code with the same compiler and build system, the output will be
completely identical, bit by bit. This is surprisingly hard to achieve in
practice.

Once they have reproducible builds, they can easily prove that each binary
package was built from the corresponding source code package: just have a
third party compile the source code again and generate the binary package, and
it should be identical (except for the signature). This reduces the need to
trust that the build machines haven't been compromised.

~~~
kobeya
Just piggybacking on this comment, you can do a whole lot more than just trust
that a few people have automated. Most people in Ubuntu get non-distro
packages from Launchpad, for example, which uses their own build servers. With
reproducible builds you can require BOTH launchpad and the developer's
signature for a package to be valid, which tremendously improves the security
situation.

~~~
kobeya
* trust that a few people have audited. (Damn autocorrect.)

------
morecoffee
Once we have reproducible builds, will it be possible to have verifiable
builds? As in, can we cryptographically show that source + compiler = binary?

Right now we can sign source code, we can sign binaries, but we can't shows
that source produced binaries. I would feel much happier about installing code
if I knew it was from a particular source or author.

~~~
nickpsecurity
Yes. The first standard for securing computer systems mandated some
protections against this. They were partly made by Paul Karger who invented
the compiler subversion Thompson wrote about a decade later. Most just focused
on that one thing where Karger et al went on to build systems that were secure
from ground up with some surviving NSA pentesting and analysis for 2-5 years.
Independently, people started building verified compilers and whole stacks w/
CPU's. They were initially simple with a lot to trust but got better over
time. Recently, the two schools have been merging more. Mainstream INFOSEC and
IT just ignores it all slowly reinventing it piece by piece with knock offs.
It's hard, has performance hit, or is built in something other than C language
so don't do it. (shrugs)

Here's several examples:

VLISP for Scheme48 whose papers are here:
[https://en.wikipedia.org/wiki/PreScheme](https://en.wikipedia.org/wiki/PreScheme)

C0 compiler + whole stack correctness in Verisoft
[http://www.verisoft.de/VerisoftRepository.html](http://www.verisoft.de/VerisoftRepository.html)

CompCert Compiler for C [http://compcert.inria.fr/](http://compcert.inria.fr/)

CakeML Subset of Standard ML [https://cakeml.org/](https://cakeml.org/)

Rockwell-Collins doing crypto DSL compiled to verified CPU
[http://www.ccs.neu.edu/home/pete/acl206/slides/hardin.pdf](http://www.ccs.neu.edu/home/pete/acl206/slides/hardin.pdf)

Karger's original paper with the attack from 1970's:
[https://www.acsac.org/2002/papers/classic-
multics.pdf](https://www.acsac.org/2002/papers/classic-multics.pdf)

Myer's landmark work on subversion in high-assurance security from 1980:
[http://csrc.nist.gov/publications/history/myer80.pdf](http://csrc.nist.gov/publications/history/myer80.pdf)

My framework I developed studying Karger back when I was building secure
things:

[http://pastebin.com/y3PufJ0V](http://pastebin.com/y3PufJ0V)

~~~
z29LiTp5qUC30n
Well you may want to mention stage0
[https://savannah.nongnu.org/projects/stage0/](https://savannah.nongnu.org/projects/stage0/)
it starts with just a 280byte hex monitor and builds up to a rather impressive
lisp AND forth, while building tools such as a text editor and line macro
assembler along the way

~~~
nickpsecurity
rain1 started a page for people interested in bootstrapping or countering
Karger's attack. Several of us are putting as many links as we can find to
small, human-understandable tools such as compilers. I added some formally-
verified or otherwise justifiable ones (eg 25-core CPU ain't gonna be simple).

[https://bootstrapping.miraheze.org/wiki/Main_Page](https://bootstrapping.miraheze.org/wiki/Main_Page)

[https://web.archive.org/web/20170724144929/https://bootstrap...](https://web.archive.org/web/20170724144929/https://bootstrapping.miraheze.org/wiki/Main_Page)

We're focused on content right now over presentation. So, it will look rough.
Hope you all enjoy it or learn something from the projects.

------
gtt
How do they achieve reproducibility with python and some other languages which
include timestamps and such?

~~~
anonacct37
[https://reproducible-builds.org/docs/timestamps/](https://reproducible-
builds.org/docs/timestamps/)

------
mabbo
One (misguided) counter argument I've heard from otherwise fantastic devs it's
the notion of adding randomness to unit tests in the hopes that if there's a
bug, at least _some_ builds will fail. In practice, I've seen those builds and
developers saying "yeah, sometimes you need to build it twice".

I think the solution is to give those devs who favor such techniques a
separate but easy to use fuzzing tool set that they can run just like their
unit tests, separate from their usual 'build' command. Give them their ability
to discover new bugs, but make it separate from the real build.

~~~
closeparen
Why would the randomness in unit tests affect the binary? RNGs are invoked
when the tests are run, not when they're built, and anyway, test code
shouldn't be part of the final binary.

~~~
davexunit
The test suite is used to validate the build. Intermittently failing test
suite -> intermittently failing build. You can always disable running a
problematic test suite, but that doesn't exactly inspire confidence in the
result.

------
Sir_Cmpwn
Compare this to Windows or OSX, where not only are you unable to build
packages yourself, but they are installed from downloads you find in disparate
places on the web, are not cryptographically signed by people you can trust,
and often include spyware anyway.

------
Cogito
Has anyone played with the tool they mentioned, diffoscope? Sounds interesting
and wonder how good it is at, for example, comparing excel files with VBA code
and formulas etc.

