
Reproducible-builds – Provide a verifiable path from source code to binary - lelf
https://reproducible-builds.org/
======
anjbe
Reproducible builds can have some unexpected benefits. For example, a package
build machine could track compiled files that have changed since it compiled
an older version of the software, and store them at the start of the archive
when zipping the installable files into package. Then when the user’s package
manager updates to the newer package, it only has to download enough of it to
extract all the files that have changed between versions, thus saving time and
bandwidth. Software that builds reproducibly will have fewer gratuitous
changes (dates, etc), thus making this process work better.

In fact, OpenBSD’s and PC-BSD’s package managers do this. It’s briefly touched
on in the last slide of this talk:
[http://www.openbsd.org/papers/eurobsdcon2015-packages.pdf](http://www.openbsd.org/papers/eurobsdcon2015-packages.pdf)

~~~
lamby
Mm. Reproducible builds have a really wide variety technical advantages,
including implicitly removing non-deterministic or unsafe behaviour (such as
downloading third-party code from the internet), detecting corrupted build
environments, reducing time-to-detection of a build host compromise,
validating cross-built packages, the potential space/time savings you mention
as well as numerous other debugging and testing advantages.

In other words, even if you are an _attacker_ , you want reproducible builds.
:)

------
LukeShu
I hadn't seen this spec before: [https://reproducible-builds.org/specs/source-
date-epoch/](https://reproducible-builds.org/specs/source-date-epoch/)

Parabola's efforts toward reproducibility have been essentially discarding
timestamps by forcing them all to Jan 1, 1990 (a simple date, with a wide gap
between it and the Unix epoch)
[https://projects.parabola.nu/packages/libretools.git/tree/sr...](https://projects.parabola.nu/packages/libretools.git/tree/src/librefetch/librefetch#n325)

Perhaps I should add support for this to Parabola's toolchain.

~~~
mapreri
SOURCE_DATE_EPOCH is useful in those cases where for a reason or another you
can't remove timestamps. Initially we wanted to remove everything, but we also
noticed it's easier to convince people/upstream to accept the support for
SOURCE_DATE_EPOCH than remove the timestamp entirely.

~~~
LukeShu
The `tar` format requires timestamps; so if the tarball products are to be
reproducible, then you need to stick something there.

------
lindig
What about the opposite direction: create builds from the same source that use
different memory layouts to make the life of attackers harder? I think
diversity could create resilience, too. Obviously it makes verification
harder.

~~~
potatosareok
There's various research projects out there to, at compiler level, create
different output binaries out of same source code to accomplish this task. I'm
not really following it I just remember a professor at my undergrad was
recruiting people for his lab on it, and I'm pretty sure his project was not
only one.
[https://ssllab.org/trac/wiki/MultiCompilerPublic](https://ssllab.org/trac/wiki/MultiCompilerPublic)

edit: see compiler/compile-time randomization? see some other similar projects
under these key terms.

------
w_t_payne
Reproducible builds are really rather important -- and open up some other
possibilities too...

------
Navarr
weak security cert :\

~~~
finnn
Are you talking about their TLS certificate? SSL Labs[0] gives them an A-, and
gives their certificate 100%.

[0]:[https://www.ssllabs.com/ssltest/analyze.html?d=reproducible-...](https://www.ssllabs.com/ssltest/analyze.html?d=reproducible-
builds.org)

~~~
Navarr
Oh, I probably have an old (bad) StartCom cached on my machine then.

[http://i.imgur.com/diXXwnQ.png](http://i.imgur.com/diXXwnQ.png)

------
dschiptsov
What is wrong (non-reproducible) with

    
    
      gcc a.c
    

or any other compiler?

~~~
gizmo686
See
[https://wiki.debian.org/ReproducibleBuilds/Howto](https://wiki.debian.org/ReproducibleBuilds/Howto)

You would probably need to pass in an "-frandom-seed" option and make sure
your environment is the same, but I would imagine you can get that to give you
a deterministic output fairly easily.

Most of the difficulty comes in the packaging, where you need to make sure
that files enter the archive with the same timestamps/permissions, and in the
same order.

------
tritium
I'm not really sure how to take this recent and emerging narrative that
software source code compilation is actually an unreliable, unpredictable
activity.

I find it really strange that there's a sudden, intense focus on the premise
that, given the same source code, when providing it to different people in the
wild and asking them to try and compile it, and then having them share their
results, doing so reveals variations and disparities that are difficult to
explain or account for.

I know _why_ people are suddenly paying attention to this detail: paranoia.

I remember when I noticed this concept had first started to gain some
traction. Once people started trying to compile certain open-source crypto
applications, and found that their binaries weren't byte-for-byte replicas, it
begged the question: Am I actually encrypting my confidential data with a
surreptitiously hobbled program? Is there a covert plot to infiltrate and
sabotage these sorts of software projects? How would I know if the encryption
was weakened in a subtle way, when I don't have the resources to try and crack
it at levels that demand well-funded hardware?

But, I wonder, is this something to worry about everywhere?

Is this why so many projects are now taking on this additional activity?

Is there a panic and an anxiety driving this trend, or are we just crossing
dotting our i's and our crossing our t's with yet another set of best
practices?

~~~
MaulingMonkey
We live in an era of viruses, rootkits, botnets, ransomware, adware, spyware,
and industrial sabotage. This is at the hands of script kiddies, political
activists, criminal networks, corporations, and every nation state with the
budget for it - or some close approximation thereof, it seems. Sourceforge has
been breached. Debian builds have been breached.

Have you managed to dodge _all_ of the above? I haven't. I've had accounts on
at least 4 services which were breached at some point. I've seen viruses,
adware, and spyware. I've seen script kiddies in my communities, distributing
backdoored executables, gathering passwords, and getting banned. And this is
only what has failed to escape my notice.

It's not paranoia if they really are out to get you.

> But, I wonder, is this something to worry about everywhere?

Would it be useful for you to be able to detect when your build machine has
been compromised, and starts producing malware laden executables? Reproducible
builds are a tool that can potentially do that for you - just diff the
binaries.

Whether or not it's worthwhile to set up depends on how much of a pain in the
ass making reproducible binaries is. Ongoing research into the subject is
interesting, if only because it directly correlates to making it less of a
pain in the ass when other people figure it out for you.

> Is there a panic and an anxiety driving this trend, or are we just crossing
> dotting our i's and our crossing our t's with yet another set of best
> practices?

I wouldn't describe such interest as panic or anxiety - but I wouldn't put it
down to merely getting in line with "best practices" either. It's an
interesting subject of research - and a potentially useful one at that.

