
You don’t need reproducible builds - ingve
http://blog.cmpxchg8b.com/2020/07/you-dont-need-reproducible-builds.html
======
Diggsey
There are a lot of reasons to prefer reproducible builds, and many of them are
not security related... It seems a bit presumptuous to argue that noone needs
reproducible builds because one particular security argument is flawed.

First, a non-flawed security argument: it only takes one non-malicious person
to build a package from source and find that it doesn't match the distributed
binary to spot a problem. Sure, if you don't compile the binaries yourself,
you might not find out until later that a binary was compromised, but that's
still better than never finding out. The reality is that most people don't
want to spend time building all their packages from source...

More generally, reproducible builds make build artifacts a pure function of
their inputs. There are countless reasons why this might be desirable.

\- If a binary is lost, it can be rebuilt _exactly_ as it was. You only need
to ensure the source is preserved.

\- If a particular version of the code is tested, and the binary is not a pure
function of the code, then you haven't really tested the binary. Bugs could
still be introduced that were not caught during testing because your build is
non-deterministic.

\- It provides a foundation for your entire OS image to be built
deterministically.

\- If you use a build cache, intermediate artifacts can be cached more easily,
and use less space. For example, changing the code from A -> B -> A will
result in two distinct artifacts instead of three.

~~~
arcticbull
> \- If a particular version of the code is tested, and the binary is not a
> pure function of the code, then you haven't really tested the binary. Bugs
> could still be introduced that were not caught during testing because your
> build is non-deterministic.

This is a weaker argument IMO because when building for test, generally, all
optimizations are disabled, debug info is emitted, symbols are un-stripped,
and so on. The unit under test is usually very different from the shipped
artifact even at the module level. Not least because the test functions are
compiled in.

~~~
wildmanx
If that's really what you are doing, then you are doing it horribly wrong.

~~~
arcticbull
You test prod builds?

~~~
joshuamorton
As part of the release process, yes...absolutely.

Compared to basically every other part of release qualification (manual QA,
canarying, etc.) re-testing on the prod build is so unbelievably cheap there's
no reason to not.

~~~
arcticbull
I suppose we're referring to different kinds of testing. Manual QA, etc, on
prod sure.

But if you're building client software artifacts, to unit test or integration
test involves building different software in a different configuration, with
different software and running it in a test harness. To facilitate unit
testing or integration testing client software you:

\- Build with a lower optimization level (-O0 usually) so that the generated
code bares even a passing resemblance to what you actually wrote and your
debugger can follow along.

\- Generate debug info.

\- Avoid stripping symbols.

\- Enable logging.

\- Build and link your tests code into a library artifact.

\- Run it in a test harness.

That's not testing what you ship. It's testing something pretty close,
obviously, but does not bear any semblance to a deterministic build.

~~~
joshuamorton
No, I'm saying you re-run your automated unit tests on the release build
because there's no reason not to.

If you have test failures in opt/stripped mode, they're more annoying to debug
yes, but wouldn't you want to know?

Another way of putting this is that when you

> \- Build and link your tests code into a library artifact.

You build and link the same object files that will be built into the release
binary artifact, deterministically.

> \- Enable logging.

I, uhh, usually do this in my released software too.

~~~
withinboredom
> I, uhh, usually do this in my released software too.

Do you have any idea how annoying it is to get logged garbage when starting
something on the command line (looking at you IntelliJ)?

I once spent several weeks hunting through Hadoop stack traces for a null
pointer exception that was being thrown in a log function. If the logging
wasn’t being done in production, I wouldn’t have wasted my life and could have
been doing useful things. Sadly, shutting down the cluster to patch it wasn’t
an option, so I had to work around it by calling something unrelated to ensure
the variable wasn’t null when it did log.

~~~
joshuamorton
Yes, which is why I regularly (think quarterly or annually) check to make sure
we have good log hygiene, and are logging at appropriate log levels and not
logging useless information.

I have alerting set up to page me if the things I care about start logging
more than the occasional item at ERROR, so I have to pay some attention or I
get pestered.

------
lucideer
> _Q. If a user has chosen to trust a platform where all binaries must be
> codesigned by the vendor, but doesn’t trust the vendor, then reproducible
> builds allow them to verify the vendor isn’t malicious._

> _I think this is a fantasy threat model. If the user does discover the
> vendor was malicious, what are they supposed to do?_

> _The malicious vendor can simply refuse to provide them with signed security
> updates instead, so this threat model doesn’t work._

For me, this is one of the primary benefits of reproducible builds and the
author's dismissal of it as fantasy is unconvincing.

Vendors trade on trust. If a reputable application uses reproducible builds
and discovers that a platform vendor is modifying their application on their
platform, that information is damaging to that vendor's reputation. That's
extremely useful leverage against potentially malicious vendors.

~~~
mrob
And vendor malice isn't the only reason why the software might be harmful.
What if the vendor's toolchain was compromised without their knowledge?
Reproducible builds provide a means for third parties to verify that it
wasn't.

~~~
R0b0t1
The holy grail of reproducible builds is achieving the same binary via
different compilers. This was, at least when I started looking into
reproducible builds, why I wanted to do it and why others wanted to do it. The
other benefits are kind of side benefits.

~~~
pests
This can't be correct. What would be the point of different compilers be then?
There's no way that every compiler would produce the same exact insrtuctions
for each respected input. There would be no point to using an optimizing
compiler or one with better intrinsic support.

~~~
rramdin
Perhaps you're comparing a multithreaded version with a single threaded
version, or a single-host build vs distributed build.

When adding new features to a compiler, you might want to verify that the old
and new versions have the same output given the same input. If your compilers
produced non-deterministic output, this exercise would not be possible.

~~~
pests
I took the statement "different compilers" to be completely different projects
and codebases. Of course reproducable builds make sense in that situation. But
why in the world would you ever expect gcc to 100% always match the output of
llvm? That doesn't make sense.

------
rocqua
So, we have source code for the signal app.

We can audit that source code, to ensure no key-leakage occurs.

Next, I install signal on my phone via the app store. How do I know the app I
installed matches the source code that was audited? After all, google / Apple
could decide / be forced to provide a modified binary.

Reproducible builds work for that.

Alternatively, consider debian. I ain't got time to compile every package I
want from source (otherwise I'd run gentoo). Requires so much more dependency
hell I'd rather avoid. With reproducible builds, I can check the source for
any package, and so can every other curious person. This way, instead of
everyone needing to set up their own build environment for every package, we
can all depend that once in a while, someone checks whether a build actually
comes from trusted source code.

Is it less secure than building myself? YES. Does it make deploying
compromised code more likely to be detected than without reproducible builds?
Also yes!

Heck, if we change from "depend that once in a while, someone checks whether a
build actually comes from trusted source code." To a federated system of
trusted checkers. It gives pretty nice guarantees, and makes depolying
compromised code pretty damn scary.

Essentially. Whenever the person compiling / signing the binary is not the
person writing the source code, reproducible builds are pretty dang nice.

~~~
wildmanx
> Next, I install signal on my phone via the app store. How do I know the app
> I installed matches the source code that was audited? After all, google /
> Apple could decide / be forced to provide a modified binary.

> Reproducible builds work for that.

You read the original post, right? He discusses this at length. Actually,
right in the beginning.

In short, if you go through the dance of building the binary yourself to
compare it to what's installed, then you could just use the binary that you
just built, without ever looking at what the app store has.

~~~
ISL
If the app store distributes binaries to thousands of people, and only one of
them rebuilds from source to check, those thousands of people gain a
substantial (but not perfect) level of protection.

~~~
ansible
Yes, it would be "not community-minded" if you did a build, and it didn't
match the app store, and you told _no one_.

------
ex_amazon_sde
A number of large companies are quietly moving towards reproducible build.
Sorry if I cannot name the names.

As a side note, reproducible builds implemented in Debian were also useful to
spot various other problem: small differences in build environment that would
make debugging more difficult.

Sometimes the same application will have different performance depending on
the build due to memory alignment, data ordering, cache friendliness.

Finally, the article is making some claims that are, frankly, incorrect:

> Q. It’s easier to audit source code than binaries, and this will make it
> harder for vendors to hide malicious code. > I don’t think this is true,
> because of “bugdoors”. A bugdoor is simply an intentional security
> vulnerability that the vendor can "exploit" when they want backdoor access.

Adding a backdoor and compiling a new "custom" binary might take 10 minutes
and a lot of people in a company could do it and leave no traces.

Writing a "bugdoor", committing it and passing code reviews is very different.
You might have to justify why you are touching a product / component / library
that might be completely unrelated to your usual work.

Plus, you leave a very clear record of your action, giving up a lot of
deniability.

> Q. It’s easier to tamper with binaries than to write a bugdoor, so
> reproducible builds do improve security. > I absolutely disagree, every
> programmer knows how to write a bug or short circuit some logic. Hiding
> malicious activity in a binary, with a multi billion dollar malware industry
> determined to find it is more difficult.

This implies that the malware industry is somehow unable to detect a "bugdoor"
or unexpected behaviors a runtime but able to detect a change to the binary...

> In addition, once you’ve produced and signed the malicious backdoor, it is
> not repudiable - you can’t deny you wrote and provided it.

Most organization track source code changes in a VCS but don't require
employees to sign binaries with keys bound to each individual. If anything,
this makes a point in favor of repro builds.

~~~
TwoBit
> A number of large companies are quietly moving towards reproducible build.
> Sorry if I cannot name the names.

Thanks, user "ex_amazon_sde"!

~~~
ex_amazon_sde
I wrote "a number" and I was not referring to Amazon :)

------
onion2k
I don't know about the 'formal' description of reproducible builds used in the
article from a security perspective, but I do know that having the exact same
output locally as is being built by a deployment process makes debugging a lot
more straightforward because it removes the subtle and hard-to-discover
problem of having slightly different library versions in your application
making things perform differently.

~~~
cesaref
Right, and I like the ability to re-build a release from 6 months ago and get
something that reproduces the behaviour of the previous build - i'm not
concerned if the binary is identical, more that it is functionally identical.
A typical way of breaking this would be for build scripts to move forward but
not be made backward compatible, or easy for the previous version to be used
for an old build (for example, a newer compiler getting used rather than the
one originally used).

I've worked on enough stuff where releases get held up for various reasons
(and so production software can be 6 months behind head) and there's a desire
for a fix to the released version and hence changes made to a branch as well
as head to apply the fix. This sort of thing is made much easier if you don't
have to spend time trying to work out how to get the branch to still build!

------
amluto
There’s another reason I think reproducible builds could all a lot of value:
app stores. Right now, if I install from a normal app store (Apple, Google,
Microsoft), there’s no real benefit to using open source apps. Even if I trust
the app store, I have no way to confirm that the app binary matches the
purported source.

App stores could improve the situation by building apps themselves, but I
think that would put them in an position they don’t like. App stores don’t
currently build their apps.

With reproducible builds, app stores could do better. An app store could list
the hash of the build artifacts along with the purported name and developer,
allowing various degrees of assurance that an app is actually a build of the
source it supposedly comes from. Without reproducible builds, the app store
would have to build the app itself and use its build instead in the submitted
build, which seems undesirable.

Tavis’s argument about bugdoors still applies, but IMO it’s largely irrelevant
to the major app threat model. Many useful apps don’t have input and output
that is susceptible to corrupt data. A lot shouldn’t access the network at
all. The common threat is that they include fifteen tracker SDKs, all of which
_are malicious by design_. Including the entire Facebook SDK is going to be
tricky as a bugdoor.

~~~
magnetic
> App stores don’t currently build their apps.

On iOS (not sure about macOS apps), when you submit an app, it is submitted as
"bitcode" (note, this is not "bytecode" \- it's bitcode, not a typo). This
allows Apple to build your app as needed from your bitcode (think of the
bitcode as LLVM IR).

This is done for a couple of reasons: (a) they can take advantage of
improvements in their backend compiler (IR->executable) that happens after you
submit your app, (b) they can build your app on platforms that didn't exist
when you submitted your app

The net result is that the App Store plays an important part in the build
process, which could legitimately generate different binaries (even for the
same device, see (a) above) for the same app you submit.

~~~
m463
They could also wrap your app in code that does things you didn't intend, like
telemetry/metrics, interception of certain functions, etc. (Not that they
couldn't do that anyway)

------
acqq
The arguments "don't follow." Reproducible builds were indeed more than once
used to verify that the published binary does correspond to the published
source.

Without having them, that kind of verification is much harder, depending on
the build setup used, it can be even too hard to be achieved at all.

So we do have clear advantages of having the build infrastructure which
results in reproducible builds, and I don't see anything that can substitute
that.

The argument "if you build yourself from the sources, your build is then
trusted" is not reflecting the reality. Most of the users are never going to
build from the sources. Having reproducible builds, only a few people have to
build from the sources to verify the binaries for the vast majority. Not to
mention that without reproducibility, you can't even know if your own build
environment is misconfigured.

------
olafure
Here's a hypothetical but a realistic scenario that not having reproducible
builds is a major issue:

1\. I create an app for a client.

2\. Client installs the app on their 10,000 enterprise mobile devices and
trains the users.

3\. Month passes. I've changed everything in the app.

4\. Major security or outage event happens. I need to change a line in the app
and those 10,000 devices need to be updated ASAP.

5\. If I can't checkout the old version, change the line and ship it without
wondering "what else will change in this build" \- then both me and my client
are going to have a bad time.

It will take two weeks for a full qa/audit of the app and there's no time for
that.

~~~
mrich
You are making the case for source control and having no external dependencies
like npm, but you don't need reproducible builds for that.

~~~
cesaref
I've a feeling there's multiple definitions of build reproducibility going on
here. I'm guessing you mean that it's not important to be have something byte
for byte identical, but more to ensure that exactly the same build steps were
run with the same source code?

For most of us, that's what build reproducibility means, but I guess for a
subset of users it means producing an identical binary.

~~~
filleokus
> but I guess for a subset of users it means producing an identical binary.

Whenever I hear people talk about the problems of creating reproducible
builds, I often hear stuff about timestamps or other metadata inserted by the
compiler that would "break" the reproducibility (under the stricter
definition).

Having your own source code versioned and dependencies version-pinned (and
pretty high confidence that the dependency package foobar-1.12 stays the same
over time) seem just like old fashioned "good practice".

The looser definition would imply that all versioned software without external
dependencies (or the source of the dependencies manually included in the
repository) is reproducible?

~~~
elbear
How many people pin the exact version of a system library they are using? Or
of a binary used in the build process.

Also, how many people run the build in a sandbox to avoid "interference" from
the environment?

Yes, this is all good practice, but I think very few people do it, because
it's not easy.

~~~
filleokus
Yeah, true. I was thinking of doing release builds in containers via the CI/CD
pipeline, keeps the environment pretty static, but not completely static of
course.

But further: All of these things would still not be enough for the strictest
definition (exact same binary), at least with normal compiler defaults afaik?

~~~
yjftsjthsd-h
> All of these things would still not be enough for the strictest definition
> (exact same binary), at least with normal compiler defaults afaik?

Right, because of things like timestamps getting into the binary.

------
trishankdatadog
Hi Travis,

If you are reading this, I'm one of the members of the TUF [1] and in-toto [2]
team, where we try to solve exactly this kind of problems. While I agree with
you that reproducible builds sound a lot simpler than they actually are to
achieve (and leaving aside all the practical complexities you mentioned in the
blog post), I think they provide value for a certain use case seemingly not
mentioned in the blog post.

It is the case where the vendor is a traditional Linux distro, and we have
independent reproducible builders to ensure that a compromise of their CI/CD
infrastructure is not enough to cause malware to be installed. It is true that
the builders can still go off and reproducibly build malicious code, but this
can be mitigated by requiring a high enough threshold of (presumably
independent) developers to sign off the code. The problem of malicious source
code is infeasible if not impossible to solve cryptographically, but we can
make sure that CI/CD increasingly sitting on the cloud are not blindly
trusted.

Could not post on your blog. Let me know what you think. Thanks!

[1] [https://theupdateframework.io/](https://theupdateframework.io/) [2]
[https://in-toto.io/](https://in-toto.io/)

~~~
dumbneurologist
I think it's "Tavis" (no "r")

~~~
trishankdatadog
my bad, but still, no reply as yet...

------
mainguy
I think OP is coming from a different perspective than I (corporate bespoke
solution builder) do. When I say "reproducible build" I mean a build that is
the same on any machine (i.e. no special magic necessary to build a "official"
version of the code). Too often in corporate environments, getting a local
build or setting up a new build pipeline involves arcane black magic and/or
copy/pasting weird libraries that can't be pulled from any sort of "official"
repository. curl/bash libraries that "automatically change versions" based on
when upstream decides to change them can wreak havok when setting up a new
build environment. My $0.02, it's not (in the corporate world) so much about
validating binaries, but more about "how many steps beyond check out the code"
exist and how easily can I validate my binary uses the same versions of
libraries/dependencies as the one that a local developer tested?

~~~
D895n9o33436N42
For this reason I’ve been dockerizing my builds for almost five years. I was
late to the Docker party, but when I saw the benefits it brings to build
pipelines, I was sold.

It's true that a dockerized build isn’t any simpler than its non-dockerized
ancestor, but at least there’s a Dockerfile that lays bare all the black magic
and special sauce which goes into each build. And it can be version controlled
to watch for drift over time.

This stuff is useful in a corporate setting, but the other fetishization of
reproducible builds is just a distraction that can stay where it belongs: open
source mailing lists.

~~~
choward
While your Dockerfile helps you know how a project was built at a specific
point in time, it's not going to work forever. Even if the file doesn't change
over time, the build it produces will. It's mainly because of installing
packages using something like "apt-get install $package". It also can change
if the files you're adding with ADD or COPY change.

~~~
D895n9o33436N42
You don’t _have_ to download the internet upon each build.

First, in a corporate environment it’s common to run builds backed by artifact
servers that’ll cache just about anything.

Second, it’s easy to place files in a Docker build context (that’s just a $25
dollar way of saying “next to the Dockerfile”) that would have been downloaded
from the internet, but are stored locally instead. This is easier said than
done for some formats. Source tarballs? Easy. Anything Java or Debian that
requires a pesky server which works a certain way? You’re going to have to use
a caching artifact server.

------
pornel
The premise of the article is ignoring difference between "trust" and "trust
but verify".

And then there's a QA which answers criticism with "I absolutely disagree", "I
think this is true, but".

> We know that attackers really do want to compromise build infrastructure,
> but more often they want to steal proprietary source code, which must pass
> through build servers.

This has shifted the goalposts so much they're on a different field now.

~~~
ummonk
Yes, exactly. And crucially you don’t need to verify yourself to get the
benefits of verification. There is value in crowdsourced verification (i.e. I
always have to choose a trusted vendor but it is safer when I know others
might try building themselves and raise a fuss if there is a mismatch between
their build and my vendor’s build).

------
rsync
These arguments against reproducible builds are very provincial, temporally.

Which is to say, they assume a user _now_ downloading source _now_ to compile
and use _now_.

One of the things I like about reproducibility is that someday, in the future,
when the project is closed and the website is down and the author is dead ...
we can take the source code and compile it and compare it to our own notes (or
to the wayback machine, or whatever) of the checksums, etc., of the binaries
and gain some confidence _that we have what we think we have_.

------
gwbas1c
I'm missing some context. I'm assuming the author is referring to binary
distributions of open-source software? Otherwise, "reproducible builds" means
something slightly different in a commercial software development environment.
(Think of someone building an e-commerce site where the software is
proprietary.)

Thus:

> You don’t need reproducible builds

No, not for open source. If it's open source, it's only valuable and useful to
have the source code if I can build it. Otherwise, if I can't build it, to me
it's no different than closed source.

> You don’t need reproducible builds

In a commercial development environment I need to build the code I work on.
What about when a bug exists due to problems in the build environment? If I
can't reproduce the build environment, I can't fix the bug.

------
ENOTTY
I think the author hasn’t accurately described the workflow where reproducible
builds are used.

Here’s my attempt.

There are three entities:

1\. The vendor, who creates and _distributes_ a product, which may include
software, to the end user

2\. The end user, who receives the product from the vendor, operates the
product, and trusts one or more verifiers to correctly certify the product
according to some standard that the user desires

3\. Verifiers, who receive proprietary access to the vendor’s product (e.g.
source code) and check that the product meets a set of standards and assert
that fact to the users

Note that verifiers are NOT in the job of distributing products to users.
That’s a hard job and it seems understandable to me if verifiers don’t want
that job.

Reproducible builds help link a specific version of source code that a
verifier certified to a specific product being operated by a user. Without it,
the user trusts the vendor much more than with it.

Yes it adds brittleness and delay (see FIPS 140 certification). These are use
case specific trade offs that only users can judge.

In the case where some facts about the source code can be formally verified,
reproducible builds support that trust relationship much better. I might go so
far as to say that reproducible builds are essential for trusting formal
verification.

You can also imagine a software supply chain that includes more steps than a
simple vendor -> user relationship. Much of the proprietary software used
today include libraries from third party vendors. There are integrators that
add their own special sauce. The supply chain looks more like: vendor ->
vendor -> ... -> vendor -> end user.

Imagine each vendor has their own set of verifiers responsible for certifying
that vendor’s output.

~~~
chii
but this just shifts the burden of who to trust to the verifiers. It didn't
make the software from the vendor _any more_ trustworthy.

~~~
ENOTTY
> this just shifts the burden of who to trust to the verifiers

Agreed. A colleague once said that trust is like a balloon; if you reduce
trust in one area, you tend to expand trust in another area.

I think the typical response to your statement from a user is that it’s easier
to trust a set of verifiers than it is to trust a vendor. The act of a
verifier blessing an artifact makes that artifact more trustworthy.

Personally I’m skeptical of these claims, but that’s the underlying assumption
of the certification process

~~~
ta8594505930
Presumably there is benefit in reducing trust in a party with mixed or unknown
incentives to increase trust in a party whose incentives align with your own.
For a critical piece of software I could imagine a schema where a user pays
one or more independent verifiers to validate the software. That would allow
the user to control the incentives of the entities telling them the software
is safe.

------
Ericson2314
The post started out OK, but then I lost the plot

> Q. It’s easier to tamper with binaries than to write a bugdoor, so
> reproducible builds do improve security. > > I absolutely disagree, every
> programmer knows how to write a bug or short circuit some logic. Hiding
> malicious activity in a binary, with a multi billion dollar malware industry
> determined to find it is more difficult. In addition, once you’ve produced
> and signed the malicious backdoor, it is not repudiable - you can’t deny you
> wrote and provided it. > > With bugdoors, you don’t need to deny it - you
> just claim it was an error, and you’re automatically forgiven.

If the binary vendor has the source code, _every source exploit is also a
binary exploit_. If you can make a semi-"bugdoor" that's plausibly deniable,
and then augment it with a binary tamper that's deniable as an artifact of
non-deterministic compilation, is that not the ultimate coup de grâce?

------
yakshaving_jgt
> What isn’t clear is what _benefit_ the reproducibility provides.

The ability to cache build artefacts is a pretty huge benefit for my work.

------
weinzierl
For me security is not the only thing that make reproducible builds
interesting. I'm often interested to link a binary to its source in
retrospect. This is at a time when the vendor might long be gone. It helps a
lot to be reasonably sure that the source you analyze is the one that was used
to build the binary and you don't waste time studying the wrong source.

The counterargument is, of course, that if you are serious you should look at
the binary anyway and never trust the source, and I can't argue with that,
except - for better or worse - that this is not the world I live in. For me an
as short as possible time to answer a question about a software with as much
confidence as possible is key. I'm alway grateful when I can reproduce the
build with reasonable effort and then look at the source instead of the
binary. Regardless how much I love to tinker with binaries, looking at the
source is usually just faster.

------
kylecordes
I can think of one argument against reproducible builds that really make
sense:

You have some legacy system incapable of reproducible builds, and the social
effort to convince people not to care is less than the technical effort to fix
it.

------
dooglius
It looks like almost everyone here thinks this is wrong and poorly argued (as
do I). A meta question to ponder: why did this get upvoted to the front page?
Is it that upvoters tend to not read the articles whereas commenters do? Do
people upvote stuff they think is wrong for the purpose of discussion?

~~~
kardos
> A meta question to ponder: why did this get upvoted to the front page?

The author is a sort of expert in the field, if he's calling out reproducible
builds as security theatre, it's worth discussing

------
AlexTWithBeard
Ugh...

If building the same source produces different binaries, then I'd like to know
(a) what is causing the difference and (b) what other differences does it
cause?

Being able to produce a consistent result is simply a sign of professionalism.

~~~
sreevisakh
How about using a different toolchain? (eg: gcc vs clang). Or even different
versions of a toolchain? Or a dependency that has to be downloaded? Being able
to build consistently requires way more effort than just following
professional practices. One method I know is to pin everything that goes into
a build - source, dependencies, toolchains, configurations and environment.

Results of non-consistent build can be as simple as a difference in
performance. But it could also be a malware injected through the compiler.

~~~
AlexTWithBeard
Yes, yes and yes. The toolchain and the dependencies should be pinned.

Otherwise sooner or later you'll hit the customer's issue that you won't be
able to reproduce - until you realize it's some subtle bug in the specific
version of your compiler.

------
bawolff
I dont think that reproducible builds are perfect or solve every problem. I
still think they are worthwhile though because they allow for a better audit
trail, and every step that is auditable makes it easier to pinpoint at what
step something bad happened.

Sure users arent likely to verify them. I still think there is value in after
the fact being able to determine - clearly this binary came from this source
code artifact vs clearly this binary was substituted as it doesnt match the
source code. Its not perfect but it helps reduce the places where an attack
can take place with nobody noticing.

------
moomin
You know that principle that mathematicians invoke when someone "proves" P=NP?
i.e. do they mention any of the classic problems in the space? Well, if you
can't find the word "Debian" in an article on reproducible builds you can be
reasonably guaranteed it's missed something.

------
karmakaze
This post misses so much.

"What isn’t clear is what benefit the reproducibility provides. The only way
to verify that the untrusted binary is bit-for-bit identical to the binary
that would be produced by building the source code, is to produce your own
trusted binary first and then compare it. At that point you already have a
trusted binary you can use, so what value did reproducible builds provide?"

Being able to reproduce a build not only validates the source but the entire
process of creating the executed artifact. To properly capture this ability
and value the CI/CD pipeline should also be reproducible in an infrastructure-
as-code manner. Together with the programs' source and short documentation (as
it should be coded) on how to put these parts together it means the operation
of the company can be recreated in another geography/datacenter and continue
to work while team members rotate out/in.

Contrast the situation with inheriting a codebase that only has possibly
matching source code and a production environment that everyone is afraid to
touch at the other extreme.

------
johnklos
This seems like handwaving and arguing against something good because some
people don't know how to do things.

"Companies can have security issues. Servers can be compromised. So why try?"

Trying to assert that a demonstrable fact is opinion because the opinionmaker
doesn't understand the fact isn't valid.

------
TedDoesntTalk
I don't know anyone who would agree with this article, at least publicly. I
want my 5 minutes back.

------
IshKebab
I think he's right about all the security arguments. It is definitely nice to
have properly reproducible builds from a build system point of view though.
Things like ccache would work a lot more reliably.

~~~
gregwebs
Also just increased confidence that the build is working properly.

I think there's added security for the first party builder in their own
process: they can better audit the process from source code to deployment.

------
shipstern
"More often, attackers want signing keys so they can sign their own binaries"

Isn't the idea that you don't have trust signing keys anymore because you rely
on consensus? An attacker would probably have a much harder time compromising
10 vendors instead of one.

You are right, about complexity and issues, and you certainly don't "need"
reproducible builds, depending on what you are after but it can be beneficial.

Also, you're arguments often seem to come out of thin air: "reproducible
builds are not for users" Why? Maybe not for the average user (yet).

------
srj
My understanding of reproducible builds seems to be different. I always
thought of it as a way of ensuring people couldn't deploy edited code from
their workstation.

You would ask trusted system A build the code at a particular code-reviewed
snapshot. It fetches the code itself, builds, and codesigns. Then you hit
deploy on system B, which is the only system with credentials to modify prod.
That system looks for A's signature. There's no vendor involved, so maybe I'm
missing a better understood picture of what a reproducible build means.

------
josh2600
Build reproducibility doesn't solve the problem of "how do I know the computer
I don't control is actually running the program I think it is?"

The only solutions are related to fully homomorphic encryption, which is a way
of running a program on a computer you don't control where the operator of
that computer cannot learn information about the work it is performing.
Unfortunately, pure software approaches to FHE haven't been proven out.

It is possible to get the benefits of FHE by utilizing a software enclave, but
then you have a different trust narrative that involves hardware (and
attestation services). The point of remote attestation is that you can verify
that a computer you don't control is running a specific binary (verified by
the hash and a signing key) at a specific moment in time. The problem is that
the chain of trust for the signing key traces back to the manufacturer of the
enclave.

The holy grail would be some way of decentralizing the trust of the enclave OR
finding a way to do purely software based FHE that was fast. I'm not holding
my breath for the latter, but the former might be possible in the next decade.

In short, if you have a reproducible build but you don't have a way of
verifying that binary is actually running on a remote server you don't
control, the reproducibility of the build is a moot point. I sorta think the
only systems where build reproducibility matters today are ones that use
enclaves (but I'm willing to be disabused of this notion if people feel
otherwise).

------
debiandev
Here are some answers:

[https://wiki.debian.org/ReproducibleBuilds/About](https://wiki.debian.org/ReproducibleBuilds/About)

------
user016301
With reproducible build anyone can verify the build and everyone benefits from
it. Without reproducible, no one can verify the build, never.

------
perryizgr8
For us, the biggest benefit of reproducible builds is debuggability. One year
later, if we get a bug report from a customer, we can fully recreate the issue
in house, patch it, and provide a new build with confidence that we fixed the
problem.

Without this, in a huge interconnected system you go insane trying to control
all the variables needed to figure out what exactly went wrong.

------
CiPHPerCoder
Reproducible builds, combined with digital signatures and an append-only
cryptographic ledger, solve a lot of these trust issues.

[https://paragonie.com/blog/2016/10/guide-automatic-
security-...](https://paragonie.com/blog/2016/10/guide-automatic-security-
updates-for-php-developers)

------
marmaduke
> With bugdoors, you don’t need to deny it - you just claim it was an error,
> and you’re automatically forgiven.

Interesting off topic point about the lack of accountability in the
profession. Even as a sysadmin, one accidental iptables flag might let an
intruder gain access. What will happen to me? Angry boss maybe.

------
irjustin
Too many people in this thread are mixing build systems vs security.

The author is talking about Reproducible builds about security and only
security[0].

[https://en.wikipedia.org/wiki/Reproducible_builds](https://en.wikipedia.org/wiki/Reproducible_builds)

------
renewiltord
Interested in reproducibility for caching/performance/debug/rollback reasons,
not security. Thanks, Tavis, for clarifying that security gains are not
useful. I don't usually consider them but when I have to, I'll look up this
article.

------
benibela
I would already be happy with working builds

If you use a non-mainstream compiler it is a horrible mess. Almost every other
time I build my program, the compiler crashes and I have to make a clean
build.

Recently they have fixed the issue, I updated the compiler, and now it still
does not compile. Random type checking error. After a clean build, the type is
accepted and it compiles. I tried to make a mwe, and instead of a type
checking error, the the compiler crashes at that line.

At least it always compiles with a clean build. But the resulting program does
not start (not accepted load segments). I have not figured that one out, so I
uninstalled the update and keep using the crashing compiler version.

------
gumby
Extreme build reproducibility requirement: had a customer in telecom. Their
customers (carriers) required downtime no more than 5 (or 10?) minutes per
decade — 7 nines I believe.

When they reported a bug they wanted the fix and then they differed the binary
to be sure that every code change was related to the patch itself and nothing
else. No additional fixes or upgrades. They were willing to pay through the
nose to stay on a very old version of the code.

The code was in C (this was mid 90s) — can’t imagine how much harder it might
have been with extensive templating and the like.

------
captainmuon
Reproducable builds keep honest vendors honest. If I introduce reproducable
builds, I make it likelier to get caught if I tamper with my own binaries.
Likewise if I provide checksums for my downloads.

There are a lot of popular freeware and open source projects from small teams
or individuals. You can't tell me that no one is at least a _little_ bit
tempted to slip in a trojan occasionally.

------
bfuclusion
Forgive me if someone has already mentioned this, but if I have multiple
vendors, why can't I compare the hashes of all of them before I ask for the
binary? Then any one vendor is being held to account by the others before I
install. No source is required, just the hashes of the source and the hashes
of the resultant binary.

------
sschueller
Yes you do. How else can you trust an open source bitcoin client in the play
store is the same version published on github?

~~~
7786655
The whole point of the article is that in order to check if the play store
version is the same, you have to download the github version and build it from
source. At that point, you might as well throw away the play store binary and
use the one you just built.

~~~
mamon
Yes, but it takes just one person to download github source and build it to
verify that a binary available in the play store is safe so that the other
1,999,999,999 Android users don't have to.

------
rmrfrmrf
If you're looking for an actual spicy take, you could say that reproducible
builds are impossible once the durability of hardware/state of the universe is
accounted for, but that the concept provides yet another black hole of cash
and effort with which devsec can build a cottage industry around.

~~~
j88439h84
There is a trusted hardware industry emerging, so your story has plausibility

------
quinndiggity
Has someone taken over Tavis's blog? Two posts in a row now that seem like he
either just doesn't understand some concepts, or is intentionally
disseminating nonsense that is pushing people in the wrong direction on things
that really do bring about benefits related to security.

------
crb002
Define reproducible. I always archive dependencies so I can rebuild, even if
that build has some new timestamps in it. Having a rando Github repo go down
when you rebuild is not a failure mode I want.

------
smitty1e
Given the choice between reproducible and the scattered disasters I've seen,
reproducible seems better.

But I would be content with "pipelined", and have the readily automated part
handle itself.

------
wadkar
Well, after spending so much time watching kilometers of rolling text on my
screen while `emerge -uavDN @world` compiles __everything __, I agree I don’t
need reproducible builds

------
wwweaponizer
Reproducible builds allow people to independently verify that a build came
from the source code that the binary vendor claims it did, rather than some
compromised source code.

------
timwaagh
i think mostly we want to be as reproducible as possible is because a non
reproducible build means there might be new bugs in it that werent there
previously (when using the same source). for example a dependency might have
been updated and might now be incompatible. it happens. its not malicious, but
its annoying. this usually gives a lot of 'works on my machine' discussion,
which is a rather frustrating and unhelpful situation.

------
deknos
tavis assumes there's an complete organization which is either good or bad.
but what about people with in the organization with restricted or accountable
access to sources or binaries?

i think reproducible builds are about malicious parties in an bigger
organization.

------
sesuximo
Reproducible builds make it a lot easier for compiler/build system debugging.

------
CoffeeBob
Not worth reading

------
benlivengood
Reproducible builds are primarily useful at organizations with strong auditing
requirements. The build system is the root of trust and it's possible to
achieve a high degree of trust inside an organization.

I think what the author misses is that having multiple vendors or
distributions providing signed deterministic/reproducible builds reduces the
total cost of maintaining individual trusted build environments.

For example, the author claims that the most straightforward way of producing
a trusted build is to do the build from source oneself. That is true, but
ignores the cost of millions of individuals spinning CPU cycles to build their
own local packages, which has already been deemed too high by most users.

If N independent entities build and sign reproducible packages for a
distribution then the probability of incorrect binaries being produced is
P(individual_problem)^N for as many N as local package managers want to check,
or trust an aggregator to fetch and compare signatures from all N producers. N
can be far smaller than the number of individual uses of the packages while
still being more trusted than a single vendor maintaining their own highly-
trusted build system. If large organizations participate in this multi-entity
process they can only increase their own certainty that they've produced
accurate builds from source for packages built publicly at least.

Deterministic builds also solve the compiler back door problem (ala
"Reflections on Trusting Trust"). Compiling each repeatible/deterministic
compiler (e.g. GCC, llvm, TCC, MSVC) with every other compiler and verifying
that all deterministic builds are identical from any compilation path, e.g.
that tcc(gcc(MSVC(llvm(tcc)))) produces identical output to
MSVC(gcc(tcc(llvm(tcc)))). The process can be extended to verifying these
paths under multiple OS's and hardware architectures. This establishes a
practical root of trust; a well-known compiler binary trusted to translate
source code into binaries without binary backdoors.
[https://news.ycombinator.com/item?id=10181339](https://news.ycombinator.com/item?id=10181339)
for previous discussion.

Finally, deterministic builds allow verifiable signatures of the form "This
container OS with signature A running a software package with signature B with
input having signature C produces output with signature D" for arbitrary
choices of deterministic source code. This allows for verifiable computing in
general; the ability to trust that the output of running compiled source on a
particular input (including other source code) actually produces a particular
output. This reduces the cost of establishing a trusted system from the high
cost of building everything from source to the cost of building the root of
trust from source and trusting the plurality of signatures establishing that
building the rest of the system from this trusted root results in the same
publicly available binaries. The process can be extended all the way to formal
verification of the root of trust and any other desired components.

------
Markoff
interesting choice of domain name, any background?

