
Reproducible builds in Debian: preventing compiler backdoors - Tsiolkovsky
http://motherboard.vice.com/read/how-debian-is-trying-to-shut-down-the-cia-and-make-software-trustworthy-again
======
belorn
One of the larger insight that Snowden gave was that agencies dealing with
data gathering does so through several avenues simultaneous rather then
relying on a single method. They request the data from companies, intercepts
the cable traffic between ISP's, plants backdoors in protocols and servers, do
MiTM, uses malware, and so on all in parallel. Reproducible builds won't "shut
down" the CIA but it will increase the security of compilers and protect
Debian's build system which would otherwise be a prime target for attacks that
could compromise a very large number of users.

------
schoen
We also had a closely related talk about this at the 31C3 which talks
primarily about the motivation for the problem and how the Tor Browser has
addressed it (leading into Debian stuff, but the Debian developers couldn't
join us in person, so you can hear much more about the Debian side from the
later talks that the article refers to).

[https://media.ccc.de/browse/congress/2014/31c3_-_6240_-_en_-...](https://media.ccc.de/browse/congress/2014/31c3_-_6240_-_en_-
_saal_g_-_201412271400_-_reproducible_builds_-_mike_perry_-_seth_schoen_-
_hans_steiner.html)

I had two examples in that talk that I liked a lot:

* What's the smallest change that you'd have to make to a binary to introduce an exploitable vulnerability? (I work through an answer in the talk.)

* I demonstrate a kernel module that tampers with source code _as it 's being read by the compiler_, so that nothing is modified on disk, and all of the source files are unmodified as confirmed by other software on the system, but the resulting binary is corrupt.

You can also hear about my co-presenter Mike Perry's heroic efforts to make
the Tor Browser build reproducibly, one of the first pieces of open source
software of its size and complexity to do so. And definitely one that people
are trying to distribute fake modified versions of (not necessarily with
network MITMs, though quite possibly -- but even with simple e-mail phishing).

------
teddyh
The article links to a talk at the CCC which was on august 15th. There was an
even more recent talk held at DebConf 15 on august 21st: [http://meetings-
archive.debian.net/pub/debian-meetings/2015/...](http://meetings-
archive.debian.net/pub/debian-
meetings/2015/debconf15/Stretching_out_for_trustworthy_reproducible_builds_creating_bit_by_bit_identical_binaries.webm)

------
giancarlostoro
tl;dr Debian wants to have reproducible builds to ensure developer systems
aren't compromised by the CIA.

Not sure how that is going to shut down the CIA though, maybe I missed it.

~~~
bdcravens
Maybe it's an idiomatic thing, but "shut down" doesn't necessarily mean
shutting down the organization. The term is commonly used to refer to a very
specific context (think sports - "shutting down" just means preventing within
a specific window)

~~~
schoen
I guess I've heard that in sports coverage to mean "prevent from scoring" (or
just "impede"?), but I'm not sure I've heard it with that meaning in other
contexts. I guess it does appear in the context of a proposal.

"The environmental group wanted to build a bike line from the park to the
waterfront, but the city planners shut them down." (not implying that the
group was disbanded, but that the plan was blocked)

But I'm still not sure either meaning makes sense in the context of the
headline.

------
mbrutsch
[http://c2.com/cgi/wiki?TheKenThompsonHack](http://c2.com/cgi/wiki?TheKenThompsonHack)

~~~
teddyh
[https://en.wikipedia.org/wiki/Backdoor_%28computing%29#Compi...](https://en.wikipedia.org/wiki/Backdoor_%28computing%29#Compiler_backdoors)

------
nickpsecurity
This is an old problem solved dozens of ways that mainstream just refuses to
deal with. The requirement is even standard for proprietary products going for
DO-178B certification. I believe they do quite manual confirmation but
automated exists. The solution is called certified compilation: the verifiable
transformation of source into binaries. You break the process into a series of
steps which each can be verified with the CST's/AST's handed from one to the
next. You can implement the steps yourself or validate someone else's, even
easier if it's a safe[r] language. Examples each using different methods are
VLISP [1], FLINT [2], and CompCert C [3].

Running Debian through CompCert while putting more work into CompCert for
portability and optimization is the easiest solution with long-term benefits.
Performance will go up steadily. Bug count will go down steadily because
that's what SML/Ocaml does. Code will be more readable. Repeat for most
trusted tools to drive assurance up across the board.

If they don't want to do that, then the result will be something along lines
of just having a bunch of people compile and sign the distro publishing
signatures, etc. You will trust that they trusted whatever they all looked at.
And anyone whose studied GCC's source, etc will know that basically means they
all saw the same code. They'd have to understand it all to have known if there
was a weakness introduced. They won't, use of C/C++ makes that harder, plenty
of rope to hang one's self in any common action, and it's why those of us
doing subversion-resistant development use languages like ML's or Oberon. FOSS
needs to similarly transition toward safe, comprehensible tools that aren't
backdoor generators just by architecture & language used.

Otherwise, all this talk of preventing subversion is just talk: they're going
to get in. And if not subversion, the endless stream of 0-day's from the
language and architectural choices will continue to do the job. A re-
implementation of the TCB's of our systems is long overdue.

[1]
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.2824&rep=rep1&type=pdf)

[2]
[http://flint.cs.yale.edu/flint/software.html](http://flint.cs.yale.edu/flint/software.html)

[3] [http://compcert.inria.fr/](http://compcert.inria.fr/)

~~~
jeffreyrogers
If I remember correctly the problem Thompson described is tougher to deal with
than this. Yes, the CompCert compiler is certified, but the problem is that
the compiler that bootstrapped CompCert may not be. If you use some compiler
(let's call it gccb, for backdoored gcc) to bootstrap CompCert that outputs
ccb (backdoored compcert). Now you recompile CompCert with ccb since CompCert
is certified. This gets you a final version of CompCert compiled with a
bootstrapped CompCert. So the process is gccb -> ccb -> cc. But then the
backdoor has just been transmitted down the line to cc.

Edit: also I just looked and it appears that CompCert is written in OCaml.
This doesn't substantively change anything above. It just means that rather
than having a backdoored gcc you'd have a backdoored ocamlc.

Realistically the only way to truly assure that your compiler is not
backdoored is by writing it yourself in assembly. But then, your assembler
might be backdoored. So you need to write your own assembler in binary. But
then your processor might be backdoored. So you need to design your own
processor. And then you might be safe. Of course, if someone really cared
enough to backdoor all that stuff, they'd probably just come after you with a
rubber hose at that point. And I think that was the point Thompson was trying
to make.

~~~
nickpsecurity
I cant remember Thompson's stuff: Im talking the general problem. You're
correct in that my first step shifts the problem: does it on purpose. Here's
what subversion resistant development takes: modular software with sensible
interfaces: ability to understand code for human review (closer to algorithm
the better); ability to understand compiler passes in isolation; ability to
implement toolchain in language of choosing. There are existing flows like
this as I illustrated. So, you use them and leverage diverse audience to check
results. I mean, would you rather implement CompCert passes by hand or GCC
even without optimizations? See the difference? ;)

Now, I did have a method to solve problem you're addressing. You implement an
assembler first. Then a macro assembler with macro's for HLL primitives. You
can use that immediately to jmplement certified compiler. Alternately, you can
pick up Oberon report or Scheme book to implement that to get a true HLL plus
compiler. Then you implememt the certified compiler with it. Comprehension,
code complexity, and trust are kept manageable by building layer by layer.
This, for productivity not security, is how Wirth and Carl first built Lilith
then Oberon. Same method will work again and good that ML/Scheme/Oberon folks
already gave us doc's plus code to use. Lets use them.

~~~
jeffreyrogers
Yep, building up like that would work. Oberon if I recall correctly is pretty
simple too (maybe ~20k LOC?) so that would actually be possible by a small
team.

~~~
nickpsecurity
It has many times. Nice, LISP-style example that was recently on HN:

[https://speakerdeck.com/nineties/creating-a-language-
using-o...](https://speakerdeck.com/nineties/creating-a-language-using-only-
assembly-language)

Note: LISP/Scheme interpreters and processors with plenty of detail (including
source) can be found with Google. Many implemented before 1990. Will run on
cheap FPGA's or process nodes. Can take it all the way to hardware. ;)

The macro ASM can be built on something like P-code: an idealized, low-level
machine easy to deploy on CISC and RISC architectures. A good example of how
to bridge ASM and HLL's is Hyde's High Level Assembly:

[http://www.plantation-productions.com/Webster/](http://www.plantation-
productions.com/Webster/)

The HLL, for non-LISP audience, can be Oberon with aid of Wirth's Compiler
Construction book among other papers:

[http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf](http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf)

So, many possibilities. People just gotta use them. Build a LISP w/ macros to
build everything else still seems to be easiest strategy. Esp as one can reuse
code from textbooks unlikely to be subverted. Wirth's next best.

------
bazillion
I fail to understand how this is going to shut down the CIA, who is:

1\. Not tasked with collection, that would be the NSA. Detailed in the article
is a meeting that the CIA apparently hosted detailing possible exploits for
Apple's phone systems. What does Debian have anything to do with exploits on
Apple's phone systems? The article is talking about an operating system-level
change and the presentation at the event referenced was a "compromised" Xcode
that could sneak backdoors into an iPhone app.

2\. Primarily involved in HUMINT (Human Intelligence) which consists of face
to face interaction. The article says that the CIA is trying to expand into
cyber warfare. That's because the CIA's cyber capabilities are laughable right
now -- what do you think they were in 2012 when the meeting took place as
detailed in the article?

3\. The documents referenced do not even detail the level of success that the
research has had on breaking Apple's encryption and security processes. News
flash, Apple's phone is pretty much the most freaking rock solid phone on the
market security-wise. It would be the holy grail for an intelligence agency to
crack this thing, but instead government, state, and city agencies have to
literally take physical phones to Apple and ask for them to unlock the data on
them if it's crucial to their investigations.

I think the article mentions the CIA to sensationalize, since pretty much
anyone recognizes the initialism and a "Yeah, fight the evil government
Debian!" clouds the loosely cobbled-together facts in the article. I was a CIA
contractor, and I really wish there was more understanding of the scope and
functions of the different agencies instead of painting every government
activity as being malicious. That the nature of the CIA and NSA prevent most
of the details of their operations from being understood is unfortunate, but
even if they were it wouldn't fit the constant anti-government narrative on
the site. I'm pondering starting a blog to educate about the roles,
responsibilities, and what it's really like working in these agencies. I was
an NSA linguist, programmer, and mission manager, as well as a CIA contractor,
so I definitely think I have a lot to contribute to the conversation but am
constantly drowned out by the rage-filled anti-government sentiment on the
site, which you can see if you read my comment history.

~~~
rdtsc
> and city agencies have to literally take physical phones to Apple and ask
> for them to unlock the data on them if it's crucial to their investigations.

If by "unlocked" you mean get access to their data, and phones can just be
unlocked by taking them to Apple, they are certainly not the most freaking
rock solid phone security-wise.

~~~
bazillion
It's understood that the phone manufacturers can unlock their phones with the
tools available to them -- that serves many purposes like being able to
refurbish old phones. Those accessing the data who are not doing so with the
intention of serving the phone owner, however, have to have a court order to
access the phone. That to me speaks highly of the phone if access outside the
user of the phone is limited to organization who signed the keys locking the
phone (and therefore have the certificates to make signed requests to unlock).

It's not like you take it to Apple and Apple says to you "Here's the user's
data.". There is something like a 60-90 day wait period while they analyze
each individual request for approval, and do their due diligence to make sure
access is justified. Juxtapose that with a typical android phone that can be
rooted without the help of the manufacturer, and then you get that the
customer is better served by one process over the other.

~~~
schoen
You can separate the ability to reinstall the operating system from the
ability to derive the keys to decrypt a particular device (or to instruct a
running device to give you root). I think this is described in Frank Stajano's
_Security for Ubiquitous Computing_.

Edit: previously in his paper with Ross Anderson
[https://www.cl.cam.ac.uk/~fms27/papers/1999-StajanoAnd-
duckl...](https://www.cl.cam.ac.uk/~fms27/papers/1999-StajanoAnd-duckling.pdf)

My ThinkPad can easily be reinstalled with a new OS, but my OS vendor can't
give someone else my full-disk encryption keys or make them root on my device.
And even with firmware-level security features we can separate "transfer
ownership of device" from "access existing protected device state".

I don't see any more reason that mobile phone vendors must be able to bypass
screen locks or disk encryption than that desktop OS vendors must be able to
do these things. (Sure, in both cases some users would want the vendor to be
able to and others wouldn't.)

------
unhammer
[https://reproducible.alioth.debian.org/presentations/2015-08...](https://reproducible.alioth.debian.org/presentations/2015-08-13-CCCamp15.pdf)
has some tips on what you can do to ensure your own FOSS packages have
reproducible builds, e.g.: * avoid storing build-timestamps (maybe use
timestamp of last commit instead) * avoid storing build numbers * LC_ALL=C
sort the sets of things where order is undefined/unimportant, e.g. file lists
going into tar or hash keys * try building, then changing something in the
environment, building again and diffing (with diffoscope)

------
alnsn
> Many other free software projects, including FreeBSD, NetBSD, and OpenWrt,
> are moving in the same direction.

I might be wrong (I'm away from my NetBSD boxes) but build.sh script in NetBSD
has a buildseed option which can be used to create identical builds of the
base system.

------
jongraehl
You also have to trust the source code of the 'untrusted' compiler (that is,
read it and make sure it doesn't directly contain any back door).

~~~
unhammer
Or trust the source code of a simpler compiler that is able to compile the
untrusted compiler: [http://www.dwheeler.com/trusting-
trust/](http://www.dwheeler.com/trusting-trust/)

------
elektromekatron
Vice's headline writer should maybe dial it back a little, unless Debian has
become a rather different organization than when I last checked.

