
Strange Loops: Ken Thompson and the Self-referencing C Compiler - signa11
http://scienceblogs.com/goodmath/2007/04/15/strange-loops-dennis-ritchie-a/
======
kwantam
The original article, which this (kind of) summarizes: "Reflections on
Trusting Trust" by Ken Thompson.

[http://cm.bell-labs.com/who/ken/trust.html](http://cm.bell-
labs.com/who/ken/trust.html)

Sort of disappointing that the blog entry doesn't bother linking to the
original, which is at least as well written.

~~~
sjay
From the original, Ken on the press lionizing hackers:

"I have watched kids testifying before Congress. It is clear that they are
completely unaware of the seriousness of their acts. There is obviously a
cultural gap. The act of breaking into a computer system has to have the same
social stigma as breaking into a neighbor's house. It should not matter that
the neighbor's door is unlocked. The press must learn that misguided use of a
computer is no more amazing than drunk driving of an automobile."

Nowadays, I here more about the opposite problem -- using overly broad, non-
technical legalese to convict kids of "hacks" that aren't really hacks. I
wonder what Ken would think of Snowden, Barrett Brown, weev et al?

~~~
alanctgardner2
I'm pretty sure the context of that quote is completely applicable to weev.
"It should not matter that the neighbor's door is unlocked", is clearly Ken
saying that something being technically simple does not make it morally or
legally justifiable.

~~~
Dylan16807
But at the same time, simply entering is not going to get the feds after you.
And while standing on someone's lawn and staring into their living room
certainly has a stigma, it will never get you put in prison for decades. (For
the weev analogy pretend they have some financial documents on the table in
the living room.)

------
WalterBright
This relies on new compiler binaries being built from the previous compiler
binaries.

The cycle can be broken by using a different compiler with a different
pedigree in the bootstrap process. In fact, this suggests a way to detect the
back door:

Given Compiler A which is the previous generation, and Compiler B which has a
different pedigree, we want to generate Compiler C and detect if it is
compromised:

A compiles C which compiles C1

B compiles C which compiles C2

C1 and C2 should match. If it does, and A is "known good", then C is now
"known good".

Of course, this can be compromised, too, but it will make it very much harder
to do so.

~~~
pmiller2
Of course, the hard part is to detect whether C1 and C2 "match." Just because
2 different compilers generate different code for the same source doesn't mean
one or the other compiler was compromised. By Rice's theorem, it's undecidable
in a general case to decide whether two computable functions are equal, so you
technically _can 't_ do this step in a general way.

~~~
WalterBright
Do a binary diff of the executables. That's unambiguous.

Remember that C1 and C2 are generated by two different compilers generated
from the same source code. The same source code should generate the same
binary.

~~~
InclinedPlane
> The same source code should generate the same binary.

This is utterly false and reveals a deep misunderstanding of compiler
technology. Aside from optimizations and exactly how they are implemented in
each compiler there are still many more or less arbitrary choices that a
compiler needs to make in order to translate source code into machine code.
Address layout being one of the most prevalent. Indeed, these choices are so
arbitrary and lead to such a high degree of difference in the binary forms
that the same compiler running on the same code will often generate different
binary output. And the same compilers running on very slightly different code
bases will quickly generate substantially divergent output. This is why
programs like bsdiff and courgette exist, because comparing binaries is
actually an enormously non-trivial problem.

~~~
WalterBright
> This is utterly false and reveals a deep misunderstanding of compiler
> technology.

I've written many professional compilers, front to back, and I use the binary
difference technique to verify that the compiler is capable of exactly
reproducing itself.

If you've got a compiler that generates different binaries depending on the
time of day, the address the compiler was loaded at, or something else that is
not the compiler switches + source code provided to it, you've got a compiler
with serious QA issues.

~~~
InclinedPlane
That's nice that your compiler works that way but that's not the way most do.

More importantly the salient point was about comparing the binary output of
_different_ compilers.

While it's certainly possible to create tools which make it make it possible
to determine if the binary output of different compilers are _effectively_ the
same such tools are very non-trivial to create. The idea that different
compilers are likely to produce exactly identical output is sheer fantasy.

~~~
WalterBright
> More importantly the salient point was about comparing the binary output of
> different compilers.

No, it wasn't. I explained it (apparently badly) 3 times now. There's another
iteration of bootstrap compiling in there before the output is compared.

~~~
chii
I think a lot of people gets confused by your explanation earlier in the
comment thread because one it is difficult to do recursion, and two, see
above. ;)

But yes, i think this method does work. You'd have to trust all the pipeline
programs used in between.

------
mkehrt
Nice article, but just read the original: [http://cm.bell-
labs.com/who/ken/trust.html](http://cm.bell-labs.com/who/ken/trust.html)

------
bediger4000
I'm surprised no one has mentioned David A. Wheeler's "Diverse Double
Compiling" ([http://www.dwheeler.com/trusting-
trust/](http://www.dwheeler.com/trusting-trust/)) as a solution to this
problem.

------
csense
Let's pretend a compiler with this backdoor will always correctly detect when
it's compiling a compiler. I.e. looksLikeCompilerCode() and
generateCompilerWithBackDoorDetection() are oracles.

With this assumption, to write a compiler that's safe, you can't run it
through an existing possibly-compromised compiler. You would have to bootstrap
your safe compiler, as if for a totally new language.

So you'd have to initially write it in a different language. You probably
can't write it in most existing languages, e.g. Python and Java are both
implemented in C. Because, if you tried to write a safe compiler in Python, a
sufficiently smart [1] C compiler would have been able to tell that you were
compiling a Python interpreter when it compiled /usr/bin/python, and inserted
a backdoor into that Python interpreter which will trigger when the
interpreter is interpreting a C compiler written in Python.

You'd basically have to consider any code that has ever passed through an
automated tool to be potentially backdoored, so you'd have to start writing in
machine code (no assembler allowed, of course, because it's probably written
in C). Of course, you could _program_ an assembler in pure machine code (or
using a potentially-tainted assembler, verifying its output by hand).

[1] It'd either be a general artificial intelligence of human level
programming ability, or some kind of magical oracle.

~~~
e12e
Reading this, I was reminded of an article using the dos debug[1] command to
bootstrap tools written from (memorized) assembly language.

Note, I can't find the actual article now - I think it may have been in one of
the early issues of Phrack[2].

[1] [http://www.intel-assembler.it/portale/5/Write-an-assembly-
pr...](http://www.intel-assembler.it/portale/5/Write-an-assembly-program-
using-DEBUG/Write-an-assembly-program-using-DEBUG.asp)

[2] [http://www.phrack.org](http://www.phrack.org)

------
gleenn
This is awesome because it means backdoors could have been inserted many many
versions ago in a compiler and as long as there was a chain of using the
compiler to compile itself through each version (as most compilers like GCC
are), the hack would be passed ad infinitum. Pretty evil.

~~~
grecy
Tinfoil hat time.

Let your imagination run if someone had done this at some point to a build of
GCC being used by Linus...

~~~
zerohp
Linus doesn't distribute kernel binaries.

------
unimpressive
There is more than one way to defeat this attack. One of the simpler ones
would be to create a formally verified C compiler in X86 assembly, which you
can use to compile an older version of GCC or whatever, and then use that
version to compile the next one, and so on and so forth until you have a
guaranteed clean modern version.

I don't think any buildchains currently implement anything like this though.

------
barbs
Interesting! Out of curiosity, if this were the Gnu C Compiler, would this be
illegal, since the GPL requires the source to be distributed with the binary,
and the source and binary don't match?

~~~
adrusi
The is no definition of "match" beyond just what the compiler produces. No
spec dictates what the output of a compiler should be.

~~~
barbs
Does this mean that, in theory, if someone wanted to modify some GPL'd code to
merge it into a proprietary project, and they didn't want to GPL the entire
project, they could modify the compiler to compile the code with the
modifications? It'd be a completely round-about way to do it, but in theory
would this be legal?

If it were a GPL-licensed compiler (e.g. GCC), they wouldn't need to
distribute the compiler-code changes, since they would only be using it
internally and not distributing the binary for the compiler itself.

Of course, they could just modify and use a shared-library of the GPL'd code
or whatever. At least, that's my understanding, could be wrong...

~~~
adrusi
Well they couldn't use gcc, because that's also GPL and they'd have to release
its source, but if I'm not missing some more nuanced part of the license, they
could probably compile with a fork of clang.

~~~
barbs
See, I don't think they'd have to release the source for gcc, since they're
not releasing the gcc binary.

My understanding is, under the GPL, if you distribute the binary, you need to
distribute the source.

