
Reflections on Rusting Trust - Manishearth
http://manishearth.github.io/blog/2016/12/02/reflections-on-rusting-trust/
======
tcoppi
An interesting experiment would be to see if a modern, easy(relatively) to
audit compiler like tcc can still be used to bootstrap a more full-featured
compiler like gcc or llvm. That would provide at least some protection against
this, in that it is unlikely a backdoor in your binary version of gcc will
have a trusting trust backdoor for tcc, which can then compile a clean gcc.

I know this was possible in the past, but I'm not sure if tcc can still
compile a relatively new gcc.

~~~
Gankro
This particular attack is done entirely in the rustc frontend, so adding
another way to build the backend shouldn't matter? One requires a new
implementation of the frontend to apply diversity mitigations.

~~~
kibwen
Which makes me wonder if Miri (
[https://github.com/solson/miri](https://github.com/solson/miri) ) counts as
an alternate implementation of the frontend. Combined with, say, Cretonne (
[https://internals.rust-lang.org/t/possible-alternative-
compi...](https://internals.rust-lang.org/t/possible-alternative-compiler-
backend-cretonne/4275) ) I wonder if we couldn't cobble together an alternate
frankencompiler for mitigating RoTT.

~~~
Manishearth
You need to generate the MIR though (to feed to miri); this only fixes attacks
that happen in the pipeline after MIR. Doing path and type resolution is the
tough part and so far we only have one implementation of that.

[https://github.com/thepowersgang/mrustc](https://github.com/thepowersgang/mrustc)
exists, anyway. I don't think it can compile rustc yet.

------
mrob
How about writing a minimal pseudo-Rust compiler that treats all invalid Rust
code as undefined behavior? This would be a lot easier than reimplementing the
real Rust compiler, because a lot of the complexity is in the compile time
error checking (eg. the borrow checker) and the detailed error messages. The
original Rust compiler could be used to check the code is valid first. Would
this be any use for diverse double compiling style countermeasures? You might
be able to write a backdoor that both circumvents the error checking in the
real Rust compiler and exploits undefined behavior in the pseudo-Rust
compiler, but this must be more difficult than a traditional trusting trust
attack.

~~~
comex
Someone has tried to do something along those lines:

[https://github.com/thepowersgang/mrustc/blob/master/README.m...](https://github.com/thepowersgang/mrustc/blob/master/README.md)

> Thankfully, (from what I have seen), the borrow checker is not needed to
> compile rust code (just to ensure that it's valid)

It's not finished, though, and honestly I'm pretty sure that borrow checking
isn't even that hard compared to all the other stuff a Rust compiler has to
do, like generic/trait resolurion. But we'll see.

~~~
naasking
> It's not finished, though, and honestly I'm pretty sure that borrow checking
> isn't even that hard compared to all the other stuff a Rust compiler has to
> do, like generic/trait resolurion.

You can implement basic trait resolution via name mangling and a global map. I
did this as an experiment a number of years ago to implement type class-like
overloading in C. I'm not sure how far you could take that technique though. I
don't think it would work for higher-kinded types, but Rust doesn't have
those.

------
nickpsecurity
If you really find that attack interesting, you might also find it interesting
to read the paper Thompson ripped it off of. Paul Karger co-invented INFOSEC
and more attacks/defenses than about anyone in the field if you're counting
foundational stuff. He wrote during his landmark pentest of MULTICS that the
PL/I compiler could be subverted with a trap door. Added you could even do a
compiler/compiler trap. The trap doors were their favorite technique. Thompson
was working on MULTICS and received that evaluation. His initial citation for
his idea was "an unknown, Air Force document." They made him change it later
by giving him another copy. Everyone still credits Thompson despite Karger
inventing it and the original mitigations. Those became part of Orange Book
class A1 requirements for security certification with all those products
coming with defenses against subversion by malicious developers.

The misattribution and forcing a correction by Thompson is in 3.2.4 of their
lessons learned paper:

[http://hack.org/mc/texts/classic-
multics.pdf](http://hack.org/mc/texts/classic-multics.pdf)

Here's the original paper where the so-called Thompson attack is described on
page 17.

[https://www.acsac.org/2002/papers/classic-multics-
orig.pdf](https://www.acsac.org/2002/papers/classic-multics-orig.pdf)

Also note they were inventing both hacking techniques and INFOSEC while doing
this evaluation. It was part of forerunner work happening among small number
of people with little to draw on. It's why you see me say "the legendary Paul
Karger" when describing the results they got. Due credit might be "Paul
Karger's compiler attack popularized & further explored by Ken Thompson's
paper, Trusting Trust." I keep mentioning it until more give it.

Back on topic. These days we have verified and certifying compilers, too. Even
typed assembly language with correctness proofs. Lots of stuff to base it off
of that's close to what most developers can understand. Basic refinement from
Rust compiler code with no optimizations to macro assembly or local scripting
languages is what I've been recommending outside verified compilers since any
developer can do it without special tooling. I even proposed bash one time
although as a compile target more than what I'd try code it in lol. I see
using local scripting is in your suggestions, too. That different people are
thinking on same lines here more often might mean it's worth exploring
further.

------
zitterbewegung
A way to migtate this attack is here [http://www.acsa-admin.org/countering-
trusting-trust-through-...](http://www.acsa-admin.org/countering-trusting-
trust-through-diverse-double-compiling/) .

~~~
Manishearth
Yes, I mention it in the post :)

~~~
akkartik
Worth linking "diverse double-compiling" there or to some other resource, if I
may make the suggestion.

For other interested readers:

a) html version of David Wheeler's dissertation:
[http://www.dwheeler.com/trusting-
trust/dissertation/html/whe...](http://www.dwheeler.com/trusting-
trust/dissertation/html/wheeler-trusting-trust-ddc.html). I read it over a
week last month, and it made a big impression on me.

b) the HN discussion for the Wheeler dissertation:
[https://news.ycombinator.com/item?id=12666923](https://news.ycombinator.com/item?id=12666923)

~~~
nickpsecurity
As I said there, Wheeler was exemplary for handling most of this right ahead
of time. He gave credit to Karger for inventing the compiler subversion
attack. He wrote the reference page on high-assurance FLOSS (for compiler
verification) and high-assurance SCM (for repo security, esp distribution). He
also wrote a cheat I gripe about in reproducible builds to give us something
to work with if high-assurance methods get ignored. Any of his stuff on this
subject is worth reading.

Here's his high-assurance and SCM pages:

[http://www.dwheeler.com/essays/high-assurance-
floss.html](http://www.dwheeler.com/essays/high-assurance-floss.html)

[http://www.dwheeler.com/essays/scm-
security.html](http://www.dwheeler.com/essays/scm-security.html)

------
qwertyuiop924
This is pretty cool, and a neat demo.

For the minority of you that haven't read the original Reflections on Trusting
Trust, you should do so now. It goes more in depth on this attack category,
and its implications. Personally, I would say it's one of the few papers that
should be required reading for programmers (along with In The Beginning Was
The Command Line, and The Lambda Papers).

------
nwah1
So, technically, it is possible that there's an undetected backdoor that has
lingered around since GCC 1.0

~~~
smegger001
Technically it would have been in the original C complier written for the
original unix by Ken Thompson and Dennis Ritchie, and been compiled in every
complier and login utility since the 70s. At this point try finding a complier
that doesn't trace is compiling lineage at some point reach back to a complier
compiled by a complier that doens't eventually reach back to it may be pretty
hard.

~~~
nwah1
True, but that compiler wasn't open source. The real head-scratcher here seems
to be that you can have a self-replicating security vulnerability in a
completely open source stack _including the compiler_

------
tomerv
> "The local variable is called krate because crate is a keyword"

This is an interesting solution to a problem I often face (whenever I write a
tool to in environment X to process something for environment X) . Is this a
common way to handle this problem? I don't think I've seen this before.

~~~
steveklabnik
Yes. In Ruby, 'class' is a keyword, so most people use 'klass', for example.

The other one that comes up in Rust is 'type', people tend to use 'ty'.

~~~
wink
The Java version used to be 'klass' and 'clazz' \- haven't seen it in a few
years though.

~~~
Manishearth
I've seen it crop up quite often in JNI code (also code that uses reflection)

------
yazaddaruvala
Say a trusting trust attack is discovered. Whats the resolution? Do you
manually edit the binary? Do you rewrite a compiler in a language with a
verified compiler?

Would something like randomizing memory layouts, or reversing stack direction
be an easy mitigation or solve an attack like this?

~~~
logicallee
you use an old compiler binary that didn't have the attack yet to compile the
latest source code that it is able to compile, then use that compile the
latest it can compile, and so forth.

what I mean is that I wouldn't expect a the 2012 rust prerelease to be able to
compile the 2016 rust source code, but it can probably do 2013, use that to
compile 2014, use that to compile 2015 use that to compile 2016.

As long as you have a single binary from any point before the attack was
introduced it shouldn't be an issue. The whole point is that at no point does
the source code contain the trust backdoor, so you can just work forward from
any binary that doesn't have it yet.

if the very first version of rust binary already had it as an issue, as well
as was written in rust, you could conceivably have a problem though. then you
would need an alternative compiler, however sub-optimal it might be...

or you could simply patch and remove the backdoor from the binary and then
have it compile itself without inserting the backdoor.

~~~
Manishearth
> but it can probably do 2013, use that to compile 2014, use that to compile
> 2015 use that to compile 2016.

Clearly you aren't aware of Rust's history :)

proto-Rust has been under so many rapid changes that each compiler usually
only compiles with a specific hash. _Now_ stuff works with a numbered Rust
release, but that's a relatively new phenomenon. This process will likely need
to go through hundreds of compilation steps. Doable, but not as simple as a
year-by-year process.

~~~
vog
I believe the year numbers were used by logicallee only as an example. Of
course, if somebody is trying this, they need to figure out a different
(smaller) time scale that actually works.

In the worst case, you have to follow each commit in the version control
system of the compiler, but I'm pretty sure you don't need to do it that fine
grained.

~~~
logicallee
yes. I'm pretty shocked Manishearth didn't get that I was just giving examples
of the process with placeholder dates, since I started my comment _explicitly_
stating (I add emphasis here):

>you use an old compiler binary that didn't have the attack yet to compile
_the latest source code that it is able to compile_ , then use _that_ to
compile the latest _it_ can compile, and so forth.

in trying to find the "latest source code that it is able to compile" you can
do a binary search backward from the current version of the source code. it
doesn't take long to find the latest version for each one (i.e. the latest
each one compiles without error, into a working binary that passes some test
suite). And anyway whenever you're binary-searching forward (i.e. after any
point where the binary search yields ""greater-than" because the one you just
tried compiled) then you can just use the version you just successfully
compiled. Let me illustrate what i mean with this binary search:

so if we're at version 1 million today, which the version 7 doesn't compile,
then you try version 7 on version 500,000 (it'll fail), then on version
250,000 (it'll fail), version 125,000 (it'll fail), version 125,000 (it'll
fail), version 62,500 (it'll fail), version 31,250 (it'll fail), version 62500
(fail), version 31250 (fail), versions, 15625, 7812, 3906, 1953 and 976.

Now suppose that version 7 compiles version 976 successfully. So the failure
with version 7 is between 976 and 1953. But since version 976 is stronger, you
can start working forward with version 976. So it's like a binary search
that's restarted whenever you get a "greater-than".

Even if for some reason this were a manual process, each time you get a
working version you at least halve the remaining space.

Finally, as you said above, someone could make a batch file / shell script
that literally goes through every commit in the version control system (not
skipping any) and always use the previously working version on the next
working version. A script doing so may well run in matter of days, though,
depending on how long it takes to compile a the compiler compiler. Ordinarily
there are a lot of commits!

The binary search above cuts this down significantly. (However the binary
search isn't theoretically _guaranteed_ to be faster; after all theoretically
we can imagine that version 7 compiles version 8 but fails on version 9;
version 8 compiles version 9 but fails on version 10; etc. So theoretically
every single commit could be breaking. (Theoretically.)

But that's extremely unlikely to be the case. I don't imagine you'd have to do
more than a few hundred compilations with the above binary-search methodology
before you got one that started with version 7 but stepping through the
commits in the way specified, produced a binary that compiles the latest
version.

The process may have to be partly manual due to breaking changes in semantics
of invoking the compiler or its dependencies, but that should be rare.

This process would also in the end allow you to do a bit-for-bit comparison of
the output of the current version of the compiler when compiled using the
above trusteable version, versus compiled from source code with the "trusting-
trust" backdoor (where every version is backdoored and inserts a backdoor when
compiling itself, without this backdoor being in the source code anywhere.

so the above process would let you tell whether there's a trusting-trust
backdoor, as long as there is a single early version that for sure didn't have
it yet, and you have the commit history (from which the trusting-trust has
been edited out).

as I said above, if you don't have a single known-good version without the
trusting-trust backdoor, then you'll have to write something that can compile
version 2 or 3 (or 7) yourself, in another language.

~~~
Manishearth
I did get that they were placeholders :)

I just wanted to note that the scale was way off. You said "I wouldn't expect
a the 2012 rust prerelease to .." followed by "but it can probably do 2013",
so it was clear you had an expectation of what the dates would be
approximately like.

And for many compilers, this is true -- you can use really old compilers to
compile the new one. But not everyone is aware of how tumultuous Rust's
history is, so I thought it interesting to note.

I should have worded it better I guess.

~~~
logicallee
yeah I was just trying to illustrate what I meant, like, the process. And
you're right, I didn't know it was so tumultuous so with your clarification
it's an interesting observation :)

(By comparison the specifications for C don't break earlier compilers often at
all.)

Thanks for the clarification :)

------
scythe
I suppose the true counter is to write a verified scheme (or forth)
interpreter in assembly language, and then write a simple rust compiler in
that scheme, which you use to compile the real compiler, and then you can use
that to make an optimized build of the compiler.

>So you have a string containing the contents of the module, except for itself

I assume interesting versions of TT would have to avoid this trick, since
someone running "strings" on the binary would notice something very
suspicious, unless something strange is done to string literals.

~~~
Manishearth
Unless your assembler, loader, OS, or microprocessor is also backdoored :)

The original article was really about this -- at the end of the day, you have
to trust _someone_. Of course, we more easily trust microprocessors and
assemblers over binary blobs, so

> I assume interesting versions of TT would have to avoid this trick

Pretty easy to encode the string literal into some binary format.

Alternatively, serialize and later deserialize the AST with a stable binary
serialization mechanism.

A really good version of TT operating on the AST would have to backdoor not
only the part where it _creates_ the AST, but also the parts where
intermediate state is displayed by the compiler (e.g. where it can dump
AST/MIR output).

There are things you can do. As a proof of concept, I didn't bother to do
them. My current POC is toothless and I like it that way!

It's cleaner to instead operate at the _end_ of the pipeline; on llvm ir or
the generated binary (but it's also harder to write). And if you can insert a
trusting trust attack in llvm itself, well, that would be something :)

~~~
dwheeler
> Unless your assembler, loader, OS, or microprocessor is also backdoored :)

True... but it has to be backdoored for that particular system. There are many
ways you can make it very unlikely that the other compiler/system is
backdoored for the same target.

For more information, see my work: [http://www.dwheeler.com/trusting-
trust/](http://www.dwheeler.com/trusting-trust/)

~~~
Manishearth
Of course :)

Great work on DDC, btw, I really enjoyed that paper.

~~~
dwheeler
Thanks!!

I hated the idea that there could be an uncounterable attack... so I kept at
it until I could show that there _is_ a countermeasure.

------
adamcharnock
I believe this attack featured early on in the Nexus trilogy of books by Ramez
Naam.

I remember being impressed at the time. An excellent read covering many topics
often featured on HN.

Edit: typo

------
foota
Seems to be like the best way to mitigate this would be to write a rust
interpreter in assembly and use that to compile the compiler.

~~~
drvdevd
One of the interesting points raised in the original Trusting Trust paper was
exactly this - that next level of trust can again be subverted via microcode
modification, and so on and so forth. I really don't view Trusting Trust (in
the original paper) as an _attack_ so much as a philosophical question being
asked about trust in general and the way it propagates through supply chains.
It's almost a paper more on _economics_ than anything else..

~~~
rudolf0
Once intelligence agencies start backdooring microcode (especially in a way
that chip manufacturers can't detect), I'm throwing out my computer and living
in the woods.

~~~
SomeStupidPoint
If you're in a situation where it matters, the safe assumption would be Intel
chips have been backdoored a while, and others likely are too.

~~~
bluejekyll
Agreed. If it matters this much to you, better move now.

I'm not even sure that the open source RISC-V initiative might prevent this,
as theoretically the NSA could gain access to the manufacturing plant and
insert their own core.

The only way to catch that at that point would be to X-ray the chip and look
for their mod, or something. Anyway, assume the NSA has access to your shit,
focus on preventing the random from grabbing your bank account access and
stealing your money.

