
Bootstrapping trust in compilers - ingve
https://www.owlfolio.org/research/bootstrapping-trust-in-compilers/
======
jperkin
Trust is certainly one of the issues when languages migrate to requiring
themselves to bootstrap.

However, by far the most infuriating (and one I run into frequently in my line
of work, hence the anger) is when you are trying to get the language running
on a platform for which binary bootstraps do not yet exist.

Portability matters. If you want your language to be useful and available to
as many people as possible, why would you seek to artificially limit the
number of platforms it can be built on, just so you can avoid writing the
bootstrap in C? I'm sure there is some amount of pride on the part of the
language author when their language can bootstrap itself, but it certainly
isn't a pragmatic decision.

It's especially frustrating when the bootstrap requirement itself changes so
that only very recent versions of the language are sufficient (e.g. GHC),
leaving the porter to have to reach back into the archives and carefully plot
a path through building multiple versions from the original C-based bootstrap
until they finally get to master.

This is painful, painful work, and then has to be done all over again for e.g.
32-bit vs 64-bit. It doesn't have to be like this.

~~~
jdudek
Pardon my ignorance, but why is cross-compilation not an option in this case?

~~~
e40
It is and that's how it's done. The rant you replied to was just uninformed.

~~~
nickpsecurity
Typically, yes. My method is for high trustworthiness but people merely
concerned with reliability can do this. Got a tool that you've used to
successfully make your other tools? Use it on the next tool. Crazy idea, eh?

------
nickpsecurity
I described how to handle this and trustworthy compilers in general here:

[https://news.ycombinator.com/item?id=10182282](https://news.ycombinator.com/item?id=10182282)

My replies to "jeffreyrogers" have details and links to examples. You do it
bottom-up. I disagree with using Forth as it's a weird language & that reduces
number of people that will verify it. One would be better off with P-code
given it was successfully used to get Pascal on 70 or so architectures. Wirth
and Jurg later used the same approach in Lilith workstation with M-code and
Modula-2. They were able to put together a CPU, high-level assembler (M-code),
high-level language, compiler, OS, editor, and so on in around 2 years by
keeping it simple and consistent. Something like that which maps to what
people already know and do.

So, again, here's your model:

1\. Portable stack or register VM that's ultra-simple plus similar to language
targeting it.

2\. Implementations of that diversified by authors, OS's, and HW.

3\. Subset of language (or simple HLL like Modula-2) coded in whatever you
need to get initial compiler working.

4\. That same compiler re-coded in language of trusted VM and run on all
targets to ensure same results (equivalence checks).

5\. Use that binary to produce an executable from compiler's HLL source and
equivalence check again.

Note: Did I word 5 less confusing than most people do at this point? I put
effort into avoiding "compile the compiler with compiler etc." ;)

6\. Use the binary from No 5 to compile future versions of the compiler
written in a subset of its own language. Should continue using a subset for
easier understanding and correctness. Check language features with testing
suite and sample applications instead of with overly complicated compiler.

So, there you go. Easy stuff already proven by Wirth et al. Not worth another
100 write-ups. Just use what we know. The real problem worth lots of
discussion and investigation is certified, secure/robust compilation. _That_
is a difficult problem open to investigation with new, interesting results
each year. Bootstrapping compilers for masses? That's so 1971. ;)

------
Jabbles
I highly recommend this writeup of someone bootstrapping half a language from
raw hexadecimal upwards:

[http://homepage.ntlworld.com/edmund.grimley-
evans/bcompiler....](http://homepage.ntlworld.com/edmund.grimley-
evans/bcompiler.html)

------
civodul
This is one of the discussions we had at the Reproducible Builds Summit last
week: [https://lists.gnu.org/archive/html/guix-
devel/2015-12/msg001...](https://lists.gnu.org/archive/html/guix-
devel/2015-12/msg00107.html) .

In GNU Guix, we don't go as far as the author suggests (starting from a FORTH-
like VM, then building a small Lisp, etc.), but we've been thinking about
going in that direction: We already have Guile Scheme at the bottom, with
which we can implement a range of tools, ranging from HTTP/FTP clients to ELF
parsers, and more. We could imagine having (possibly feature-limited) variants
of some of the bootstrap tools, written in Scheme, for the purpose of building
the "real" tools.

Our current bootstrap looks like this:
[https://www.gnu.org/software/guix/manual/html_node/Bootstrap...](https://www.gnu.org/software/guix/manual/html_node/Bootstrapping.html)
.

------
OR13
Obligatory reference to previous HN discussion of Reflections on Trusting
Trust by Ken Thompson [1].

I'd be interested in any war stories or links to compilers verified with
things like: Cryptol [2], Coq [3] or Idris [4].

I've seen Cryptol prove equivalence for cryptographic algorithms written in C
and Java. Would love to learn more about how this approach can or can't be
applied to compilers.

1\.
[https://news.ycombinator.com/item?id=2642486](https://news.ycombinator.com/item?id=2642486)

2\. [http://www.cryptol.net/](http://www.cryptol.net/)

3\. [https://coq.inria.fr/](https://coq.inria.fr/)

4\. [http://www.idris-lang.org/](http://www.idris-lang.org/)

------
steveklabnik
I enjoyed this slightly snarky response to this general issue on Reddit:
[https://www.reddit.com/r/rust/comments/2tdsev/compilers_with...](https://www.reddit.com/r/rust/comments/2tdsev/compilers_with_backdoors/cp91cep)

Regardless, this is one of the reasons that I'd really like to have a second
Rust implementation exist.

------
xbtcdev
The trouble with KTH is that somebody could go through all of this song and
dance, and it still wouldn't mean a thing because how do you trust them?

~~~
rejschaap
You don't trust them. The result of this endeavor is not a trustworthy
compiler, the result is a procedure to generate one. Every step in the
procedure can and should be verified independently. What this buys you is a
procedure that produces a trustworthy compiler given your initial environment
is trustworthy. The latter still being an issue of course.

------
Confiks
Here is also a very interesting talk about reproducible builds and trusting
compilers given at the Chaos Communication Congress:
[https://www.youtube.com/watch?v=5pAen7beYNc](https://www.youtube.com/watch?v=5pAen7beYNc)

They also talk about using multiple and very old compilers to bootstrap trust.

------
SamReidHughes
A point made at
[https://news.ycombinator.com/item?id=6360232](https://news.ycombinator.com/item?id=6360232)
is that it suffices to use an older compiler and system, assuming that the
newer one was developed independently.

------
eridal
Given that today hardware is cheap and fast, why are we still using compiled
languages?

Plus using languages that requires source code at runtime help lower the
barrier for newcomers.

~~~
rwallace
By all means use Python if that's what you prefer! But it doesn't solve the
problem discussed in the article: what if the Python binary has been
compromised?

~~~
eridal
Yep I agree. I was talking about compiled _high-level_ languages.

