
GNU Mes 0.17: towards bootstrappable builds for GuixSD - fanf2
http://lists.gnu.org/archive/html/info-gnu/2018-08/msg00006.html
======
dane-pgp
This seems to me like a necessary complement to the efforts around making
packages build reproducibly.

It has long been accepted that if a piece of software requires a non-Free
compiler to build it, then that piece of software is de facto non-Free too.
Taking that to its logical extreme, a piece of software isn't Free unless it
can be built by a compiler whose recursive sequence of meta-compilers leads
back to a minimal, audited binary seed.

Fortunately, once this has been done once (or multiple times independently,
producing the same results), all future compilations and software can
potentially enjoy the benefits of the process.

~~~
MaysonL
Assuming that you can trust the computers it's running on.

~~~
sn41
This is basically the argument made in "Reflections on Trusting Trust", the
Turing Award Lecture of Ken Thompson.

------
zapita
> _It consists of a mutual self-hosting Scheme interpreter written in ~5,000
> LOC of simple C and a Nyacc-based C compiler written in Scheme._

I’m trying to understand how important this part is. Is there a fundamental
reason this needed to be a C compiler and Scheme interpreter that each can
compile or interpret the other? Or is it just that they needed to support
those two languages to bootstrap this particular software distribution?

~~~
avhon1
Ultimately, the goal is to compile a trusted copy of the gcc source code with
a trusted compiler. This means that you need a trustworthy c compiler binary.
In order for the compiler binary to be trustworthy, you have to either:

1\. read and understand the machine code of the compiler's binary, or

2\. have an observable compiler that can compile itself and also gcc (or at
least better compilers that can compile gcc).

This project uses approach 2 - an interpreter to interpret a compiler
compiling it's own interpreter, and also a better compiler. Because the
compiler is interpreted, you can potentially intervene and observe the
compilation progress. (To watch a compiled binary of a compiler compiling
itself, you would need a trustworthy debugger binary, which you can't
bootstrap unless you can already trust some other compiler, or read+understand
the debugger's binary.)

Because the interpreted compiler can compile its own interpreter, and the
interpreter interprets its own compiler, you can read and understand the
source code of each, observe the interpreted compiler compile the interpreter,
verify that the result is bit-for-bit identical to what you have been running,
and now you have an executable compiler that you can trust.

So, having a mutually co-hosting compiler and interpreter is a neat way of
closing the compiler trust loop -- instead of always having to trust some
earlier compiler, this allows you to establish trust (or distrust) of a pair
of programs by running them on themselves and comparing what you get to what
you started with. The fact that the interpreter is written in scheme is
incidental to the fact that this project is intended for GuixSD, a linux
distribution that uses a package manager written and configured in scheme. I
assume scheme was chosen because

1\. people who work on GuixSD like scheme, and are comfortable reading and
writing it, and

2\. scheme is a language with a good complexity/power ratio, minimizing the
amount of code that has to be written, read, and understood in order to trust
the compiler toolchain.

The fact that the compiler is a c compiler is more important, because with a
trusted c compiler you can compile (old versions of) gcc. With only a trusted
X compiler (where X is not C or C++), you would need to add another step, a C
compiler written in X. This would still be feasible, but adds more code to the
already-enormous pile that needs to be trusted.

~~~
specialist
Nice explanation, thanks.

"Trusting Trust" makes my head hurt.

Has anyone considered bootstrapping on completely different stacks and then
diffing the results?

Maybe even using cross compilers.

I can imagine hostile actors compromising various Intel, AMD, ARM chips. But I
can't imagine those compromised targets all behaving the same way. Or going
back in time to port their compromises to obsolete architectures.

~~~
csense
> Has anyone considered bootstrapping on completely different stacks and then
> diffing the results?

Yes.

[1]
[https://news.ycombinator.com/item?id=12666923](https://news.ycombinator.com/item?id=12666923)

------
matthewbauer
This seems like a really neat idea! I think it would be cool to use plain
source code to completely bootstrap a system. But, at the same time, this
sounds like a maintenance nightmare! Maintaining older or smaller versions of
software just to avoid self-hosting is bound to hit all kinds of bugs - big
and small.

At the same time, I'm not convinced this completely solves the problem. Maybe
you can trust your C compiler now, but can you trust your Bash interpreter, or
your sed command? What stops them from injecting things into your C compiler
while it's building. At the end of the day, you have to trust some software,
why not include your C compiler in this list?

Anyway, I am much more concerned about much higher places in free software
bootstrapping. Lots of software has switched from Autotools to Meson without
thinking about how we will bootstrap this stuff. Meson requires you to have
Ninja to work, but how can I bootstrap Ninja without Python? Autotools is
admittedly bad, but at least you only needed bash & make for it to work.

~~~
paulie_a
I can't find a reference to it but there was an intentional security bug
introduced to a c compiler that would pass through to the compiled code.

~~~
fanf2
Reflections on trusting trust, Ken Thompson’s Turing award lecture

[https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7...](https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf)

------
carapace
Really cool.

See also: CakeML [https://cakeml.org/](https://cakeml.org/)

> CakeML is a functional programming language and an ecosystem of proofs and
> tools built around the language. The ecosystem includes a proven-correct
> compiler that can bootstrap itself.

