The idea is that you implement a simple, ASCII-based VM for your platform and that's enough to bootstrap a number of different assemblers, to a basic C compiler, to a full-fledged C compiler and (very minimal) POSIX environment.
The goal is twofold: a base for trusted compilation like this one, and a way to guarantee long-term executability of various archiving programs (ie: guarantee that you can unrar a file in 20 years with minimal work).
EDIT: very rough repo is here - https://github.com/mmastrac/bootstrap
My philosophy is of having a very simple and custom language (which I called G) for which it is easier to write a compiler in (x86) Assembly, then write a C compiler in G that is good enough to compile tcc. Then tcc is known to be able to compile gcc 4.7.4 (the last version which does not require a C++ compiler).
My C compiler is starting to have a shape, and in theory (if I find the time to work on it) it should not be far from supporting all that tcc needs.
The README in the linked page contains more information.
I'll look at your code too!
If you want to have some brainstorming on #bootstrappable at freenode, there are often interesting discussions on these things.
My repo is at https://github.com/akkartik/mu/tree/master/subx#readme
Does this take into account a potential change in the predominant architecture? I.e., we move from x86 to fooarch? Presumably there's more work than "implement the VM in fooarch instructions"? You'd have to write a fooarch assembler as well, right? As well as fooarch C compiler backends?
You can also choose to write the VM in raw assembly. While this isn't ideal, the VM itself is mainly just straightforward register operations and should map trivially to any hardware that has bit operations and hardware 32-bit multiply/divide.
If it comes down to it, you can implement the VM itself on bare metal, but you'll need to do some work implementing things like a filesystem (not terribly hard to get a basic, non-scalable one up-and-running).
I suppose there's an assumption that the platform provides 32-bit integers, but I _think_ that's a safe assumption.
Originally Wirth only planned to use it for bootstrapping purposes, then UCSD decided to make a whole platform on top of it.
Aside from trust, the other reason people are doing this is for fun challenge of building things from ground up. They also are learning about interpreters, compilers, assembly, etc. The author of this work talked like he is doing it the way he is mainly for the challenge.
Creating a backdoor distributed across a bunch of 7400-series logic chips would be pretty unlikely.
Since Reflections on Trusting Trust has been linked already, I'm going to offer something else. Today's nightmare: https://www.teamten.com/lawrence/writings/coding-machines/
That being said, it's not professional fiction. It's possibly the first work of the writer, and works because of talent (probably) and a good premise (definitely), but it's sorely lacking in polish and editing. I wouldn't look too closely at the cracks. :)
It does look like an interesting movie. I think I'll take a look.
From http://git.savannah.nongnu.org/cgit/stage0.git/tree/README (which also declares GPL3):
> This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of Compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.
> Additionally, all code must be able to be understood by 70% of the population of programmers.
If the code can not be understood by that volume, it needs to be altered until it satifies the above requirement.
> A class of minimal bootstrap binaries that has a reproducible build on all platforms. Providing a verifiable base for defeating the trusting trust attack.
My first programming language was Fortran, but I couldn’t get my first program to compile! I never did. I switched to something easier, Basic. Then went back to Fortran successfully. I toyed with programming for years before deciding on it as a career.
Ten years from my first attempts I was a good programmer doing really interesting things.
In twenty years I was architecting distributed systems for IBM’s new AIX operating systems.
In thirty years I was chief scientist at a very successful software company.
Key to my success was to keep learning from people smarter than me by reading, taking classes, and practice doing hard things. Although it has now been fifty years, I still love programming.
I am phantasizing about a sort of ceremony in which the whole bootstrap process is done live in front of an audience starting with a discrete computer (using e.g. this board as a CPU https://monster6502.com), absolutely no electronic non-volatile memory and the first programs read into the computer from punch cards or punch tape. This would be used to create later stages for more powerful hardware and the end result (after maybe one or two hardware switches) is hopefully a minimal C compiler or similar that can be used to bootstrap production compilers like GCC. Ideally, this binary is shown to be completely identical to a binary built by a typical build process.
Even if such a ceremony is ultimately not very useful, it could still be seen as a kind of artistic performance.
Some context is this: https://bootstrapping.miraheze.org/wiki/Stage0
Initially the project was written in Java, after enough features were working the bootstrap phase could start.
 https://github.com/certik/bcompiler (Fork in GitHub)
Also worth check - https://www.t3x.org/t3x/book.html :-)
Compilers: Principles, Techniques, and Tools
Simply stated, a compiler is a program that can read a program in one language - the source language - and translate it into an equivalent program in another language - the target language
Engineering a Compiler
Compilers are computer programs that translate a program written in one language into a program written in another language
And was written using the bare metal text editor found here:
Recursive descent seems to be the go-to parsing technique for compilers both big and small now. I like how all the repetitive functions for each level have been refactored into a "general_recursion" function, but if you want to make it even simpler and yet more extendable, table-driven precedence climbing would be ideal:
I think the fact that everyone just writes recursive descent parers tells us in practice that there isn't sufficient value in using more techniques like table-driven variants and they don't make anything practically simpler.
If it was significantly more clear, people would use it in practice! This makes me think it is not in fact significantly more clear.
I did research work in parsers, and I work professionally in compilers now, and guess what when I need a parser for in-fix expressions I just write a recursive descent one manually, it's never an issue.
In context of more complex language with small set of infix operators and their precedence classes it is probably not worthwhile unless you really want user-defined operators.
Why would it be more useful than just cross compiling?
The now-defunct Aboriginal Linux project was doing a similar sort of bootstrapping, and the build dependencies for GCC was a big issue:
To work around this, it never used anything later than gcc-4.2.1 from 2007, while we're now on GCC 8.2.
EDIT: Yes, it appears GCC uses C++: https://lwn.net/Articles/542457/
So literally once that step is done, we would have a full path...
That should do the full bootstrap in about 1 minute
I wished I had time today to put this through its paces.