
C compiler with support for structs written in Assembly - z29LiTp5qUC30n
http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage2/cc_x86.s
======
mmastrac
I started working on a similar VM-based bootstrap for getting from bare metal
to C compiler. If anyone is interested in collaborating let me know.

The idea is that you implement a simple, ASCII-based VM for your platform and
that's enough to bootstrap a number of different assemblers, to a basic C
compiler, to a full-fledged C compiler and (very minimal) POSIX environment.

The goal is twofold: a base for trusted compilation like this one, and a way
to guarantee long-term executability of various archiving programs (ie:
guarantee that you can unrar a file in 20 years with minimal work).

EDIT: very rough repo is here -
[https://github.com/mmastrac/bootstrap](https://github.com/mmastrac/bootstrap)

~~~
ailideex
Why not just cross compile!?

~~~
phoe-krk
Bootstrapping isn't an issue of convenience, it is an issue of trust. You
can't trust the compiler doing the cross-compilation. You literally have to
start from the smallest chunks of assembly code and build your way up to a
fully featured compiler through several stages, each of which is more complex
and can in turn compile more complex code.

~~~
weberc2
This still doesn't get you all the way, since you're ultimately trusting your
chip manufacturer.

~~~
cellularmitosis
If we are just talking about bootstrapping, you could also homebrew a CPU:
[https://news.ycombinator.com/item?id=13208516](https://news.ycombinator.com/item?id=13208516)

Creating a backdoor distributed across a bunch of 7400-series logic chips
would be pretty unlikely.

------
Filligree
As for 'why bootstrap'...

Since Reflections on Trusting Trust has been linked already, I'm going to
offer something else. Today's nightmare:
[https://www.teamten.com/lawrence/writings/coding-
machines/](https://www.teamten.com/lawrence/writings/coding-machines/)

~~~
axaxs
Thanks for this, I'd never read it before and it was very interesting. But by
the end I had more questions than answers. In this situation, my first thought
would never be 'i need to write my own compiler', vs say, trying any combo of
the handful of c compilers on other boxes. Also it's not clear to me about the
ending. Was this a real story? I couldn't tell if the 'letter' was meant to be
a dystopian what-if, real, or anything in between.

~~~
Filligree
The entire page is fiction, presumably. Hopefully.

That being said, it's not _professional_ fiction. It's possibly the first work
of the writer, and works because of talent (probably) and a good premise
(definitely), but it's sorely lacking in polish and editing. I wouldn't look
too closely at the cracks. :)

------
exikyut
Vital details:

From
[http://git.savannah.nongnu.org/cgit/stage0.git/tree/README](http://git.savannah.nongnu.org/cgit/stage0.git/tree/README)
(which also declares GPL3):

> _This is a set of manually created hex programs in a Cthulhu Path to madness
> fashion. Which only have the goal of creating a bootstrapping path to a C
> compiler capable of Compiling GCC, with only the explicit requirement of a
> single 1 KByte binary or less._

> _Additionally, all code must be able to be understood by 70% of the
> population of programmers. If the code can not be understood by that volume,
> it needs to be altered until it satifies the above requirement._

From
[https://savannah.nongnu.org/projects/stage0/](https://savannah.nongnu.org/projects/stage0/):

> _A class of minimal bootstrap binaries that has a reproducible build on all
> platforms. Providing a verifiable base for defeating the trusting trust
> attack._

 _Cooooooool._

~~~
rasengan
I am in the bottom 30 percentile of programmers. :(

~~~
shawn
Everyone was. Work hard and force yourself to learn something alien each day.

~~~
todd8
Shawn’s perspective is great. Like chess players or golfers, there is (almost)
always someone better, but practice improves our ability. Programming takes
practice, but it gets more satisfying the more you do of it.

My first programming language was Fortran, but I couldn’t get my first program
to compile! I never did. I switched to something easier, Basic. Then went back
to Fortran successfully. I toyed with programming for years before deciding on
it as a career.

Ten years from my first attempts I was a good programmer doing really
interesting things.

In twenty years I was architecting distributed systems for IBM’s new AIX
operating systems.

In thirty years I was chief scientist at a very successful software company.

Key to my success was to keep learning from people smarter than me by reading,
taking classes, and practice doing hard things. Although it has now been fifty
years, I still love programming.

------
gmueckl
It would be kind of cool to start small with entirely verifiable hardware for
the first bootstrapping stages.

I am phantasizing about a sort of ceremony in which the whole bootstrap
process is done live in front of an audience starting with a discrete computer
(using e.g. this board as a CPU
[https://monster6502.com](https://monster6502.com)), absolutely no electronic
non-volatile memory and the first programs read into the computer from punch
cards or punch tape. This would be used to create later stages for more
powerful hardware and the end result (after maybe one or two hardware
switches) is hopefully a minimal C compiler or similar that can be used to
bootstrap production compilers like GCC. Ideally, this binary is shown to be
completely identical to a binary built by a typical build process.

Even if such a ceremony is ultimately not very useful, it could still be seen
as a kind of artistic performance.

------
pnathan
Very, very nice. Not often you get to study non-trivial assembly programs.

Some context is this:
[https://bootstrapping.miraheze.org/wiki/Stage0](https://bootstrapping.miraheze.org/wiki/Stage0)

~~~
akkartik
This is fantastic. I see it's mostly written by the author of the project. Why
the heck isn't it in the repo's Readme?!

------
lboasso
Bootstrapping a compiler is fun. I wrote a self-hosting compiler for the
Oberon-07 language, targeting the JVM:
[https://github.com/lboasso/oberonc](https://github.com/lboasso/oberonc)

Initially the project was written in Java, after enough features were working
the bootstrap phase could start.

------
pome
If someone is interesting about how to bootstrapping, this[0][1] tutorial is
just awesome, tiny compiler from nothing (raw machine codes and hex!) to the
self-hosting!

[0]
[https://web.archive.org/web/20120712111627/http://www.rano.o...](https://web.archive.org/web/20120712111627/http://www.rano.org/bcompiler.html)
[1] [https://github.com/certik/bcompiler](https://github.com/certik/bcompiler)
(Fork in GitHub)

Also worth check -
[https://www.t3x.org/t3x/book.html](https://www.t3x.org/t3x/book.html) :-)

------
sandov
Then how do they usually bootstrap C? Do they write a C compiler without
structs, then program a C with structs compiler using C without structs?

~~~
garmaine
They cross-compile on a machine that already has a C compiler.

~~~
fb03
But how was the first compiler bootstrapped? Serious question (how can a
compiler of X language be written in X?)

~~~
coliveira
They wrote in another language. I think the bootstrap C compiler was written
in BCPL.

~~~
garmaine
And, for example, C++ was originally a C program that transpiled C++ -> C.

~~~
sigjuice
No, it wasn’t. “Transpile” is not a real word. There is no need to desecrate
the first ever C++ compiler.

~~~
garmaine
[https://en.m.wikipedia.org/wiki/Source-to-
source_compiler](https://en.m.wikipedia.org/wiki/Source-to-source_compiler)

[https://en.m.wikipedia.org/wiki/Cfront](https://en.m.wikipedia.org/wiki/Cfront)

~~~
sigjuice
“Source-to-source” is redundant. You can just shorten this to “compiler”.

~~~
pawelmurias
It's not redundant. A "source-to-source" compiler isn't a "source-to-machine
code" compiler

~~~
sigjuice
It is redundant. See page 1 of most compiler books.

Compilers: Principles, Techniques, and Tools

 _Simply stated, a compiler is a program that can read a program in one
language - the source language - and translate it into an equivalent program
in another language - the target language_

Engineering a Compiler

 _Compilers are computer programs that translate a program written in one
language into a program written in another language_

------
tempodox
What assembler syntax is this? The comments don't say.

~~~
z29LiTp5qUC30n
M0 macro assembly, it is implemented here:
[http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/M...](http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/M0-macro.hex2)
Which is Built in Hex2 which is implemented here:
[http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/s...](http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/stage1_assembler-2.hex1)
Which is built in Hex1 which is implemented here:
[http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/s...](http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/stage1_assembler-1.hex0)
Which is self-hosting and implemented here:
[http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/s...](http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/stage1_assembler-0.hex0)

And was written using the bare metal text editor found here:
[http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/S...](http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage1/SET.s)

------
userbinator
I don't recognise the architecture despite it saying x86, but it reminds me of
ARM.

Recursive descent seems to be the go-to parsing technique for compilers both
big and small now. I like how all the repetitive functions for each level have
been refactored into a "general_recursion" function, but if you want to make
it even simpler and yet more extendable, table-driven precedence climbing
would be ideal:

[http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing)

~~~
chrisseaton
> Recursive descent seems to be the go-to parsing technique for compilers both
> big and small now ... if you want to make it even simpler and yet more
> extendable, table-driven precedence climbing would be ideal

I think the fact that everyone just writes recursive descent parers tells us
in practice that there isn't sufficient value in using more techniques like
table-driven variants and they don't make anything practically simpler.

~~~
dfox
That depends on what you are parsing. For in-fix expressions shunting yard
algorithm is significantly more clear (ie. you don't have to resort to left-
right recursion tricks and invent labels like "product") and allows you to
extend the set of supported operators by simply adding table entry (which you
can do even while parsing and thus support user-defined operators)

~~~
chrisseaton
> For in-fix expressions shunting yard algorithm is significantly more clear

If it was significantly more clear, people would use it in practice! This
makes me think it is not in fact significantly more clear.

I did research work in parsers, and I work professionally in compilers now,
and guess what when I need a parser for in-fix expressions I just write a
recursive descent one manually, it's never an issue.

~~~
dfox
It is significantly more clear if you only parse infix expressions which is
why it is used for many introductory "lets write a calculator" examples.

In context of more complex language with small set of infix operators and
their precedence classes it is probably not worthwhile unless you really want
user-defined operators.

~~~
mncharity
My very fuzzy recollection is there's long-ago work compiling user-defined
mixfix operator precedence parsing down to recursive descent, but yes, opp
implementation is pretty for large complex user-defined operator sets.

------
pecg
This program looks interesting. Maybe it could help in the bootstrap process
of tcc, though I don't know if it is written in ANSI C. On a related note, I
will try to test compiling lua's runtime environment and interpreter, because
I'm sure it is written using standard C already.

~~~
nickpsecurity
Part of the goal of these efforts is doing that. For example, here's a page
tracking which tools can compile which others:

[https://bootstrapping.miraheze.org/wiki/C_compilers](https://bootstrapping.miraheze.org/wiki/C_compilers)

------
another-cuppa
Just a few hours ago there was a conversation in #gentoo-chat about the
Thompson attack. I wondered how difficult it would be to write a C compiler
that can compile GCC. How far away is this?

~~~
chubot
Isn't GCC written in C++ now? I think they are probably aiming for a very old
version of GCC written in C (or that's all they can really hope for).

The now-defunct Aboriginal Linux project was doing a similar sort of
bootstrapping, and the build dependencies for GCC was a big issue:

[http://landley.net/aboriginal/](http://landley.net/aboriginal/)

To work around this, it never used anything later than gcc-4.2.1 from 2007,
while we're now on GCC 8.2.

[http://gcc.gnu.org/releases.html](http://gcc.gnu.org/releases.html)

EDIT: Yes, it appears GCC uses C++:
[https://lwn.net/Articles/542457/](https://lwn.net/Articles/542457/)

~~~
earenndil
Yes, but you can use gcc 4.2.1 to compile a newer version which _does_ use
c++.

------
sam0x17
This is super impressive. What might be some compelling technical reasons to
use this over existing C implementations? I just don't know so I thought I'd
ask.

------
pavelbr
If I wanted to actually run this compiler, how would I build it?

~~~
z29LiTp5qUC30n
git clone
'[https://git.savannah.gnu.org/git/stage0.git'](https://git.savannah.gnu.org/git/stage0.git')
make test

That should do the full bootstrap in about 1 minute

------
newnewpdro
I'd appreciate a bare minimum C compiler capable of building gcc written in
either assembly or x86 machine code that can be trivially audited in as-
executed form.

------
jacquesm
That's impressive.

I wished I had time today to put this through its paces.

~~~
z29LiTp5qUC30n
The C version is alot easier to read:
[https://github.com/oriansj/M2-Planet](https://github.com/oriansj/M2-Planet)

------
maxpert
Mind blown

