
I wrote a self-hosting C compiler in 40 days (2015) - rspivak
http://www.sigbus.info/how-i-wrote-a-self-hosting-c-compiler-in-40-days.html
======
pklausler
That's about how long it took me to write a self-compiling C compiler and
basic runtime library for a simulation environment for a new instruction set
architecture back in '98 (it eventually became the Cray X-1). Although I did
start with a working C preprocessor left over from an earlier project, and in
many ways the preprocessor can be the hardest part to get right.

Bringing up a compiler in a simulated environment has its advantages over real
metal. Seeing instruction traces with computed values is awesome for
debugging.

------
sly010
> At first the compiler was about 20 lines long, and the only thing that was
> able to do is to read an integer from the standard input and then emit a
> program that immediately exits with the integer as exit code.

I always find it hard to find the absolute minimum functionality i can
implement so I usually just attack the problem hard, fail, and keep repeating
until I have enough broken functionality, then I start adding tests and
refactoring.

Can I ask what is your day job? :)

~~~
JustSomeNobody
I've heard it called a walking skeleton. This is what I try to do early
because I think it aides greatly in data structure design.

~~~
rjeli
I've heard it as a "skewer", because you're penetrating every necessary layer.

~~~
rjeli
Just kidding, it's actually a Spike:
[http://wiki.c2.com/?SpikeSolution](http://wiki.c2.com/?SpikeSolution)

~~~
marssaxman
I've always used the phrase "driving a wire all the way through", but I like
"spike" better, especially if other people will know what it means.

------
dang
Previously discussed at
[https://news.ycombinator.com/item?id=10731002](https://news.ycombinator.com/item?id=10731002).

------
anonymousiam
"In x86 calling convention, structs are copied to the stack and their pointers
are passed to functions. But in x86-64, you have to destructure a struct into
multiple pieces of data and pass them via registers. It's complicated, so I'll
leave it alone for now. It's rare that you want to pass a struct as a value
instead of passing a pointer to a struct."

Would anyone care to explain this? It makes no sense at all to me. Why would
the 64-bit architecture restrict passing a large object on the stack?

Thanks.

~~~
wyc
Performance. Storing to and retrieving from registers are typically much
faster than writing to and then accessing memory. It's like keeping things in
your 16 hands vs. keeping them in your backpack en transit. I believe this to
be Linux-specific, as the BSDs have a different ABI that relies solely on the
stack.

Some notes:

[http://www.int80h.org/bsdasm/#default-calling-
convention](http://www.int80h.org/bsdasm/#default-calling-convention)

~~~
anonymousiam
Thanks for the explanation everyone. I suppose it was my fault for
interpreting "have to" as an absolute requirement rather than a preferred
method.

~~~
gpderetta
It is a requirement in the sense that if the C code passes the aggregate
parameter by value, the compiler is required to generate the assembler code to
pass via the registers [1].

[1] inlining, calls to static functions and whole program optimization do give
the compilers freedom to pick a different calling convention of course.

~~~
anonymous_iam
I still don't understand this. If a blob of C code passes a large structure by
value, doesn't it go on the stack? Why would the compiler be required to pass
such an object only via the registers?

~~~
taejo
It's generally expected that functions compiled with different compilers (for
the same target architecture) can call each other. This only works if the
compilers agree on where function arguments go; since they are all required to
do it in the same way, it's better to require everyone to do it the fast way
rather than requiring everyone to do it the slow way.

------
kardos
So does this mean that we can put to rest the 'trusting trust' paradigm? As
in, use 8cc to bootstrap our way into compiling gcc without using gcc?

~~~
akkartik
Indeed, a small trusted compiler is part of the solution. There's still _some_
subtlety in how you use it, though: [http://www.dwheeler.com/trusting-
trust/dissertation/html/whe...](http://www.dwheeler.com/trusting-
trust/dissertation/html/wheeler-trusting-trust-ddc.html)

------
anonymousiam
What isn't stated is whether or not this new compiler requires libc from the
one used to compile it. Assuming a new CPU, the bootstrap code must be written
in assembly (assuming an assembler has been written to translate the .asm
source into machine code). Writing a C compiler in C with no way to bootstrap
is "cheating".

~~~
lifthrasiir
> Assuming a new CPU, the bootstrap code must be written in assembly [...].

Not really, thanks to cross-compilation. Only a small number of CPUs has
really required bootstrapping with direct machine code.

------
e19293001
For me, one of the most rewarding task in learning a programming language is
writing a compiler that can compile itself though more complex programming
languages like Haskell and perl6 are very challenging but doable.

