
I wrote a self-hosting C compiler in 40 days (2015) - rspivak
https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compiler-in-40-days
======
kazinator
> _The C vararg spec is not well-designed. If you pass all function arguments
> via the stack, va_start may be implemented pretty easily, but on the modern
> processor and in modern calling convention, arguments are passed via
> registers to reduce overhead of function calls. So the assumption of the
> spec does not match the reality._

The original assumption of the pre-ANSI <vararags.h> was that way, since it
was just a hack exploiting actual object code behavior in the absence of
language spec support. But ANSI C standardized the ... ellipsis, and the old
hack became undefined behavior. A function declaration with the ellipsis is
not type compatible with one that lacks it. This difference in type means that
the compiler can map variadic functions to a different calling convention,
such as one that always uses the stack for all the trailing arguments, even if
they occur in positions for which the regular convention uses registers.

I wouldn't call variadic functions well designed on the whole, but aspect
isn't badly designed.

------
rui314
Author here. These days I'm writing a new C compiler
([https://github.com/rui314/chibicc](https://github.com/rui314/chibicc)) from
scratch again to write a book about compilers.

Since I'll be using the new one as a reference implementation for a book, and
the book is intended to be for beginners, I put as much effort as I can into
improving the readability of the code. In particular, not only the head of the
repository but every commit in the repo should be readable, so that readers
can easily understand how each feature is implemented. I believe I'm doing a
good job keeping it clean so far. (Actually in order to keep the commit
history clean, I continue rewriting commit history and doing `git push -f`,
but that should be fine because the purpose of publishing the repo is not for
co-development but for sharing a reference implementation.)

chibicc does not have a C preprocessor, but except that it can compile itself
already, so if you are interested, you can take a look.

~~~
meuk
How's it coming along? Will there be an English translation available?
Actually, I just tried to translate part of the book (the whole book is 'too
big to translate') with Google translate, and the translation looks of
remarkably high quality compared to what I'm used to from Google translate.

I would love to see a similar book/project for a linker and assembler.

~~~
rui314
I will translate it to English. Be wary of the machine translation -- using
the neural network they learned how to write surprisingly natural sentences
but that doesn't mean their translation is correct.

------
WalterBright
> I've almost finished implementing C preprocessor in just one day. It's
> actually a port from my previous attempt to write a compiler.

The preprocessor is fiendishly tricky to write. I wondered how he did it in
such a short time :-) I had to scrap mine and reimplement it 3 times.

I wish I had had
[https://www.spinellis.gr/blog/20060626/cpp.algo.pdf](https://www.spinellis.gr/blog/20060626/cpp.algo.pdf)
to work from.

~~~
tandr
For the reference folks - this is coming from the guy who's wikipedia page [0]
says "He was hired by Facebook to write a fast C/C++ preprocessor in D."

Not that I have had any doubts about how hard it is...

[0]
[https://en.wikipedia.org/wiki/Walter_Bright](https://en.wikipedia.org/wiki/Walter_Bright)

~~~
snazz
He also _wrote D_ (or is credited with creating it).

I’m also interested in how one would write a C preprocessor in a day. It’s got
so many tricky edge cases and whatnot.

~~~
WalterBright
> or is credited with creating it

Full disclosure: I looked into the future with my Chronoscope, and copied the
most popular language!

------
pizlonator
Fucking love shit like this.

More folks should write compilers. Can't wait to see your next compiler, Rui.

~~~
nineteen999
> More folks should write compilers

It's on my TODO list. After, of course, I finish that x86_32 preemptive multi-
tasking kernel I started writing 20 years ago and never finished. Plus the
16-bit CP/M clone for the 68000 that's gathering dust. Or maybe after my half
finished Z80 emulator or my half finished Z80-systems-in-Unreal-Engine
project, or my mostly-but-not-completely-working VT100 emulator. Sigh. How do
people ever finish this stuff?

~~~
newnewpdro
Not participating in internet forums is a good start, huge time sink.

~~~
nineteen999
Usually a during work hours activity for me, while stuff is "building", but
point taken.

~~~
sitkack
If ever I heard a reason to use C++ templates, this is it.

------
wyldfire
Rui is [one of?] the primary maintainers of lld: the LLVM project's linker.

EDIT, oh, he says as much at the bottom: "Since then, I have moved to the LLVM
team in Google, and I'm now working on lld, the LLVM linker."

------
bla3
The author has since written a longer book on how to write compilers (albeit
in Japanese):
[https://www.sigbus.info/compilerbook](https://www.sigbus.info/compilerbook)

~~~
proyb
Discovered that book too, still looking for idea how to translate to English.

~~~
rui314
I will translate it to English myself once it's complete.

~~~
reinhardt1053
How to get a notification when the book comes out?

------
jokoon
There are so many small C compilers, I never know which one to choose,
although TinyCC seems to be the best. To me it allows 2 things:

* Use C as a fast scripting language. This can be useful for game programming, where you don't want to recompile your engine, but you can't tolerate the sluggishness of interpreted languages.

* Use C as a compile target. I would love to build a "pythonic lean C++", without the hard stuff, without template or backward compatibility of C. Just python indenting, strongly typed, maps, set, geometric types, python-like standard functions... I guess that using C as a compile target allows one to avoid the hassle of building a low-level compiler... Not sure though.

~~~
garaetjjte
To second point: do you have looked at D?

~~~
jokoon
D is too high level, is not really pythonic, and the syntax seems to be too
distant from C.

I just wish C had some nicer things, like a string type, maps, etc. python
indentation would also make it readable.

D has a lot of difference with C/C++, even if it does things right.

C is nice because it's simple and readable.

D looks like it's using a lot of new syntax and it looks hard to adapt/learn.

------
slacka
Related HN discussion for when the 8cc project was first announced:
[https://news.ycombinator.com/item?id=9125912](https://news.ycombinator.com/item?id=9125912)

~~~
dang
This article itself was also discussed

in 2017:
[https://news.ycombinator.com/item?id=13914137](https://news.ycombinator.com/item?id=13914137)

and 2015:
[https://news.ycombinator.com/item?id=10731002](https://news.ycombinator.com/item?id=10731002)

------
bluetomcat
Once you understand the type declaration syntax via the “declarations mirror
use” guideline, there is nothing weird in C syntax. Any scary “pointer to
function returning pointer to array” declarations are easily disentangled by
applying the operators in the correct order around the declared name. People
who know that rule tend to put the asterisk next to the name in pointer
declarations.

The semantics of the language are quite simple if we discard the weird
implicit integer promotion and conversion rules. Every operation yields a new
value of a certain type, and some expressions are considered lvalues by virtue
of designating a modifiable location in memory.

~~~
kazinator
Rui was struggling with implementing it, not understanding and using it. He
found that with 15 years of C experience, he didn't understand it well enough
to just sit down and implement all of it without studying the spec.

~~~
bluetomcat
I didn’t mean to devalue the efforts of the author, just wanted to point out
that it is relatively straightforward to write a minimal C compiler, and the
gotchas are in some small details.

~~~
Koshkin
But the _art_ of this is in getting all these details accounted for in a
staightforward manner.

