
Writing a C Compiler, Part 2 - luu
https://norasandler.com/2017/12/05/Write-a-Compiler-2.html
======
jsteemann
Nice article and a good read! Just came in and read part 2, so please don't
blame me if the following has been posted in a comment for part 1 already.

IMHO there are plenty of good books on compiler construction. One of the best
ones I ever got my hands on was a book by Niklaus Wirth (a now-retired
professor, among other things, he created Pascal, Oberon and other languages).
He explains all the details of creating your own compiler from the ground up.

It's available only here:
[https://www.inf.ethz.ch/personal/wirth/CompilerConstruction/...](https://www.inf.ethz.ch/personal/wirth/CompilerConstruction/index.html)

In that book, he is creating a compiler for Oberon, a language that was more
or less used for didactic purposes only. The book is also pretty dated, so
there is not much to take away in terms of practically usable compiler code.
But I can still recommend it to everyone, because I think it's didactically
very good, and provides all the necessary details that make compiler
construction so worthwhile (and annoying).

~~~
lboasso
I agree, Niklaus Wirth's compiler book is a good introduction to the topic.

To showcase Wirth's approach, I wrote a self hosted Oberon compiler for the
JVM: [https://github.com/lboasso/oberonc](https://github.com/lboasso/oberonc)

The compiler can also be used to compile and run the source code of Oberon0
provided in the book, all you need is a JVM installed.

~~~
andrewbinstock
That is very impressive. Congratulations!

~~~
lboasso
Thanks, my hope is that the Oberon language will not be forgotten.

The Oberon0 language, described in the compiler book, is just a toy to
illustrate basic compiler techniques. The full Oberon language is a simple but
powerful language, that inspired Go.

------
roknovosel
I feel like a good place to share some of the resources I've compiled from HN
regarding compilers/interpreters construction:

MOOC/Courses:

\-
[https://cseweb.ucsd.edu/classes/sp17/cse131-a/](https://cseweb.ucsd.edu/classes/sp17/cse131-a/)

\-
[http://cs.brown.edu/courses/cs173/2016/index.html](http://cs.brown.edu/courses/cs173/2016/index.html)

\-
[https://lagunita.stanford.edu/courses/Engineering/Compilers/...](https://lagunita.stanford.edu/courses/Engineering/Compilers/Fall2014/about)

\-
[http://www.craftinginterpreters.com/](http://www.craftinginterpreters.com/)

Blog posts:

\- An Intro to Compilers
([https://nicoleorchard.com/blog/compilers](https://nicoleorchard.com/blog/compilers))
and the accompanying Hacker news comments
([https://news.ycombinator.com/item?id=15005031](https://news.ycombinator.com/item?id=15005031))

\- Resources for Amateur Compiler Writers
([https://c9x.me/compile/bib/](https://c9x.me/compile/bib/))

Other resources:

\- Tweet
([https://twitter.com/munificentbob/status/901543375945388032](https://twitter.com/munificentbob/status/901543375945388032))
by Bob Nystrom (works on Dart VM and creator of Crafting Interpreters) -

\- Wren ([http://wren.io/](http://wren.io/)): another project by Bob Nystrom,
a scripting language written in C with beautifully documented code (I
recommend checking it out on GitHub
([https://github.com/munificent/wren)](https://github.com/munificent/wren\)))

\- Ask HN: Resources for building a programming language?
([https://news.ycombinator.com/item?id=15171238](https://news.ycombinator.com/item?id=15171238))

\- Mal – Make a Lisp, in 68 languages
([https://news.ycombinator.com/item?id=15226110](https://news.ycombinator.com/item?id=15226110))

~~~
b3b0p
Thank you! Very nice.

Your comment made me do a quick Google for "Awesome Compilers"

Sure enough: [https://github.com/aalhour/awesome-
compilers](https://github.com/aalhour/awesome-compilers)

(I didn't check if the links posted here are in the awesome-compilers repo
though.)

Edit: I also want to thank OP for continuing part 2. I didn't read part 1
completely because I figured it had a chance to be an abandoned blog post
series. Keep it up OP!

------
jsteemann
A "tiny" (read: small) C compiler can also be found at:
[https://bellard.org/tcc/](https://bellard.org/tcc/)

It has been unmaintained for a while now, but it's still good as a starting
point.

Interestingly enough it's a spin-off of a code submission for winning the
international obfuscated C code contest
([http://www.ioccc.org/](http://www.ioccc.org/)).

Original code (covering just a subset of C) can be found here:
[https://bellard.org/otcc/](https://bellard.org/otcc/)

~~~
amadeusz
I think tcc is now community maintained, see "Savannah project page and git
repository", and you can get there link to project git, which was last updated
just few days ago.
([http://repo.or.cz/w/tinycc.git](http://repo.or.cz/w/tinycc.git)) so seems to
be active.

~~~
dane-pgp
Earlier this year, tcc (re)gained the ability to compile gcc 4.7:

[http://lists.nongnu.org/archive/html/tinycc-
devel/2017-05/ms...](http://lists.nongnu.org/archive/html/tinycc-
devel/2017-05/msg00102.html)

This is an important result for activities such as Diverse Double-Compiling (a
response to the Trusting Trust attack), and as part of the exciting work being
done towards bootstrapping a modern Linux environment from source and a
hex/assembly base:

[http://bootstrappable.org/](http://bootstrappable.org/)

------
userbinator
_but you can actually implement ! in just three lines of assembly_

I was actually expecting the "classic" x86 idiom for !, which also happens to
be 3 instructions, but is slightly shorter and likely faster than the sequence
presented:

    
    
        neg eax
        sbb eax, eax
        inc eax
    

As a bonus, it also works in 16-bit mode (using the 16-bit registers) on
everything back to the 8086, since SETcc is 386+, and given that I've seen it
in early DOS compiler output, this neg/sbb/inc idiom has likely been known for
a very long time. Figuring out how it works is left as an exercise for the
reader ;-)

~~~
cwzwarich
The code there is probably faster, because the

    
    
        mov eax, 0
    

will be handled during register renaming and not actually issued. It will also
eliminate the partial dependency of SETE on the other bits of EAX.

------
chrisaycock
Discussion for Part 1:

[https://news.ycombinator.com/item?id=15821899](https://news.ycombinator.com/item?id=15821899)

------
wyufro
I really feel it doesn’t make much sense to generate assembly any longer. It’s
much more reasonable to generate IR (for example for llvm). Then you get free
optimizations and at least some portability. It’s probably easier as well to
not keep track of registers.

~~~
tptacek
If you're writing the compiler as an exercise, skipping codegen means you're
missing out on a lot of really good stuff.

~~~
wglb
Code generation for me is more fun than the rest of it, as you are likely
dealing with an NP complete problem.

------
orodley
Shameless plug - I wrote my own self-hosting C toolchain (compiler + libc +
assembler + linker):
[https://github.com/orodley/naive](https://github.com/orodley/naive).

It was a really fun project, and I learned a lot from it.

------
remcob
The explanation for bitwise complement ignores the word-size which matters a
lot here. For example ~4 = 251 in uint8, not 3 like in the example.

~~~
norax
This is a good point, thank you for bringing it up! I added a footnote about
it: [https://norasandler.com/2017/12/05/Write-a-
Compiler-2.html#f...](https://norasandler.com/2017/12/05/Write-a-
Compiler-2.html#fn2)

