
Learning to write a compiler - siim
http://stackoverflow.com/questions/1669/learning-to-write-a-compiler
======
Hoff
Here is an earlier incarnation of the "writing a compiler" thread:

<http://news.ycombinator.com/item?id=1608129>

~~~
silentbicycle
Oops, I downvoted you when I meant to upvote. There is probably another
incarnation or three before the one you linked, too.

I'd recommend Appel's "Modern Compiler Implementation in ML" over the dragon
book.

~~~
mitjak
I noticed there are a few versions, including for Java and C. Any reason why
the ML version is better (not against ML; just don't know it well enough (at
all actually)).

~~~
silentbicycle
A very large part of compiler operations involve analyzing and transforming
tree/graph data structures (once you've parsed to an AST), so you want a
language with good native support for those structures. Pattern matching and
garbage collection help immensely, as do ML's inferred static types. Working
with complex data structures is really where ML shines.

It's not just ML, though - using a language where you can work directly with
native tree structures (ML and Haskell's type constructors, Lisp sexps, Prolog
functors, Lua tables, or even JSON) means that you can postpone learning
syntax/parsing and _start_ with the language's semantics. You can save the
syntax details for when you have the language design worked out. Compiler
books based on C (such as the dragon book) spend such a long time on parsing
upfront because they don't have much choice.

Also, Andrew Appel works on SML/NJ, and the C and Java books are translated
from the ML version. I've looked at the C version (which uses lex and yacc
instead of ml-lex and ml-yacc, mallocs but doesn't free until later in the
book, etc.), and while it's still a good compiler overview book, I'd strongly
suggest getting the ML version instead. The ML version uses Standard ML (SML),
but I've only used OCaml and didn't have trouble following it and making the
minor adaptations along the way. (I haven't looked at the Java version.
Whatever.)

------
mapleoin
I'm upvoting this just to hopefully start a discussion and hear more about
this topic from fellow HNers.

I hear the Dragon Book mentioned almost daily around here; I'm intrigued. Do
you think this book could be read by someone who's not actually interested in
writing a compiler? Does the book stand by itself? I'm interested in reading
it, but I don't really think I have the motivation to start writing my own
compiler (even a mock one).

~~~
pbiggar
DO NOT READ THE DRAGON BOOK.

It's not a very good introduction to compilers. Read Engineering a Compiler by
Cooper and Torczon. The Appel book is also very good, and contains some stuff
about functional and logic languages that are generally missing from most
compiler texts. If you're an enthusiast, but not in it to build a compiler, I
really enjoy Programming Language Pragmatics. For more advanced material, use
the Muchnick book, or the Compiler Design Handbook (both editions have
different materials). They also provide excellent pointers to literature, but
aren't great for beginners.

I get the impression that most people who recommend the Dragon book haven't
read it. When I was doing my PhD I had five or six books on my shelf, and was
constantly unimpressed with the Dragon Book, but always impressed with
Cooper/Torczon, Muchnick and Appel.

~~~
ahn
>I get the impression that most people who recommend the Dragon book haven't
read it.

These days you can usually tell that they haven't read it if they refer to it
as "the Dragon book" and don't say anything about the edition. The same thing
goes for "Knuth", "the Appel book", etc.

------
okmjuhb
The interesting thing about compiler writing is the drastic reduction in
complexity that's happened. There are essentially off-the-shelf tools that can
be used to handle everything before semantic analysis and everything after
conversion to a reasonably high-level IR. This has dramatically reduced the
cost of language experimentation; it's now feasible to bang out a new language
in a day or two (assuming you're familiar with yacc, LLVM, etc.), and generate
efficient code on a number of target architectures.

These tools mean that "learning to write a compiler" can take several
different paths; on the one hand, we could learn about how each of these tools
work (e.g. "how to write a parser generator", "machine-independent
optimizations", "code generation", etc.). This is the approach taken by the
dragon book. This gives the background theory and mathematics behind compiler-
writing (and will likely be something people need to know if they want to
write for instance an LLVM optimization pass). The problem with this is that,
because of the trends mentioned above, it has little to do with the day-to-day
of actual compiler writing and language experimentation (this is only somewhat
true, since most "real" compilers will have a custom-written frontend and at
least a little bit of knowledge about the target architecture).

The other approach is to use pre-written tools, to focus on language design
decisions and applications. To be honest I'm not familiar with any resources
that do this particularly well. This is the approach taken by most of the
"we're going to build a compiler!" websites mentioned on the Stack Overflow
post. This approach is probably more useful for most of the people who want to
"write a compiler", but it leaves people with a very shallow knowledge of
what's happening beneath the surface of the APIs they use.

I don't mean to imply that one of these approaches is "better" than the other
(and indeed, the second requires a little bit of the first - it's difficult to
debug an automatically generated parser without knowing at least a little bit
of the theory), but it's important to know which approach you're aiming for,
and trying to optimize based on that goal. Going to the Dragon book as a "how-
to" is a disaster waiting to happen.

~~~
pjscott
I think that combining this short compiler tutorial with LLVM would be a very
effective way for someone to get comfortable with basic compiler writing:

<http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf>

Afterward, learning how to do lexing and parsing would be a good addition. The
tutorial covers a subset of scheme, which is trivially simple to parse,
especially if you're writing the compiler in scheme as well.

------
wingo
The top comment has a remarkably good list of references. I like the nanopass
and incremental papers a lot, RTT is a classic, and there were several good-
looking references I had never seen. Almost makes me feel some jealousy at the
pleasant vertigo the original poster must feel :)

