Hacker News new | past | comments | ask | show | jobs | submit login
Let's Build a Compiler (iecc.com)
247 points by _virtu on Oct 30, 2013 | hide | past | web | favorite | 56 comments



If you're looking for a modern compilers class -- including the theory of why this stuff works -- I highly recommend Matt Might's [0]. All of the notes, slides, and code are online.

I audited Might's "Compilers" this spring. He live-coded a parser that parsed with derivatives, returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)

All of his coding was done in Racket scheme. At the beginning he told us we could use whatever for the projects, but doing certain things in C++ / Java / Python / some imperative language was "like bringing a knife to a gun-fight."

The final project was a working Python -> C translator.

Really badass class.

[0] http://matt.might.net/teaching/compilers/spring-2013/

[1] http://matt.might.net/articles/parsing-with-derivatives/


> returning all possible parse trees whenever there were ambiguities in the grammar. [1] (Try getting that from yacc, or basically any other tool in existence right now.)

Any GLR-based tool can do that trivially, including recent versions of Bison (derived from yacc): http://www.gnu.org/software/bison/manual/html_node/GLR-Parse...

Ultimately I don't think this is the best way to develop parsers, particularly when you are designing the language itself (as opposed to writing a grammar for an existing language), because it gives you no hint that ambiguity exists until you actually encounter an ambiguous string (since the question of whether a grammar is ambiguous is undecidable).

I wrote more about this on my blog: http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why...


Again, this parsing is trivial if you use parser combinators of (almost) any kind.

http://en.wikipedia.org/wiki/Parser_combinator

http://www.cs.nott.ac.uk/~gmh/monparsing.pdf

Actually, when using such approach you have to fight the power of resulting parser. You have to restrict it or your parser will retain data for too long.


CMU's got a fairly nice one. It guides students as they build a series of compilers for increasingly large subsets of C, effectively from scratch. Course materials are public, I believe.

http://www.cs.cmu.edu/afs/cs/project/fox/mosaic/people/fp/co...


> a working Python -> C translator

Including generator functions? That would be impressive. Well, one could solve it by putting all locals into an heap allocated object and use the "switch over the whole function body"-trick to continue execution at the correct position.


only a small portion of a full compiler involves translators, though.


I went through this on my own back in a HS programming class! Glad to see it here.

While the teacher was walking students through how to do loops, I got permission to hack away in the back of the room on this. I ended up building a BASIC-like interpreter with a decent graphics API. By the end of the class, my project was a multi-level breakout game I'd written in the interpreter I'd written. TLDR; two years later, I talked to a girl.


You're lucky, I was doing the same things in middle school. Took me until college for that last part... :)

That same game project I started then still has been going on since then, multiple compilers and VMs written to do parts of the game. It's ended up an incredibly complex monstrosity with procedurally generated worlds and randomly altered scripts (some actions just won't be available on some play-throughs).


Is the game (or its code) online somewhere? :P


Not yet, there's been many incarnation, but none of them ever reached a playable state. I've been on a kick to restart it lately after starting to learn prolog. I've been playing with some pre-made engines to find out if any of them can accelerate the whole thing a bit. Checking out Unity right now.


Hah, well done on both counts!


How long did it take you to get through it?


Well, it was about 20 years ago and I was a self taught programmer pre-Internet days. I actually downloaded it from a BBS and had no outside references to work with.

I spent probably 30 minutes 2 or 3 times a week for a couple of months. Most of that was probably adding features to my interpreter and coding the game itself. I recall it being extremely clear to me, even without any sort of formal CS. Even things like working with the stack and recursion were clear to me at the time.


congrats. This has been on my list to read and experiment with. I was hoping my 15-year old could get into this as well.


No pressure though ;)


This series is one of the best introductions to compiler construction. It doesn't cover everything and it's 25 years old now, but it is the only guide I know of that will hold your hand as you build a working compiler from scratch.

If you have never built a compiler before, I cannot think of a better place to start.

Afterward, if you're curious about theory and advanced topics, I recommend heading to Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman (which covers a lot of theory associated with front-ends) then proceeding to Modern Compiler Construction in ML by Appel (which covers some more advanced topics and back-end stuff). Then you can continue reading about more specific/advanced topics if you like.


There is a class about compilers on coursera [1] which should be pretty good and more up to date.

[1] https://www.coursera.org/course/compilers


Seconded. My experience:

Taking this course was an great way to learn more about compilers and fill a hole in my CS curriculum. Professor Alex Aiken is a great instructor and covers a good amount of material. I learned a lot about compiler construction despite having toyed with my own compiler before starting the course. The programming assignments were particularly tough, giving me useful experience in building compilers and a great sense of achievement.

(TL;DR from my full blog post: http://dirkjan.ochtman.nl/writing/2012/07/21/compilers-on-co...)


The class is good, but very time-consuming and spends a lot of time on theory. Expect to spend at least 10 hours a week between lectures, quizzes, and the project.


This is more specific to C, but can still be applied to other areas. I always thought it was a great read. A Retargetable C Compiler: Design and Implementation

https://sites.google.com/site/lccretargetablecompiler/


A port of it to Python would probably be a better place to start. Not many people are conversant in Pascal these days.


It's easy enough to follow along. The snippets are simple enough that anyone who knows a procedural language won't have trouble understanding what they do, and from there it's pretty trivial to write the equivalent code in another language. You can get pretty close by writing C with a few helper functions.


It gives me an excuse to install Free Pascal and give it a whirl though.


That's the spirit!


I've seen Engineering a Compiler get recommended as well, have you read that one?


I have that one on my shelf. It's decent, but it doesn't cover anything that the other two don't. Compared to Engineering a Compiler, the Aho, Sethi, and Ullman book goes into more depth on the theory, and the Appel book has a better breadth of topics. It's not a bad book by any means, but I'd recommend that if you only have the cash for one book, go for the Appel book.


I just looked at the chapters, and it's kind of funny that it gets these accolades despite being written as a 80-char-width text-only file.

Edit: I don't say that to disparage it; I actually think that's an impressive accomplishment.


He's lucky he had an 80-column card! ;-)

http://en.wikipedia.org/wiki/Apple_80-Column_Text_Card


I followed this in the early 90s and had a lovely time. It helped that Turbo Pascal was my language of choice though and might not be quite so helpful now although Pascal is a pretty good pseudocode..

It's not the same but Vidar Hokstad has been writing a series for several years now in Ruby: http://www.hokstad.com/compiler/ .. and other resources aplenty: http://stackoverflow.com/a/1672/3951


Thanks for the mention... I'll be pushing out the next part in a couple of days. And I'm just wrapping up a part on register allocation.

The really great part with the Crenshaw tutorial, is that it's so cohesive and concise. It's reminiscent of Wirths compiler construction texts, but much simpler to follow.


Me to! What a blast to the past. I might look through it again.


> Pascal is a pretty good pseudocode..

Pascal is pretty much directly derived from early dialects of Algol, which is where most modern programming language syntaxes derive; it's the original block-structured syntax, as opposed to line-structured assembly and early FORTRAN and the fully-bracketed Lisp syntax (which is also block-structured if you indent it sanely).


And the reason Pascal is pretty much directly derived from Algol is that Wirth was on the Algol committee, and kept trying to get it simplified and tightened up. He then eventually went on to do Euler and Pascal (and of course later Modula and Oberon) to implement the ideas he'd had for Algol.

This is quite interesting: Pascal and its Successors - Niklaus Wirth http://www.swissdelphicenter.ch/en/niklauswirth.php


This is definitely the classic online text. However, I'm surprised no one mentioned An Incremental Approach to Compiler Construction (http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf). It is the basis of the Ikarus Scheme compiler.

Lisp in Small Pieces is also a useful book, for those interested in Lisp/Scheme. It covers much of the same stuff as in the PDF I mentioned.


I was about to mention An Incremental Approach to Compiler Construction, but you've just saved me the trouble of digging up the link. It's a mere 11 pages, but it covers a lot of ground, in small, easy-to-digest pieces, and invites you to follow along and get your hands dirty. Highly recommended.


If you like Pascal, then Compiler Construction, by Niklaus Wirth is certainly also worth a look:

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

Easy to read, concise, and good for beginners.


Email me if you'd like an HTML version of this. I have converted most of it for my own personal use (through part 11, IIRC) but don't want to distribute it publicly since I'm unclear on the copyright status.


I suggest you contact the IECC. They should be able to provide guidance on what you can share and how. It's old enough that they might allow separate hosting provided you link back to their site.


Thanks, will do.


Isn't contacting the author the natural thing to do? Some googling found a user profile for him: http://www.embedded.com/user/JackCrens


He's cool with it. Thanks for digging that up.


Let's find out.


Isn't this kindof "publicly"?


Yes, let's turn this into a semantic debate. Much more fun and productive.


It's not a purely semantic question - "in what way is announcing that you're distributing better than visibly distributing?" - I can think of a couple.


Personally this is my favorite work on compiler theory. While dragon book is definitely more comprehensive, Let's Build a Compiler is far more approachable (so approachable that I worked through it when I was only 14).

I haven't gone to the bare metal level since (as well as using parser generators), but it's a great piece of work that gives you a slight understanding of what YACC+family do under the covers (even though they are different types of parsers). I continually recommend it as a starting point for anyone who wants to learn how to write parsers.



I remember reading this when he first published it (as well as everything he ever wrote for "Embedded Systems Programming"). Jack Crenshaw is my favorite Rocket (Computer) Scientist!


Reading the text files, using vim, which were written in the 80s and talk about Borland and other old stuff, makes me... a bit sentimental.


Talking about sentimental, This made me giggle, "there will probably never come a time when a good assembler-language programmer can't out-program a compiler."

Reading that reminded me why I'll never make predictions on computing, especially on what can't be done.


A good assembler-language programmer these days starts with the output of a compiler. And then proceeds to beat the pants off said compiler by optimizing specific portions.

Not least because no language includes sufficient semantic information for the compiler to be able to safely optimize all the parts that the programmer can.

It's still mostly true - just not worth the effort for anything but smaller fragments.


Yeah, I was going to say something similar. You'll never get a compiler to be as intelligent as a human being. At best you'll get human intelligence rapidly applied by a computer to a problem. The computer will be able to do this "manual labor" (if you want to call it that) faster than a human can. However, a human being will be able to find ways to optimize for the problem that a computer will not be able to.

I realize that many people believe that computers will some day be able to truly think on a human level. I just don't happen to be one of those people.


Would there happen to be a PDF or eBook format of this?


PDF: http://compilers.iecc.com/crenshaw/tutorfinal.pdf

At the bottom of the page.


Thanks!


I read this in high school. At some point a light bulb went off and I finally got recursive descent parsers. Great series.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: