Hacker News new | past | comments | ask | show | jobs | submit login

I never took a compilers course in college and came to regret that. Ten years later I started reading the Dragon Book, and realized how I could trivially modify the NFA regular expression search algorithm to solve a problem I’d seen in digital forensics and eDiscovery. The writing is extremely clear and easy to follow, and I now tell all interns to be sure to take a compilers course before they graduate. Well-earned!



It was just a few days ago when I mentioned how often I see the dragon book held up as a punching bag. It seems that at some point the book, along with K&R, have become the target for unsubstantiated criticism. I've pressed for details, and I've seen others do the same, but it never ends in anything concrete—just an implicit appeal to the new conventional wisdom that they do more harm than good. Reminds me of how "truth" gets synthesized on Reddit.

Here's a counterexample: Russ Cox has a series on regular expressions, beginning with Regular Expression Matching Can Be Simple And Fast <https://swtch.com/~rsc/regexp/regexp1.html>. In the last several years, it has become very well-received. In it, he lays out the case for Thompson NFAs, and also laments in some places about how the approach is not more widely known, having become a sort of lost knowledge. (See also Jonathan Blow's <https://www.youtube.com/watch?v=ZSRHeXYDLko>.) For all the criticism of the dragon book, though, you'd think that would mean enough people have read it that someone would have pointed out that Thompson is cited in Chapter 3, where his approach is described in some detail.

I think that what's actually going on is that, like The Art Of Computer Programming, the dragon book is widely referenced and rarely read. Now, enter the 21st century meme factory.

Having said that, the most recent comment I saw that was being casually dismissive of the dragon book—ironically for its "second-rate methods" that have "held back" compiler development—did motivate me into digging the book out, which itself led to me to spending an hour or so chasing down the source of McCarthy's "ho, ho, you're confusing theory with practice" quip, since the origin story of Lisp's eval is also told in Chapter 11 of the dragon book. <https://hypothes.is/a/pQs39pAhEeupnAuO5qfOeQ>


As others noted, the Dragon Book spends far too much time on parsing, and not nearly enough on optimizations. To be fair, the first edition of the book is from 1986. It looks like the 2nd edition (2006) adds 3 chapters on data flow analysis, parallelization. I don't happen to have that version of the book so I don't know how good the material is.

I took compilers in grad school in the early 90's. Even then, we mostly skipped parsing because that was considered covered in another class (Theory of Computation, a class that covered regular expression, finite automata, Turing machines, P vs NP, those sorts of topics).

The professor spent as much time as he could on optimizations, since that was his research focus. He then went onto write his own compiler book (Engineering a Compiler; Cooper & Torczon) so you can compare the table of contents (https://www.elsevier.com/books/engineering-a-compiler/cooper...) and see what current compiler researchers feel are important topics to cover in a textbook.

Not throwing the Dragon Book under the bus, but it probably is in the "cited widely but rarely read" category as you have noted, just from its position as being first/early.

An anecdote from me - I had the occasion to write a parser for a work project a few years ago. Rather than reach for flex/bison or any theory I had from compiler courses... I went with the new-to-me method of a parser combinator. This was awesome and I might even describe as fun. Granted the target language I was parsing was much simpler than a programming language, but I hope that is covered in today's compilers courses. I remember really tedious parsing homework problems back in the day.


> As others noted, the Dragon Book spends far too much time on parsing, and not nearly enough on optimizations.

If you actually want to specialize in writing compilers, that may be true.

But most people who study compilers don't write compilers. Instead, the time they spend studying compilers will help them to better understand programming language design, which will make them more effective at programming anything.

And for that, it is far better to spend lots of time on parsing than on esoteric compiler optimizations.'

> (Engineering a Compiler; Cooper & Torczon)

I bought that book, but didn't get much out of it because it concentrates on compiler optimizations and seems to be truly targeted for an audience of compiler specialists. Instead, I chose Programming Language Pragmatics by Michael L. Scott, and went through nearly the entire thing with a study group over about 10 months.


> esoteric compiler optimizations

The D programming language design is based in part on my experience with designing and building compiler optimizers, including the first data flow analysis C compiler for DOS.

It's why D has, for example, immutable types rather than just const types. (You can't optimize C const pointers because another mutable reference to the same value can change it.) D's contracts are also designed with an eye towards enabling optimizations.


> Instead, the time they spend studying compilers will help them to better understand programming language design, which will make them more effective at programming anything.

What makes you think that? especially 2nd part, but actually both


I can confirm that the second edition gives a quite-good treatment of data flow analysis. Not as thorough as you'd find in a proper program analysis book (where Nielson is the usual choice, though the recent Rival/Yi book on abstract interpretation is fantastic IMO) but it's well-written and at around the right level of detail for a compilers course.

Source: PL PhD candidate, used (parts of) it to teach a graduate compilers course.


> To be fair, the first edition of the book is from 1986.

The first edition of Compilers: Principles, Techniques, and Tools* is from 1986. The first edition of the dragon book wasn't Compilers, it was Principles of Compiler Design (1977). The 2nd edition of Compilers is the 3rd dragon book.


Those would be the green dragon book (1977), the red dragon book (1986) and the purple dragon book (2006). Maybe it's time for a blue dragon book?


My impression (it's been a very long time since my compilers class) was that it didn't spend too much time on parsing; rather, it spent too much time on how to implement parser generators. I've spent a lot of time parsing things in my life, but rather much less on the details of LR parsers.


I was doing compiler design back in the late 70s/early 80s and writing your own compiler generator was quite common at the time - it wasn't until things like bison/yacc became available which not only did an excellent job but also allowed one to hack on them and/or the result (for example my Algol68 compiler needed to be able to be able to decide how to resolve some shift/reduce conflicts on the fly ... that's how you do changeable operator precedence)


Couldn't it be said that the first edition of the Dragon Book is an indication of how far ahead of their time Aho and Ullman were at that point?


Well, it was basically the only book out at that time. I remember a few others in the 1990s - Compiler Design in C; Holub, and later after I graduated Modern Compiler Implementation in C; Appel).

Hey something cool I found while googling - Holub's book is available as a free download from the author! (https://holub.com/compiler/)

The Dragon Book has flaws but it is also the case where the pioneer catches all the arrows. But the OP asked about unsubstantiated criticisms of the book so I added in the one I remember from my courses - mostly too much on parsing, not enough on everything else. The 2nd compiler course I took didn't even use a textbook, Dragon book or otherwise; it was a handful of papers and a semester-long project on SSA and associated documentation.


I took a professional development course in C from Alan Holub in Berkeley that I think was the base material from that book, ages ago. I can say that a room-full of talented people worked really, really hard to keep up the pace in that course. Like some other classes at UC Berkeley proper, surviving the course says something about you! It was all in portable C, and rigorous.


I took a course in compilers from Hennessy and Ullman at Stanford in the 80's. The course material was a series of papers written by them on various techniques, including a lot of data flow stuff.


> To be fair, the first edition of the book is from 1986. It looks like the 2nd edition (2006) adds 3 chapters on data flow analysis, parallelization.

So this criticism seems to say "well, it was written too early."


Or possibly, "needs to update more often than every 20 years".


Well, the two editions were only 10 years apart.


> the first edition of the book is from 1986

My copy says 1979 3rd printing. First edition was 1977. It had a different title then, "Principles of Compiler Design".


> Dragon Book spends far too much time on parsing

I've thought this before, but I think it's a shame that parsing is lumped in with compiler courses. Parsing is such a huge subject, and I'd say that the Dragon Book only lightly touches on parsing (because it's a compiler book afterall).

CS course designers of the future - separate compiler courses into two separate subjects 1. Compilers, and 2. Parsing!


> I've pressed for details, and I've seen others do the same, but it never ends in anything concrete—just an implicit appeal to the new conventional wisdom that they do more harm than good.

Really? Merits of the criticism aside (I don't know anything about compilers, so I'm not capable of judging), the received wisdom seems to be that the Dragon book spends far too much time on parsing and not enough time on other, more important aspects of modern compiler development.

See, for instance, this blog post by a professor about teaching a compilers course: http://danghica.blogspot.com/2020/04/teaching-compilers.html.

Or this blog post (and the one it links in the first paragraph): https://semantic-domain.blogspot.com/2020/02/thought-experim....


Exactly. I don't have the book in front of me, but I remember its typechecking section being hilariously short and imperative focused. Also in general the book doesn't focus on implementation at all. That's okay in some respects; it's probably stayed more relevant and useful due to not talking about implementation. But it makes it possible to read a bunch of the book and still have no clue how to write code to build a compiler.


I took a compilers class around the dragon book. I think the valid criticism is the sequence of the book. An undergraduate compiler class can only cover perhaps half the book.

IMHO, the number of students that would benefit from learning about AST transformation and code gen vastly outnumbers the folks that would benefit from learning about finite automata transformation (and I took a dedicated course in that).

There simply aren’t that many folks working on regex engines.


I second this based on my experience. My masters thesis was in code generation for Haskell where in I spent fair bit of time learning tree pattern matching, AST manipulation, register allocation etc. One can achieve a non trivial speed up by understanding these different phases of code generation.

Another reason, in the context of compilers, is that there’s not much to be gained by micro-optimizing lexing and parsing. After all, a code is compiled just once and executed multiple times. So you know where much of the fun lies. The undergrads unfortunately usually miss out on the meatier part of the compiler course.


It used to be the case that lexing was the hottest part of a compiler, because it had to touch every byte of source and subsequent phases did not have so much data to crunch. But that stopped being true in the 1990s as optimizing compilers became mainstream technology.


> There simply aren’t that many folks working on regex engines.

You'd be surprised: there's even a dedicated annual international conference on implementation of automata that has been going since 1996: <https://link.springer.com/conference/wia>

There are constantly new algorithms, libraries, tools and applications around regular languages and regular relations (an extension of regular expression that is more elegant, symmetric, expressive and takes a lot of the ugliness of regexes out). Simplifying somewhat, finite state transducers are NFAs "with output".

See e.g. FOMA at <https://fomafst.github.io/>.


> An undergraduate compiler class can only cover perhaps half the book.

When I took a compilers course that used this book, the professor had his own syllabus and would use the dragon book in support of his plan. We jumped around quite a bit and certainly didn't use all of the book.

That course was one of my favorites and had a huge impact on me. I haven't written any compilers, but have used a lot of the concepts (especially state machines) over the years.


> There simply aren’t that many folks working on regex engines.

Yes that's true but there are A LOT of leetcode/hankerrank problems that are trivially solvable when one has a solid grasp of DFA and state machines in general!


They are good books. (I started the first one when writing a compiler).

The valid criticisms of them relate to improvements in parsing, and in optimization, which has progressed significantly in the years that they were written.

Fun fact: On the compiler we wrote, Jeanne Musinski PhD, a student of Ullman was on our team. She wrote the parser, and actually discovered an improvement in error recovery during our project and published a paper after conferring with Ullman.


I also see this sort of commentary regarding lex and yacc, the programs used in the dragon book. I always muse that the commentators probably have no idea how much software they currently use that, unbeknownst to them, has lex/flex and/or yacc/bison dependencies.


>I think that what's actually going on is that, like The Art Of Computer Programming, the dragon book is widely referenced and rarely read. Now, enter the 21st century meme factory.

In my opinion this book is "hard to read" in the sense it's dense in math formality

There's a lot of good concepts but I bet you could write it times shorter.


Although I really enjoyed compilers in college I never thought I would build one for practical reasons.

I ended up using the knowledge for two very interesting projects. One was a template engine in Java for turning variables into clauses in SQL queries. The other was an interpreter for PL/SQL that helps to extract procedures and functions in order to do partial updates of packages in production (the user chooses which procedures to send to production and doesn't have to update the whole package).


My first job had me write a biology protocol description to robot commands compiler. It wasn't Turing complete, but it did need to allocate and assign "registers", which were the physical spaces available for the 96 well sample plates.


Agreed, of all the CS classes I took in college, the compilers class has been the most useful in my career. Still have the copy of the "dragon book" from that class.


Weird, I remember the book as being too dense and cryptic without sufficient explanatory text for a compilers course when I originally read it...maybe I am mis-remembering. I will have to find a copy and take another look.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: