Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: The dragon compiler book (2nd edition) is a great book
49 points by jobhdez on June 1, 2023 | hide | past | favorite | 46 comments
Hello,

I have seen a lot of people on the internet say that the dragon book is horrible book to learn compilers from.

well, I have read some of the second edition of the dragon book and I think it is a great book. The main criticism is that its too focused on parsing. Thats just like 5 chapters. I skipped these and went straight into the intermediate language, code generation, and optimizing chapters.

These chapters give a detailed account of things such as register allocation, instruction selection, and instruction scheduling. The optimizing chapters are also good.

im not claiming im an expert compiler person. I am not but compilers is something i want to learn and so i gave the dragon book a try and i really liked it.

more importantly the dragon book made me realize that compilers are mathematical systems in a way. the ast is tree and trees have mathematical properties but graphs are needed for register allocation, dataflow analysis for optimization. optimization is math problem.

So i think this book gave me a better understanding of the field and it made appreciate compilers.

i'd say if youre interested in compilers you can start with "essence of compilation" by dr. siek and get some experience writing compilers and then read the dragon book.

i just wanted to share my experience. i think the dragon book is a great book.




Besides devoting too many pages to parsing, I think compiler practitioners have low opinion of the book because it isn't useful for them. For example, the second edition claims to be updated to modern optimization techniques, and in a sense that's true, but it is useless because the book isn't using SSA. For practitioners, SSA is not optional these days, and everything about optimization in the book needs to be updated to SSA to be useful.

But you are probably not a compiler practitioner. Then I agree it is a good book to learn about compilers.


But the dragon book introduces three address code. SSA is a variant of three address code. Take a look here http://www.cs.columbia.edu/~aho/cs4115/Lectures/15-03-25.htm....


3AC/TAC is something fundamentally different than SSA. Sure, both somehow superficially constrain how you write variable assignment statements - if your intermediate representation contains such things and is a list of instructions. 3AC means that your "instructions" have <=3 operands, SSA means each "syntactic" variable is assigned at most once, that is: they are not a variable anymore. In fact, SSA transformation makes an imperative program referentially transparent (excluding any loads/stores from/to memory) and thus makes it a pure functional program (excluding memory). You can see a program in SSA form simply as a functional program where each basic block is a (nested) function and each PHI is a parameter of the function. SSA as a list of instructions/statements as in 3AC is a red herring and doesn't have any relation to why it's used in compilers/for static analysis. SSA is not a variant of 3AC, also because SSA transformation of 3AC requires the same algorithms as SSA transformation of any other common imperative IR does.

In fact SSA makes no constraints on the number of operands in an instruction/operation what so ever.


Three address code:

```

p = a + b

q = p-c

p = q * d

```

Ssa:

```

p1 = a+ b

q1= p1 - c

p2 = q1 * d

```

Given the above examples how is it not straightforward to use ssa instead of three address code using what the dragon book teaches? What is wrong with the above examples?

In fact section 6.2.4 in the dragon book it says:

“Two distinctive aspects distinguish SSA from three-address code. The first is that all assignments in SSA are to variables with distinct names; hence the term static single-assigment.” The second one is that ssa uses a function to combine two definitions


That's exactly what I mean with superficially similar. Sure, you can transform a non-3AC SSA IR into three address code or transform a 3AC IR into SSA form - as you have done in your example. However, you will need an SSA transformation/construction algorithm to do the latter, which is not that simple or straightforward, especially if you want to be somewhat optimal with placing the PHIs. You can also try to apply your existing analyses and program optimizations on the 3AC SSA - chances are high though that you'll have to adapt them to correctly handle PHIs however. If you simply treat them as a unanalyzed function calls you will likely suffer from a huge loss in analysis precision and optimization opportunities.

On the other hand, if you write your analyses and transformations with SSA form as a precondition, you can reduce complexity because SSA form is referentially transparent, giving you a lot of things like use-def chains, dead code analysis, etc for more or less free.

This is what people mean when they say that SSA is essential for modern compiler construction. That quote from the book is reductionist to the maximum possible extent and put's it in the direct comparison with 3AC, to which it only has the relation that both are transformations of IR instructions the result of both have nothing in common.

The example is misleading because it displays the SSA IR also in 3AC - which is unnecessary - and it completely omits PHIs without which SSA properties simply do not hold and for which no 3AC correlation exists at all - they are not representable. In fact, control flow can easily be built such that a PHI has any amount of operands - e.g. a switch which might need significant additional code transformation to be representable as 3AC.


My apologies for saying the following but can you share some books or papers that support your view? The example that I provided was from the dragon book. CMU uses that same example for one of its ssa slides for an compiler optimization course. This makes me think that I can I indeed lower an ast to ssa that looks like the example I gave and still exploit the ssa for optimization. By the way the examples I provided above are from the dragon book.

Perhaps you work on llvm or something professionally?

But yeah if you can please provide support for your view. Thanks


I've worked on a research project using LLVM as a backend but with it's own SSA implementation at the chair of Sebastian Hack. One of his well known scientific contributions was on optimal register allocation with SSA. The chair also published http://dx.doi.org/10.1007/978-3-642-37051-9_6 a very good and easy way to do SSA construction directly from the AST without computing dominance frontiers.

You should probably start on the Wikipedia page for SSA, which - since the dragon book doesn't teach what SSA actually is, is a reasonable starting point https://en.m.wikipedia.org/wiki/Static_single-assignment_for... Note that even though it provides many examples and other comparisons it not once mentions 3AC (and vice versa actually). Under Benefits you can find links to various optimizations some of whose pages reference what SSA helps with. None of them have much if anything to do with 3AC.

That the CMU simply copies an example straight from the book and provides no further explanation and maybe even doesn't mention PHIs (?) doesn't speak for that course at all. Especially if you take a look at the list of compilers that do use SSA on Wikipedia and note that pretty much all major programming language implementations use SSA.

Of course you can lower an AST to SSA that looks like the example (except the example is so rudimentary it's still missing the very necessary PHI functions) and you can even make it have 3AC form and still exploit the SSA properties in optimizations. It's just that the properties and the SSA form still have no real relation except that both modify your list of instructions and claiming that there's one that's somehow meaningful or that 3AC gives you anything that SSA does is at best grossly misleading by a textbook.


Well yes, but you can't use the algorithm in the dragon book to generate three address code to generate SSA, because SSA has additional constraints. The dragon book really dropped the ball here, there is no excuse why it doesn't include any algorithm for SSA construction.


The dragon book is a theoretical book. It's like any other (good) OS book out there: good source of information to know the foundations. After reading it, go out and read something more practical (which you probably won't have issues learning it since you now know the foundations)


Do you recommend a book that’s more up-to-date?


You can try this book if you want something that came out this year https://github.com/IUCompilerCourse/Essentials-of-Compilatio.... Go to the releases to either get the racket version or python version. But I mean cmu uses the dragon book second edition for a graduate level compiler optimization class.


Tiger books include SSA, GC (tracing and RC), OOP, FP,....

https://www.cs.princeton.edu/~appel/modern/


From the parsers I see in the wild (not strictly speaking about compilers here), I think most programmers should definitely spend some time studying the basics of them. It’s incredible to me how people mess up parsing even the simplest of file formats or make it super complicated.


I think people should know about parsing, but I don't think the material on parsing in most compiler books (LL parsing, shift-reduce conflicts, etc.) is particularly relevant.

I've had good experiences using parser combinator libraries. They are straight-forward to use, and to write, and expressive. Beyond that, it's seeing how to breakdown a parsing problem into a grammar that I think is the main skill.


I'm having flashbacks to a well funded startup I used to work at. People seemed to think that to how to handle having commas in data while still being able to use CSV files was an area of active research. At, like, a dozen different layers in the application we would strip commas from any and all strings, but they would still frequently slip through and screw up the entire pipeline. The complaints I made about this quickly used up what little political capital I had. The joys of `str.split ','` based CSV "parsing".


You don't need to complain for this, just fix the code in all places by calling a better csv parser that you can also write.


You'll never catch me, batman! You'll have to pry recursive descent from my cold shrivelled cobol-fingers.


Hi, my name is Weston, and I'm bad at parsing (and it hurts the whole time I do it because I'm just barely clever enough to do it badly/dangerously).

What else should I study besides the first 5 chapters of the 2nd edition of the dragon book?


Well - do you want to be good at parsing/has any need for it? Even if you write your own programming language you are much better off using something like ANTLR to generate your grammar — the advantage is a well-specified standard description of your own grammar and no parser bugs.

Though mind you, not everyone shares my sentiment here, last time I wrote that people were adamant about hand-written parsers.


An easy to reach for activity that usually involves parsing is stuff like Advent of Code, where there is almost always a parsing step.


Sorry if I misinterpreted your comment, but the last sentence reads like a snark. Why did you have to go from “the basics” to “read half a compiler book”? Most developers don’t even seem to read a single Wikipedia article on it.


For those that find it a bit light, I enjoyed Advanced Compiler Design and Implementation by Steven S. Muchnick. Which covers topics that I found much trickier.


I think the third edition of Engineering a Compiler was recently published (paperback only, WTH). I found that to be a better read IMO.


I have heard good things about this book.


I remember picking this book up at the university library in my second year because I had seen it in the movie “Hackers” and the cover looked really cool. I distinctly remember the parsing chapters especially recursive descent parsing. It was one of those light bulb “oh, this is how you do it” moments. Well worth spending a few weekends playing with that book, even if it’s somewhat dated, its like SICP in a sense more of a “mind expansion” device than anything else.


SICP was a great book too. It taught me to program, and bent my mind. I just marveled at the Beauty of it a few times.


I know nothing about compiler design or compiler parsing. How broadly applicable are the chapters on parsing? Will it make the average programmer instantly more equipped to, say, write a robust JSON parser?


I’m not convinced that it would actually be particularly good as a parsing book either.

But the average programmer should just use a CSV parser lib, why reinvent the wheel? For custom tasks learning about ANTLR is much more productive (a good parser generator)


I think you would have the knowledge to build a parser generator that then you can use to write a parser for json. But you can also also just learn how to use a parser generator. You’ll have to know a little bit about regular expressions for this


Yes.


Amen. This is will have a permanent place in my bookshelf. I actually found that implementing some of the papers was very valuable (and joyful). Not only that this stuff points you to a lot more research generated after the book. Even better is that each paper you implement imo was very self contained on its own so you can build and use as you go.


What is a good book just for recursive descend parsing? I dabbed into some compiler books but found BNF difficult.

Just some clarification, it is not difficult to understand but difficult to build from scratch. I'm thinking maybe a book that asks me to build up BNF expressions for a series of grammars of gradually increasing complexity would be nice.


Not a book, but a nice way to start with the task is to build up a simple JSON parser using recursive descent;

This [1] is a great tutorial to start with. Personally, I followed up with a standards compliant JSON parser of my own, in Python [2], and later ported it into a larger golang codebase. Practically building up multiple JSON parsers taught me to build my own DSL, using recursive descent parsing. My path admittedly is quite non-standard and idiosyncratic, but was way more fun than going through a textbook :)

[1]: https://www.booleanworld.com/building-recursive-descent-pars...

[2]: https://github.com/HexmosTech/json-rd


Crafting Interpreters was fun to read and easy to follow, even as someone with a mechanical engineering degree :)


You might want to checkout Parsing Techniques: A Practical Guide by Dick Grune et.al.

Came across it on HN but I have not read it though.


seconded.

so a lot of people say a lot of things. i like chapters 1-5 just as much as the rest, btw. it's objectively a great book, written by objectively great developers and computer scientists.

is it the best to learn from today? who can say.

you might also say we should not start the education for structural engineering by studying the Brooklyn bridge either.

but, so here we are.


But as mentioned, not even the rest of the book is relevant for compilers without SSA, so I don’t think your bridge example is relevant.


The book introduces three address code which is similar to ssa


Yea the authors of the book won a Turing award a few years ago.


Reminds me of this thread :) https://news.ycombinator.com/item?id=3754545

Since it's so old that blog post is dead now, here is an archive copy: https://web.archive.org/web/20180929002914/http://www.billth...


I make a living as a high-performance JIT compiler writer, and the dragon book is a waste of time in my opinion. There are much better books out there. If you are a beginner, start by learning Recursive Descend parsing. And then learn LLVM.


> There are much better books out there.

List please?


I had greater experience with Cooper and Torczon’s Engineering a Compiler book, I can only recommend it.


I have the first edition but do not have an intention to create compilers yet, and the book is the hardest book I have for recreational reading. I can not just open it in the middle and find anything understandable.


It’s really rewarding though when you write a compiler. I recommend the experience. But if you think about compilers are so fundamental for the whole software infrastructure. So I think compilers is worth knowing. You will understand languages better; for example you will understand how statically and dynamic languages work because you will Implement a type checker, you will understand lexical scoping better because you will compile this to assembly; and overall you will have a better idea of how your high level code looks like in assembly which could help you write efficient code.you will also get to understand how memory is laid out which will help you appreciate NGINX.so by learning compilers I think you will be appreciate more web development. Those are my thoughts anyways. I’m not saying I’m correct but that’s what I’ve noticed in my journey


"green book International Unix Environments. orange book Computer security criteria, DOD standards. book with man wearing pink shirt The Pink Shirt Book, Guide to IBM PCs. So called due to the nasty pink shirt the guy wears on the cover another book Devil book. The Unix Bible. another book Dragon book. Compiler design. large red book The Red Book. NSA Trusted Networks. Otherwise known as the Ugly Red Book that won't fit on a shelf."

Sorry, title took me down a nostalgic trip.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: