Hacker News new | past | comments | ask | show | jobs | submit login
GNU Compiler Collection Internals [pdf] (gnu.org)
168 points by rvz 28 days ago | hide | past | web | favorite | 65 comments

As I believe that there are a lot of compiler people around at HN, I would like to ask:

What would be a good starting point to learn about writing compilers? I have written a small tree-walk language (and is following the byte-code based language) following Crafting Interpreters[0], but I would like some more learning resources.

I would ideally like some online book (I prefer readable resources) that are similar style with Crafting Interpreters, but more focusing on compilers & optimizing. I have heard about the Dragon book, but it was too big and pricey :-(, I'm not sure if I can go through the entire book.

[0]: http://craftinginterpreters.com

My humble opinion: do not get the Dragon book - it puts too much focus on parser theory and implementation which is in my opinion not the hard part of compiler construction.

Most modern languages use a handwritten recursive descent parser anyway and the interesting part starts after you have the AST (code generation and register allocation).

> My humble opinion: do not get the Dragon book - it puts too much focus on parser theory and implementation

This is my strong opinion as well. Almost everyone fixates on parsing.

I've been working on an interpreter/compiler full time since 2013, so coming up for eight years. I think I've spent maybe three or four days working on the parser. It's a vanishingly small part of the job.

I wish compiler courses started with an AST, and then parsing was a separate course unrelated to compilers.

Yes. A lot of one semester compiler classes spend way too much time on lexing and parsing. Amusingly, their coverage of parsing is limited to LALR. It almost seems as if profs are thinking, if I had to suffer through LALR(1) table generation then you're going to have suffer through it as well.

By comparison, at Berkeley they recreate 61C Machine Structures every few years. Out goes the unimportant, in comes the new.

Lexing and parsing isn't even a vanishingly small part of the job. It's usually someone else's job. Someone who works on LLVM code generation for the XYZ processor is the wrong person to keep the Clang front end up to date with the latest C++ committee's standard iteration.

They should push DFAs and grammars into theory classes. That said, I do like the 2nd edition of the Dragon book, particularly the later chapters.

In my experience, compilers are taught over three semesters or so: lexing/parsing is one course (or rolled into a prerequisite), the standard compilers is the second, and advanced (graduate-level) compilers is the third course.

Berkeley just has CS164 [1] which is all in one. They haven't taught a graduate compiler section in years (decades?). Stanford is on the quarter system. They have an even more compressed single quarter compiler class, CS143 [2], but they also have a graduate advanced topics class, CS243 [3] as well. Moreover, CS243 doesn't require 143 as a prerequisite. As an aside, Berkeley is very strong at computer architecture (RISC, RISC-V, ...) while Stanford is very strong at compilers (Aiken, Lam, Ullman, Hennessy, ...)

http://www-inst.eecs.berkeley.edu/~cs164/sp19/ https://web.stanford.edu/class/cs143/index2018.html https://suif.stanford.edu/~courses/cs243/

Yes - it's probably the case, because there is nice theory behind it (DFAs for lexing and grammars for parsing).

What I can really recommend (and me and others have recommended it already countless times) is TECS: The elements of computing systems (also know as "Nand2Tetris"). Although it also covers other parts like the hardware stuff it lets you implement

    - an assembler
    - a VM
    - a compiler that compiles a simplified Java-like language to this VM
It takes a bottom-up approach, but if you go through all projects you get a really good view about the big picture. However as it does code generation via an intermediate VM language it doesn't really touch register architectures as a compilation target. Anyway I highly recommend that book.

The small amount of time spent on writing the parser might have something to do with how well-understood that area of computer science is. If nobody had ever written those books which were mostly about parsing, then maybe writing the parser would be the hardest part.

I don’t think so. Almost everyone abandons all the theory and just writes recursive descent. It's not the case that they're using it but have just come to see it as normal. They're not using it.

Recursive descent is theory. If you put someone new to parsing on the job, you'd get a mess, not RD.

In my practical experience students naturally re-invent recursive descent.

Yeah, I always write the grammar before writing the recursive descent parser. I can't imagine doing otherwise. It would be a mess.

I'd go further and say that for most people, the first time you read or write a context-free grammar is when you're trying to write a parser. So the theory is motivating the practice.

> which is in my opinion not the hard part of compiler construction

It depends on whether you are compiling C++ or not. ;-)

Parsing C++ is not that hard if you use hand rolled recursive descent parser. On the other hand hacking some parser generator framework to be able to parse C++ is hard and IMO mostly pointless busywork.

Not that hard? It's literally undecidable!


That's not what 'hard' means in this context. Requiring template instantiation (or some other interplay I'm not aware of) doesn't make parsing harder - it just means it can't be done independently.

And what's more that kind of interplay is easier outside a parser framework!

Many languages are not very pure for parsing. If you're writing a parser using lex and bison this means bending over backwards to make it work.

> hacking some parser generator framework ... is hard and IMO mostly pointless busywork

Bingo. We spent decades formalising and automating something... that was never that hard in the first place.

In context of parsing the situation when all the state machine/language theory has it's place is when you have some reason to care about compiler passes and generate output while reading input. That is, when you are not constructing some intermediate in-memory representation of whole compilation unit, presumably because you are running on system with (very) limited memory. For programming language compilers this ceased to be interesting limitation in 80's.

I would advise the tiger book instead, pick your favourite flavour of it, ML, Java or C.

There are some typos there, but it does provide a better experience developing a language and all compiler phases.


I concur, an incredible book. Simple, neophyte-friendly, yet deep. Unlike Dragon Book, which should have been called Parsing Book, this is actually focused on compiling and languages.

> I have heard about the Dragon book, but it was too big and pricey.

Don't feel bad about acquiring US textbooks through alternative means like libgen. Their exorbitant prices are an effect of the student loan cartel.

I've been picking up older (but still perfectly relevant) textbooks from half-price books for years now. I have a hardcover copy of the dragon book that I bought for $5.99.

I have written the Go arm64 and sparc64 compilers, as well as several other proprietary production compilers for several languages.

The Dragon Book is the best, for sure, and in my opinion essential material. Many people find its style too dry. It's basically a reference, not a tutorial. If you want something more hands on, I recommend this book about LCC[1]. It covers LCC, which is a C compiler. It's not a generic book about writing compilers, but I believe to be very useful anyway because it's very hands on. The current version of LCC is here[2]: Not sure if you can regenerate the book out of the current code or not (LCC is written in literate programming style, and the book used to be just the literate part).

[1] A Retargetable C Compiler: Design and Implementation (Addison-Wesley, 1995, ISBN 0-8053-1670-1)

[2] https://github.com/drh/lcc

Yes, I find that The Dragon Book to be a fundamental reference in my university days of studying compilers. Even so, many sections of it are still relevant to this day.

Also, I think I might have seen your name before whilst looking at the Solaris port of Go years back. Mind you, I'm looking at bringing Golang to Haiku (0) with similar changes like Solaris and Windows.

(0) https://github.com/golang-haiku/go/tree/golang-1.11-haiku

Cool, I am very interested in seeing Go work on all platforms!

What are you interested in learning? Compilers is quite a broad field, consider:

* Lexing/parsing

* Classical scalar optimizations

* Loop (vector) optimizations

* Code generation trifecta of scheduling, register allocation, and peepholes

* Dataflow and static analysis

* Interprocedural, link-time, and whole-program optimization

* JITs (to the extent that it differs from the above)

* Garbage collection

* A variety of advanced topics, such as decompilation, dynamic binary analysis/rewriting, superoptimization, formal verification, etc.

There's no resource that's going to cover everything well, and some of them (particularly the last topic) are going to best covered by reading the research papers that come out at the premier compiler conferences such as PLDI. The Dragon book, for example, mostly covers parsing, classical scalar optimizations, and code generation, giving you very little to go on for interprocedural optimization, for example. Allen & Kennedy's "Optimizing Compilers for Modern Architectures" is going to be a better resource for loop optimization.

If you want to get these textbooks for a cheaper price, try finding a nearby university, looking up when they offer a compiler course, and seeing if the bookstore has a used copy before the semester starts. I was lucky in that the compiler class had been cancelled immediately before the semester started, so the entire pallet of books was available, offering me my choice of highest quality used book.

I found great joy in "Compiler Construction" by N. Wirth. [1] Okay, that was more than 20 years ago.

[1] https://news.ycombinator.com/item?id=10764672

> What would be a good starting point to learn about writing compilers?

You also might find the LLVM/Clang kaleidoscope based tutorials interesting to get started having an insight into writing an LLVM-based compiler. Also Adrian Sampson has a great article in introducing LLVM for new-comers [1].

[0] https://llvm.org/docs/tutorial

[1] https://www.cs.cornell.edu/~asampson/blog/llvm.html

> You also might find the LLVM/Clang kaleidoscope based tutorials interesting to get started having an insight into writing an LLVM-based compiler.

Hmm, I've always assumed that the tutorials were for the people who already 'know' compilers enough, it's great that it's more accessible than I thought.

Definitely will going to look through them. Great resources :-)

If Haskell is your thing, this may be of interest


Best book in my opinion: Compiler Construction Using Java, JavaCC, and Yacc, IEEE/Wiley, 2012

Please look at my past comments[0][1].

After that, read the Dragon Book as others had recommend. It would be smooth read after reading the first one.

[0] - https://news.ycombinator.com/item?id=18996703

[1] - https://news.ycombinator.com/item?id=10184364

Engineering a compiler is great.

The dragon book is pretty dated in certain areas e.g. SSA is mentioned once AFAIK and never used.

The classic is Crenshaw's Let's Build a Compiler. Yes, it's in Pascal. You probably don't use Pascal.


Pascal is the ideal language to learn compilers, because the language is so easy to parse. Nonetheless, it is still a real language, much closer to C than, let's say, Lisp.

I wasn’t trying to say Pascal is problematic beyond toolchain logistics. Pascal is no longer mainstream and using it probably means using a text editor or an unfamiliar IDE. Compared to more familiar languages, you’re pretty much on your own to figure out tools unless you already use Vim or Emacs. Which why people use Vim and Emacs.

Now you have three problems...without regular expressions.

FreePascal, Delphi and Oxygen, already three possibilities. And there are a couple of other variants still around.

Lazarus, Visual Studio Code, Delphi IDE, Oxygen IDE, Emacs, Vim, Eclipse as IDEs.

Exactly. 3 x 8 logical combinations. And 3x operating systems. Without Notepad, Sublime, and gedit.

It’s not insurmountable but it ain’t nothing. And it isn’t writing a compiler either. Everyone doesn’t walk in my shoes.

So how that is any different of multiple combinations of C compilers and respective IDEs/Editors?

Let's Build a Compiler doesn't use C?

Sure, and I still fail to see why going through it and using Pascal is supposed to be a problem.

The toolchain probably isn't on your computer. And the internet provides 72 options (or 99 if you consider Notepad, Subline Text, and gedit).

Why is the text editor even an issue? Anyone able to learn how to write a compiler is fully capable of deciding which editor or IDE to use.

The book isn't for people who are able to write compilers.

When I discovered Hacker News, I'd never heard of Python. When I joined, I used Visual Studio Express, or rather Expresses because each was language locked. Obviously, I ran Windows. It was still four years until I bit the Emacs bullet.

Yeah, installing and running Pascal wouldn't be a problem for me now. But I remember a time when it would have been. I remember what that was like. Half of all programmers are below average. I'm certainly one of them.

So they are in for a surprise, because even on FOSS friendly OSes programming languages aren't installed by default unless one choses either full installation or a development profile.

Pythons, Perl, and gcc are commonly part of Linux distros. C compilers have been part of Unix since K&R. They are necessary to install the system.

Hell, noobs rPi’s come with the Wolfram language too.

Being part of distros doesn't mean they are installed, specially on commercial UNIXes since compiler tools became a separate product.

Better read the history of gcc, it was commercialization of UNIX C compilers introduced by Sun, that made people start paying attention to gcc. Until then it was a largely ignored effort.

Finally I don't believe that anyone skilful enough to learn how to install Linux, is not able to install a compiler.

Just like any other programming language, so what is the problem?

I don't understand why you're comparing Pascal to vim and Emacs. You don't even need to use Pascal to create a compiler in Pascal, do you understand that? I once created a simple Pascal compiler using C, and it was a simple project, much easier than parsing any other language. You can use any language you want, even Python would work.

The best things about compilers is that they have gentle ramps for learning. If you start simple enough, https://github.com/oriansj/M2-Planet the rest just becomes just add one more thing...

It even works if you start from zero: https://github.com/oriansj/mescc-tools-seed

Crenshaw's tutorial is an excellently lucid read: https://compilers.iecc.com/crenshaw/

Crafting Interpreters is a gem. You chose a very good starting point.

If you'd like a gentle introduction to something like gcc itself before diving in to writing your own, then you may want to read through:


which gives some introduction into how a modern compiler like gcc is designed/structured. For context, the author, siddhesh is a current glibc maintainer.

If you're interested in the Dragon Book, the older edition is dirt-cheap ($3.08 as of right now): https://www.barnesandnoble.com/p/compilers-alfred-v-aho/1100...

> If you're interested in the Dragon Book

I'm def. interested, but I'm not sure if:

* it's something that I can read for a week or two, for someone to use play with compilers & things (doesn't look like it)

* and if it'll pay off, as I live in a non-US country which means cheap editions are still pricey due to shipping.

Another option is to just find undergrad course materials for a compilers course and just go through it. Lecture slides can be great; sometimes they have other reading materials too. That should get you up to speed with the fundamentals fairly quickly, and most likely that's all you're looking for.

If after that you're decently good with those and want more, you can look at grad-level materials, though in my personal opinion the materials for those can be quite a bit more dry (and painful) if you try to self-study, at least unless you find a particularly good source.

Have you tried the book "Game Scripting Mastery"? It lays out the detail steps to construct a C-like scripting language and a Virtual Machine to run on top of a customized byte-code.

Is this document under source control? I'm curious to see what's changed.

In case you're worried this is outdated, note this is for version 10 of gcc (pdf is a pre-release) as stated on the frontpage of the document. I am curious how they plan to keep this updated, I could not easily find the repository for this.

Manuals, for both users and developers, are part of the gcc source code repository.

GCC 10 is marked as "prerelease BETA" in my package manager, so I know that this PDF is up-to-date. I was looking to use it as a cheap way to keep track of new developments ;)

What were RMS contributions in the last decade or so? It seems wrong that his name is on the frontpage of the documentation and nobody elses.

Other contributors are listed near the end of the document, under "Contributors to GCC".

It's common practice to keep the original author(s), regardless of new editions.

I don't understand why you're being downvoted. They should just write "By the GCC Dev Community" and simply keep a list of authors.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact