What would be a good starting point to learn about writing compilers? I have written a small tree-walk language (and is following the byte-code based language) following Crafting Interpreters, but I would like some more learning resources.
I would ideally like some online book (I prefer readable resources) that are similar style with Crafting Interpreters, but more focusing on compilers & optimizing.
I have heard about the Dragon book, but it was too big and pricey :-(, I'm not sure if I can go through the entire book.
Most modern languages use a handwritten recursive descent parser anyway and the interesting part starts after you have the AST (code generation and register allocation).
This is my strong opinion as well. Almost everyone fixates on parsing.
I've been working on an interpreter/compiler full time since 2013, so coming up for eight years. I think I've spent maybe three or four days working on the parser. It's a vanishingly small part of the job.
I wish compiler courses started with an AST, and then parsing was a separate course unrelated to compilers.
By comparison, at Berkeley they recreate 61C Machine Structures every few years. Out goes the unimportant, in comes the new.
Lexing and parsing isn't even a vanishingly small part of the job. It's usually someone else's job. Someone who works on LLVM code generation for the XYZ processor is the wrong person to keep the Clang front end up to date with the latest C++ committee's standard iteration.
They should push DFAs and grammars into theory classes. That said, I do like the 2nd edition of the Dragon book, particularly the later chapters.
What I can really recommend (and me and others have recommended it already countless times) is TECS: The elements of computing systems (also know as "Nand2Tetris"). Although it also covers other parts like the hardware stuff it lets you implement
- an assembler
- a VM
- a compiler that compiles a simplified Java-like language to this VM
I'd go further and say that for most people, the first time you read or write a context-free grammar is when you're trying to write a parser. So the theory is motivating the practice.
It depends on whether you are compiling C++ or not. ;-)
And what's more that kind of interplay is easier outside a parser framework!
Many languages are not very pure for parsing. If you're writing a parser using lex and bison this means bending over backwards to make it work.
Bingo. We spent decades formalising and automating something... that was never that hard in the first place.
There are some typos there, but it does provide a better experience developing a language and all compiler phases.
Don't feel bad about acquiring US textbooks through alternative means like libgen. Their exorbitant prices are an effect of the student loan cartel.
The Dragon Book is the best, for sure, and in my opinion essential material. Many people find its style too dry. It's basically a reference, not a tutorial. If you want something more hands on, I recommend this book about LCC. It covers LCC, which is a C compiler. It's not a generic book about writing compilers, but I believe to be very useful anyway because it's very hands on. The current version of LCC is here: Not sure if you can regenerate the book out of the current code or not (LCC is written in literate programming style, and the book used to be just the literate part).
 A Retargetable C Compiler: Design and Implementation (Addison-Wesley, 1995, ISBN 0-8053-1670-1)
Also, I think I might have seen your name before whilst looking at the Solaris port of Go years back. Mind you, I'm looking at bringing Golang to Haiku (0) with similar changes like Solaris and Windows.
* Classical scalar optimizations
* Loop (vector) optimizations
* Code generation trifecta of scheduling, register allocation, and peepholes
* Dataflow and static analysis
* Interprocedural, link-time, and whole-program optimization
* JITs (to the extent that it differs from the above)
* Garbage collection
* A variety of advanced topics, such as decompilation, dynamic binary analysis/rewriting, superoptimization, formal verification, etc.
There's no resource that's going to cover everything well, and some of them (particularly the last topic) are going to best covered by reading the research papers that come out at the premier compiler conferences such as PLDI. The Dragon book, for example, mostly covers parsing, classical scalar optimizations, and code generation, giving you very little to go on for interprocedural optimization, for example. Allen & Kennedy's "Optimizing Compilers for Modern Architectures" is going to be a better resource for loop optimization.
If you want to get these textbooks for a cheaper price, try finding a nearby university, looking up when they offer a compiler course, and seeing if the bookstore has a used copy before the semester starts. I was lucky in that the compiler class had been cancelled immediately before the semester started, so the entire pallet of books was available, offering me my choice of highest quality used book.
You also might find the LLVM/Clang kaleidoscope based tutorials interesting to get started having an insight into writing an LLVM-based compiler. Also Adrian Sampson has a great article in introducing LLVM for new-comers .
Hmm, I've always assumed that the tutorials were for the people who already 'know' compilers enough, it's great that it's more accessible than I thought.
Definitely will going to look through them. Great resources :-)
Please look at my past comments.
After that, read the Dragon Book as others had recommend. It would be smooth read after reading the first one.
 - https://news.ycombinator.com/item?id=18996703
 - https://news.ycombinator.com/item?id=10184364
The dragon book is pretty dated in certain areas e.g. SSA is mentioned once AFAIK and never used.
Now you have three problems...without regular expressions.
Lazarus, Visual Studio Code, Delphi IDE, Oxygen IDE, Emacs, Vim, Eclipse as IDEs.
It’s not insurmountable but it ain’t nothing. And it isn’t writing a compiler either. Everyone doesn’t walk in my shoes.
When I discovered Hacker News, I'd never heard of Python. When I joined, I used Visual Studio Express, or rather Expresses because each was language locked. Obviously, I ran Windows. It was still four years until I bit the Emacs bullet.
Yeah, installing and running Pascal wouldn't be a problem for me now. But I remember a time when it would have been. I remember what that was like. Half of all programmers are below average. I'm certainly one of them.
Hell, noobs rPi’s come with the Wolfram language too.
Better read the history of gcc, it was commercialization of UNIX C compilers introduced by Sun, that made people start paying attention to gcc. Until then it was a largely ignored effort.
Finally I don't believe that anyone skilful enough to learn how to install Linux, is not able to install a compiler.
It even works if you start from zero: https://github.com/oriansj/mescc-tools-seed
which gives some introduction into how a modern compiler like gcc is designed/structured. For context, the author, siddhesh is a current glibc maintainer.
I'm def. interested, but I'm not sure if:
* it's something that I can read for a week or two, for someone to use play with compilers & things (doesn't look like it)
* and if it'll pay off, as I live in a non-US country which means cheap editions are still pricey due to shipping.
If after that you're decently good with those and want more, you can look at grad-level materials, though in my personal opinion the materials for those can be quite a bit more dry (and painful) if you try to self-study, at least unless you find a particularly good source.