
An Intro to Compilers - luu
https://nicoleorchard.com/blog/compilers
======
simonebrunozzi
Ha, sweet memories.

I am 40 now; back in 2004-2006, for two years I was one of the youngest
professors in Italy, teaching "compilers and programming languages".

I still feel so fortunate to have had that experience.

To help my students save the relatively huge amount of money to buy the dragon
book(s), I created a condensed version of the parts that were required for the
course - and I didn't charge anything for it, unlike what usually happens
pretty much everywhere in Italy. They were available in PDF and OpenOffice
formats, on the website that I created for the course (yes, the CS department
didn't really have a proper website to use as CMS - I kid you not).

You can find the material here, it's in Italian but it might be fun to take a
quick look: [http://www.lulu.com/shop/simone-brunozzi/dispense-lab-
lingua...](http://www.lulu.com/shop/simone-brunozzi/dispense-lab-linguaggi-di-
programmazione-e-compilatori-informatica-unipgit/ebook/product-17491425.html)

Such good memories.

~~~
gnuvince
I was T.A. for COMP-520, Compiler Design, at McGill University for two years,
and it was the best job I've ever had. The compiler project had not been
updated in 10-12 years; my adviser trusted me to make a new, interesting
project. I was paid by the department for 5 hours of work per week, but I
poured in more than 35 per week to design, document, and implement a subset of
Go

My hard work paid off immediately. Students were more interested by a language
that was new, modern, and used in the industry than by the previous language
that was used, Wig. As a result, enrollment jumped from 12-14 students to 45.
Though it was a subset, and much of what makes Go interesting was removed
(concurrency, interfaces, GC) it was still an extremely demanding project.
Many times I thought that the project was too big, that we needed to cut back,
but the students were real troopers and chugged through (by the way, when did
20-year olds become so smart?! I was never this bright when I was their age!)
and all teams completed the project and made me very proud.

You can see the page for the first year of the class, along with relevant
documents at
[http://www.cs.mcgill.ca/~cs520/2015/](http://www.cs.mcgill.ca/~cs520/2015/)

------
bitL
BTW, can anyone please suggest a good online compiler course for creating your
own programming language with labs/exams/certs? I can find pretty good courses
for almost anything but not for compilers... Something that would focus on
handling grammar (LALR, LL, CYK etc), ASTs, semantics, types, code-completion,
imperative/OOP/functional/logical/self-modifying constructs etc. or even NLP-
to-AST conversion...

~~~
jpolitz
My compilers course has all the lectures recorded, and has starter code for
all the assignments provided. It provides all the parsers, and focuses on the
backend, including memory management and various details of value
representations.

This is just my materials, I don't have an associated platform with
verification/certification. However, some folks from HN have previously done
the course and emailed me after, and seem to have gotten some value out of it.

[https://cseweb.ucsd.edu/classes/sp17/cse131-a/](https://cseweb.ucsd.edu/classes/sp17/cse131-a/)

[https://podcast.ucsd.edu/podcasts/default.aspx?PodcastId=401...](https://podcast.ucsd.edu/podcasts/default.aspx?PodcastId=4014)

[https://github.com/ucsd-cse131-sp17](https://github.com/ucsd-cse131-sp17)

Earlier offering:

[https://www.cs.swarthmore.edu/~jpolitz/cs75/s16/](https://www.cs.swarthmore.edu/~jpolitz/cs75/s16/)

~~~
gilmi
I'm using the earlier offering of your course and another[0] to learn about
compilers for a while now and it's the only method that really clicked for me.
Thank you very much for making this open!

[0]:
[http://www.ccs.neu.edu/course/cs4410](http://www.ccs.neu.edu/course/cs4410)

~~~
jpolitz
Ben Lerner did a great job writing notes [0] for his version, which are super
helpful if you prefer reading to videos.

[0]
[https://course.ccs.neu.edu/cs4410/#%28part._lectures%29](https://course.ccs.neu.edu/cs4410/#%28part._lectures%29)

~~~
gilmi
Definitely! They are really well done and actually worked better for me than
videos. Unfortunately they only cover the first half of the course.

------
pbiggar
Pet peeve about compiler education that is not repeated by this article (yay,
well done!): the way you have to wade through months of parsing bullshit to
get to the really interesting stuff (optimization and static analysis).

~~~
gizmo686
As much as I enjoyed optimization and static analysis, parsing is probably the
single most useful knowledge I got from my formal CS education; and probably
the only part of a compiler that most CS students will ever need to know.

~~~
olewhalehunter
The symbiotic relationship between compiler development and hardware
architecture design has a defining role in CS problems, and while you don't
need to understand that relationship to write CRUD apps for a living, I think
ignorance of it robs those who really want to understand this stuff of truths
both technical and profound.

~~~
yeowMeng
Totally!

Compiler development = Finite State Automaton + Push Down Automaton + Context
Free Grammar + Intermediate representation + Static Analysis/Optimiation +
Stack Machine + ...

Hardware Architecture = Stack Machine/Turing Machine + electricity +
creativity

Stack Machine == Turing Machine

Compiler development starts and ends with Turing Machines.

Hardware architecture ends with Turing Machines operating at the speed of
electricity through metal.

~~~
olewhalehunter
There are alternatives to this mainstream style of computer architecture, such
as dataflow or hybrid quantum-classical systems, which both require a very
different kind of compiler.

[https://en.wikipedia.org/wiki/Dataflow_architecture](https://en.wikipedia.org/wiki/Dataflow_architecture)

------
bollu
I've been writing an llvm-lite in haskell to show of the main optimisation
ideas of LLVM [here]([https://github.com/bollu/tiny-optimising-
compiler](https://github.com/bollu/tiny-optimising-compiler)). Thought someone
maybe interested :)

------
userbinator
I know it might be a contrived case, but it's rather disappointing to see that
even with the -O2 optimisation level, the final generated code contains 43%
(3/7) useless instructions to mess around with RBP. I mean, it should be
bleeding obvious to the compiler that this function doesn't require any stack
allocation, much less an RBP-based one (the tradeoff between using RBP/RSP-
based allocation is far more subtle, and I can forgive a compiler for choosing
nonoptimally in that case), so why did it emit those useless instructions?

There's plenty of articles about how intelligent compiler optimisers can be,
and I don't doubt that they can do some pretty advanced transformations,
having read a lot of compiler output; and then, I see things like this, which
just make you go _why!?!?_ It's puzzling how compilers behave, applying very
sophisticated optimisation techniques but then missing the easy cases. The
effect might not be as great as more advanced optimisations, but so is the
effort required to do this type of optimisation. This is not even low-hanging
fruit, it's sitting on the ground.

 _However, the compiler will use as few registers as possible._

Compilers usually try to use all the registers they can (although sometimes
they do exhibit odd non-optimal behaviour, as described above); eschewing
register use and using memory instead will increase code size and time.

~~~
brigade
The "useless" instructions aren't for potential stack allocation, they're
there so backtraces work. If you don't care about that then you can add
-fomit-frame-pointer and they disappear.

Compilers don't default to this because having backtraces on crashes and
profiling are more useful than the like 1% speed gain from another available
register on x86-64.

~~~
userbinator
Backtraces work without that.

[http://yosefk.com/blog/getting-the-call-stack-without-a-
fram...](http://yosefk.com/blog/getting-the-call-stack-without-a-frame-
pointer.html)

~~~
brigade
Are you arguing that they _do_ work on ELF+DWARF with sufficient extra debug
info, when running under a debugger, despite documentation that explicitly
says it doesn't work? Because I certainly don't expect my users to have a
debugger installed (let alone regularly run programs under one) but I'd still
like to get useful crashlogs automatically.

Or are you saying that you could potentially reconstruct a backtrace, if
someone wrote code to do so that is smart enough to locate and account for all
the stack adjustments in reconstructing where each function stored its return
address, including runtime alloca() / VLA adjustments and multiple adjustments
in the same function (potentially needing to reconstruct the code flow), just
from the disassembly? Because the linked MIPS code is nowhere near that smart,
and to my and that author's knowledge, such code has not been written for x86.

Either way, nice in theory but not good enough in practice.

~~~
userbinator
The debugging info is enough to find, for each code address, the offset to the
return address on the stack, from which you can find _that_ offset, etc. You
don't need a debugger but saving the stack is enough to get a backtrace for a
crashed process.

------
WalterBright
One nice thing today is the source code to professional compilers is readily
available. Pick one for a simpler language, and just read the code.

~~~
sigjuice
Is there a particular language that you could recommend? Thanks!

~~~
blacksmythe
Perhaps the one he wrote?

    
    
        Walter Bright is the creator and first implementer of the D programming language... http://www.walterbright.com/

~~~
WalterBright
D is a fairly complex language. I'd start with a simpler one, like a Pascal or
Javascript compiler.

For the latter, you can look at:

[https://github.com/DigitalMars/DMDScript/tree/master/engine/...](https://github.com/DigitalMars/DMDScript/tree/master/engine/source/dmdscript)

~~~
jnordwick
I'd also say the Java compiler is fairly simple too, but it lack any real
optimization passes.

~~~
WalterBright
A Java compiler lacks optimization and code generation (generating IR is not
code generation). So does Javascript. But they are both great languages for
learning how lexing, parsing, and semantic analysis works because those
operations are relatively simple.

C is a harder one because the multiple "phases of translation" add quite a bit
of complexity. It's easy to get lost in the frustrating and pointless
complexity of the preprocesser. C++ layers onto that mixing semantic analysis
up with parsing, and an unusually complex semantic analysis. Not for
beginners.

I suggest reading the source code to a professional compiler because there are
books about how compilers work, and then there's how compilers really work.
Pro compilers tend to be built differently from academic or side project
compilers.

------
xoroshiro
I don't understand why this still looks scary to me. A while back after
learning regex (though only at a basic level) I thought, maybe I can make my
own custom C preprocessor as an exercise. Perfect right? I get to choose the
syntax AND the rules. But somehow I just can't write it. Nested ifdefs,
defines, undefs mixing together too OP.

Reading through this, it seems like there's a huge amount of work into real
compilers. Front end, optimizer, backend, etc. I do appreciate it more, but as
useful as compilers are, maybe I'll just leave it to the pros (:

~~~
le-mark
> maybe I'll just leave it to the pros (:

Don't ever do that! It really isn't that complicated at a basic level. Regex
aren't the solution for parsing. What you're missing is the recursive nature
of "context free languages"[1]. Formal language theory [2] was one of the big,
early wins of Computer Science. There's a lot to it, but writing a simple
recursive descent parser doesn't require any of it.

The big caveat is that parsing is only about 1/3 of the problem, the others as
described in this article are optimization and code generation. And of these,
optimization can be skipped entirely leaving only code generation, which can
be naively done by walking the AST. If this all seems foriegn to you, there's
a lot of really good info online nowadays ie people love writing about this
subject.

Understanding this subject is very important in my opinion, this is one of the
foundational things we do. Attaining a comfort level with compiler engineering
is one the two or three things anyone can do to really "level up" as a
software developer. Some other things are writing multi threaded servers in C,
and 3d software rasterization. Again, my opinion here.

[1] [https://stackoverflow.com/questions/559763/regular-vs-
contex...](https://stackoverflow.com/questions/559763/regular-vs-context-free-
grammars)

[2]
[https://en.wikipedia.org/wiki/Formal_language](https://en.wikipedia.org/wiki/Formal_language)

~~~
xoroshiro
>Understanding this subject is very important in my opinion, this is one of
the foundational things we do. Attaining a comfort level with compiler
engineering is one the two or three things anyone can do to really "level up"
as a software developer.

I think it depends on the level of abstraction.

Then again, I'm not really a software developer, most of the work I do is
scripting stuff for data analysis, which is where I learned regex from. I did
learn C in college which I somehow got fascinated with, but I never really do
any serious work with it. Still, over time, reading about quite lower level
stuff (compared to what I do) does seem like it helps take the mystery out of
things like unintuitive behaviors with multiple references to a Python list (I
suspect it was pointers all along). Taking it to the next lower level with
studying compilers and how assembly gets generated doesn't seem like it will
benefit me much more.

Still, I do like C enough to possibly consider doing something in it for fun
if I get an idea, so maybe one day I'll come back to this.

------
bogomipz
Wow I enjoyed both the content and the design aesthetic of this post. I guess
that's what happens what a designer turns software engineer? Please post more
like this!

------
fleshweasel
I'm a little confused as it looks like an image in the article labels a string
literal as being a comment. Am I misunderstanding something?

~~~
b4ux1t3
Note that it is underlined in yellow. The color of the underline is what
denotes it as a literal, not the color of the text itself.

------
andreasgonewild
Compiler design depends a lot on the language being compiled though, this
would be a typical design for a C-style language.

When doing Forth-style languages for example; I've found that it's not very
helpful to parse to trees, since I can't tell the structure of function calls
before seeing the actual running stack anyway. Which in turn affects the
compiler, since all it has to work with is a sequential stream of tokens.

And not all languages are stand-alone, or compile all the way down to machine
code. Compiling to byte-code for running immediately, with the backup of a
runtime library; is an attractive option for more specialized languages; and
this means that it's possible to carry more information straight over from
compilation to runtime.

I've been working on a scripting language that raised some of these questions
lately:

[https://github.com/andreas-gone-
wild/snackis/blob/master/sna...](https://github.com/andreas-gone-
wild/snackis/blob/master/snabel.md)

------
jacquesm
If this stuff interests you, even though it is quite old you really should
read the 'dragon book':

[https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools)

~~~
tmccrmck
We've rehashed the debate over the utility of the dragon book in the 21st
century so many times on HN that I'm not going to comment on it's usefulness.
(Search if interested)

But, for those interested, I highly recommend _Engineering a Compiler_ by
Cooper and Torczon OR _Modern Compiler Implementation in ML_ by Appel. I've
read all three and would only recommend the dragon book to someone very
interested in the subject.

~~~
timthorn
Appel's book is also available in many other languages, if the thought of ML
is offputting.

~~~
sigjuice
Why would the thought of ML be off-putting?

~~~
timthorn
If you're unfamiliar with it but want to learn about compilers, a text in a
language you know will decrease the cognitive load.

------
kp25
I'm usually scared to look at these compilers titled articles, feeling I do
not understand or learn anything from it.

This article is a very good intro to compilers and now I understood how these
things work. Thanks for the article.

------
JustSomeNobody
I love reading about compilers. It's fascinating. I have implemented my own
PL/0 compiler in various languages as a way to learn those languages. It's fun
without having to think about designing a new language.

Question: Does anyone know of some resources for adding concurrency to a
simple stack based VM? I've been wanting to look into that for a while just
for kicks.

So, basically, take something like the machine here:
[https://en.wikipedia.org/wiki/P-code_machine](https://en.wikipedia.org/wiki/P-code_machine)

and add concurrency to it.

------
Tehnix
> Some compilers translate source code into another programming language.
> These compilers are called source-to-source translators or transpilers

I was wondering how compile-to-JS languages would be classified. Technically
they are all transpilers, but seeing as there is no lower-level for the
browser than JS (barring WebASM), one could also argue that they are in fact
also compilers.

Anyone have any thoughts on this?

~~~
pbiggar
Transpiler is a recently made up word, and describes something that's just as
well described as a compiler. I wouldn't spend too much time hoping for a
distinction here.

~~~
chrisseaton
> Transpiler is a recently made up word

People keep saying this, but it's not true. You can find the word being used,
in its modern meaning, as far back as 1964.

[https://academic.oup.com/comjnl/article/7/1/28/558689/The-
co...](https://academic.oup.com/comjnl/article/7/1/28/558689/The-
communication-of-algorithms)

~~~
sigjuice
Page 35 of this paper has this sentence.

 _One can also envisage a "transpiler" capable of converting a program from
one such style into another with considerable generality._

Are there any other known citations of the term "transpiler"?

~~~
chrisseaton
Yes, there is also this book from 1999, which talks about a Pascal to C
'transpiler' around as early as 1989.

[https://books.google.co.uk/books?id=vKQSBwAAQBAJ&pg=PR8&lpg=...](https://books.google.co.uk/books?id=vKQSBwAAQBAJ&pg=PR8&lpg=PR8&dq=smith+yen+p2c&source=bl&ots=-eQE5Rs2ln&sig=4IljPIYoAYteUfapKJbmTe3tqnE&hl=en&sa=X&ved=0ahUKEwiHo-
aN-9bVAhXDbVAKHf4TAcgQ6AEIKDAA#v=onepage&q=smith%20yen%20p2c&f=false)

~~~
sigjuice
The actual author of p2c has the good taste to not call it a transpiler.
[https://schneider.ncifcrf.gov/p2c/historic/daves.index-2012J...](https://schneider.ncifcrf.gov/p2c/historic/daves.index-2012Jul25-20-44-55.html)

------
codingbear
Can anyone help me understand what kind of employment are available in the
market which requires deep knowledge of compilers?

I love this subject, but I don't know to which domains it is applied on
outside of compiling languages like Javascript/JAVA/D/C/Rust and so on on into
Machine code.

~~~
147
Steve Yegge's rant opened my eyes to how useful knowledge of compilers can be:
[http://steve-yegge.blogspot.com/2007/06/rich-programmer-
food...](http://steve-yegge.blogspot.com/2007/06/rich-programmer-food.html)

------
yazilimci_adam
Thanks. I liked this blog post. I've questions. Why should I prefer LLVM? What
is LLVM and how it is different from GCC. I understand, every compiler has
three parts. Front end, Optimizer and Back End. Looks simple.

~~~
legulere
I haven't looked at the GCC code myself, but what I heard is that you can see
that it's a codebase that grew massively over a large timespan and things that
should be separated are not. At least a few years ago there were actually two
different IR formats (the newer one being based on SSA) which are converted
between each other several times. I think GCC's IR does not have a canonical
text representation, this is really helpful for understanding and debugging in
LLVM. Still also LLVM has its weird parts, like their variable naming scheme

------
CalChris
This was really good. I could imagine it being required background reading for
a lab of the compilation pipeline in 61C at Berkeley. Add in a section on
linking.

~~~
sitkack
Levine's "Linkers and Loaders" is available online
[https://www.iecc.com/linker/](https://www.iecc.com/linker/)

------
lmirosevic
Can anyone point out to me where the linker fits into the equation?

