
So You Want to Be a Compiler Wizard - samsymons
http://belkadan.com/blog/2016/05/So-You-Want-To-Be-A-Compiler-Wizard/
======
tbirdz
I left this comment on the author's page too, but I've copied it to HN as
well:

I'd also suggest learning and writing your own Forth. The Forth language is
very simple to implement, simpler even than Lisp by a long ways, I'd say. If
you know how to make a stack, (well two stacks), then you can make your own
Forth Interpreter. Going from the Interpreter to the compiler is also easier
than in other languages.

There is not really much syntax to speak of, and most of the languages's
features, even the comments, can be implemented in the language itself. Of
course for speed reasons you'd probably want to put the langauge forms into
highly optimized assembly forms or something, but you don't have to do that
right away.

There also isn't much of a runtime system to implement at all. With Lisp
you'll have to write a garbage collector, with Forth, you just need to be able
to allocate a slab of memory from the OS, maintain the stacks, and the word
(like a function in another language) and variable dictionaries. Oh, and
including a copy of the compiler to run in interactive mode, but you'd need to
do that in Lisp too.

Also, writing a Forth compiler lets you jump right into advanced
optimizations. Really even just aggressive inlining and basic optimizations
will get you a huge speed increase, which is a great motivational boost. The
Reverse Polish Notation style stack form of the language is great for matching
optimization patterns with as-is. I know SSA is pretty much THE technique to
use for register based languages, but you can get great results from Forth by
optimizing as is instead of converting to SSA.

I really think Forth is a great language to learn compilers with. It really
lets you skip ahead to the interesting parts of compilers, since the language
doesn't require complicated parsers, or runtime systems.

~~~
prestonbriggs
I disagree. Forth is cute and you'll learn some stuff, but you won't become a
compiler wizard. You _need_ to learn how to do parsers and run-time systems
and all the rest.

Besides, it's fun!

~~~
tbirdz
I see what you're saying, but respectfully I disagree. The importance of
parsers and run-time systems in compilers really depends on the language you
are compiling. There's a big spectrum there.

For runtime systems, on one hand you have languages where they are very
important, such as runtime garbage collectors in Lisp, Java, and similar
languages, Prolog's runtime logic backtracking, runtime exception support for
C++ but no garbage collection, down to languages like C with no garbage
collection and minimal runtime. Forth is certainly on the lower end of the
compiler runtime implementation spectrum.

For parsing, there's a similar spectrum. There's languages with hideously
complicated grammars, like C++, less complex ones like Lisp, and probably
again on the simplest end of the spectrum is Forth.

So the requirement of needing to learn to write complex parsers and runtime
systems is dependant on the language. I'd certainly classify a C compiler
author as a "wizard" even though they wouldn't have to write a garbage
collector for their C compiler, and a Lisp compiler author as a "wizard" even
though their parser is much less difficult to create than a C++ language
parser.

Of course, it's not like wizard is an exact term or anything. If you're saying
a "compiler wizard" should be able to understand how to implement any
language, then of course you need to have in-depth knowledge of parsers and
run-time systems. Me personally, I'd say if you can write a compiler for any
language you're somewhere on the compiler magician spectrum. Maybe not full
wizard, but at least a compiler warlock or enchanter or something (this is
getting silly now, I don't really want to argue about the semantics of the
term wizard applied to compilers).

~~~
mcbits
I'm far from a Forth wizard, but I've often heard it compared to a flexible
assembly language for a stack machine. It seems like it could be the
foundation for a visual programming language based on linking word blocks end-
to-end, where you could zoom into user-defined words to get to the definition,
etc. The simplicity of the language would/could prevent it degenerating into a
tangled web of connections as when most languages are visualized.

------
UK-AL
Much more useful to learn about constructing NFA/DFA automatons from 'pure'
regular expressions, then say just knowing regular expression syntax off by
art. This is basically gets you to understand regular grammar, probably the
easiest formal grammar to understand. If you understand state machines, you
know what you know what regular grammars are capable of.

Where all the different types of parsing methods? Recursive Descent, LL
Parsers, LR Parsers, LALR parser. Being able to understand production rules?
Chomsky hierarchy of the formal grammars?

Then converting that into AST -> Optimization(so much stuff here) -> Code
emitter.

~~~
unimpressive
Coursera/Stanford Online's compilers course covers all this.

[https://class.coursera.org/compilers/](https://class.coursera.org/compilers/)

------
ori_b
This is... rather awful. I'd suggest taking a look at the reading list here
instead:

[http://c9x.me/compile/bib/](http://c9x.me/compile/bib/)

This article goes through a number of unrelated exercises, mentions a few
hello-world level introductions, and then skips to a more or less unrelated
introduction to open source, with a rather bizarre criticism of people who
contribute to open source without getting paid for it.

If it was written for 'becoming a C wizard', I would imagine that it would
read something along the lines of:

    
    
       - Say 'hello' a few times
       - Write some words on paper
       - type 'print "hello"' into a python prompt
       - Work through the "hello world" program in K&R
       - You're a terrible person if you learn C without getting paid.

~~~
joe_the_user
Yes,

In fact, it's even worse than that. You can become one kind of "C wizard" just
by writing a bunch of programs since a lot of programming is fairly clear once
someone understands function calls and for-loops.

But understand compiler construction at a reasonable level requires some
specific set of skills that don't come from just messing around.

------
xigency
These are definitely some interesting ideas on breaking into a new discipline.

The little projects list is great and it's definitely good motivation to start
improving skills.

One problem I have with this article though, is not with the idea of becoming
a wizard at something, which is a great way at looking at things, but rather
the idea of joining a special group.

Personally, I have a bit of an issue with using LLVM to get started. Maybe
this isn't fair because I have literally no experience with LLVM outside of
clang, but I think part of the idea of becoming a "wizard" means starting from
scratch. Obviously, it's not helpful to point someone in the direction of
impossible tasks to get them _motivated_ , but the direction this heads
quickly turns toward not becoming a leading independent thinker but becoming a
mindless bug fixer on another project. I don't think you can learn to write
compilers by fixing other people's bugs.

It turns to: solving bugs, getting involved in open source, jumping on
_someone else 's_ projects, and most of all projects where others are best
suited to gain from the developments. It reminds me of the flowchart "Should I
work for free?" which programmers should really pay attention to.

It's great to work with others, to collaborate, and to appreciate and
understand other's technology; however, computer science and programming are
rather miraculous in the amount of distance an individual can travel on their
own, and being forced to jump on a bandwagon or join a community seems to be
demeaning in the end.

~~~
chrisseaton
What are you suggesting then? Learning about how compilers work with no
exposure to any existing compilers?

~~~
xigency
I think it's great to look at existing compilers, but probably better to look
at past compilers. I stumbled upon the first edition of cc the other day:
[https://github.com/mortdeus/legacy-cc](https://github.com/mortdeus/legacy-cc)
It's also helpful to look at the overall structure of something like LLVM or
to study the internals.

What seems totally useless, though, is to fix bugs! Compiler writing is a very
high level field. Fixing bugs on a database server provides equal experience
to fixing bugs on a compiler to compiler writing: which is being able to write
bug-free code in the given language. So it seems the author is trying to
advertise for free labor.

Also, in the first example

    
    
        Learn regular expressions. This isn’t ... how real compilers work, but regular expressions...
    

This is a good example of why practical experience isn't as valuable as
knowledge, and that's because the class of regular languages are crucial to
compiler writing! So yes, that is how they work, especially for parts and
passes of tokenization and parsing.

The best advice, which isn't given here, is to read! There are many, many
great resources on all of the ins-and-outs of front end, backend, optimization
and all the other facets of compiler writing, and there are very few new
ideas. Most of the research dates from the 60's and 70's. It's better to be
well read on compilers than to have actually written one, depending on the
circumstance.

Also, many people always ask for resources on compiler writing, but if you
spend even a little time looking for books you will find tons.

------
_RPM
> Write snprintf in C

Another application of the `snprintf` function is `PyArg_ParseTuple` from
CPython. [0] It's not exactly printing formatted output but parsing the args
from the `va_list` based on the format. You pass it pointers and it assigns
values to the pointers.

[0]
[https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple](https://docs.python.org/2/c-api/arg.html#c.PyArg_ParseTuple)

~~~
akkartik
> Write snprintf in assembly

See the implementation of _interpolate_ in my Assembly-like language:
[http://akkartik.github.io/mu/html/070text.mu.html](http://akkartik.github.io/mu/html/070text.mu.html)
(search for 'interpolate'. There's also unit tests: the scenarios after the
definition)

More details: [http://akkartik.name/post/mu](http://akkartik.name/post/mu)

------
sebg
Whenever compilers come up, this Steve Yegge article comes to mind ->
[http://steve-yegge.blogspot.com/2007/06/rich-programmer-
food...](http://steve-yegge.blogspot.com/2007/06/rich-programmer-food.html)
that starts: "Gentle, yet insistent executive summary: If you don't know how
compilers work, then you don't know how computers work. If you're not 100%
sure whether you know how compilers work, then you don't know how they work."

~~~
joe_the_user
_Gentle, yet insistent executive summary: If you don 't know how compilers
work, then you don't know how computers work. If you're not 100% sure whether
you know how compilers work, then you don't know how they work._

As opaque and arrogant as that sounds, the statement is actually fairly
accurate.

The way I'd try to put nicely is that compiler construction involves some
conceptual barriers that make constructing an interpreter or compiler
different than other programs, even other complex programs.

The main difference is that a programming language is an abstract object on a
different logical level than, say, a real world object.

I think the best way to understand this is constructing a recursive descent
parser. Doing requires one to transform the language you parsing and so get an
idea of abstract languages. A lot of the advice people gives, such as using
regular expressions, isn't a way to get a complete picture of what's happening
in the parser or compiler construction process.

------
jbb555
This was an ok article before it started ranting on about privilege and bias
and codes of conducts for projects. I lost interest then.

~~~
sanderjd
That wasn't ranting, it was cogent and well-structured and made good points.
If you disagree with those points, say so, but don't diminish it by calling it
a rant.

I think those points are extremely relevant for an article targeted at people
trying to get involved in open source. The for-pay vs. free-time divide is a
fundamental problem with using open source work as an "in" to the industry,
and despite being scoff-worthy to some people, it's helpful for people coming
into this with fresh eyes (ie. the target audience of the article) to be given
some context.

~~~
xigency
It sounds like the author is being categorically dismissive of a group of
people that I can't even match. Is he against people with salaries developing
software? Is he against people who aren't paid from writing software? Should
people who aren't paid be paid and those employed be fired? Should whites
pursue primarily agrarian careers so that non-whites can pick up technology-
minded hobbies in order to transition to lucrative careers? The start and end
point of his discussion in that paragraph form a very narrow loop which defies
logic.

\--

On privilege:

I completely disagree with the necessity of being engaged in a community to be
a compiler developer or to learn any CS domain.

But stating that free time is a privilege is a bit absurd. You could be a
23-hour a day gardener and still write a compiler. It's not as if there's a
time limit.

Dividing by gender is ridiculous---the first computer programmer was a woman.

All that it takes to become a compiler enthusiast is the understanding of
languages and computers. Having a pencil and paper is a plus.

Alan Turing was gay but it didn't prevent him from inventing a form of
universal computer.

People should be celebrated for their differences, not admonished for not
having enough outlying characteristics.

Remember, the first programmers had neither computers nor compilers, and the
first compiler writers had punch cards, so who exactly is privileged?

~~~
dajomu
He's suggesting that for a lot of people getting involved in open source it is
a means to an end. Whether that is getting a job in a new field, increasing
exposure or experience, etc... Usually that end is something monetary or
status related. What he is then saying is that some parts of the population
are more privileged than others when it comes to the free time and resources
needed to achieve that. It's not necessarily just that they have the resources
now, but that they may have had them in the past too (upbringing, access to
education, encouragement, etc).

So the discussion of privilege shouldn't necessarily be seen as a negative
force trying to tear down white men, but should inform our decision making
when we take other people's hardships into account. As a white male myself I
understand that I have had better opportunities than a large part of the
population. I'd rather take that into account and try to improve the fairness
in society, than to look away from it. What to actually do about it is another
matter... But acknowledging that it's a problem is the first step.

------
jdc0589
something about parsing text and generating something meaningful from it is
just super gratifying; doesn't even have to be a full blown compiler.

Some of most enjoyable programming experiences I have had were working on
trivial parsing + transformation; e.g. a css beautifier, html beautifier, and
whatever the hell this is
[https://github.com/jdc0589/jtranslate](https://github.com/jdc0589/jtranslate)
(I wrote it in college, I still have no idea what I meant its purpose to be)

I've done a couple things closer to full blown compilers (C-- compiler, hand
written pascal subset compiler + one using a parser generator for the same
language subset). As fun as those were, the simpler stuff was more enjoyable
for me.

~~~
nojvek
I 100% agree with this. Making a computer understand a domain of text and
manipulate it in interesting ways is amazing. Similar is web crawlers.

It feels like magic when the product works.

~~~
jdc0589
hah, we have similar preferences for what we consider "fun". The last per
project I worked on was basically the same as
[https://www.kimonolabs.com/](https://www.kimonolabs.com/), but with a
different paradigm for writing crawlers/parsers (I hate wysiwyg/point+click
interfaces for shit like this). THey launched their product when I was taking
a break from mine, and implemented most of my feature set though :(

------
prestonbriggs
List seems inadequate. I guess it's enough to get you going working on
compilers, but there's not much wizardly on that list.

Read papers. Learn about data-flow analysis and optimization. Learn code
generation (instruction selection, register allocation, instruction
scheduling). Learn about interprocedural data-flow analysis. Learn/build a
garbage collector or 2. Parallelization, vectorization, type analysis. Write
an optimizer for Go. Write papers or your own.

~~~
xigency
I'm not sure that the author cares very much about CPU registers.

~~~
prestonbriggs
Sure he does... Anyone advocating for assembly language necessarily cares
about registers.

------
melling
I've got a collection of resources on compilers and interpreters:

[https://github.com/melling/ComputerLanguages/blob/master/com...](https://github.com/melling/ComputerLanguages/blob/master/compilers.org)

~~~
codewritinfool
Excellent resource, thank you!

------
chrisseaton
If anyone wants to get involved with an open source Ruby JIT compiler you can
join us in #jruby or
[https://gitter.im/jruby/jruby](https://gitter.im/jruby/jruby). Happy to tutor
beginners.

------
WalterBright
An article I wrote on the same subject:
[https://www.digitalmars.com/articles/b89.html](https://www.digitalmars.com/articles/b89.html)

~~~
ori_b
You may want to update your certificates -- Chrome complains when I try to
view the page.

~~~
WalterBright
Yeah, I need to do that. For the moment, use this instead:
[http://www.digitalmars.com/articles/b89.html](http://www.digitalmars.com/articles/b89.html)

