
An Incremental Approach to Compiler Construction (2006) [pdf] - xorbox
http://schemeworkshop.org/2006/11-ghuloum.pdf
======
jpolitz
I love this paper! I built my compilers course out of the ideas in it:

[https://www.cs.swarthmore.edu/~jpolitz/cs75/s16/index.html](https://www.cs.swarthmore.edu/~jpolitz/cs75/s16/index.html)

I think the incremental approach is terrific, because it allows you to get to
a program the emits assembly and builds a working binary in week one. The
first thing this does is give a concrete example of what "a compiler" is. The
second is to provide a great foundation for discussing static vs. dynamic and
what decisions are made at compilation time vs. runtime, without needing a
full implementation. These concepts are not obvious (e.g. when can and should
a check for unbound ids happen? What about divide by zero, or overflow, or
type mismatch?), and deserve to be carefully taught and considered.

This lets the course build up a new feature, from front-to-back, each week or
two, and consider its implications on the whole pipeline each time.

~~~
bronxbomber92
Looks like a very a nice course! The notes are protected by a Swarthmore login
page, but the videos are not; are the notes meant to be inaccessible?

~~~
jpolitz
Thanks for asking! There's a public mirror of a bunch of the starter code at:

[https://github.com/compilers-course-materials/](https://github.com/compilers-
course-materials/)

The lecture notes (as SVG/source code) are at:

[https://github.com/compilers-course-
materials/cs75-s16-lectu...](https://github.com/compilers-course-
materials/cs75-s16-lectures)

in particular.

------
userbinator
I find it a little amusing that it claims "toy compilers are too simplistic
and do not prepare the novice compiler writer to construct a useful compiler",
and then chooses Scheme, effectively bypassing the whole process of parsing...
which IMHO is half the fun of writing a compiler.

Seeing the elegance of recursive descent (and refactoring it into table-driven
operator-precedence) and how it can turn recursive structures like expressions
and statement blocks into linear sequences of instructions is something I
think everyone writing a compiler should experience.

However, I agree with an incremental approach, and that is nothing new;
personally I think Jack Crenshaw's tutorial series (started in 1988) is one of
the best:

[http://compilers.iecc.com/crenshaw/](http://compilers.iecc.com/crenshaw/)
(original 68k version)

[http://www.pp4s.co.uk/main/tu-trans-comp-jc-
intro.html](http://www.pp4s.co.uk/main/tu-trans-comp-jc-intro.html) (x86
version)

It's interesting to note that Jack doesn't specialise in compilers, nor
computer science for that matter, yet his explanations are far clearer to me
than many of those who do.

~~~
jpolitz
> how it can turn recursive structures like expressions and statement blocks
> into linear sequences of instructions

I think this is one of the key concepts in a compiler, yet I don't think it
has much to do with parsing.

Parsing (in a compiler) is the act of taking a linear sequence of characters
or tokens, and turning them into a rich AST structure.

The back end of the compiler is what takes this recursive structure and
flattens it out into a linear sequence of instructions that "mean" the same
thing as the tree.

Parsing is critical, of course, for building a working compiler. But I think
not focusing on surface syntax gets to the key recursive-to-linear insight you
mention more directly!

~~~
vidarh
> and turning them into a rich AST structure.

This is not a necessary step, and is specifically not done in a lot of simpler
compilers. E.g. most "Wirth-type" compilers never build an AST.

There are lots of good reasons to build ASTs if you want to do complex
analysis and optimisation, and it simplifies the presentation of some things,
but calling the code generator module straight from the parser in many ways
makes the direction connection clearer.

------
bsznjyewgd
Maybe related, Essentials of Compilation: An Incremental Approach:

[https://www.sharelatex.com/project/5637a774990f556d48bab667](https://www.sharelatex.com/project/5637a774990f556d48bab667)

"Abdulaziz proposed an incremental approach in which the students build the
compiler in stages; they start by implementing a complete compiler for a very
small subset of the input language, then in each subsequent stage they add a
feature to the input language and add or modify passes to handle the new
feature [...] This is the textbook for the incremental version of the compiler
course at Indiana University"

~~~
jpolitz
Thanks, this a terrific reference that I hadn't seen!

I see a few interesting contrasts with the my course off the bat:

\- I skipped forcing students to implement uniquifying variables early on. I
wanted to get from source to assembly as quickly as possible, but this may
have been a worthwhile step to force, since it teaches some valuable lessons.

\- I had them implement ANF (AKA flatten in those notes) in a different style,
because I was used to it. The style in that textbook is better (Ranjit Jhala
at UC San Diego also pointed this out when he taught a version of the course),
where an expression is turned into a list of bindings and a final expression,
rather than using a continuation-passing ANF algorithm as I did.

\- The linked book has instruction selection with semi-abstract addresses
followed by a "patching" phase, to avoid instructions that, say, move from
stack location to stack location. This is cool, but is a few more steps than I
wanted to get into for the simplest compilers. I didn't pursue as structured
an approach, and instead gave a simple description to the first few "compile"
functions we wrote:

    
    
        Generate a list of instructions that gets the "answer"
        for this expression into EAX
    

This avoided detailed discussion of instruction selection, abstract vs.
concrete addresses, etc, in the beginning, and just made students generate
some instructions that work. We quite quickly after that needed to start
talking about issues of clobbering other expressions' state, and finding
unique locations for variables, but then those just became constraints on
"getting the answer into EAX" correctly. In fact, I found that repeating the
mantra "get the answer into EAX" was a nicely actionable way to go about
generating instructions, and letting students get started when we introduced a
new feature (to compile it, we need to get the answer into EAX!)

After we'd done this a few times, we had the shared vocabulary to realize that
answers don't _always_ have to go to EAX (the compiler can be parameterized
over where the current answer should go), and not every constant's value needs
to flow through EAX in order to get to its variable's home, etc. But these
were refinements on top of our dead simple strategy.

Just some thoughts I had while perusing the first few chapters here. Thanks
again for sharing!

------
616c
Is this preceding work to the tour de force that is the Chez Scheme
incremental like 40+ Scheme compiler. I mean, Cisco bought it, probably to
hide it from us all. It must be worth something.

[https://www.youtube.com/watch?v=Os7FE3J-U5Q](https://www.youtube.com/watch?v=Os7FE3J-U5Q)

[http://lambda-the-ultimate.org/node/1589](http://lambda-the-
ultimate.org/node/1589)

[https://github.com/akeep/nanopass-
framework](https://github.com/akeep/nanopass-framework)

~~~
tbirdz
Not true. Chez Scheme has been released as free open source software:
[https://github.com/cisco/ChezScheme](https://github.com/cisco/ChezScheme)

~~~
616c
As someone pointed out, yeah, open-sourced _after_ bought. There used to be
variations. One, compiled, Chez Scheme, and the latter interpreted with none
of the benefits of the compiler design but still fast and well desiged in
relation to other Schemes, Petite Chez Scheme. The latter was freeware, the
former, neither free/open nor freeware.

It was never clear why Cisco bought it. Rumors persist ...

------
lacampbell
I've just realised that I have confused this paper with another paper, which I
think was from the racket people and involved what they called "nano-passes".
I think I've actually glossed over this paper many times for this reasons.

I'm not interested in writing a scheme compiler (there are several good ones
out there already), but this looks like it has a lot of useful techniques I
can pilfer.

------
melling
Prior submission with good comments:

[https://news.ycombinator.com/item?id=10785164](https://news.ycombinator.com/item?id=10785164)

