
Thought Experiment: An Introductory Compilers Class - ingve
https://semantic-domain.blogspot.com/2020/02/thought-experiment-introductory.html
======
andrewflnr
I disagree a bit with the premise. I feel that the most important part of a
compilers course is communicating that _compilers are not magic_ , that none
of this is magic. Good developers need to be able to see through their
abstractions when necessary, and it's too easy to treat compilers as a black
box.

As a slightly different concern, I think the curriculum above runs the risk of
giving the impression that even if compilers aren't magic, they definitely
need a lot of heavy duty abstract math. I'm not in favor of this either.

With that in mind, my ideal compilers course would probably take an intuituve
approach to lexing, recursive decent parsing, and extremely simple tree-
walking, stack-based code generation. Keep calm, it's all just data and basic
algorithms.

My college education had the effect of lowering my expectations of my fellow
students, so a more sophisticated course may be appropriate in some cases.
YMMV.

~~~
jcranmer
Personally, I feel that a compilers course should actually be about learning
how to build a compiler. After all, the only reason to take a compiler course
is because you want to--it's not going to be on the necessary track for a
degree. The attitude of "you're not going to remember anything about compilers
anyways, so let's make this an advanced math class" is quite frankly
insulting, most especially to people who _actually_ want to take a compilers
course to learn about compilers.

~~~
andrewflnr
I mean, I wouldn't object to having compilers be mandatory. Out of the other
side of my mouth, I like to say that a compiler has great properties for
teaching actual software engineering, e.g. managing complexity, dealing with
changes (hey kids, we're adding a new syntax feature! Now we're adding
closures!), while still teaching you that compilers aren't magic... so that's
another approach I would endorse. At that level, you actually would want some
formality, but you could add it gradually.

------
abatoir
I tried to add a comment on the source page, but that's not possible if you
don't have a Google account. So I'll add it here.

There are two great resources for an introduction to compilers. First, "The
Unix Programming Environment" has a chapter on programming tools which builds
the equivalent of the Unix 'dc' desk calculator. This includes variables and
procedures, so it's non-trivial.

The second one was first posted to USENET by (as I recall) Jack Ganssle
called, "Let's Build a Compiler". This is notable because it builds a
recursive-descent compiler, rather than the usual LALR ones.

Considering the intent was a course for beginners, I think much of what that
page wants to cover is more suitable for a _second_ course. If you go through
the UPE book first, you'll be in a much better position to slog through the
Dragon book and get all those low-level things.

------
not2b
Introductory compilers classes should drop LR parsing as a topic. It takes a
huge amount of time to teach and while it is mathematically elegant, it isn't
really a good approach to writing a compiler front end that produces high-
quality error messages. Teach recursive descent and move on to more
interesting topics.

~~~
ken
I'd go even further. Minimize the exposure time to parsers.

We started my undergrad compilers class with several weeks on parsing. I think
it's by far the hardest aspect in all of compilerdom -- and as a Lisp
programmer, I didn't understand the need to be tortured that way.

I do understand on a theory level why parsers would get put first in a
compiler class, but they're a tremendous barrier to entry and just make the
whole subject seem inaccessible. There's lots of other ways I can think of
which would make it easier for newcomers to get some traction on the subject
of compilers.

~~~
UncleOxidant
I don't agree that parsing is the hardest aspect in all of compilerdom. More
like parsing is the least interesting part of what makes a compiler. Perhaps
as you allude to, use Lisp or S-expressions as the frontend language and then
build the compiler as you don't need much of a parser. Then you can get
straight to the more interesting bits.

------
Animats
This is more like "Compilers for functional languages I", to be followed by
about two more courses.

Non-optimizing compilers for simple imperative languages are straightforward.
There's a front end, which turns

    
    
        a = b + c;
    

into

    
    
        b c + a =
    

which can be represented as a tree. Then, for each operator, you generate
code. So you generate something like

    
    
        load a0,b
        add a0,c
        store a0,a
    

Then you show how that can be generalized to cover functions, arrays, loops,
etc. That's Compilers 101, Wirth's classic book on Pascal.

This article takes a completely different approach. All functional, all the
time. Not sure whether this is good or bad, but it's definitely different. Is
what this guy is promoting mainstream in advanced compiler work today, or is
he off in his own world?

One issue with functional everything is that naive functional compilation is
very inefficient. If you write

    
    
        f(g(a)) 
    

where a is some kind of big collection, and g produces a similar collection as
output, then you have an intermediate variable which is itself a big
collection. Much of functional programming optimization is about trying not to
generate big intermediate values, but converting the problem to a pipeline
instead.

~~~
chubot
About the last paragraph, R and Pandas both have that issue. (Although I'm not
sure if it's a bottleneck in typical code, since there are other problems
too.)

But Julia and the newer Weld compiler apparently have loop fusion:

[https://julialang.org/blog/2017/01/moredots/](https://julialang.org/blog/2017/01/moredots/)

[https://dawn.cs.stanford.edu/2018/07/30/weldopt/](https://dawn.cs.stanford.edu/2018/07/30/weldopt/)

------
microtherion
My favorite introductory compiler book is Niklaus Wirth's "Compiler
Construction", which implements a compiler end-to-end with full source code,
in a bit more than 100 pages:
[https://inf.ethz.ch/personal/wirth/CompilerConstruction/inde...](https://inf.ethz.ch/personal/wirth/CompilerConstruction/index.html)

------
seanmcdirmid
Rather than teach compiler construction, maybe we should instead teach
language design with an added implementation component. Compiler construction
on its own seems like too dry of a topic for most students, and I say this as
a PL PhD. Language design might attract more interest, and provides more
context on why you would want to write a compiler in the first place.

~~~
Footpost
That's interesting. What would you teach in the language design part of the
course? By the time compilers are taught (2nd or 3rd year undergraduate), most
students rarely understand advanced language concepts, whether call/cc, GADTs,
higher-kinded types, or type-based lifetime tracking, message-passing vs
shared memory parallelism etc, let alone the tradeoffs involved in the design
space.

Note also that there is no scientific consensus on what makes for a good
programming language.

------
mjevans
I feel like this might still be too high level. A better approach may be to
teach by the example of writing a basic FORTH environment from machine
instructions, then basic helpers written in the language, and working up from
there.

Various types of better things can be added in once there's something to
optimize.

Edit: I was probably thinking of this version that I likely read a couple
years ago:

[https://news.ycombinator.com/item?id=13505285](https://news.ycombinator.com/item?id=13505285)

Quoting their post

"" rwmj on Jan 28, 2017 [-]

This is a how-to-write-a-FORTH tutorial that I wrote a few years ago. It's
particularly nice that you get to see how various control structures are
implemented, like IF, CASE and even comments!

Part 1:
[http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth...](http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth.S)

Part 2:
[http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth...](http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth.f)

(github mirror:
[https://github.com/AlexandreAbreu/jonesforth/blob/master/jon...](https://github.com/AlexandreAbreu/jonesforth/blob/master/jonesforth.S)
[https://github.com/AlexandreAbreu/jonesforth/blob/master/jon...](https://github.com/AlexandreAbreu/jonesforth/blob/master/jonesforth.f)
)

Previous HN comments:
[https://news.ycombinator.com/item?id=10187248](https://news.ycombinator.com/item?id=10187248)

""

The immediate reply covers my feelings for how the __introductory__ class
should proceed: minimal assembly and maximum programmer clarity.

"" rsaarelm on Jan 28, 2017 [-]

Working through Jones Forth got me up to speed on both understanding how Forth
works and getting my hands dirty with some practical assembly coding.

Once I got a better idea of Forth, I also realized that Jones stays in
assembly for rather long. He builds up the entire interpreter from raw
assembly words, because you need the interpreter to start parsing textual
Forth source. But you if you could somehow write Forth before the interpreter
is put together, it would be quite natural to switch to Forth much earlier.
And it turns out you can do that. Jones even defines the `defword` macro. But
for some reason he makes very little use of it. Rewriting most of the code
leading up to INTERPRET using defword was a fun exercise.

The next step would've been making the whole system bootstrapping by rewriting
the assembler in Forth, but x86 machine code generation is hairy enough that I
bailed out at this point. ""

