
Let’s Build a Compiler (1995) - undreren
https://compilers.iecc.com/crenshaw/
======
eismcc
Looks like it took a number of years for the article to be finished. From
“back to the future”:

I won't spend a lot of time making excuses; only point out that things happen,
and priorities change. In the four years since installment fourteen, I've
managed to get laid off, get divorced, have a nervous breakdown, begin a new
career as a writer, begin another one as a consultant, move, work on two real-
time systems, and raise fourteen baby birds, three pigeons, six possums, and a
duck.

The author has quite a diverse work history:

[https://www.linkedin.com/in/jack-crenshaw](https://www.linkedin.com/in/jack-
crenshaw)

------
merricksb
Previous HN discussions:

[https://news.ycombinator.com/item?id=6641117](https://news.ycombinator.com/item?id=6641117)
(2013)

[https://news.ycombinator.com/item?id=1727004](https://news.ycombinator.com/item?id=1727004)
(2010)

[https://news.ycombinator.com/item?id=232024](https://news.ycombinator.com/item?id=232024)
(2008)

(Links provided for info, not complaining about dupes.)

~~~
svat
Also
[https://news.ycombinator.com/item?id=19890918](https://news.ycombinator.com/item?id=19890918)
discusses [https://xmonader.github.io/letsbuildacompiler-
pretty/](https://xmonader.github.io/letsbuildacompiler-pretty/) which is a
prettification of these plain .txt files — thanks to it I was able to read
this series on my phone during lunch breaks.

------
brianobush
I really enjoyed Crenshaw's series on a Sweet 16-like interpreter in Embedded
Programming (Programmer's Toolbox section) around 1999. He has a charming way
of tying technology with history from his point of view.

Found one article:
[https://m.eet.com/media/1171254/toolbox.pdf](https://m.eet.com/media/1171254/toolbox.pdf)

------
ainar-g
I love this series! Does anybody know, if anyone ever made a version that
outputs AMD64 assembly instead of 68000? Or some kind of gentle introduction
to a reduced version of the AMD64 instruction set, so that I could do it
myself? The instruction set is quite huge, so having a number of primitive
commands that get the job done would be nice.

~~~
barrkel
You can go along way with not much more than a couple dozen instructions
(mnemonics, though when encoded with addressing modes it'll be more opcodes,
but Jack's compiler delegates that work to the assembler). MOV, MOVZX, JMP,
CALL, RET, CMP, TEST, Jcc, ADD, SUB, MUL, DIV, AND, OR, NOT, XOR, INC, DEC
would about cover it for basic control flow and integer operations.

Adding floating point arithmetic needs another dozen or so instructions. The
x87 opcodes (FLD, FST, FADD etc.) are very easy to program against, since it's
a stack machine - a post-order traversal of an expression tree is usually
sufficient if the stack won't overflow, though SSE2 instructions are more
usual for x64.

Large portions of the full instruction set are vector operations which you
wouldn't realistically be emitting for a didactic toy compiler. You can get by
perfectly well without the string operations too.

~~~
fizixer
I agree with you. However, a quick personal anecdote:

I tried to create "assembler", in python, for just one instruction, MOV. And
not even 64, or 32-bit, (it was either 16-bit or 8-bit).

After scratching my head with all the corner cases, and handling about 80% of
them, my python code looked like a complete mess.

There is an article floating around (posted here multiple times) that says MOV
is turing complete [0].

[0]
[https://www.google.com/search?q=MOV+is+turing+complete](https://www.google.com/search?q=MOV+is+turing+complete)

~~~
wolfgke
> There is an article floating around (posted here multiple times) that says
> MOV is turing complete [0].

Don't believe this claim - any computer has only access to a _finite_ amount
of memory while a Turing machine needs an infinite tape to work. So the proof
must be false.

In other words: only a mathematically idealized version of MOV might be Turing
complete - but this cannot be the MOV that x86 implements.

~~~
chrisseaton
Cone on... _anyone_ reading the claim knows it has a silent ‘(but obviously
with a machine-bound tape)’ at the end of it.

~~~
wolfgke
The class of machines with the ‘(but obviously with a machine-bound tape)’ are
even weaker than finite-state machines (because the number of states is bound
by a fixed constant instead of "just" assumed to be finite).

So if you argue that the silent ‘(but obviously with a machine-bound tape)’
assumption is OK, you argue that it is OK to identify Turing machines with a
class of machines that is even weaker than finite-state machines.

Any professor for (theoretical) computer science will be horrified by such
claims.

~~~
chrisseaton
If your program finishes without reaching the limit of tape then it doesn’t
matter that the tape was limited.

It’s not about classes of machines. A machine with a limited tape is a perfect
approximation for the proper class of a machine with an unlimited tape if the
limit is not reached for all programs you care about.

Professors won’t be horrified - they also use the silent parenthetical
constraint. Look for example at formal publications like Dolan’s and see how
academics talk about it.

------
jimws
This tutorial uses Pascal as the implementation language. Is there a similar
tutorial done in a language that is mainstream today? Like Python? Go? Rust?

~~~
mhh__
Modern Compiler implementation in [x] follows a similar structure (in much
more depth) and can be consumed in Java, C or - the correct choice - ML.

------
einpoklum
I would be a bit wary of a 30-year-old book on compiler construction. If it
were more on the theoretical side then fine, but this sounds kind of hands-on.
"Let's build a compiler with 1980s tools!" doesn't sound very appealing.

That is just a shallow impression though. Convince me I'm wrong?

~~~
makotoNagano
Sometimes learning on older technologies is very helpful to learning more of
the fundamentals than learning practical skills directly.

For example I studied how to program assembly code for an STM32F0
microprocessor. Would never do that in practice. but worked wonders in
teaching me the intricacies of a processor at a very low level.

~~~
Athas
Old techniques are not necessarily more fundamental, they are just more
primitive. (And often, but not always, more efficient.)

In particular, old compiler texts will often do things like emit code directly
from the parser, try to minimise the number of AST passes, or keep complex
global symbol tables separate from the AST. These are mostly workarounds for
technical limitations that are no longer relevant, and in fact only obscure
the principles being taught (code generation during parsing is my pet example
of this).

~~~
bakery2k
Isn't code generation during parsing still common today? In particular,
bytecode generation in interpreters (and JIT compilers) for scripting
languages, e.g. Lua?

~~~
Athas
It's sometimes a good idea to do it that way in practice, but it's still a
conflation of two conceptually distinct processes. I think it is a bad
approach when teaching compiler implementation, as it means you avoid the
extremely core concept of an abstract syntax tree.

~~~
badsectoracula
But it isn't a core concept if you do not need it. And an AST builder can be
"injected" between the parser and codegen at a later point in time, if needed.
You do not even need to do it in one go, e.g. if your compiler has something
like a "ParseExpression" (assuming recursive descent parsing that spits out
code as it parses), you can start by making a partial AST just for the
expressions and leave everything else (e.g. declarations, control structures,
assignments - assuming those aren't part of an expression, etc) as-is.

This is useful for both practical and teaching purposes: for practical because
it keeps things simple in case the additional complexity isn't needed (e.g.
scripting languages) and for teaching purposes because someone learns _both_
ways (which are used in real world problems) while at the same time learning
why one might be preferable to the other. And if you do the partial AST bit
you even introduce the idea of an AST gradually by building off existing
knowledge and experience the student has acquired.

