
How to Write Your Own Compiler (2009) - yinso
http://staff.polito.it/silvano.rivoira/HowToWriteYourOwnCompiler.htm
======
bollu
Shameless plug: I've been writing a series that's WIP about writing a tiny
_optimising_ compiler - [https://github.com/bollu/tiny-optimising-
compiler](https://github.com/bollu/tiny-optimising-compiler). It tries to
model as much as possible, and the aim is to show off the power that modern
compiler ideas bring: SSA and polyhedral compilation.

~~~
chubot
This looks cool, thanks for pointing it out.

I'm somewhat familiar witih SSA, but I don't know anything about polyhedral
compilation. What does it buy you? Is there a concise intro you could point
to?

(I thought I had heard someone say it makes your compiler slower for marginal
benefit, but they/I could be wrong about that.)

FWIW I found this introduction to the new Go SSA back end pretty useful as an
overview:

[https://www.youtube.com/watch?v=uTMvKVma5ms](https://www.youtube.com/watch?v=uTMvKVma5ms)

One thing that was interesting is that they seemed to have an LLVM TableGen-
like "rules" DSL for both architecture-independent and -dependent
optimizations / code gen.

I don't think polyhedral compilation was mentioned and I don't think they use
it.

~~~
gone35
See:

[https://en.m.wikipedia.org/wiki/Polytope_model](https://en.m.wikipedia.org/wiki/Polytope_model)
[https://www.cs.indiana.edu/~achauhan/Teaching/B629/2010-Fall...](https://www.cs.indiana.edu/~achauhan/Teaching/B629/2010-Fall/StudentPresns/PolyhedralModelOverview.pdf)
[http://web.cs.ucla.edu/~pouchet/lectures/doc/888.11.5.pdf](http://web.cs.ucla.edu/~pouchet/lectures/doc/888.11.5.pdf)

------
mtkd
One of the best resources I've seen on this is Capon and Jinks 'Compiler
Engineering Using Pascal' (1988; ISBN 0-333-47155-5) if you can get hold of it

[http://www.cs.man.ac.uk/~pjj/book.html](http://www.cs.man.ac.uk/~pjj/book.html)

~~~
jonsen
Also _Brinch Hansen on Pascal Compilers_ :

[https://www.amazon.com/Brinch-Hansen-Pascal-
Compilers/dp/013...](https://www.amazon.com/Brinch-Hansen-Pascal-
Compilers/dp/0130830984/)

~~~
pjmlp
> 1 New from $3,000.00

Uau! I should take care of my old books.

~~~
robertbaruch
Haha, don't forget there has to be a seller AND a buyer for a price to hold
true.

~~~
kazinator
There also has to be _volume_ for a price to hold true. Even if someone buys
that one for $3000, that doesn't mean it's the price if it only happens once.

------
andreasgonewild
May I suggest considering Forth as a substrate? Or will that get me voted
straight into neckbeard land?

I've had a lot of fun hacking Forths, writing my own languages didn't really
click until I finally had a serious look. Forth skips on most of the
complexity of getting from bytes to semantics; even Lisp is complex in
comparison; but is still powerful enough to do interesting things; and the
best foundation for DSL's I've come across.

My latest rabbit hole is called Snabel: [https://github.com/andreas-gone-
wild/snackis/blob/master/sna...](https://github.com/andreas-gone-
wild/snackis/blob/master/snabel.md)

~~~
vidarh
Forth is one of those things that always fascinates me but ultimately always
ends up feeling too hard to read.

Snabel looks like an interesting experiment in that regard..

~~~
andreasgonewild
Lisp and Forth, everyone gets that reaction on first contact; the real problem
is that most popular languages look the same these days. Popularity has very
little to do with earning it. And it's getting worse with more and more
crapscript lately.

I'm not very much into dogmatics and orthodoxy, Forth wasn't the final answer
to anything. Nothing ever is. Implementing the same thing over and over again
doesn't make sense to me; I have plenty of experience of my own that thinks
differently, things I always wanted to work differently. So that's what I'm
doing, my best to improve the status quo.

~~~
vidarh
I've spent a _lot_ of time over the years trying to get used to Lispy and
Forth-like languages, and the syntax is a much greater barrier than you think.

Most languages look pretty much the same because we do have half a century of
experience with what people get comfortable with. There's further centuries of
experience with what people find readable in natural language. People seem to
infer a lot more information from layout and structure than some seem to
think.

But there _is_ a split there between those who are fine with dense syntax and
those who are not. Lisp, Forth, and many functional languages fall in a
category that tend to be popular with people who also e.g. find mathematical
notation straightforward and are happy to decipher symbol after symbol.

But for many of us, being able to instantly get an overview is a visual
process that requires more distinct syntax. I can remember pages of code I saw
20-30 years ago by the overall shape of he code, and layout, and even the
font, but if I'm asked to recall code by a sequence of tokens I had to read
token by token, I'd draw a blank over code I saw days ago.

That's why you see languages like e.g. Ruby, that in themselves bring very
little new on the semantics side, but that popularise ideas from other
languages (e.g. Smalltalk, Self for Ruby, along with a lot of syntactical
baggage from Perl).

In recent years a lot of ideas have been lifted from Lisp, but some of them
are linked quite closely to Lisps syntax. We've started seeing languages
tackle this, but it's still a tricky balance to get right.

But I think Forth lacks a language that can do this for it. Retaining the
ideas of Forth while making it readable to a bigger audience seems like a hard
problem, but it's been (reasonably well) tackled for Smalltalk and Lisp.
That's what piqued my interested with your experiment.

~~~
zeveb
> Most languages look pretty much the same because we do have half a century
> of experience with what people get comfortable with.

I think it's more about familiarity rather than any inherent quality of the
syntax itself.

> But for many of us, being able to instantly get an overview is a visual
> process that requires more distinct syntax.

I think colour and shape can go a long way here, too — and that applies to
both Forth & Lisp.

~~~
andreasgonewild
Or choice of symbols and identifiers. People love to bash Perl, but Larry is
definitely on to something when it comes to names and symbols, however
flawed/misused the implementation in Perl may be; I feel like we're chasing
the same ghosts.

An example is Snabels use of "|" for resetting the stack, "_" for dropping the
top and "(..)" for grouping modifications; it smells a tiny bit like Perl in
that it's not afraid of using what's available to get the wanted experience,
but has a dramatic effect on code shape compared to regular mumble-Forth.

------
whitten
I appreciate the effort to have a web page talking about writing your own
compiler. Right now, I'm looking for an easy way to add define-use chains to a
compiler. I know it involves breaking code into blocks and tracing through the
code looking for use of the same variables. This tutorial is good because it
adds a symbol table on each block level, which helps in differentiating names
that are re-used in a block and don't refer to other variables of the same
name. Does anyone know of code that makes clear what is involved in def-use
for a variable without saying "this is an exercise for the reader" ?

~~~
tom_mellior
> a symbol table on each block level, which helps in differentiating names

Yes, that's something that every compiler (for languages with scopes that
allow hiding) needs. Typically you also never compare variables by name but by
pointer to a symbol structure or some numeric ID.

> Does anyone know of code that makes clear what is involved in def-use for a
> variable without saying "this is an exercise for the reader" ?

It's not code, but chapter 2 of Nielson/Nielson/Hankin's Principles of Program
Analysis discusses how to do this. See accompanying slides on the book's
website:
[http://www.imm.dtu.dk/~hrni/PPA/ppasup2004.html](http://www.imm.dtu.dk/~hrni/PPA/ppasup2004.html)

In a sense, this _must_ be an exercise for the reader, since if you are
working on your own compiler, nobody's code will be compatible with yours.

------
peter303
From its earliest year UNIX/Linux had utilities including lex and yacc that
automate much of these fuctions. I tended to use awk to create powerful new
command scripts.

~~~
chrisseaton
lex and yacc only automate one tiny part of the front-end of a compiler.

~~~
Xophmeister
Also, if you're learning to write your own compiler, you should, IMO,
implement the lexer/parser (if you haven't done so before) to understand how
they work.

------
caryhartline
If you're wondering why that page looks the way it does, well:

> <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-
> microsoft-com:office:office"
> xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
> xmlns="[http://www.w3.org/TR/REC-html40"><head>](http://www.w3.org/TR/REC-
> html40"><head>) <meta http-equiv="Content-Type" content="text/html;
> charset=windows-1252">

~~~
quickben
It's short, informative, and up to the point.

~~~
tigroferoce
To give some context, I have studied with that professor and I can say that
the guy is not super up to date with presentation technology: at the time (its
was 2004) he was still using hand-written transparencies projected with an
overhead projector. The readability was close to zero.

