
Join the Compiler Creation Club - skiskilo
http://tech.pro/blog/1733/join-the-compiler-creation-club
======
cjh_
I have been writing a scheme interpreter in c, and for me the most interesting
aspect so far has been the level at which I am programming.

At the beginning it was very traditional c; symbol and AST manipulation were a
PITA (at that point it was a malloc'd arrays of `expressions`). After I had a
base language working I started to use the language I had implemented so far
to further the implementation, this finally peaked where this weekend I did a
large refactor to remove most of my c arrays and instead replace them with
scheme pairs and lists.

For example, here [1] I implement define function form in terms of lambda,
specifically new_lambda(env, cons(args, cons(body, null))).

In hindsight this seems so obvious, but I have found the whole process
extremely interesting, specifically looking at how the implemented languages
starts to influence the implementation language.

I really cannot stress enough how enjoyable the process of writing my
interpreter has been, I thoroughly recommend it to anyone who is interesting
in programming languages.

[1]
[https://github.com/mkfifo/plot/commit/07272bd69e51979ab71fa0...](https://github.com/mkfifo/plot/commit/07272bd69e51979ab71fa08f6415978f46a3b4ee#diff-e2f93fafd9051a63d4bd310620ebd261R105)

------
pjmlp
Advice to anyone jumping into writing compilers, just pick a functional
language, specially ML family.

Symbolic manipulation of data structures is plain joy, compared to what is
required in C or Pascal family of languages.

~~~
rayiner
My recommendation would be to compile a Lisp. See:
[http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf](http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf).
Easy to follow tutorial for compiling Scheme to x86.

This advice is especially true if you think syntax and parsing are the boring
parts of writing a compiler.

~~~
cjh_
I have been working on a small r7rs scheme interperter [1] and it has been
extremely educational. I had never used scheme previously but learning it from
sicp [2] has been rather enlightening, it surprisingly lived up to
expectations.

I haven't considered if I will go down the compiler route, but either way that
resource looks very interesting; especially the coverage on tail call
optimisation at the assembly level.

Does anyone have any further resources discussing tail call optimisation? I
would like something a bit more in depth.

The other topic I would like to read more on is continuations, as a still
newbie-schemer I find the idea of implementing them to be quite daunting.

[1] [https://github.com/mkfifo/plot](https://github.com/mkfifo/plot) [2]
[https://mitpress.mit.edu/sicp/](https://mitpress.mit.edu/sicp/)

~~~
rayiner
The best way to deal with tail call "optimization" is to not treat it as an
optimization at all. Tail calls are a syntactic property of source code. You
can easily determine whether a given call is in a tail position by recursively
traversing the AST. Then you just have to generate tail calls in a way that
uses constant stack space. This is mostly a matter of properly designing the
calling convention. See: www.complang.tuwien.ac.at/schani/diplarb.ps‎.

~~~
cjh_
Thanks for those words and that link.

I like that way of thinking of it, I see it as just reusing the existing stack
frame as a call in a tail call position doesn't need anything from the current
stack (returning to this frame adds no extra meaning to it).

Having never implemented them though I feel like my ideas still need some
fleshing out, my naive approach seems to be similar to the idea of
trampolining.

Really looking forward to reading the linked document, very thorough and looks
to be just what I was looking for (it even covers architecture dependent
aspects!).

~~~
rayiner
If you've got the freedom to design your calling convention, you don't need
trampolines or anything like that.

The basic idea is to compile every call in a tail position to a jump. You just
need to clean up your stack before making the call, and make sure to arrange
your calling convention such that A can call B, which can jump to C, which can
return directly to A.

In the A -> B -> C example, this means:

1) Using a callee-pops convention so that callees pop arguments off the stack
rather than callers. If B and C take different numbers of arguments, A doesn't
know how many arguments to pop. So functions should pop their arguments off
the stack themselves before they return.

2) B must restore its callee-save registers before jumping to C, which will
save those registers itself if it clobbers them.

3) To accommodate variadic functions, there has to be a hidden argument to let
callees know how many arguments were actually pushed by the caller, so they
can pop the right number.

------
chrisdotcode
Most "tutorials" I've seen as of late (including this one) seem to walk
through creating the grammar, and then just hand wave the actual creation of
the AST (and the rest of the steps) to "yacc magic" or some other friends.

I would not call that building a compiler.

Are there any modern tutorials/references of hand-generating the grammar,
hand-coding the parser, and hand-coding whatever comes next (because I have no
idea, thanks to these new-age tutorials) - without a toolchain, so that the
entire process can be seen from start to finish?

~~~
mcpherrinm
Writing a parser isn't a terribly interesting or difficult part of writing a
compiler. You can describe a simple one in a page of pseudocode. If you really
are interested in writing a compiler, I wouldn't get hung up on this point.

If you want more depth, you probably want to pick up a compiler textbook. I
liked "Modern Compiler Implementation" (I've used both the C and ML versions)
in my undergrad.

~~~
mcpherrinm
I've been thinking about what I've said in this post and I want to reword
slightly. Writing a parser _can_ be very interesting, and there's a lot of
neat techniques, and since we're talking about learning here and not making
production compilers, it's certainly a worthwhile endeavor.

What I really mean to say is that your parser doesn't have a lot of effect on
the rest of your compiler design. In the end, it's a function that takes input
text to an AST. You can go back and replace a YACC generated one later, if you
want to know more.

Of course if you choose a simple to parse input language (like lisp) then you
can write a simple handwritten parser off the bat.

------
computer
For an advanced version: The C++ Grandmaster Certification MOOC where
participants build their own complete C++11 compiler in C++11 is still going
strong: [http://cppgm.org/](http://cppgm.org/)

~~~
octo_t
Has _literally_ anyone actually completed that certification?

~~~
madhusudancs
Well, I would extend that and ask. Has anybody completed implementing the
Lexer/Parser phase? Once you get the Syntax tree or IR, the complexity is more
or less (if not exactly the same) similar to other languages.

~~~
WalterBright
I've written a complete C++98 compiler, front to back.

~~~
pkaye
How long did something like that take you to do?

~~~
WalterBright
10 years

~~~
arcft
Mr. Bright, you're my hero. Really. Not only because you have made a C++
compiler using recusive descent parser and it's was the first ever to generate
native code and the first comercial too and you don't even do generate
assembly on dmd. You still made the D language and dmd compiler. A world
saving tool from C and C++. It will rock the world. Could you imagine one of
the next world's famous software like Windows or *BSD written in D? Also, I'm
on the way on compiler business too (joke :-)). (sorry english, not my native
language)

------
lennel
Nitpicking certainly but I would say use something like antlr rather than
peg.js

antlr "schemas" are extremely close to ebnf's (some exceptions obviously) and
the compiler creation club could benefit from that. Antlr 3 is usable in a
variaty of languages (although 4 is a hellavalot simpler and solves a fair bit
around left recursive grammers, but non jvm support is non existent still I
suspect)

------
cocoflunchy
I started writing my own compile-to-JS language a while back after reading
this book: [http://createyourproglang.com/](http://createyourproglang.com/)
and I'm loving it!

However I'm having a bit of trouble right now with my grammar, I'm using Jison
([http://jison.org](http://jison.org)) and the error messages are kind of
confusing (I didn't even know they were error messages at first, I thought
they were some kind of logging). I apparently have shift-reduce conflicts just
about everywhere, but I didn't bother solving them as I was building the rest
of the language since everything is working fine! (well I'm sure there are
edge cases that I haven't ran into yet)

So if anyone here has a good link on the basics of parsing, grammars, LR(1)
and whatnot, I'd like to understand what I'm doing ;)

~~~
pjmlp
Although a bit out of date, the Dragon book is always worth a read.

[http://dragonbook.stanford.edu/](http://dragonbook.stanford.edu/)

------
chewxy
Ugh, Hacker News you are better than this. THIS was posted yesterday:
[http://news.ycombinator.com/item?id=6792225](http://news.ycombinator.com/item?id=6792225).

It links to the original source. Why are we not giving the original author
credit, instead of linking to blog spam on tech.pro?

~~~
huragok
Original author here - I work at tech.pro and decided to syndicate it across
to the site (it even references it down the bottom of the article!).

~~~
chewxy
ack! I seem to have yet again made a fool of myself.

------
caissy
I am currently following a compiler class, and I must say that I am really
amazed and impressed. For an assignment, we of course had to write an
interpreter for our own mini language. The fealing you have after creating
this interpreter was overwhelming.

For the lexer and parser, we used SableCC, an object oriented framework that
generates a compiler in Java. I've never used anything else (yacc, lex, etc),
so I can't compare the tool, but it provides a rich, useful and easy interface
to use.

~~~
pjmlp
SableCC is quite good.

yacc and lex are handy to have around, but feel like Jurassic tools when
compared to more modern parser generators tooling like ANTLR.

~~~
Taniwha
kids these days, when I wrote my first compiler I had to write my own yacc
equivalent, and parser

FYI: My generator allowed for dynamic resolution of shift/reduce conflicts
which allows you to compile some languages yacc wont let you (languages that
let you change the priority of operators for example)

~~~
pjmlp
Given that one of my CS specialization areas was compiler design, me too.

I implemented a left recursive parser in x86 Assembly for MS-DOS systems.

As I said, yacc and lex are nice to have, but nowadays there is little
incentive to keep using them.

Specially as you say, they are not able to parse all types of languages.

~~~
Taniwha
Did something similar in 6800 assembly, left recursion of course limits you
even more - but it helps fit that tiny tiny compiler into 2k.

I actually like yacc/bison - they're good for most purposes and deliberately
designing a non-LALR (or more a non-LR) language on purpose (rather than
because you don't know any better) is usually silly - you do need to 'get' the
concept of building a parse tree from bottom up - assembling it from larger
and larger snippets as you go

Oh and yacc/bison run about 10 times faster than equivalent I wrote 10 years
before they existed so I'm not complaining

I tend to use the same bespoke lexical analyser I've used for years and hack
it to suit - it includes support for symbol tables/etc and runs really fast,
no need to reinvent the wheel

------
j_baker
Nit: Not all languages have a separation between statements and expressions.
Most lisps just have expressions.

~~~
nimble
Bigger nit: If lisps just have expressions (and not statements), then they
clearly have a separation between the two :).

------
edtechdev
You can also learn a lot by studying (and even contributing to) other open
source compilers. Here are about 100 javascript-based compilers, including
tools for compiler writers at the bottom:
[https://github.com/jashkenas/coffee-script/wiki/List-of-
lang...](https://github.com/jashkenas/coffee-script/wiki/List-of-languages-
that-compile-to-JS)

