
Lessons learned building a toy compiler - k4rtik
https://jaseemabid.github.io/2017/07/04/compiler.html
======
throwaway2016a
I love seeing articles like this... my compiler design course in college was
one of my favorite. Also, one of the most useful. Parsers and lexers are
useful in so many places besides just code compilers.

With that said, I think there is one small issue in the article:

> The quintessential first step in any compiler is parsing the source string
> into an Abstract Syntax Tree

If there is a quintessential first step in writing a compiler it is doing
lexical analysis with a lexer to break the program up into tokens. Then using
a parser to create the AST. While you could go straight from the raw text to
parsing, it makes it a lot easier if you lex it first.

~~~
auggierose
Well you can do both at the same time, it is called local lexing:
[https://arxiv.org/abs/1702.03277](https://arxiv.org/abs/1702.03277)

~~~
throwaway2016a
I did mention in my post that the separate lexing step is not strictly
necessary. I was just pointing out that a lexer follows the definition of
"quintessential" more than parser does.

That paper is very recent. It sounds like it is still doing both steps just
concurrently and with feedback. It isn't skipping the lexing step altogether.
The benefit of separation is decreased code complexity and easier debugging.

~~~
auggierose
Yes, both stages are still present in that approach, and that is a good thing,
because clearly it is useful to separate lexing and parsing. It's just that
the two stages are intertwined and thus lexing can be dependent on the parsing
context. So it is not really going from "raw text to parsing".

------
lower
It's nice to have a small example of how to compile to LLVM, but the compiler
is a bit more limited than what the blog post makes it appear.

It's not quite `a compiler for simply typed lambda calculus', but only for a
small fragment without higher-order functions. One currently cannot write
lambda terms that take functions as arguments.

I was curious how the compiler represents closures and manages memory, mainly
because I'm looking for a small example of how to do garbage collection in
LLVM. But it turns out that the parser doesn't allow function types yet and
the compiler itself doesn't implement closures yet.

~~~
jaseemabid
You are right, the compiler cannot handle much of higher order functions and
closures yet. I have hinted how to do that with lambda lifting and closure
conversion and I might get it working in the next few weeks. Maybe a part 2
for the blog post.

~~~
lower
Yeah, do it! :)

------
darioush
I think the most underrated part is: "untyped lambda calculus is harder"

~~~
jaseemabid
You read my mind :D

------
jaseemabid
Author of the post here. AMA :)

~~~
bArray
Hello there! Great post :)

I never took a course in writing compilers at University but have always been
interested - where would be the best place to start for somebody interested in
doing this? To dip their toes into the water if you will?

~~~
jaseemabid
Just tinkering with the tools and writing some code helped me a lot more than
the university course I had 6 years ago. There are several beginner friendly
online resources now including

1\. Write a scheme in 48hrs in Haskell

2\. Kaleidoscope tutorial by LLVM developers

3\. [http://www.buildyourownlisp.com](http://www.buildyourownlisp.com)

4\. [http://createyourproglang.com](http://createyourproglang.com)

~~~
catchmrbharath
I would also add
5.[http://dev.stephendiehl.com/fun/](http://dev.stephendiehl.com/fun/)

------
louithethrid
I remember building funny little DSL-Tools with flex and bison.
[https://github.com/westes/flex](https://github.com/westes/flex)

I used it for AutoSar applications, where you create tailored embeded programs
from a DSL specification.

------
krallja
Perhaps before diving into FFI, you could write a small runtime library with
the basic I/O you need (`readline` and `puts`, perhaps?)

~~~
jaseemabid
I did exactly the same last night and you don't need the echo $? hack anymore.

------
tu7001
Great post! Interested, how it's related to, for example, Scheme or Lisp.

