
I Built a Lisp Compiler - kristianp
https://mpov.timmorgan.org/i-built-a-lisp-compiler/
======
dap232
Me too! ;-)

Back in 2006 –when I was studying CS at Rollins College– I wrote a Scheme
(subset of Lisp) interpreter that also shows a visual representation of linked
lists and function calls. You can check it out here:
[http://davidpilo.com/pvts/](http://davidpilo.com/pvts/)

The interpreter was written in Java and while it does not support the full
R5RS grammar it supports quite a bit of it (see
[http://davidpilo.com/pvts/language.html](http://davidpilo.com/pvts/language.html))

Cheers!

~~~
msla
> Scheme (subset of Lisp)

There are people who'd fight you over that one. :)

From Wiki:
[http://wiki.c2.com/?IsSchemeLisp](http://wiki.c2.com/?IsSchemeLisp)

> One of the interesting debates in Lisp circles is the question of whether or
> not SchemeLanguage should be considered part of the LispLanguage family.
> There are more than a few Lispers (and perhaps a few schemers) who think
> that the LispSchemeDifferences are sufficiently large that Scheme should not
> be considered to be a "Lisp". This topic has spread across many pages, so
> this page has been created to quarantine this particular HolyWar.

And one interesting reason why not:

> The philosophies of the Lisp and Scheme communities have diverged quite a
> bit. SchemeLanguage focuses much more on FunctionalProgramming; CommonLisp
> on metaprogramming and multi-paradigm programming (especially OO with
> CommonLispObjectSystem).

------
chc4
I did a MAL implementation too, and am currently working on an LLVM compiler,
but dear god I'd never frame it as "spent x years learning to make a
compiler".

MAL is easy, and straightforward - I'm not saying this to devalue your work
(great job!) but to point out that the language is by design simple to
implement and doesn't require any prior knowledge of how an interpreter works.
People who want to write their own implementation should try that right now,
and definitely not be scared off by "I spent 10 years working on this".
Turning that interpeter into a compiler is an extra step on top.

~~~
mamcx
> People who want to write their own implementation should try that right now,
> and definitely not be scared off by "I spent 10 years working on this".

Certainly, but the problem with implementing languages is the HUGE cliff on
complexity after your first "calculator" or lisp.

You can make a simple interpreter in hours, even minutes. Then, suddenly, you
get AMBITIOUS.

That is what take years!

P.D: I'm also in the hunt for a relational language
([http://tablam.org](http://tablam.org)), and I now in the what, 3 years of
it? (Whoa! I search for my old mentions of it and the first I found is from
2013!
[https://www.clubdelphi.com/foros/showthread.php?t=84835&high...](https://www.clubdelphi.com/foros/showthread.php?t=84835&highlight=crear+lenguaje))...

~~~
jasim
"Become a spiritual attempt at the dbase/foxpro family of database-oriented
languages, in the sense manipulate data(bases) is natural and integrated"

This is a personal dream for me as well. The commenters in this thread
(@gavinpc, @vidarh) will also agree with you:
[https://news.ycombinator.com/item?id=18634555](https://news.ycombinator.com/item?id=18634555)

~~~
mamcx
How I miss that thread?

Anyway, if you wanna join the dream, I will not complain ;)

------
Dangeranger
Glad you wrote this set of experiences up.

I’ve followed BuildYourOwnLisp[0] in the past, so it’s cool to see something
that is more focused on the compilation of Lisp rather than its implementation
as a language.

[0] [http://buildyourownlisp.com](http://buildyourownlisp.com)

~~~
blu42
Ha! If I only knew about 'Build Your Own Lisp' three months ago!

I needed a simple language as a vehicle for a compiler talk I'm preparing for
this summer, so I hacked along an extremely reduced LISP (a 'Non-LISP', as I
call it), whose C++ implementation from scratch came out at about 1K -- 1.2K
LOC with the AST optimizations I meant to demonstrate for the talk. Self-
contained code here -- no libs or (much) STL:
[https://github.com/blu/tinl](https://github.com/blu/tinl)

OP, thank you for sharing Tim Morgan's work -- it's a work of love!

~~~
e12e
> whose C++ implementation from scratch...

You're aware of clasp?

[https://github.com/clasp-developers/clasp](https://github.com/clasp-
developers/clasp)

Which builds (in part) on embeddable common lisp:

[https://gitlab.com/embeddable-common-lisp/ecl](https://gitlab.com/embeddable-
common-lisp/ecl)

See around 20-30 minutes:
[https://youtu.be/8X69_42Mj-g](https://youtu.be/8X69_42Mj-g)

~~~
blu42
Nope, I'm seeing clasp for the first time -- Christian Schafmeister's talk was
extremely nice to listen to!

I did come across some small LISP implementations at the early stages, but by
that time I already had the AST builder done. Maybe because I didn't actively
search for LISP implementations, as I didn't need a 'proper' LISP per se, more
of a DSL for quickly writing ASTs of arbitrary complex computational
expressions. Those ASTs were the final goal ; )

------
mafm
Nice! Lots of really practical advice.

I wish I'd been given that advice before I wrote a compiler. Back then, the
best advice was in Richard Bornat's _Understanding and Writing Compilers_ and
the dragon book - useful, but not a walkthrough.

------
th0br0
> Writing C code and trying to keep it indented was a bit of a pain and I wish
> I would have done something else. I believe some compilers write ugly code
> and then “pretty it up” with a library before writing it out. This is
> something to explore!

There's a tool for this nowadays: `clang-format`.

~~~
eatonphil
+1 for clang-format. I have been using it as a beautifier on generated code in
a TypeScript to C++ compiler.

There are some other features I wish it did for me in cleaning up my generated
code. For example it doesn't remove superfluous parenthesis. It doesn't remove
unused labels. It doesn't remove superfluous semi-colons (a single line of
just a semi-colon). And so on. (I should, of course, just be building up an
AST and pretty-printing the AST instead.)

~~~
pfdietz
clang-tidy?

~~~
eatonphil
Haven't looked into it. Thanks!

------
misframer
Great post! I started writing a Lisp interpreter in Go with the mal guide but
switched to Norvig's _(How to Write a (Lisp) Interpreter (in Python))_ [0]
since it was more approachable (IMO) as a Lisp beginner. I'm looking forward
to writing a compiler next like the author.

[0] [http://norvig.com/lispy.html](http://norvig.com/lispy.html)

------
afranchuk
Nice write up! It coincides a bit with my progression: I first wrote a scheme
interpreter, then a scheme to LLVM IR compiler, and now I'm working on a
custom lisp JIT compiler (the JIT bit plays well into macro expansion). It's
interesting how that path naturally unfolds.

~~~
aidenn0
I'm curious what you mean by "the JIT bit plays well into macro expansion"
since most lisps will expand macros AOT rather than JIT.

------
nitros
Same here, one of my long ongoing projects is a scheme compiler following the
same ideas as chicken ([https://www.call-cc.org/](https://www.call-cc.org/))
and cyclone
([https://github.com/justinethier/cyclone](https://github.com/justinethier/cyclone)).

The compiler is written in rust which generates C (that's currently dependent
on being compiled by GCC or Clang). [https://github.com/nitros12/some-scheme-
compiler](https://github.com/nitros12/some-scheme-compiler)

------
bogomipz
The author states:

>"I saw an x86 assembler written in Ruby which intrigued me, but the thought
of working with assembly gave me pause."

Might anybody know the name of this Ruby-based assembler and/or where to find
it?

~~~
NikkiA
Could be Wilson:

[https://github.com/seattlerb/wilson](https://github.com/seattlerb/wilson)

------
fjfaase
Lately, I experimented with a 'functional' script language, that uses lazy
evaluation for generating C code and does not do any concatenation. It has a
build-in debugger, which lets you break at output locations, which proved
quite handy when debugging code that generates code. See:
[https://github.com/FransFaase/cyclonedds/commits/xtypes_ts](https://github.com/FransFaase/cyclonedds/commits/xtypes_ts)

------
codr77
Lisp is a better choice than most for the first compiler/interpreter, but I
would recommend starting even further down the complexity scale with Forth [0]
since it lowers the risk of running out of motivation/loosing track of the
goal.

[0] [https://gitlab.com/sifoo/snigl](https://gitlab.com/sifoo/snigl)

~~~
eatonphil
Scheme is wayyyy simpler than Forth IMO (even when you include macros and
call/cc). I wrote a minimal stack-based language years ago and I still cannot
get a handle around Forth. The inner interpreter and immediate words? Even
after reading Let over Lambda, Forth is greek to me.

But, to be fair, getting to the point of being able to implement a recursive
fibonacci algorithm is not too difficult taking either path.

~~~
kazinator
Forth is really a kind of assembly language for a virtual machine, and not
really a higher level language. It looks higher level because it doesn't have
registers.

Register machine languages are "catenative" also. We can take a "mov eax, ebx"
and catenate that with "move ecx, $foo" in any order we want. We are
constrained, though, because different sections of code use different input
and output registers (or combinations of input and output registers and stack
locations and such). Forth makes those conventions uniform, and that leads to
some degrees of freedom which give it a higher level flavor.

The thing to do is to make some higher level language that uses Forth as a VM;
then you have understandable code.

------
aasasd
I wonder if the approach of making a compiler from an interpreter leads to
lots of suboptimal code.

One point I can think of is, in an interpreter you're presumably keeping track
of program's vars in your own data structures instead of just throwing
pointers around. Though, this might be unavoidable with a dynamic source
language.

~~~
whitten
It certainly leads to code for a different purpose. In an interpreter you need
to store enough information so you can choose the right subroutine to call at
runtime. This must also include enough information as inputs to let the
subroutine produce the right results. Generally I think of this as a reference
to code and references to data, and a references to where to store the
results. For a compiler you need all that information to decide what code to
emit but you don’t need to keep it around when the emitted code is used,
unless you want to debug or replicate decisions at runtime that were already
resolved at compile time.

For example, if you had an index into a block of memory that you could prove
was only large enough to address elements in the block, there is no need to
check at runtime that it won’t be used to create out of range references for a
compiler. If the same operation was done by an interpreter you might still
check because the code that does the interpretation may not be solely used for
that instance of the indexing/addressing operation, as you can guarantee with
code produced by a compiler.

------
dudouble
Thank you for sharing this, it highly encourages me to embark on a similar
journey.

------
karma_fountain
Hello! Can I just ask, how did you learn to do the incremental bit of this?

------
cwt8805
since rely on gcc, why bother tcc

~~~
rurban
tcc is much faster, like 20x faster. You don't need any gcc optimization,
you'll do the optimization by yourself in your compiler.

You'd link to libtcc.a and don't need to worry anymore

~~~
cwt8805
I mean when I execute "./malcc --compile hello.mal hello", not rely on gcc to
generate executable file will be good.

