
Lisp in fewer than 200 lines of C - jfo
https://carld.github.io/2017/06/20/lisp-in-less-than-200-lines-of-c.html
======
sago
Writing your own Lisp-ette is a brilliant evening or weekend project,
regardless of the language. It's some of the simplest non-toy parsing you can
attempt, a bit of light data structure work, and understanding eval/apply is
80% of the work in implementing it. I would highly recommend anyone to have a
go, and try not to follow existing code too closely: figure out the problems
in your language of choice.

The post identifies some of it's own weaknesses (memory handling, errors),
which are quite C specific. Or at least easier to handle in other languages,
where you can punt those issues to the host language runtime. But it will be a
fun extension to fix them (a job for the second evening / weekend of coding ;)
)

But, imho, the beauty of writing a Lisp is that there are a bunch of things
you can do from there, some more difficult, but several are achievable step-
by-step in a day or a few days each. I'd first add a few special forms more
than the OP (quote, if, progn, math operations), then my suggestions:

1\. Defining names (if you haven't already), both let and define special
forms.

2\. Lambdas.

3\. Tail call optimisation (I suggest this not because it's an optimisation,
this Lisp doesn't need optimising, but because TCO is a bite-sized extension.)

4\. Lexical scoping of lambdas.

5\. Continuations. call/cc

6\. And if you're really brave (or skilled, or just masochistic), macros.

I was encouraged to do this as a new grad student, and it was one of the most
fun and educational experiences I remember. I didn't get as far as macros back
then, but implementing call/cc was a definite pivot point in my programming
competence.

~~~
DonaldFisk
I found none of these particularly difficult. (I've never attempted 3. or 5.
though.) Functional values, however, are difficult. They work effortlessly in
interpreted code, but in compiled code you're faced with the upwards funarg
problem:
[https://en.wikipedia.org/wiki/Funarg_problem#Upwards_funarg_...](https://en.wikipedia.org/wiki/Funarg_problem#Upwards_funarg_problem)

A solution is needed if you want lazy evaluation.

~~~
sago
How did you do lambdas and lexical scoping without 'functional values'?

~~~
DonaldFisk
That's explained in
[https://en.wikipedia.org/wiki/Funarg_problem#Downwards_funar...](https://en.wikipedia.org/wiki/Funarg_problem#Downwards_funarg_problem)

In addition to return address and dynamic link, you need to store a static
link in each stack frame.

------
piinbinary
After reading Peter Norvig's post about a lisp implementation [0], I decided
to write one in Python.

I got something working in 65 lines [1]

[0] [http://norvig.com/lispy.html](http://norvig.com/lispy.html)

[1]
[https://gist.github.com/jmikkola/b7c6c644dff1c07891c698f0a52...](https://gist.github.com/jmikkola/b7c6c644dff1c07891c698f0a527a890)

~~~
Wohlf
Thanks for this, I recently did the same and it's nice to have others work to
compare to.

------
akashakya
If you like this, you might like Lisp interpreter written in assembly in a
single file. It is one of the best commented code ever written imo.

[https://github.com/marcpaq/arpilisp](https://github.com/marcpaq/arpilisp)

~~~
_sdegutis
Since this is written in assembly is it much faster than a C version since
this one can manage its own stack frames and stack variables and such? I
always imagined that’s the case and that a lisp implemented fully in assembly
would be the trick to a super fast lisp that can complete with Go.

~~~
ci5er
LISP was created a long time ago. Assembly was the weapon of choice. Thinking
Machines, Symbolics and Macsyma might already say: We did that. But, uh, no.

~~~
klmr
The first LISP implementation was famously done in machine code, not assembly.

------
kindfellow92
Oof, all the macros are broken:

    
    
        #define is_space(x)  (x == ' ' || x == '\n')
        #define is_parens(x) (x == '(' || x == ')')
    

Should be

    
    
        #define is_space(x)  ((x) == ' ' || (x) == '\n')
        #define is_parens(x) ((x) == '(' || (x) == ')')
    

Probably doesn’t matter in practice for this. It could end up being a nasty
source of bug later on in the project.

~~~
geofft
That still evaluates x twice, which can also be a source of bugs. I usually
take one of two approaches: either decide that this is a weekend hack and
using the macros whenever the expansion isn't obvious in my head is a sign of
too much complexity, or use this GCC extension:

    
    
        #define is_space(x) ({ typeof(x) y = x; y == ' ' || y = '\n'; })
    

(Or in this case, turn it into an actual function and let the compiler figure
out optimization.)

~~~
kindfellow92
GCC extensions make my brain hurt :( Should just use C++ at that point:

    
    
        template<class T>
        bool is_space(const T & x) {
            return x == ‘ ‘ || x == ‘\n’;
        }

~~~
pjmlp
Even better if you make it:

    
    
        template<class T>
        constexpr bool is_space(const T & x) {
            return x == ' ' || x == '\n';
        }
    

Debuggable, type safe and same performance as straight C code.

~~~
IvanVergiliev
And throw an `inline` in there just to be more likely to end up with something
macro-like.

~~~
atilaneves
`constexpr` is implicitly `inline`:
[http://en.cppreference.com/w/cpp/language/inline](http://en.cppreference.com/w/cpp/language/inline)

~~~
gpderetta
orthogonally, templates are also implicitly inline.

Then again, inline does not mean what most people think.

------
SonOfLilit
If you like short interpreter implementations, you might like this:
[http://code.jsoftware.com/wiki/Essays/Incunabulum](http://code.jsoftware.com/wiki/Essays/Incunabulum)

~~~
tromp
or this interpreter of a lazy lambda calculus based language in 25 lines of
(obfuscated) C:

[http://www.ioccc.org/2012/tromp/tromp.c](http://www.ioccc.org/2012/tromp/tromp.c)

[http://www.ioccc.org/2012/tromp/hint.html](http://www.ioccc.org/2012/tromp/hint.html)

------
krat0sprakhar
Really cool! A more complete implementation in 1000 lines of C for getting
started: [http://www.buildyourownlisp.com](http://www.buildyourownlisp.com)

~~~
whistlerbrk
Can't recommend it enough, I learned a lot from this tutorial. I hope to go
back and add a lot more languages features.

------
jstanley
While this is very cool, it has at least one buffer overflow vulnerability.
There is no bounds checking in gettoken().

Also, the talk about pointers being aligned to "8 bit boundaries" I think
means 8 _byte_ boundaries. Memory is not bit-addressable (at least, not in C,
on anything popular).

But I don't mean to detract from the project! It is very cool nonetheless :)

------
BMorearty
That's fun. Here's Jisp, my Lisp implementation in a tiny bit of JavaScript.
It supports declaring functions, interop with JavaScript, quoting a list,
variable arg lists, and more:
[http://www.ducklet.com/jisp/](http://www.ducklet.com/jisp/).

~~~
TimTheTinker
Have you taken a look at ClojureScript? You might find it interesting.

------
lisper
Lisp with full lexical closures in ~100 lines of Python:

[http://www.flownet.com/ron/lisp/l.py](http://www.flownet.com/ron/lisp/l.py)

The interpreter itself is 48 lines.

------
raphlinus
See also Ben Lynn's implementation[1] in about 100 lines of Haskell.
Admittedly it's not a totally fair comparison because it's relying on
Haskell's runtime, but it's still an excellent demonstration of Haskell's
power and expressiveness compared with a lower level language.

[1]:
[https://crypto.stanford.edu/~blynn/lambda/lisp.html](https://crypto.stanford.edu/~blynn/lambda/lisp.html)

~~~
vog
To be fair, it is not just that Haskell is a "high-level language", but that
it is a language in the ML family of languages. ML means "Meta Language" and
was specifically designed to be used to implement (other) languages.

------
mweibel
Awesome article, thanks for that. As someone who almost never even looks at C
code, this was very understandable with the inline comments.

What's the reason for using macros instead of real functions? Is this an
optimization because macros get inlined at compile time? Does this really
bring a lot of value?

~~~
klmr
> Does this really bring a lot of value?

In this particular case, none whatsoever. It’s egregious abuse of macros.

------
agrafix
This motivated me to hack together a Haskell implementation, but with a little
better error handling :) [https://github.com/agrafix/micro-
lisp](https://github.com/agrafix/micro-lisp)

------
mar77i
Things that bugs me: cast to (long) when they should use intptr_t.

EDIT: And gettoken() should check against buffer: index < sizeof token.

EDIT 2: And I'd store the tag in a separate variable, because bit abuse in a
pointer is plain and simply asking for problems.

------
woadwarrior01
Reminds of this book[1] I’d read a couple of years ago.

[1]: [http://www.buildyourownlisp.com/](http://www.buildyourownlisp.com/)

------
Sir_Cmpwn
Oof, this is full to the brim of bad C practices. Use macros way less
liberally, don't do this ridiculous pointer tagging thing, and leverage more
of the stdlib.

------
petethepig
For everyone interested in learning about these types of things, check out
Make-A-Lisp project [0]. There's more lines of code, but also more features.
The guide is awesome and split into several self-contained steps.

[0]
[https://github.com/kanaka/mal/blob/master/process/guide.md](https://github.com/kanaka/mal/blob/master/process/guide.md)

------
martyalain
You could also have a look at this essay,
[http://lambdaway.free.fr/workshop/?view=lambdacode](http://lambdaway.free.fr/workshop/?view=lambdacode),
following Peter Norvig's lispy, written in less than 300 lines of plain
Javascript and working in any web browser.

Your opinion is welcome. Alain Marty

------
raldi
Toward the end of `print_obj()`, we see:

    
    
            if (is_pair(cdr(ob))) {
              printf(" ");
              print_obj(cdr(ob), 0);
            }
    

How could this `if` statement ever evaluate to false? We already verified that
`cdr(ob) != 0`, and the CDR can never be a plain old string, so isn't this
`if` superfluous?

~~~
rurban
Nope. It still can be an atom. Only if it's a pair (i.e. a cons with two
cells, not just one) it prints the next.

~~~
raldi
Even if that were possible (and with this codebase, I don't think it is), I
don't see any code in this function that would print the atom in such a case.
It seems like it would just print nothing.

~~~
rurban
Look harder, it does. There are only cons (pair) and atoms.

~~~
raldi
The author just confirmed, the if-statement does nothing:
[https://github.com/carld/micro-lisp/issues/9](https://github.com/carld/micro-
lisp/issues/9)

------
rurban
In the eval the dynamic intern("if") ... for all interned symbols should be
moved to compile-time global storage of those interns. Otherwise it looks
good. In reality one would use more tag bits, no just one. Typically 3.

------
oneplane
But what about the other way around? Would it be possible to get a C compiler
written in Lisp in ~200 lines as well? I mean, not a linker and macro
processor etc, just the compiler to get objects.

------
sigjuice
Nice project that I will spend some time studying. However, _fewer than 200
lines_ is not really a virtue, IMHO.

------
asveikau
No calls to free() to match calloc(), no overflow checking in gettoken() and
then I read this:

> a program with missing or unmatched parenthesis, unresolved symbols, etc
> will likely just result in something like a segmentation fault.

I understand this is just a fun thing to hack on, but this is an irresponsible
way to write software. I hope no one here is reading this and thinking it's
how they should be writing C.

~~~
ac2u
The author isn't responsible if people read it and think it's how they should
be writing C, especially when the objective of writing lisp in as little C as
possible is communicated.

So why is it fair to label it irresponsible?

~~~
chii
it's not irresponsible, but copy/paste proliferation does exist, and happens
surprisingly often with code posted online like this. When you write code for
posting online, you'd want to consider this problem.

~~~
sillysaurus3
If you have to worry about the entire world whenever you do anything, you'll
never get anything done.

------
Kenji
If Lisp is so easy so implement, is it also easy to learn? (I am familiar with
Haskell)

------
Areading314
This is more of historical interest than of technological interest...

~~~
ColinWright
I think you are mistaken. There is much to learn from implementations such as
this, and the techniques used form the basis of much more complex systems. The
technology remains very relevant.

FWIW, I didn't downvote you, as I don't downvote someone simply because I
believe they're wrong, or because I disagree with them.

~~~
sparkie
While I agree that it's not just historical interest - there's still much that
can be learned from Lisp - this implementation is certainly not the place to
learn it. There's no garbage collection, which is really the biggest concern
of a proper lisp implementation in a language like C. Without GC, it's a
gimmick implementation. There's no practical use for it.

~~~
ColinWright
There is much to be learned from incomplete, toy implementations. Not least,
the interested reader can see if they can extend the existing implementation
to include the missing features.

Insisting that learners/students should only ever read and study complete,
perfect implementations is, I think, a mistake. I've learned a great deal from
studying, and subsequently improving and extending, implementations that are
imperfect and incomplete.

