Hacker News new | comments | show | ask | jobs | submit login
Lisp in fewer than 200 lines of C (carld.github.io)
443 points by jfo 10 months ago | hide | past | web | favorite | 102 comments



Writing your own Lisp-ette is a brilliant evening or weekend project, regardless of the language. It's some of the simplest non-toy parsing you can attempt, a bit of light data structure work, and understanding eval/apply is 80% of the work in implementing it. I would highly recommend anyone to have a go, and try not to follow existing code too closely: figure out the problems in your language of choice.

The post identifies some of it's own weaknesses (memory handling, errors), which are quite C specific. Or at least easier to handle in other languages, where you can punt those issues to the host language runtime. But it will be a fun extension to fix them (a job for the second evening / weekend of coding ;) )

But, imho, the beauty of writing a Lisp is that there are a bunch of things you can do from there, some more difficult, but several are achievable step-by-step in a day or a few days each. I'd first add a few special forms more than the OP (quote, if, progn, math operations), then my suggestions:

1. Defining names (if you haven't already), both let and define special forms.

2. Lambdas.

3. Tail call optimisation (I suggest this not because it's an optimisation, this Lisp doesn't need optimising, but because TCO is a bite-sized extension.)

4. Lexical scoping of lambdas.

5. Continuations. call/cc

6. And if you're really brave (or skilled, or just masochistic), macros.

I was encouraged to do this as a new grad student, and it was one of the most fun and educational experiences I remember. I didn't get as far as macros back then, but implementing call/cc was a definite pivot point in my programming competence.


> Writing your own Lisp-ette is a brilliant evening or weekend project, regardless of the language.

Just be careful about scope creep. I started one as a weekend project five years ago and I'm still not finished ;)


I found none of these particularly difficult. (I've never attempted 3. or 5. though.) Functional values, however, are difficult. They work effortlessly in interpreted code, but in compiled code you're faced with the upwards funarg problem: https://en.wikipedia.org/wiki/Funarg_problem#Upwards_funarg_...

A solution is needed if you want lazy evaluation.


> They work effortlessly in interpreted code, but in compiled code you're faced with the upwards funarg problem

No it's not about interpreted or compiled. It's about using the native stack or a heap for stack frames. Compiled code and interpreters can both use either, so that's orthogonal.


Yes, to solve the upwards funarg problem you can't rely on a single stack. The problem is that using the heap for everything instead comes with a heavy computational cost, and using the heap only when you need it means you need to identify those circumstances, which isn't trivial.


How did you do lambdas and lexical scoping without 'functional values'?


That's explained in https://en.wikipedia.org/wiki/Funarg_problem#Downwards_funar...

In addition to return address and dynamic link, you need to store a static link in each stack frame.


Presumably by implementing only the simpler "downwards" funargs: the ability to pass functions into a function scope but never return them.


5 is do-able by an undergrad, though the first time you encounter the concept it can be difficult to grapple how you would implement it.


If you go with (hygenic) fexprs instead of macros, they're both easier to implement and more expressive, at the expense of performance.

Check out John Shutt's thesis for more information on hygenic fexprs: https://web.wpi.edu/Pubs/ETD/Available/etd-090110-124904/unr...


Here's a small implementation of Kernel in around 700 lines of Haskell [https://github.com/vito/hummus/tree/master/src/Hummus]. Note that it's not Kernel proper, as it uses delimited continuations instead of the guarded undelimited continuations in the Kernel report, plus there's a few other differences and some bugs.

Anyone looking for a more complete implementation, check out klisp [http://klisp.org]. It's not 100%, but has some stuff not included in the Kernel report, some of it lifted from R6RS. The main thing it's missing which it desperately needs is an FFI.


On the subject of Haskell, what's the shortest Haskell (possibly Haskell subset) interpreter anyone has written?


Thanks for the tip. If it wasn't the end of the weekend, I'd be tempted to break out emacs.


Another fun one is a Forth interpreter. I tried to make one in Rust once, I can't say I actually succeeded but it's fun to tinker with. It is simple enough that you can probably write a bad one in assembly without that much assembly knowledge.


Or Brainfuck. It's fun and only takes half an hour or so.


Not sure why you're downvoted (people don't like swear words?). As a "my first interpreter" project Brainfuck is really quick, fun and could motivate you to explore the topic further.


Agree.

I'm writing Scheme R5RS in Kotlin (https://github.com/kovrik/scheme-in-kotlin) and have implemented everything except macros (6).

Have no idea how to beat them.


Just think of them as defining functions that take s expressions in and spit s expressions out before “normal” runtime evaluation and which hence only see built in symbols.


Yes, I get that (in general).

But then there are things like hygiene, performance and some tricky edge-cases.

And I couldn't find any standard (and simple) algorithm to implement macros (preferably written in something other than Scheme itself).

Still trying to wrap my head around.


"This paper describes a modified form of Kohlbecker’s algo- rithm for reliably hygienic (capture-free) macro expansion in block-structured languages, where macros are source-to- source transformations specified using a high-level pattern language."

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.464...


Lexical scoping is essential if you want to use the Lisp for anything essential. If you start with dynamic scope for everything and the system develops any real complexity, then switching to lexical scoping becomes very painful. Best to do it right the first time.


7 typing and Hindley-Milner inference


After reading Peter Norvig's post about a lisp implementation [0], I decided to write one in Python.

I got something working in 65 lines [1]

[0] http://norvig.com/lispy.html

[1] https://gist.github.com/jmikkola/b7c6c644dff1c07891c698f0a52...


Thanks for this, I recently did the same and it's nice to have others work to compare to.


If you like this, you might like Lisp interpreter written in assembly in a single file. It is one of the best commented code ever written imo.

https://github.com/marcpaq/arpilisp


Another small lisp implementation: http://piumarta.com/software/lysp/

Also, another, more interesting Lisp by the same author: (http://piumarta.com/software/maru/). Maru is basically a lisp where some of the core functions like eval and apply are extensible from user code. There's basically a global map of types to evaluators and applicators, with some functions for you to register your own types and get the evaluation behavior you want.


It seems like arpilisp is inspired by jonesforth[0]. Although not directly stated, the style is similar and the acknowledgements mentions Richard Jones. Anyone interested in implementing simple programming languages might also want to take a look at jonesforth.

[0]: https://github.com/nornagon/jonesforth


Arpilisp author here.

Yes, jonesforth definitely inspired and influenced me; that's why Jones is first in the Acknowledgements section.

If you're curious, I keep a list of single-file implementations of programming languages (including jonesforth):

https://github.com/marcpaq/b1fipl


Someone once suggested to me the easiest way to "bootstrap the world" is to write a Forth implementation in assembly, and then write your Lisp in Forth.


as there is a project aiming to "bootstrap the world" https://savannah.nongnu.org/projects/stage0/ and they hand wrote a FORTH http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage2/f... and a compacting garbage collecting lisp http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage2/l...

and their opinion was that it was easier to do the lisp than it was to do the FORTH. Although right now they are trying to improve their C Compiler prototype https://github.com/oriansj/M2-Planet before they convert it to assembly


Since this is written in assembly is it much faster than a C version since this one can manage its own stack frames and stack variables and such? I always imagined that’s the case and that a lisp implemented fully in assembly would be the trick to a super fast lisp that can complete with Go.


An interpreter written in assembly is still an interpreter. Yes, it may have control over stack frames. But so does a compiler, and the code generated by a compiler doesn't incur interpretation overhead. If you want performance, compliation is the way to go. And that's what fast Lisps do.


Arpilisp author here.

If you were to implement the same Lisp in C then compare, then maybe the assembly variant would be faster for the reasons you mention. Or maybe not.

Also, I modeled arpilisp after the original Lisp. That's barely a first step, and possibly the wrong first step, for anything non-trivial, including applications requiring a "super fast lisp that can compete with Go."

But I didn't write arpilisp for performance. I wrote it to learn and share. Enjoy!


LISP was created a long time ago. Assembly was the weapon of choice. Thinking Machines, Symbolics and Macsyma might already say: We did that. But, uh, no.


The first LISP implementation was famously done in machine code, not assembly.


Oof, all the macros are broken:

    #define is_space(x)  (x == ' ' || x == '\n')
    #define is_parens(x) (x == '(' || x == ')')
Should be

    #define is_space(x)  ((x) == ' ' || (x) == '\n')
    #define is_parens(x) ((x) == '(' || (x) == ')')
Probably doesn’t matter in practice for this. It could end up being a nasty source of bug later on in the project.


That still evaluates x twice, which can also be a source of bugs. I usually take one of two approaches: either decide that this is a weekend hack and using the macros whenever the expansion isn't obvious in my head is a sign of too much complexity, or use this GCC extension:

    #define is_space(x) ({ typeof(x) y = x; y == ' ' || y = '\n'; })
(Or in this case, turn it into an actual function and let the compiler figure out optimization.)


GCC extensions make my brain hurt :( Should just use C++ at that point:

    template<class T>
    bool is_space(const T & x) {
        return x == ‘ ‘ || x == ‘\n’;
    }


Even better if you make it:

    template<class T>
    constexpr bool is_space(const T & x) {
        return x == ' ' || x == '\n';
    }
Debuggable, type safe and same performance as straight C code.


Constexpr is not necessary. Only required if you want to ensure that a variable is initialized with a pre-computed value. The compiler will optimize in either case if it can, regardless of the constexpr attribute.


> Only required if you want to ensure that a variable is initialized with a pre-computed value

Yep.


And throw an `inline` in there just to be more likely to end up with something macro-like.


`constexpr` is implicitly `inline`: http://en.cppreference.com/w/cpp/language/inline


orthogonally, templates are also implicitly inline.

Then again, inline does not mean what most people think.


“inline” is more about linkage than about actual inlining. The compiler will in-line regardless of that attribute.


The function can be trivially implemented in C, since you don’t need a generic input argument type. No idea why OP chose macros in the first place.


I'd argue that is_space() should be isspace(), the standard from <ctypes.h>. Pretty sure it wouldn't make the semantics of the language worse, but it's one less wheel re-invented and makes the code a smidgen easier to read since there's one less concept to learn in it. Also one less thing to debug ...


Heh. Make that <ctype.h>, of course. D'oh. :)


If you like short interpreter implementations, you might like this: http://code.jsoftware.com/wiki/Essays/Incunabulum


or this interpreter of a lazy lambda calculus based language in 25 lines of (obfuscated) C:

http://www.ioccc.org/2012/tromp/tromp.c

http://www.ioccc.org/2012/tromp/hint.html


I love how it demonstrates how recursion/loops and shape/dimension based programming can be both thin, fast and generic


Really cool! A more complete implementation in 1000 lines of C for getting started: http://www.buildyourownlisp.com


Can't recommend it enough, I learned a lot from this tutorial. I hope to go back and add a lot more languages features.


That's fun. Here's Jisp, my Lisp implementation in a tiny bit of JavaScript. It supports declaring functions, interop with JavaScript, quoting a list, variable arg lists, and more: http://www.ducklet.com/jisp/.


Have you taken a look at ClojureScript? You might find it interesting.


Lisp with full lexical closures in ~100 lines of Python:

http://www.flownet.com/ron/lisp/l.py

The interpreter itself is 48 lines.


While this is very cool, it has at least one buffer overflow vulnerability. There is no bounds checking in gettoken().

Also, the talk about pointers being aligned to "8 bit boundaries" I think means 8 byte boundaries. Memory is not bit-addressable (at least, not in C, on anything popular).

But I don't mean to detract from the project! It is very cool nonetheless :)


See also Ben Lynn's implementation[1] in about 100 lines of Haskell. Admittedly it's not a totally fair comparison because it's relying on Haskell's runtime, but it's still an excellent demonstration of Haskell's power and expressiveness compared with a lower level language.

[1]: https://crypto.stanford.edu/~blynn/lambda/lisp.html


To be fair, it is not just that Haskell is a "high-level language", but that it is a language in the ML family of languages. ML means "Meta Language" and was specifically designed to be used to implement (other) languages.


Awesome article, thanks for that. As someone who almost never even looks at C code, this was very understandable with the inline comments.

What's the reason for using macros instead of real functions? Is this an optimization because macros get inlined at compile time? Does this really bring a lot of value?


> Does this really bring a lot of value?

In this particular case, none whatsoever. It’s egregious abuse of macros.


This motivated me to hack together a Haskell implementation, but with a little better error handling :) https://github.com/agrafix/micro-lisp


Things that bugs me: cast to (long) when they should use intptr_t.

EDIT: And gettoken() should check against buffer: index < sizeof token.

EDIT 2: And I'd store the tag in a separate variable, because bit abuse in a pointer is plain and simply asking for problems.


Reminds of this book[1] I’d read a couple of years ago.

[1]: http://www.buildyourownlisp.com/


For everyone interested in learning about these types of things, check out Make-A-Lisp project [0]. There's more lines of code, but also more features. The guide is awesome and split into several self-contained steps.

[0] https://github.com/kanaka/mal/blob/master/process/guide.md


You could also have a look at this essay, http://lambdaway.free.fr/workshop/?view=lambdacode, following Peter Norvig's lispy, written in less than 300 lines of plain Javascript and working in any web browser.

Your opinion is welcome. Alain Marty


Toward the end of `print_obj()`, we see:

        if (is_pair(cdr(ob))) {
          printf(" ");
          print_obj(cdr(ob), 0);
        }
How could this `if` statement ever evaluate to false? We already verified that `cdr(ob) != 0`, and the CDR can never be a plain old string, so isn't this `if` superfluous?


I don't know about this implementation in particular, but in general a cons cell is just an object with two slots in it. You can use it to represent a list (by putting another cons or nil in the cdr), but you can also use it to represent other things. For example, a pair, by putting an arbitrary object in the car and cdr. This is a pretty common technique, see "a-list" for one application.


Nope. It still can be an atom. Only if it's a pair (i.e. a cons with two cells, not just one) it prints the next.


Even if that were possible (and with this codebase, I don't think it is), I don't see any code in this function that would print the atom in such a case. It seems like it would just print nothing.


Look harder, it does. There are only cons (pair) and atoms.


The author just confirmed, the if-statement does nothing: https://github.com/carld/micro-lisp/issues/9


Can you give me an example program that, when fed as input, would cause this if-statement to evaluate to false?


In the eval the dynamic intern("if") ... for all interned symbols should be moved to compile-time global storage of those interns. Otherwise it looks good. In reality one would use more tag bits, no just one. Typically 3.


But what about the other way around? Would it be possible to get a C compiler written in Lisp in ~200 lines as well? I mean, not a linker and macro processor etc, just the compiler to get objects.


Oof, this is full to the brim of bad C practices. Use macros way less liberally, don't do this ridiculous pointer tagging thing, and leverage more of the stdlib.


Nice project that I will spend some time studying. However, fewer than 200 lines is not really a virtue, IMHO.


No calls to free() to match calloc(), no overflow checking in gettoken() and then I read this:

> a program with missing or unmatched parenthesis, unresolved symbols, etc will likely just result in something like a segmentation fault.

I understand this is just a fun thing to hack on, but this is an irresponsible way to write software. I hope no one here is reading this and thinking it's how they should be writing C.


It's a lot longer if it's done properly. You should use an array of dotted pairs to store your list elements, out of which you build a free list, from which you allocate by calling cons. To garbage collect, you mark all objects in use (on the stack, or accessible from the symbol table) and then you add the unmarked conses to the free list. You should not be using malloc or calloc at all, certainly not to implement cons.

Below is some of the relevant code in C/C++ from the virtual machine of Emblem (the Lisp dialect I'm using to implement inter alia my visual dataflow language, Full Metal Jacket). To keep things short I haven't included initialization or the garbage collector. cons_op takes the top two stack entries (one on the stack, the other in tos), conses them, and returns it in tos.

-------------------

    typedef struct ListStruct {
        void *head;
        void *tail;
    } *List;

    static struct ListStruct consTable[NUM_CONSES];

    static List freeStore;

    #define hd(X) (((List)(X))->head)
    #define tl(X) (((List)(X))->tail)
    #define NIL (void *)&consTable[0]
    #define null(X) ((X) == NIL)

    static void **stack;
    static void *tos; /* Top of stack register. */
    static long sp;

    static void cons_op()
    {
        List x;

        if (null(freeStore)) { cons_gc(); }
        x = freeStore;
        freeStore = (List)tl(freeStore);
        hd(x) = stack[sp++];
        tl(x) = tos;
        tos = x;
    }
----------------------

Alternatively, instead of C you could use a garbage collected language such as Java, C#, or Go, and then not have to worry about memory management.


The author isn't responsible if people read it and think it's how they should be writing C, especially when the objective of writing lisp in as little C as possible is communicated.

So why is it fair to label it irresponsible?


I would add that there is already hundreds of implementations of Lisp in C and hundreds of little tutorials that are extremely similar to this online anyway, it's not like one more is going to make a difference.

Yes, his C code could be a lot better, but at the end of the day it really doesn't matter and nobody is going to be using this for anything important.


> The author isn't responsible if people read it and think it's how they should be writing C,

Seems we disagree on the basic premise then. If someone learns the wrong thing based on my example I view myself as responsible. (I have not been a perfect example all the time either. We owe it to ourselves and others to always improve on that front.)


it's not irresponsible, but copy/paste proliferation does exist, and happens surprisingly often with code posted online like this. When you write code for posting online, you'd want to consider this problem.


If you have to worry about the entire world whenever you do anything, you'll never get anything done.


If people always wrote software responsibly we'd have a great deal fewer interesting projects on GitHub


Would it really be so hard to do this exact same thing in a memory safe language? Would that stifle the author's creativity in any way?


People often do what they love with the tools they love. Complaining that someone wrote a hobby project with C is akin to complaining that your neighbor made a decorative baseball bat with her lathe instead of a 3D printer.


Writing Lisp in 200 lines of Haskell wouldn’t be nearly as interesting. The whole point is (presumably) that it’s fun to code golf in C because it’s not very expressive. Relax, no one is going to use the OP’s code to run a pacemaker.


"but this is an irresponsible way to write software"

Segfault is a totally safe way to terminate a process on a modern desktop os. Besides - there is no 'correct' way to write software.


Sorry, this is wrong. A segfault means you got lucky and your bad memory access hit an unmapped or otherwise invalid page. Other times the program will keep running with incorrect results. Many such issues represent exploitable bugs.

In any case it's a bad issue and should not be a normal failure mode for a syntax error.


"Other times the program will keep running with incorrect results. Many such issues represent exploitable bugs."

None of which matter if it's a prototype, running on a developers machine.

If you want memory safety you run the program through valgrind anyway with a large input dataset. And write unit tests. And integration tests. And so on.

The baggage of production quality software development environment is so high it easily stifles the joy of quick and dirty prototypes.

The main use of prototype is to facilitate understanding. This is the most critical constraint, whose needs drive over anything else.

Besides, who on earth is going to exploit a few hundreds of lines of code a developer runs on his or her own machine?


I don't know if you noticed this but C hackers are a minority on here. Thus it's not unreasonable to think people get an impression of how C is done from submissions here. We owe it to the craft to serve as good examples. It's not as burdensome as you suggest.


I see your point, but respectfully disagree that an incomplete implementation should stop a submitter from sharing his or her discoveries. This is an aesthetic position - I think interesting ideas are more important than clean implementations (It is left as an exercise to the reader... is an ok position, IMO).


I think people were referring to the free-free approach where you just quit a short lived program without deallocating anything.


If Lisp is so easy so implement, is it also easy to learn? (I am familiar with Haskell)


[dead]


somebody's machine learning algorithms need better training.


Wrong thread?


Look at "denisehilton"'s post history, it's all spam


This is more of historical interest than of technological interest...


I think you are mistaken. There is much to learn from implementations such as this, and the techniques used form the basis of much more complex systems. The technology remains very relevant.

FWIW, I didn't downvote you, as I don't downvote someone simply because I believe they're wrong, or because I disagree with them.


While I agree that it's not just historical interest - there's still much that can be learned from Lisp - this implementation is certainly not the place to learn it. There's no garbage collection, which is really the biggest concern of a proper lisp implementation in a language like C. Without GC, it's a gimmick implementation. There's no practical use for it.


There is much to be learned from incomplete, toy implementations. Not least, the interested reader can see if they can extend the existing implementation to include the missing features.

Insisting that learners/students should only ever read and study complete, perfect implementations is, I think, a mistake. I've learned a great deal from studying, and subsequently improving and extending, implementations that are imperfect and incomplete.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: