Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What does a developer need to know to build their own Lisp from scratch?
81 points by stolen_biscuit 34 days ago | hide | past | favorite | 66 comments
Assuming the developer is competent in their language of choice and want to implement lisp in that language. What's at the heart of lisp that makes it so simple and elegant?

This isn't a traditional approach, but you might consider using a JSON parser in your favorite language and then implementing `eval` and `apply` over that data structure [0]. It saves you from dealing with the mundane lexer/parser stuff and lets you dive straight into an actual interpreter:

    ["define", ["factorial", "n"],
        ["if", ["=", "n", 0], 1,
             ["*", "n", ["factorial", ["-", "n", 1]]]]]

    ["display", ["factorial", 10]]
Here, JSON strings are symbols (identifiers). In a language like Python or JavaScript, you wouldn't even need to use the JSON parser. Just skip straight to doing eval and apply over the list/Array, and you can use dict/Object as your environment to hold definitions. You could be up and running in a few hours, and I think that would feel satisfying.

Then again, maybe that's too ugly to consider. :-)

In addition to some of the other book and web site recommendations people have made (SICP, MaL, etc...), I think "Lisp in Small Pieces" by Christian Queinnec is really good if you want to take it further.

[0] https://wiki.c2.com/?EvalApply

A lisp parser is trivial! Don't live with that ugly syntax to save a page of code. It'd be very demotivating.

It's trivial for you, but are you sure you've got the OPs best interest at heart? Remember, you're trying to help someone else here.

I think the magic is in getting the interpreter to work, and depending on the personality, that might be what's motivating.

It's really trivial for everyone. To start with you only need three classes of characters: parentheses, whitespace, and everything else defines a symbol.

I bought the dragon book long ago. I got about halfway through and it was still on parsing! What a waste. That's not a problem for s-expressions.

> I bought the dragon book long ago.

That makes me think you might've forgotten what it's like to be a beginner. Imagine teaching a high school student and that every other word you're using doesn't make complete sense to them yet.

> That makes me think you might've forgotten what it's like to be a beginner.

Let's put things in perspective: the parser bit of putting together your own lisp is by far the easiest task of all.

Complaining that parsing lisp is hard to a beginner but everything else is accessible makes no sense at all. I mean, a parser for s-expressions requires what? Five or six terminal tokens and three or four non-terminal tokens? That's less than 50loc.

Assuming the developer is competent in their language of choice should rule out anyone who can't write a Lisp parser in their language of choice.

If you're not undertaking this to learn how to write a Lisp, which includes writing a parser, what's the point? Get this tribe 3 shit out of here...

> Assuming the developer is competent in their language of choice should rule out anyone who can't write a Lisp parser in their language of choice.

I didn’t write a s-expr parser for the minischeme interpreter I like to play around with but I did use re2c/lemon to generate a parser. Then I got Unicode working…

Couldn’t even imagine what it would take to hand write a lexer that does Unicode properly, even the scheme string and char functions call into generated functions (and a library for utf8 strings) since that stuff is complex.

Also don’t know what tribe 3 is, one of those things the cool kids talk about?

imagine thinking the difference between parsing a string and using a pre-parsed datastructure was "trivial for everyone" because you could count off how many classes of character were involved

one major difference is that you don't have to write the parser at all the other way

> It's trivial for you, but are you sure you've got the OPs best interest at heart?

It's trivial to anyone.

Furthermore, there are already s-expression parsers out there.

Suggesting JSON is simply poor, ill-thought out advice.

I had some trouble fitting strings into such a language because of using JSON strings for both identifiers and LISP strings. Then again, even being stuck with just numbers, this would be completely sufficient to bootstrap a parser to be able to use nicer syntax.

Yeah, I agree that part is kind of a bummer. It's really just a compromise to get up and running quickly. And it's not necessarily the best approach for everyone.

> JSON strings are symbols (identifiers)

Wishing JSON strings were symbols doesn't make them so.

In Lisp, strings and symbols have different type. If you have a symbol, since it is not a string, it is not equal to any string in the system.

On the other hand, two different symbols which are unequal can have the same name. This is if at least one of them is uninterned or, in a Lisp which has symbol packages, they are interned in different packages.

Symbols are compared by identity, not by their name.

> Wishing JSON strings were symbols doesn't make them so.

You misunderstand (or GP was unclear). They don't mean that JSON strings are literally symbols, but in this prototype on-the-fly JSON DLS, the JSON strings represent symbols. A lisp string in this proto-lang would be "\"hello world\"" instead.

He didn't misunderstand. He likes starting arguments, and he's really just waiting for an opportunity to tell you about the silly flavor of lisp he invented.

Being helpful or on-topic is not his goal.

This is on topic for implementing a lisp parser, though. Not sure what your point beyond this ad hominem is.

His point is simple. Argument poster was wrong, isn't being helpful, and has a pattern of doing this.

By the by, there was no ad hominem there. Ad hominem is when you insult someone as a way to avoid what they're saying. What this person is saying has not been avoided.

Probably put the fallacies away: https://laurencetennant.com/bonds/bdksucks.html

Saying that "he just wants to tell you about the silly flavor of lisp he invented" isn't exactly constructive either, especially when it's relevant.

Alternatively, if you want slightly better syntax (something like `(display (factorial 10))`, just like a "normal" lisp), use this regex in a function called `tokenize` and you're half-way to making your own lisp:

Stolen from https://github.com/kanaka/mal/blob/master/process/guide.md#s...

I'm assuming that uses back-references to handle the matching parens, but I don't like reading regexes (even my own). It's funny that paren matching used to be an example of why you needed to use a push down automata instead of a regular language.

I like your idea for making a prettier lisp quickly, but depending on the language/library, isn't this going to give the a user a bunch of nested `match` objects instead of nicely nested and printable lists?

No free lunch - I guess you trade one ugliness for another.

> It's funny that paren matching used to be an example of why you needed to use a push down automata instead of a regular language.

Expressions with parentheses are not a regular language; expressions which can match them are not regular expressions, even if they extend regular expression syntax.

"Regular expression" became the name of a software feature, which retained its name as it was extended beyond regular sets.

Just like "web browser" became the name of a kind of program, and that name still sticks even though it's now a monstrous application platform, not just for browsing.

I honestly can't tell if you intentionally take stuff out of context so you can argue against it, or if your reading comprehension and logic skills are really just that bad.

I said, "instead of a regular language". You even quoted it. You don't even read the bits that you cut and paste to argue against.

> expressions which can match them are not regular expressions

And that's the damned point. Modern "regular expressions" aren't just "regular" any more. They've had back references and other extensions for a while now:




Please - don't reply to me until you take the time to read and understand what I've said.

If "paren matching used to be an example of why you needed to use a push down automata instead of a regular language", isn't that intended to say that this is not the case today? If we change "language" to "expression", then I agree and don't have anything to add. The scope of "regular expression" has increased, in informal usage, but (I strongly suspect) "regular language" still means the same thing as before.

GP calls it a tokenizer (aka lexer) so presumably it doesn't match parens, it just returns a flat array of tokens.

That's disappointing. I don't have any stats, but I imagine 90% of beginners writing an interpreter (much less a compiler) start a lexer and parser and quit before getting to the good stuff.

Skipping the tedium until you've got "hello world" and "factorial working" makes sense to me. You can always go fix the syntax later.

If you don't include the parser, "factorial working" is going to be the same evaluating just about any scripting language. You don't see half the value in Lisp's syntax until you're writing the parser.

Here's how I did it, I would say it's solid, approachable, and enjoyable:

Go through the first half of Crafting Interpreters [0]. And then try to complete mal - Make a Lisp [1]. That's it. You'll only need these two.

P.S. People also say good things about Build Your Own Lisp [2], but I didn't finish it because I find spending quite some time writing C doesn't make me feel I'm enjoying something elegant.

[0]: https://craftinginterpreters.com/ [1]: https://github.com/kanaka/mal [2]: https://buildyourownlisp.com/

I can't recommend Build Your Own Lisp - if there's anything particularly "profound" about Lisp, the book misses it by miles. For example, the book claims that adding a new feature requires "syntax, representation, parsing and semantics" [0], whereas one of the nice things about Lisp is that adding most sorts of features doesn't require new syntax or new representations (and "syntax" and "parsing" are fairly redundant).

The example also indicates how the book conflates language and implementation. It's perhaps not the most important thing to know if one is learning how to design and implement programming languages, but unfortunate as CL and Scheme have independent standards; and separating semantics and implementation makes it possible to discuss how differing implementation techniques can be used for the same language. The language implemented also uses dynamic scoping, which is plain weird nowadays.

The C code is of questionable quality, and the author doesn't cover how to improve your chances of writing working C (e.g. using a debugger, testing with sanitizers, etc). This is more of an issue, because the interpreter copies and mutates in a weird way which appears quite easy to mess up, rather than using a garbage collector. The book uses a parser library by the author, so using e.g. the Boehm GC library doesn't seem too bad, but arguably doing your own memory management is more representative of C code that isn't a language implementation. The approach also makes mutation hard to reason about; it's hard to say what a particular update is actually going to affect.

[0] https://buildyourownlisp.com/chapter10_q_expressions

Norvig's Scheme interpreter is a good starting place as well. Using Python as he did someone can do it in a couple of hours. It might take a bit longer if translating to another language at the same time. Someone can get a taste of what's involved and then move on to those more elaborate implementations.


People who don't have a solid couple of years working in Lisp invariably do a cargo-culted botch job of implementing it.

You see issues in their code like confusing the concept of "atom" and "symbol", or symbols just being strings, evaluation level confusions and things of that nature.

There is more than one elegance in Lisp; what people find elegant is subjective. Some people are charmed by a meta-circular interpreter. Some are charmed by interned symbols: two symbols are equal or not based on just some machine word comparison. Some are charmed by the way lists are made of garbage-collected pairs so you can endlessly rearrange lists like pouring water from one container to another, without having to worry about the storage allocation. Some are charmed by being able to quote any part of a program to turn it into a data literal.

None of these things are exactly simple under the hood.

Many elegant aspects in a mdoern Lisp woudn't have been found in the classic Lisp 1 or 1.5. I think it's elegant that a type error (or any other) results in an exception that can be handled. I think unwinding a stack is elegant. Structural pattern matching is elegant. Bignum integers are elegant. File compilation with fine-grained evaluation control is elegant.

Elegant is about UX, and elegance can come off as simplicity (especially if we examine just one area at a time rather than the integrated whole); but underneath the UX there is a lot of huffing and puffing under the hood.

Why is Lisp/Scheme a good candidate for something to implement oneself? Because it has hardly any syntax, so writing the parser is relatively easy compared to writing the parser for other languages.

> What's at the heart of Lisp that makes it so simple and elegant?

In terms of implementing it, probably the Cons Cell. The insight being that the heart of Lisp is pairs, not lists:



The parser for Scheme is simple because it doesn't have reader macros. The parser for common lisp can't really be simple (see [0] describing building a JSON syntax extension to common lisp within common lisp).

If by "lisp" people just mean s-expressions then yes s-expressions are really easy to parse.

[0] https://gist.github.com/chaitanyagupta/9324402

When a neophyte is excited about Lisp, they mean s-expressions and eval. We shouldn't ask them Lisp-1 or Lisp-2, or Scheme? Or reader macros. We should throw them a helping hand to expand their computational universe.

We should always seek to educate the pupil graciously.

At the simplest:

1. Parse and produce S-Expresseions ("read" and "print").

2. Use the object system of the implementation language as much as possible. If necessary, implement some lisp-specific types like the cons pair, empty list, symbol. First get it to work correctly, you can optimise later.

3. Implement all the "special forms" which can't be implemented using macros in the implementation language.

4. Implement any library functions that can't be implemented in lisp itself using the implementation language.

5. Implement "eval" (evaluates S-Expresseions). Now you've got a working interpreter.

6. Use your interpreter to evaluate macros. Use that to implement any additional "special forms".

7. Write any additional library functions in your own lisp.

That's it, at this point you have a working lisp implementation. There are many opportunities for optimisation, from lambda lifting and continuation passing style to hygienic macros and even just in time compilation.

Happy Hacking!

Not much really; I would recommend working through SICP [0], it is fun and in the end you know a lot and can move on from there to even more advanced topics.

Together with the YouTube recordings of the lectures about the book will give you many AHA moments about why it’s so elegant etc.

Including topics like types and a prolog like logic language.

[0] https://en.m.wikipedia.org/wiki/Structure_and_Interpretation...

Calling SICP "not much" must be quite an understatement. The book is 800+ pages (PDF version from Sara Bander). If you really work through all the exercises without checking solutions online, you are probably gonna be busy for a year or so, assuming you still got other things going on in live.

But I agree with recommending it to everyone, who wants to be educated about computer programming and writing beautiful, readable and stable code, as it introduces the reader to many concepts like abstraction barriers/layers and making things more readable by splitting into well named functions and elegant solutions to problems.

Reading part of SICP (the first 40% roughly) and really working on the exercises on my own, without looking up solutions early, was a breaking point in my computer programming self-education, as I learned lots of things, that I did not learn at university, where none such high quality material was used.

I should have split up that sentence I guess. I didn’t actually mean to say sicp is quick (although I’m wouldn’t say a year if you are already a good developer); implementing a scheme is quick to learn and do, however I don’t think you will intuit the elegance from it that way. After (a portion of) sicp, you probably will.

I don't recommend anyone wanting a quick intro to scheme or lisp or even making a scheme or lisp to read SICP.

If you enjoy SICP that's great but if you don't, don't let it stop you.

There's a book for just that.

The title is "Build Your Own Lisp" and it is free to read [0].

You learn C and build your own Lisp dialect.

[0]: https://buildyourownlisp.com/

I gave that a try but because I didn't have any experience with C before, I found it hard to learn both C and how to build my own lisp. Instead "mal" (discussed here: https://news.ycombinator.com/item?id=31547901) makes no assumption about what language you want to use, and instead explains everything conceptually so you can implement your lisp in any language you want.

The repository of mal also features a implementation for basically any type of language for reference, in case you get stuck.

Fun fact: mal originally was made to see if you could implement a lisp with just GNU Make macro language (spoiler: you can!), after the author saw someone else making a lisp language in bash.

Last time I read about the book (I have it and started working through it.), people did not recommend it (here on HN), due to how the author chooses to implement things later on in the book. Instead people recommended looking at MAL ("Make A Lisp") repos.

I started working through the book using Rust instead of C, which allowed me to parse things in a safe way, making sure, that my grammar covers all cases in a typesafe way. If anyone works through the book, I recommend avoiding the use of C, which is used in the book itself.

However, before reading part of the book, I already had watched part of the Minikanren uncourse, where Will Byrd writes a Scheme parser (well, the basic part of it, not implementing the whole standard), explaining everything quite well, which made me confident, that I could change the language I use to implement the things in the book and still get things right.

So my recommendation is to look at the Minikanren Uncourse videos. It should be in the first 4 videos of the series.

Thank you for your comment.

I did not knew about the existence of MAL.

I will check it out.

Moreover, in line of the original question asked, I would just suggest The Little Schemer. Here you not only learn the language and how to think recursively, but you also learn how to implement a lot of the in-built and library functions that comes with languages.

The Little Schemer was a life changing book for me.

I would also definitely recommend The Little Schemer. Big fan of the 10 rules here : ) It might not be the easiest to understand in some parts though.

the classic introduction to building a lisp (interpreter) in different programming languages is:


This is the best resource I've found as well, for creating a lisp from scratch. Thanks to mal, I created my own lisp in JavaScript in just a week of coding in my free time! Granted, I write Clojure every day for the last years, but I don't think it's really required, but certainly made it easier.

mal has also been "featured" on HN a couple of times in the past. Here are some discussions:

- 2015 - https://news.ycombinator.com/item?id=9121448

- 2017 - https://news.ycombinator.com/item?id=15226110

- 2019 - https://news.ycombinator.com/item?id=21670442

- 2021 - https://news.ycombinator.com/item?id=26924344

According to the schedule, we're bound for another frontpage feature of mal in 2023, looking forward to it :)

See how to fit a Lisp system into a boot sector of a floppy disk:


See also 'LISP from Nothing', a Nils Holm book:


All of Nils Holm's books are gems. Extremely approachable and fun. Both Crafting Interpreters and Ball's books owe a debt of gratitude to Holm.

Before Holm, there was https://compilers.iecc.com/crenshaw/

Look into “Make a Lisp” (MAL). Language agnostic step by step instructions for writing a fully featured interpreter, including tail recursion and reader macro support. Plus, examples in every language. It’s on github.

> It's on github

... at https://github.com/kanaka/mal

Fun fact: I got to work with Joel at a Clojure-based startup, LonoCloud (since acquired by ViaSat). Super sharp dude, and very helpful about MAL. Definitely recommend.

> What's at the heart of lisp that makes it so simple and elegant?

IMO, it's a synergy of several things. Trivial parsing*, homoiconicity, automatic memory management, dynamic types, first class symbols, first class everything†, extensible everything†, heavy reliance on anonymous functions, flexible scope rules.

All of those have since been adopted by other languages, but only Lisps have them all working together. (And not every Lisp has every feature.)

* ... until you consider various extensions

† almost

Oh, and I second the recommendation for Peter Norvig's Lispy. https://norvig.com/lispy.html

Or, if you want to do a low-level implementation, Quiennec's Lisp in Small Pieces.

Pyramid[1] was based on the last chapter of SICP, was executable after a couple weeks, and I had never written any lisp before(outside of a couple hours playing with Scheme once).

Most of that time was the code generator. An interpreter is much simpler and you could have something similar in a couple hours. Especially if you use a JSON parser to start, and switch to S-expressions once you get everything working. Or use an existing Lisp with a (read) primitive that knows the syntax.

As for "why Lisp?", it's a bit like how you never need to read the JSON spec to write JSON in practice, because they unified the syntax into a few core datatypes(primitives, arrays, objects). Besides the simple syntax, they also include a symbol table core concept that tells you how to interpret the minimal syntax.

In most languages, it's a lot of work adding a new feature to the language: Lexer to recognize token, backtracking parser, adding a new internal type, etc. In a Lisp interpreter, you just edit the symbol table to know about a new form and you're done.

Despite the core being simple, a practical Lisp environment is going to have thousands of primitives with a variety of effects on the runtime, and you'll need to look them up. Racket has about 1700[2] implemented in Chez Scheme, for example. But at least they're "only primitives", so there's uniformity in how they're used. Nothing like "What's the syntax for an anonymous function that takes 1 callback as argument that captures a local by reference and 3 other variables by value?"

[1] https://www.michaelburge.us/2017/11/28/write-your-next-ether...

[2] https://github.com/racket/racket/blob/master/racket/src/cs/p...

If by "lisp from scratch" you mean an s-expression interpreter that can run programs like recursive exponentiation or recursive fibonnaci, very little. You can do that in a low few hundred lines of code.

If you mean for the full power of a lisp then you still have to pick: for Scheme that might be call-with-cc and hygienic macros. For Common Lisp that might be regular macros and reader macros. Those take more code and thought.

IMO call-with-cc AND regular macros is what one should implement for toy language. Hygienic macros are too hard to understand let alone implement, while ordinary macros are more powerful (and maintainability does not matter much for toy language) and relatively easy to implement. And call-with-cc is too fun to skip.

> Hygienic macros are too hard to understand let alone implement, while ordinary macros are more powerful (and maintainability does not matter much for toy language) and relatively easy to implement.

Understanding hygienic macros isn’t all that hard but I agree implementing them is difficult. On multiple occasions I’ve tried to plan out how to get them to work but (ironically) everything I can find only explains the theory and not how to practically implement them.

Start by dealing (or not) with "dotted pair" notation up front. eg (a . b) (a b . c) etc.

This confronts a whole bunch of issues directly and immediately. What is a list, really? How do I handle recursion in my parsing--implementation language (fast--but maybe not tail recursive and weak to cycles) or lisp (slower--but probably infinite and handles circular lists)? How do I deal with garbage (long parses kick up a lot of garbage)?

Dealing with dotted pairs is the difference that means you understand implementing lisp rather than are just toying with it.

Code is data.

LISP builds upon a very simple, yet powerful data structure (pairs that can be arranged into binary trees) that has a direct correspondence with its syntax. This means that it is very simple to write a very basic LISP interpreter in almost any programming language, and to use it to bootstrap a more fully-featured one. It's even quite feasible to write one in assembly language, and the very first ones indeed have been implemented that way.

I have experimented with several lisp-like environments. Parsing s-exprs is simple. The choice of primitives, i.e. built-in functions which differ from those in Common Lisp or Scheme, can lead to interesting behavioral dynamics.

I have tended to implement more complex core data structures than cons-cells. E.g native strings, maps and arrays.

Suggest looking at newLisp and picoLisp for ideas on how non-traditional Lisps have been implemented.

University of Maryland's compilers class has the lectures/notes online. The course is about implementing a reduced (but still pretty big) subset of Racket. https://www.cs.umd.edu/class/spring2022/cmsc430/Notes.html.

I recommend Dmitry Soshnikov's video tutorials _Building an Interpreter from scratch_ [0]

[0] https://www.dmitrysoshnikov.education/p/essentials-of-interp...

If you read SICP you are good to go (mitpress.edu/sicp). Or you can read the code of an existing implementation

take a look at my scheme to c compiler written in Common Lisp for reference. I implemented a parser. https://github.com/Jobhdez/scheme-to-c

scheme is kindof the fix point of sequential programming languages

it eats itself into constancy


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact