
Build me a Lisp - KayEss
https://kirit.com/Build%20me%20a%20LISP
======
lisper
This article is chock-full of misinformation.

> an s-expression [is] a fancy term for a list of [cons] cells

No, it's not. An S-expression is a _serialization_ of a (single) cons cell,
whose elements might be other cells.

> We're using a vector instead [of cons cells], but the two are equivalent

No, they are not. When represented as cons cells, CDR is a non-consing O(1)
operation. When represented as vectors, CDR is a copying O(n) operation (and
because it necessarily makes a copy, the semantics are different if you
introduce mutation).

The fact that S-expressions represent cons cells and NOT vectors is crucially
important. It is the feature from which Lisp derives much of its power.

It is possible to make a Lisp-like language where the surface syntax
represents vectors instead of cons cells. In fact, this makes a useful
exercise. If you undertake this exercise you will come to know the answer to
the question: why has this idea (a vector-based Lisp) not gained more wide-
spread adoption?

> There are three different sorts of data structure that we have so far:
> Strings. S-expressions, and Cells

As above, S-expressions are not a data type, they are a serialization of cons
cells, i.e. they are a _mapping_ between cons cells and strings, and this
mapping has a very particular feature from which much of Lisp's power is
derived, which is that it compresses a linked list of cons cells into this:

(1 2 3 4)

instead of this

(1 . (2 . (3 . (4 . nil))))

This language is missing that core feature. (It's also missing _symbols_ ,
without which a language cannot rightfully call itself a Lisp. And first-class
functions. And macros.)

While I don't want to discourage any attempt to make Lisp more accessible,
it's important to actually get it right. What this article is describing is
similar in spirit to Lisp, but it's not Lisp. It's a different (toy) language
at a completely different point in the design space.

~~~
bitwize
> If you undertake this exercise you will come to know the answer to the
> question: why has this idea (a vector-based Lisp) not gained more wide-
> spread adoption?

It's gained incredibly widespread adoption. Clojure doesn't use cons cells at
all; lists are trees of vectors internally, and vectors and maps enjoy first-
class status, even as syntactic elements, right alongside lists. Clojure
enjoys far more business interest than Common Lisp or any other Lisp dialect,
meaning that non-cons-cell based Lisp has _won_.

Furthermore, the C++ programmers, being performance minded, have learned
something the Lisp guys still don't seem to grok: on modern architectures
where cache is fast and RAM is slow, the big-O inefficiency of using vectors
vs. linked lists is usually more than made up for by the speed gains of cache
locality, to the extent that for real-world data structures it almost _never_
makes sense to use a linked list rather than a vector.

Anyway, if you represent a list as a slice of an underlying vector, cdr is
still O(1).

~~~
lisper
> non-cons-cell based Lisp has won

In terms of usage, yes. Not in terms of adoption as an implementation
technique. If you _want_ a non-cons-cell-based Lisp (and no one actually
_wants_ this -- that code is vectors is not the reason people use Clojure) you
have only one option.

> vectors and maps enjoy first-class status,

They do in Common Lisp as well.

> even as syntactic elements

Vectors have a standard surface syntax in Common Lisp, and it's trivial to
define a reader macro for maps if you want one. But anything you would do with
a static map is probably better done with a CASE form.

> the big-O inefficiency of using vectors vs. linked lists is usually more
> than made up for by the speed gains of cache locality

All of this only matters when processing _code_ , which is usually a tiny
fraction of the CPU time. When processing _data_ , Lisp has vectors, and Lisp
programmers can and do use them.

> if you represent a list as a slice of an underlying vector, cdr is still
> O(1).

Yes, but then you still have to allocate memory to store the result.

------
madmax96
Really nice code, as someone who isn't caught up entirely with the recent work
that's gone into C++, it taught me some useful features!

The taxonomy isn't quite right for Lisp. It's actually simpler than the post.
It should be something like this (very minimalistically):

    
    
        <s-expression> ::= <atom> | ( <s-expression> . <s-expression> )
        <atom> ::= <number> | <symbol> | <string>
        ...
    

The difference between this and what the author presents is that everything is
an s-expression. s-expressions are either an atomic value or a pair of
s-expressions. Convenient notation `(a b c ... z)` called "list builder
notation" is provided so that we don't have to write `(a . (b . (c ... (Z .
NIL))))`.

~~~
amelius
Instead of shoehorning everything into a generic "pair" structure, wouldn't it
be much more powerful to build a separate type for everything?

~~~
harperlee
Well the point is that you do not _need_ to; it would just be an optimization
opportunity.

~~~
amelius
But types have more uses than just optimization.

~~~
harperlee
That is completely correct. In lisp, the fact that syntax is so flexible
enables the addition of a lot of functionality on top of the base language.
Types and related functionality (checks, enforcement, conversion, etc.) is one
of those things; but if everything is executed on top of pairs, there is a
performance penalty that you may avoid if you provide more efficient data
structures from the onset with a not-so-minimal implementation (such as
clojure providing vectors, where you dont search a list in linear time). My
“just” above meant that if you can do it slowly in the upper abstraction
level, providing it in the lower one is basically an optimization.

In any case in second thought what I said is not completely true, as there are
things you may need to do in compile time and the minimal implementation does
not provide macros.

------
svat
This is a really nice and illuminating post!

Especially cool how all evaluation is done by line 33. (Minor comment: Line 14
may benefit from using a different variable for the function instead of “c”.)

An “aside” comment, as the title mentions “literate C++” [Edit: This was
submitted as something like “Build me a Lisp: in less than 50 lines of
literate C++”]. This post exemplifies one aspect of literate programming (as
Knuth envisioned it), which is “let us concentrate rather on explaining to
_human beings_ what we want a computer to do.”

But there are two further features that Knuth-style LP systems have, which are
missing here:

• The ability to write your program / explanation in any order (ideally, an
order that is best for presentation, rather than an order demanded by the
compiler). In this case, the program is explained strictly in the order of
lines in the program, which imposes some awkwardness. (Not only in the
explanation, but also in the code: e.g. putting a #include on line 34 of 49.)

• Named chunks. So instead of writing

    
    
        int main() {
            try {
    

in one code block, and having to continue it later, you could write something
like, say,

    
    
        int main() {
            try {
                [[say hello world]]
            } [[handle errors]]
    

and later fill in the corresponding sections by name.

The original examples by Knuth are somewhat hard to read because they were
written in Pascal and the programs were quite low-level but there's a great
example available today (a book which won an Academy Award!), now free online
— [http://www.pbr-book.org/](http://www.pbr-book.org/) (random page:
[http://www.pbr-
book.org/3ed-2018/Light_Transport_I_Surface_R...](http://www.pbr-
book.org/3ed-2018/Light_Transport_I_Surface_Reflection/Direct_Lighting.html) )

But then again, the “no-tools” approach here does have the advantage of not
requiring any special build step and not requiring the reader to understand
any such conventions, so I guess that's a tradeoff. (They say one of the
reasons literate programming never took off is that there are as many
literate-programming tools as the number of people trying literate
programming; everyone ends up having to write the tool that works for them.)

~~~
saagarjha
Honestly, I found the code a bit hard to follow, mostly because everything was
so disjoint. It would be nice to have "folding" so I could move comments out
the way once I'm done with them, to allow me to look at the code as a whole.
Once I know what a snippet does, I can "name" chunks myself mentally–any
further labels, especially multi-lines ones that show up in between code, are
unnecessary.

~~~
svat
Your comment illustrates another reason literate programming never took off
:-) Until a reader has spent enough time reading and getting familiar with the
conventions, how to navigate, etc (in my case it was several dozen hours),
they experience a lot of resistance against simply reading the literate
program, and keep wanting to read “the real code”.

Ultimately a program is a set of instructions to a computer, and when we as
humans look at a program we're trying to conceptualize how these instructions
are organized. (Essentially, “translate” them into a mental map.) We have so
much more experience doing this with “real” code, knowing what we can ignore
and when, and being able to do this at well — e.g. with things like function
docstrings or public/private declarations, we're able to read them once and
then have them basically fade away from our consciousness so we don't read
them again. When faced with unfamiliar conventions, it takes a while to get
that ability again.

~~~
javajosh
Perhaps literate programs shine when they help us build that mental map. One
way is positively describing parts of the map with metaphor or intent (where
the OP shines). Another way is by discussing alternatives that weren't used.
Yes, this can be a lot of content and sometimes crowds out the code, as in
this example.

------
sokoloff
Here's a case where the previously submitted title, which included something
like "in less than 50 lines of literate C++", was more indicative/compelling
and I think more useful for news.yc.

I don't think of that as click-bait in the same way as "you won't believe #7",
but rather as additional information which truthfully and transparently makes
it more likely that I'll accurately gauge my interest in the linked article.

"Don't editorialize" in this case is a negative, but perhaps is globally
optimal.

------
tom_mellior
For whatever it's worth, here is the same in 24 lines of OCaml:

    
    
        type cell = Atom of string
                  | Sexpr of cell list
    
        let library = Hashtbl.create 1
    
        let rec eval = function
          | Atom s -> s
          | Sexpr [] -> failwith "Empty sexpr"
          | Sexpr (head :: args) ->
            let fname = eval head in
            match Hashtbl.find_opt library fname with
            | Some builtin -> builtin args
            | None -> failwith (String.concat " " [fname; " not found"])
    
        let main () =
          let prog = Sexpr [Atom "cat"; Atom "Hello "; Atom "world!"] in
          let prog2 = Sexpr [Sexpr [Atom "cat"; Atom "c"; Atom "a"; Atom "t"];
                             Atom "Hello "; Atom "world!"] in
          Hashtbl.add library "cat"
            (fun args -> String.concat "" (List.map eval args));
          Printf.printf "%s\n" (eval prog);
          Printf.printf "%s\n" (eval prog2)
    
        let _ = main ()
    

Using linked lists instead of vectors makes separating an application's head
and args (CAR and CDR, if you prefer) much easier than using clunky C++
iterators. This is because linked list cells are composed of cons cells.

------
seletz
If you enjoy reading code and want to know a bit more about lisp, maybe this
is interesting:

"Scheme 9 from empty space" \--
[https://t3x.org/s9fes/](https://t3x.org/s9fes/)

It's a scheme built in c using literate programming. You can even buy a
printed version.

Also, there's "Lisp in small pieces" if you want to go deeper --
[https://books.google.de/books/about/Lisp_in_Small_Pieces.htm...](https://books.google.de/books/about/Lisp_in_Small_Pieces.html)

------
mark_l_watson
Thanks, that is a very nice writeup. I have been experimenting a bit with
creating new Lisp-like languages in Racket and the idea of growing a language
to solve specific types of problems is powerful (and is the way I program in
Lisp, from the bottom up). Sorry if this is a bit off topic, but you might
enjoy the experiences of someone building their own Lisp Machine with a eZ80,
complete with hardware hacking support:
[https://youtu.be/Ad9NtyBCx78](https://youtu.be/Ad9NtyBCx78)

~~~
KayEss
Part of the motivation for writing this up is to explain internally about a
DSL used for testing web APIs, for example:
<[https://github.com/KayEss/fostgres/blob/master/Example/films...](https://github.com/KayEss/fostgres/blob/master/Example/films/film.t1.fg>)

I expect we'll extend use of this sort of mechanism much further the more
people understand it.

I'll watch that video, sounds interesting. A company I worked for nearly ended
up getting a Symbolics LISP machine for CGI in the early 90s. Would have been
pretty sweet I think.

------
drtse4
Nice post... even if before clicking through I expected a post about why
asking "build me a lisp" could have been a good interview question :)

As a side note, I wrote a similar post in Swift:
[https://www.uraimo.com/2017/02/05/building-a-lisp-from-
scrat...](https://www.uraimo.com/2017/02/05/building-a-lisp-from-scratch-with-
swift/)

------
zach43
nice post...you can implement a lisp parser in C++ relatively easily too due
to lisp's regular syntax

~~~
agumonkey
any nice cpp peg library ?

------
twoodfin
Can someone familiar with modern C++ explain the type mechanics of std::visit
on line 14? In particular, it seems the “auto” parameter of the lambda
expression could stand in for either type in the variant, dependent on its
runtime value, but I have always thought of “auto” as a compile-time
mechanism.

Is the “auto” in this case being filled in by some compiler-generated proxy
type that can dispatch any common operation available across all variants?

~~~
KayEss
The auto there works like a templated function would (except it's anonymous as
you'd expect from a lambda).

~~~
twoodfin
I’m still confused. Presumably you can’t pass an uninstantiated template to a
function, so what’s the type of the lambda expression?

It’s effectively a callable object with multiple overloads but I didn’t know
lambda syntax could create something like that.

~~~
KayEss
Lambdas have an un-utterable type, so the addition of a template like
mechanism in their arguments is something the compiler can do behind the
scenes and it all just works. Look for "generic" in this page
<[https://en.cppreference.com/w/cpp/language/lambda>](https://en.cppreference.com/w/cpp/language/lambda>)

------
whitten
For those who can explain the difference is this a LISP 1 or a LISP 2 ?

~~~
thibran
In a Lisp 1 function and variables share the same namespace.

    
    
        (defvar foo 5)   ; variable with the name foo
        (defun foo () 5) ; function with the name foo
    

In a Lisp 1 the defun would overwrite the previous variable declaration. In a
Lisp 2 though, both expressions coexist.

~~~
sedachv
Last week I came across a Usenet post from Kent Pitman (who helped create
Common Lisp, and coined the terms Lisp1 and Lisp2 in this paper:
[https://www.dreamsongs.com/Separation.html](https://www.dreamsongs.com/Separation.html))
which provides the best explanation I have seen on why Lisp2 is a good idea:

[https://groups.google.com/forum/#!msg/comp.lang.lisp/6TkfPLe...](https://groups.google.com/forum/#!msg/comp.lang.lisp/6TkfPLeWpk0/rAxTREHhlw0J)

Also as pointed out by KMP later in that thread, block and go tags are
separate namespaces, and so are type names. I like to say that it is really
Lisp1 vs LispN. Any language with macros and first-class identifiers is going
to be capable of having arbitrarily many namespaces.

------
SomethingOrNot
> Normally programming language blog posts get started with grammars and
> syntax and parsing and all sorts of stuff like that. I can't be bothered
> with all of that,

And I can’t be bothered with C++ for compiler construction.

