Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What is the “minimal” set of primitives needed for a Lisp interpreter? (cmu.edu)
67 points by shawndumas on Dec 8, 2014 | hide | past | favorite | 23 comments


There's been a lot of discussion about the minimal set of primitives to build an interpreter in the Forth community (for instance https://groups.google.com/d/msg/comp.lang.forth/KHw28PKZXUk/.... ) Particularly interesting is eForth, a Forth made to be easily ported while still having acceptable performance, and Minimal ANS Forth, a heavily-commented ANS compliant Forth written in Forth (both of these use significantly more than the minimum for practical reasons.) The answer, for both Lisp and Forth, is somewhere between 8 and 11. In "real world" usage, the set of primitives largely depends on the hardware you're targeting and how well you need it to perform.


The Forth exercise is interesting because you end up with a super simple instruction set. If someone ever invents a machine that can do operations really fast, but where implementing complex operations is prohibitive, such exercises could have practical value.


On this site they say there are 10 LISP primitives. The primitives are:

    atom 
    quote 
    eq
    car 
    cdr 
    cons 
    cond 
    lambda 
    label 
    apply
http://hyperpolyglot.wikidot.com/lisp#ten-primitives

Steve Yegge reckons there are seven or five:

    Its part of the purity of the idea of LISP: you only need the seven (or is 
    it five?) primitives to build the full machine.
http://steve-yegge.blogspot.com/2006/04/lisp-is-not-acceptab...

McCarthy had five:

    atom
    eq
    car
    cdr
    cons
http://www-formal.stanford.edu/jmc/recursive.pdf

McCarthy then added another four:

    quote
    cond
    lambda
    label
So according to McCarthy - the answer is nine.

See here for a more in depth discussion:

http://stackoverflow.com/questions/3482389/how-many-primitiv...


The answer will vary significantly depending on the degree of interpretation required (i.e., how much of your Lisp you want to lower down to something simpler prior to interpretation). That's why some lists include lambda, while some are only listing primitive functions.


Shen[1] is a powerful language built upon Kl, which have only 46 primitives[2].

[1]: http://www.shenlanguage.org/ [2]: http://www.shenlanguage.org/learn-shen/shendoc.htm#The Primitive Functions of K Lambda


pg has an implementation of Lisp primitives that was described in McCarthy's 1960 paper.

http://ep.yimg.com/ty/cdn/paulgraham/jmc.lisp


You only need lambda (and primitives for doing I/O). You can define numbers, lists, and all other data types using only lambda. For example you can define booleans like this:

    (define true (lambda (a b) a)
    (define false (lambda (a b) b)
    (define if (lambda (b a b) (b a b)))
Now you can do:

    (if true 4 9) -> returns 4
You can define logical `and` and `or` like this:

    (define and (lambda (p q) (if p q false)))
    (define or (lambda (p q) (if p true q)))
To make that work right for (if b (...) (...)) you'd need lazy evaluation though, otherwise both the then and else branch would always be evaluated regardless of whether the condition is true or false. Or you could wrap the then and else branch in a lambda, and then call the lambda that is returned by `if`.

- So booleans are the type forall r. r -> r -> r. "To use a boolean, you have to give it a value of type r for the then branch, and a value of type r for the else branch, and then you get a value of type r back".

- You can define natural numbers as functions of type forall r. r -> (r -> r) -> r. "To use a natural number, you should give a value of type r for n=0, and a function of type r -> r for n=k+1, then you get a value of type r back".

- You can define pairs of type (A,B) as forall r. (A -> B -> r) -> r. "To use a pair of type A,B you have to give a function that takes an A and a B and returns an r, then you get an r back".

- You can define sum types A+B (either an A or a B) as forall r. (A -> r) -> (B -> r) -> r.

- You can define least fix points (inductive types) of a type constructor F as forall r. (F r -> r) -> r.

- You can define greatest fix points (coinductive types) of a type constructor F as exists s. (s, s -> F s).

See also http://en.wikipedia.org/wiki/Church_encoding

Church encodings are a general way to define data types in lambda calculus purely based on lambdas. Church encodings are still not fully understood. For example the relation between Church encodings and mathematical induction is not fully understood yet (in type theory terms, the relation between non-dependent elimination / recursion principle and dependent elimination / induction principle). Also it's not yet fully understood how to take a fix point of a contravariant type constructor. Another thing that's as far as I know not fully understood is how to do Church encodings for higher inductive types. This goes to the very foundations of mathematics, and researchers in type theory think that in the coming years we will have breakthroughs that will make type theory a viable replacement for set theory as the basis of mathematics, with the benefit that mathematical proofs will be verified by computers.


I like the definition of cons/car/cdr in terms of lambda.

A cons cell is just a function you can call to get back the car or cdr. Once you have 'if' and any two constants, you can do:

(define mcons (lambda (a b) (lambda (method) (if (= method :car) a (if (= method :cdr) b)))))

(define (mcar p) (p :car))

(define (mcdr p) (p :cdr))


There is also an encoding based directly on lambda:

    (define (cons a b) (lambda (f) (f a b)))
    (define (car c) (c (lambda (a b) a)))
    (define (cdr c) (c (lambda (a b) b)))
No `if`, no `=`, no `:car/:cdr`. Only lambda!


Nice.

It looks like that's effectively using true/false (as defined via lambda) as the two constants and inlining the "if true" and "if false"?


Yes, that's easier to see if you wrote your version like this:

    (define mcons (lambda (a b) (lambda (c) (if c a b))))
    (define (mcar p) (p true))
    (define (mcdr p) (p false))
In type theory terms this expresses the isomorphism (A,B) ~= (b:Bool) -> (if b then A else B), where the latter is a type called a 'dependent function'. This is a function where the type of output depends on the value of the input. For example if you had (cons 42 "hello") then that would be a function that takes a boolean, and if the boolean is true then it returns the integer 42, and if the boolean is false then it returns the string "hello". So the type of the output depends on the value of the input. This is what dependent types are all about, but that hasn't yet penetrated into mainstream programming languages.

Church pairs on the other hand express the isomorphism (A,B) ~= forall r. (A -> B -> r) -> r. For that I find it easier to think in terms of producers/consumers. You can ask yourself "what is a consumer of a pair of type (A,B)?" -> answer: it's a consumer of an A and a B. Therefore we can define cons like this:

    (define (cons car-value cdr-value) (lambda (consumer) (consumer car-value cdr-value)))
Instead of thinking of a cons as a pair, we think of a cons as a function that takes a cons-consumer and it passes its data to the consumer. What kind of consumers can we write? For example, a consumer that sums the components:

    (define summer (lambda (car-value cdr-value) (+ car-value cdr-value)))
Now given a cons, we can sum it like this:

    (define somecons (cons 4 5))
    (somecons summer) -> 9
Now we can define a function sum that sums the components of a cons like this:

    (define (sum c) (c summer))
Similarly, we can define "cdr-ers" and "car-ers":

    (define car-er (lambda (car-value cdr-value) car-value))
    (define cdr-er (lambda (car-value cdr-value) cdr-value))
Now we define car and cdr like this:

    (define (car c) (c car-er))
    (define (cdr c) (c cdr-er))
The same sort of reasoning works for church booleans, church numerals, church lists, church binary trees, and so forth. That is, you could try to think about this in a dynamically typed framework, but for the more difficult church encodings that becomes almost impossible. At least for me, types really help to bring order to the madness of lambda. If I try to think about what a type like forall r. (F r -> r) -> r means in dynamic terms that causes instant brain melt, from the perspective of types it's a short and innocent expression.


Excellent post. Your type-based approach is a very useful tool for thinking about things which I find difficult. Thanks for that.


Correction (can't edit any more):

    (define if (lambda (c a b) (c a b)))


Lambda is a very complicated thing, with extremely twisted name substitution rules. A minimal system needs only one combinator - http://en.wikipedia.org/wiki/Iota_and_Jot


lambda, quote, cons, car, cdr, atom, cond, eq, label

Or with pattern-matching: cons, lambda, label, quote

And really, label is just immediately-invoked lambda, so: cons, lambda, quote

http://bodil.org/building-lisp/#0


The struggle to understand what is a primitive has to take into account what the instruction set architecture (ISA) is for the machine that will be running the system.

This article doesn't even consider the idea that the LISP may be running on any exotic hardware at all.

It might be instructive to look at the ISA for the Lisp Machine from the 1980s.


Why would what is a primitive depend on the ISA? If you need car, cdr and cons, it doesn't matter whether each is a single instruction or then thousand, does it?


Which is quite different to the theoretical minimum number which they are presumably looking at


Wouldn't the minimal number be 1: NAND?


Or if you want to build things on a purely functional basis all you need is S and K:

http://en.wikipedia.org/wiki/SKI_combinator_calculus

It's straightforward to convert lambda calculus expressions to to SK(I) combinators - the fact that you can get recursion using the Y combinator from SK is trivial to demonstrate but still surprises me!


Then again, you could use a single universal combinator, http://en.wikipedia.org/wiki/Iota_and_Jot .

I find it weird that one can express all of combinator calculus using one function.


Cool - I had heard that there was as a single combinator scheme but hadn't seen a definition of it before.

I had some fun as an undergraduate "compiling" proper code down to, in the worst cast, SK combinators and that was entertainingly inefficient (resulting in a need to become acquainted with implementing garbage collection). I suspect that U might be even more fun...

NB I did my SK stuff on a multi-user Unix minicomputer in '87/'88 so modern machines probably makes this a bit more realistic to play with.


Yes and No.

You could build the lisp primitives out of NAND, but you still need those primitives to build the lisp implementation, which is what the orignial question is. You could also write the lisp primitives without direct reference to NAND (excluding NAND inside of the CPU).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: