Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Guide for Implementing Common Lisp
82 points by HexDecOctBin 10 months ago | hide | past | favorite | 44 comments
I have written compilers and interpreters before, but I find it hard to understand how Common Lisp compilers work – what with their readtables and read macros and normal macros. It's hard to figure out how the compilation pipelines actually works, and how the process of implementing them goes (e.g., what subsets are implemented before the rest, etc.).

This got me thinking: is there any implementation guide on how a Common Lisp is/should be implemented?




This is a very approachable paper from 1990 on one way to do it with a C kernel bootstrapping to Common Lisp: https://www.softwarepreservation.org/projects/LISP/kcl/paper... Kyoto Common Lisp (KCL) is the ancestor of today's Embeddable Common Lisp (ECL).

SICL is probably the best modern version of CL written in CL from a design standpoint, even if it's not taking over SBCL's role anytime soon: https://github.com/robert-strandh/SICL It uses some fancy bootstrapping to have the whole language available early, e.g. their definition of class 'symbol is:

    (defclass symbol (t)
      ((%name :reader symbol-name)
       (%package :reader symbol-package))
      (:metaclass built-in-class))
vs SBCL:

    (define-primitive-object
        (symbol :lowtag other-pointer-lowtag
                :widetag symbol-header-widetag
                :alloc-trans %make-symbol
                :type symbol)
      ...
      (name :ref-trans symbol-name :init :arg)
      (package :ref-trans symbol-package
               :set-trans %set-symbol-package
               :init :null)
      ...)
(That's from one of the papers from the 2019 European Lisp Symposium: https://www.european-lisp-symposium.org/2019/index.html Scroll down to the bottom for proceedings with all the papers in the pdf. Going through the ELS archives and checking out papers and their references can be useful for learning more about this stuff too.)


Also worth mentioning SICL is very modular, and components of it are used in other Common Lisp implementations. If memory serves me right, Clasp uses SICL's Loop module to implement the LOOP macro, and this has allowed both Clasp and SICL devs to uncover lots of codebases with non-conformant usages of LOOP. (One common violation is e.g. WITH-clause after FOR-clause, and the SICL module decides to strictly follow the CL specification and error here).


Lisp from Nothing and Lisp System Implementation from [1].

There are quite a few open source Lisp implementations in addition to SBCL that you can peruse [2] (follow links to CL's supported by Quicklisp). If you are targeting a specific platform (JVM or embedded), then ABCL/ECL; see Clasp for using LLVM.

To reuse modules for writing your own lisp, see SICL [3]

[1] http://t3x.org

[2] https://www.quicklisp.org/beta/

[3] https://github.com/robert-strandh/SICL



Common Lisp compilers do not interact with read macros at all. Read macros are done by the time a compiler gets to the code.

Depending on the design, macros may also be expanded already; the compiler works with macro-expanded code.

The easiest approach, from a boostrapping point of view, is to write an interpreter in some host language. Get macro expansion and read macros working in the interpreter, and then add a compiler.


I see. So what is the output of Read macros? It has to be text, right, since only the compiler will be able to deal with tokens/objects?


Read macros are processed at read time, the result is a change to the way the `read` function operates. The output is a lisp object.

On Lisp Ch 17, page 224: https://sep.turbifycdn.com/ty/cdn/paulgraham/onlisp.pdf

CLTL2 Section 22.1.1: https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node188.html#... - Goes into a lot of detail on how the reader works.

The macros most people mean when they talk about Lisp macros are discussed in CLTL2 Ch 8. They are expanded during evaluation or compilation.

CLTL2 CH 8: https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node97.html

Read macros are able to produce (or not produce) lisp objects by taking control of the stream processing for some period of time.


reader macros create Lisp data.

The compiler/interpreter gets as input Lisp data. Let's use READ, DESCRIBE and EVAL to see that working:

    CL-USER 17 > (defun example (&aux input result)
                   (terpri)                    ; newline
                   (write-string "Input> ")
                   (finish-output)
                   (terpri)
                   (setf input (read))         ; reading from input stream
                   (terpri)
                   (describe input)            ; what did we read?
                   (terpri)
                   (write-string "Output>")
                   (setf result (eval input))  ; evaluation
                   (terpri)
                   (write-string "Result>")
                   (print result)              ; printing the result
                   (values))                   ; return nothing
    EXAMPLE

    CL-USER 18 > (example)

    Input> 
    (progn
      (print (+ 1 3))
      (print 'done))

    (PROGN (PRINT (+ 1 3)) (PRINT (QUOTE DONE))) is a LIST
    0      PROGN
    1      (PRINT (+ 1 3))
    2      (PRINT (QUOTE DONE))
    Output>
    4 
    DONE 
    Result>
    DONE
As you can see READ in actual Lisp takes input from the keyboard and returns a list as data.

EVAL is then taking code as a list and evaluates it to a value, as side effects there might be printed output of the program.


The compiler deals with code in S-expression form, represented as lists and atoms. Reader macros produce Common Lisp objects.


The readtables and reader macros are entirely independent of Common Lisp compilers. The compilers work on the representation (that is, trees of cons cells) that is produced by the reader, which is after the readtables and reader macros have done their work.

If you are thinking of a Common Lisp compiler as working on a text file, you're thinking about it wrong. In Common Lisp, you can invoke the COMPILE function on a lambda expression that you cons up at runtime and that never is in a textual representation.

As a general rule, don't reimplement a Lisp except as a learning exercise (in which case, you're not going to reimplement Common Lisp; it's too big.)


> In Common Lisp, you can invoke the COMPILE function on a lambda expression that you cons up at runtime and that never is in a textual representation.

Ah, I see. This actually clarified some of the apparently contradictory stuff I have been reading, thanks.

> As a general rule, don't reimplement a Lisp except as a learning exercise

Oh, I am much too nuts for something as simple as that, I am actually planning to write a GPU-targeted language with ALGOL-syntax and Lisp-metaprogramming and REPL-driven-ish development. Wish me luck!


Why m-expressions (Algol syntax)? McCarthy initially tried m-expressions for Lisp, but it turned out that programmers liked s-expressions better.


In particular, macros work great with s-expressions, not so great with m-expressions.


There has been some new research around this old problem, which inspired me: https://dl.acm.org/doi/10.1145/3622818


I salute your honest madness.


Loko is a pretty new Scheme compiler: https://scheme.fail/

It implements everything down to assembler on bare metal.

You can easily analyze what is going on by disassembling the code:

    > (disassemble car)
    Disassembly for #<procedure car loko/runtime/pairs.sls:81:0>
   
      entry:
       2073E8 83F8F8       (cmp eax #xFFFFFFF8)
       2073EB 0F8506000000 (jnz L0)
     ; (set! rax (car rdi))
       2073F1 488B47FE     (mov rax (mem64+ rdi #x-2))
       2073F5 F8           (clc)
       2073F6 C3           (ret)
      L0:
       2073F7 E9749CFFFF   (jmp (+ rip #x-638C))
    >


> but I find it hard to understand how Common Lisp compilers work – what with their readtables and read macros and normal macros

Read tables and reader macros are used in the reader, which is separate from the compiler. The compiler macroexpands normal macros though.

> what subsets are implemented before the rest

There are no obvious subsets in Common Lisp - see Baker on the matter <https://www.plover.com/~mjd/misc/hbaker-archive/MetaCircular...>.

SBCL is mostly written without CLOS and then brings in a CLOS implementation late into bootstrapping; that of course prevents you from using CLOS in some places, so you may not want to do that though.

> is there any implementation guide on how a Common Lisp is/should be implemented?

You could do a lot better than the existing Common Lisp implementations, especially in compilers and GC, but the issues there are not specific to Common Lisp.


> There are no obvious subsets in Common Lisp - see Baker

i think baker's point, and i'm sure you know it's also been discussed to death on #lisp at some point, that you can't choose one obvious set of fundamental special forms, in terms of which you can express a common lisp, because some of those forms can be in various ways expressed in terms of each other. his other point is that spec's special forms are underspecified. but i don't think that he's making a point that one can't take spec's special forms, treat that set of special forms as arbitrary, but working, and implement a common lisp with them. i've not given this subject enough thought, but do you by any chance have links to somebody making that point, that you can't in fact implement a common lisp using just special forms.


> SBCL is mostly written without CLOS and then brings in a CLOS implementation late into bootstrapping; that of course prevents you from using CLOS in some places, so you may not want to do that though.

This is one of the things that makes Common Lisp implementations tricky. Because the error handling system in CL is based on CLOS, it means you have to build the base system with a crippled error handler and then at some point replace the whole thing with the proper error handler atomically and pray you don't get an error during the switchover.


> Because the error handling system in CL is based on CLOS

This is not the case. CONDITION objects don't have to be CLOS objects (although they are allowed to be).


See here why Conditions is not defined in terms of CLOS:

https://www.lispworks.com/documentation/HyperSpec/Issues/iss...

> The condition system should not be too tightly integrated into CLOS, for two reasons: Some implementations already have a native condition system that is not based on CLOS, and it should be possible to integrate the native conditions and the ANSI CL conditions. Some people would like to define an ANSI Common Lisp subset that does not contain CLOS but does contain conditions.


Both of those reasons are pretty much stale nowadays. Contemporary implementation where conditions are standard objects are CCL, ECL, Clasp, ABCL, CLISP, SICL, LispWorks, and Allegro, leaving out pretty much only SBCL (which does this for bootstrapping reasons, such as having errors much earlier than CLOS) and probably Genera (which is hard to call contemporary). And AFAIK there are no alive subsets of CL whatsoever, let alone those which have a condition system but do not have CLOS.


Lisp in Small Pieces - Christian Queinnec https://christian.queinnec.org/WWW/LiSP.html

Anatomy of LISP - John Allen https://archive.org/details/anatomyoflisp0000alle/


PAIP by Peter Norvig, Chapter 23, Compiling Lisp

https://github.com/norvig/paip-lisp/blob/main/docs/chapter23...



Lisp in Small Pieces is the standard recommendation.


Thanks. I skimmed through it a while back, so my memory might be faulty, but I think it implemented a Lisp in a Lisp. Do they actually implement this whole machinery, or do they simply use the existing mechanisms?


The later chapters talk about compilation to byte-code and IIRC how you implement that in a VM in C. But it doesn't talk about memory management and garbage collection. It has a chapter on macros.

SICP goes a little further from compiling Scheme to virtual assembly, a VM and memory management with garbage collection in their last chapter. It's all implemented in Scheme itself, including the VM, but at that point it's low-level enough that you should be able to make the step to C. It doesn't talk about macros.

https://mitp-content-server.mit.edu/books/content/sectbyfn/b...

Neither one really talks about Common Lisp though and read macros etc. I'm also not sure how similar Scheme and Common Lisp compilers are.


Just for completeness sake, here's "arpilisp: A Lisp interpreter for Raspberry Pi implemented in a single ARM assembly file" : https://github.com/marcpaq/arpilisp

*Includes the classic mark-sweep GC from McCarthy 1960


Starting with the obvious, are you familiar with the HyperSpec? https://www.lispworks.com/documentation/HyperSpec/Front/inde...


Yeah, though I just yesterday discovered Novaspec and CL-community spec which actually makes it readable.


Mark: How do I compile Lisp source code?

Nolan: That's the neat thing -- you don't.

Lisp compilers operate on Lisp data structures -- generally speaking, mostly lists and atoms. The 'read' function is what turns text into these data structures; reader macros alter how this function operates and control what data structures will be emitted by the reader. This really is the neat part about Lisp -- it's much simpler to write a compiler that consumes an AST directly than one that consumes text, and the AST is expressed directly in terms of Lisp's primitive data structures which you can easily write your own functions over. And it's a relative doddle to write a reader for s-expressions compared to other expression formats. Join these two relatively simple pieces together and you have a complete Lisp compiler system.

As for regular macros, they transform the AST before it's evaluated or compiled. So they come into play after the reader has consumed the source and produced an AST, but before the evaluator or compiler can work.


Lisp compilers and interpreters are using s-expressions (-> nested lists of atoms) as source input. But these are not an AST. The compiler/interpreter has to find out what these lists & atoms actually mean and may build an AST.

Macros also see only lists/atoms. Macros then find out on their own what the embedded code should mean. That's different from some other languages, where macros work over an AST. Which means that the code is parsed and needs to follow a syntax. That's not the case in Common Lisp, macro forms can be in any syntax. (infix a + b) is legal syntax in Common Lisp, the macro could for example convert infix syntax to prefix. In a language where the operator + is defined to be a prefix operator (-> syntax), an expression prefix(+ a b) could not parse the thing into an addition call AST node, since it would need to know what syntax rules to use for the code. Somehow it would need to build an AST for a different syntax.


I think bitwize's reply is one of the best comments I've read here. If you don't have the context it's really pretty condescending and snarky. but, hexdecoctbin thanked them for the reply.

based on your username, I'll absolutely stipulate that you're correct. bitwize didn't really answer the question correctly.

I'm not saying you're wrong. I'm saying bitwize said what OP needed to hear to grok the answer, and that seems pretty fucking lispy to me.


> Join these two relatively simple pieces together and you have a complete Lisp compiler system.

I see, this clarifies things a lot. Thanks.


Make a Lisp: https://github.com/kanaka/mal

Make a Lisp is more Clojure-flavored than Common Lisp, but it might still be nice to examine because of how many different implementation languages it has been done in by various authors.

It also breaks down the implementation process into 11 distinct steps which are easy to understand.


I find it strange to think in terms of implementing a parser for lists as producing some kind of AST.

The reader does not return an AST, but is a Lisp procedure which returns nested lists of atoms. As such it is independent of code and it just reads s-expression data. The input to EVAL or COMPILE then is also not an AST, but an s-expression representation. Thus the first thing one would do is to think about a data representation for s-expressions.

READ then needs to turn s-expressions into that data representation. READ then also does not work over a string, but a text stream.

People implement then a Clojure-like language, but miss the implementation implications that the input to EVAL is actually a basic Lisp data structure and they miss the nature of READ producing such a Lisp data structure from basic stream operations like READ-CHAR... Also in many cases it is useful to make the reader table-driven and not hard-wired.

Thus a typical question is to think how much of Lisp do I have to implement in that other language and how much Lisp then can be then implemented in Lisp.

So, one of the really basic ideas of Lisp is that READ is defined to take input from a stream (keyboard, file, ...) and returns basic Lisp data. It is a general tool for de-serialization of Lisp data, not a special tool for reading Lisp code.

READ is not converting a STRING to an AST.


Mal - while nice - has very little to do with common lisp.


I highly recommend anybody interested in Lisp and compilers to check this repo out. I went through it this weekend and built my own Lisp interpreter in Ruby with it, fwimc: https://github.com/Looveh/lovisp


Perhaps reading the source code for SBCL, https://www.sbcl.org/porting.html provide some of the answers you seek.


Compiling a Lisp, the series https://bernsteinbear.com/blog/lisp/


I would suggest reading Lisp In Small Pieces by Christian Quiennec.


Asking the real questions, thanks.


SBCL is open source.

That might be a way to start.

Or not.

Good luck.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: