Hacker News new | past | comments | ask | show | jobs | submit login
Compiling a Lisp to x86-64: Let expressions (bernsteinbear.com)
131 points by todsacerdoti 25 days ago | hide | past | favorite | 39 comments

Thanks to the author for an enjoyable intellectual journey, it's a wonderful series of articles.

Having tried my hand at following the Make a Lisp project ¹, implementing an interpreter in a couple of languages, it's fascinating to see the process of designing the language and compiler from the ground up, discussing various tradeoffs for simplicity, reasons for internal data structure, etc.

In fact, I think I'm learning more about C99 than Lisp - there's something about creating a language (or just following along like me), by necessity it leads to the foundations of computing/programming, the concepts that make up the basis of all languages. It's a perfect (meta)subject for an educational project.

¹ https://github.com/kanaka/mal

I'm so glad you're enjoying them!

I can't recommend Lisp In Small Pieces enough. I prefer it to SICP.

Appreciate the recommendation! I was able to track down the author's page:


..And someone who updated the book's source code to run on modern Schemes:


Critique: the TOC is only shown in first post and not clickable, makes it hard to navigate the series. It would also be nice to have the TOC repeated in each post.

Question: I saw a chapter about unary functions. In some interpreters let bindings are explained as "sugar", or syntactic extensions on top of function calls. For instance `let` is not present in the core of scheme [1]. So in this way one could transform a let binding from:

    (let ((a 1) (b 2)) (+ a b))

    ((((lambda (a) (lambda (b) (+ a b))) 1) 2)

    (define-syntax let
      (syntax-rules ()
        [(_ ((x e) ...) b1 b2 ...)
         ((lambda (x ...) b1 b2 ...) e ...)]))
... then you wouldn't need to compile let bindings, just function application, which some people call "desugaring".

I was wondering if this kind of desugaring is used in practical lisp compilers or not... without knowing too much about lisp compilers, I imagine one reason to not go this way is that it could potentially generate way more code than compiling higher level concepts like `let` directly.

1: https://scheme.com/tspl4/further.html

Yes, real compilers do handle `((lambda (x) ...) ...)` as well as a `let`, because as a Lisp programmer you want to be able to define and use new macros freely, without having to worry too much about low-level optimizations. Kent Dybvig gave a talk on this sort of thing called the "macro-writer's Bill of Rights". It's a similar concept to the recent talk about what optimizations need to be guaranteed for "zero-cost abstraction" in Rust and C++.

By guaranteed optimisations talk, are you referring to this blog post by Robert O'Callahan: https://robert.ocallahan.org/2020/08/what-is-minimal-set-of-...?

Yes, thanks.

Thanks for the hint, lots of interesting talks on the conference.


Most lisp compilers do exactly as you suggest. Generally, lisp is implemented on a small core of maybe 20~30 functions, with everything else as macros. If you're curious about how lisps are implemented, I recommend checking out this series: https://www.youtube.com/watch?v=Wa81OJnlsoI

That's really not true of Common Lisp implementations. These have many more than 20-30 functions, and they all have to be there because one can obtain them as values and pass those values around (and, in the case of standard generic functions, add new methods for them). There may be specific expansions for calls to particular standard functions when the types of the arguments are known, or when particular keyword and optional arguments are or are not present.

Well, of course; many of them will be functions, not macros.

What I mean is, there will only be 20~30 builtins; the rest can be implemented 'in userspace', as it were.

Though specifically in the case of cl, clos might complicate that somewhat, but probably not a ton.

Let's go with this restatement.

It's one of the features of Common Lisp that things that might be buried in the compiler in other languages are surfaced and made available to the user. This means a great deal of the language's standard functionality can be implemented as if it were in "user space". Macros, readtables, method combinations, setf expanders and generalized places, for example.

IMO, if Common Lisp were to be extended (it probably won't be, at least officially, as there's no money for another whack at a standard) I think it would be best even if more "internal" features were surfaced this way, and made available for user extension.

Good idea about the TOC.

Re: lambda: it's important to realize that this series is building up incrementally bigger features from nothing. This Lisp compiler won't have a full implementation of closures or even labeled procedures for some time. So right now we're getting used to name binding and using what we have.

While it's possible to generate the same code I did even when rewriting to lambda, you'll probably have to have some kind of strength reduction afterward.

Well, you've implemented let* :

    (let* ((a 1) (b a)) (+ a b))
is equivalent to:

    ((((lambda (a) (lambda (b) (+ a b))) a) 1)
And anyway, implementing 'letrec' in C/Asm is really not that hard: allocate cells, assign names to them, compute the values. Compare that with the machinery of making closures (flat or linked?) and function invocation.

I personally prefer to have 'letrec' as a primitive and implement non-recursive 'let' on top of it by checking whether any newly bound names are referenced from the bound values.

> Well, you've implemented let* :

    (let* ((a 1) (b a)) (+ a b))
I think this should just be plain let.

If I do:

     (define-syntax let-
      (syntax-rules ()
        [(_ ((x e) ...) b1 b2 ...)
         ((lambda (x ...) b1 b2 ...) e ...)]))
And then try:

       (let- ((a 2) (b a)) (+ a b))
I get:

     ; a: undefined;
     ;  cannot reference an identifier before its definition
     ;   in module: top-level

I don't really understand if you are surprised by that or not. That is the expected behaviour of regular let. Parallel binding. let* is sequential.

I'm not surprised by that. The parent suggested (or I thought that they did) that this was actually the definition for let* . And my point was to show that it does not behave like let* but rather like plain let.

Metoo 1982:

    (ncompile (macroexpand '(let ((x 1)) (+ x 1)))) 
    $09CB:$7188:   JMP  ??      ; see below         
    $09CB:$718B:   CALL $0ECD   ; CALL ONEARG       
    $09CB:$718E:   PUSH [$012C]                     
    $09CB:$7192:   MOV  [$012C],AX                  
    $09CB:$7195:   JMP  ??      ; see below         
    $09CB:$7198:   PUSH [STACKMARK]                 
    $09CB:$719C:   MOV  [STACKMARK],SP              
    $09CB:$71A0:   MOV  AX,[$012C]                  
    $09CB:$71A3:   CALL $0F1D   ; CALL NUMVAL       
    $09CB:$71A6:   MOV  BX,$01                      
    $09CB:$71A9:   ADD  AX,BX                       
    $09CB:$71AB:   CALL $05C9   ; CALL MAKNUM       
    $09CB:$71AE:   JMP  $2080                       
    $09CB:$7195:   JMP  $71B1                       
    $09CB:$71B1:   CALL $7198                       
    $09CB:$71B4:   POP  [$012C]                     
    $09CB:$71B8:   JMP  $1DA7                       
    $09CB:$7188:   JMP  $71BB                       
    $09CB:$71BB:   MOV  AX,$04  ; S-object 1        
    $09CB:$71BE:   CALL $718E                       
    $09CB:$71C1:   JMP  $1DA7                       
    (subru: eval=$7188, compile=$3B6F)
One pass, but corrects forward jumps afterwards.


Aint I clever? Realized rightaway that forwards JMPs to the next address could have been filled with NOPs instead. But this is not a beauty contest.

Ah, right, I didn't look at your name. Have you worked on any other lisps?

Musimp/Mulisp dual syntax was very particular. The Lisp-scene has gone just downhill ever since, imho. http://www.edm2.com/index.php/MuLISP

Even with Racket, Shen, etc.?

Thanks for posting! I'm the author and can answer any questions and receive any constructive criticism.

I don't have any questions, but I just started reading it a few days ago, and I'm re-implementing it in Nim (which is an interesting experience). One issue I ran into right away was that my code from step 1 was segfaulting, and the issue was that I accidentally called `mprotect` before copying the memory into the buffer. Easy mistake to make, so you might want to add a warning for people who aren't just copy-pasting the code.

Excellent! I'm so happy you're following along in another language. If you're developing in the open, can I link to it somewhere on the series?

It's a good point and I'll try and remember to add a notice about order being important later.

Once I'm ready I'm planning on throwing it up on github for sure. I'll make a note to send you an email with it :)

Awesome. Looking forward to it.

I wish these series on compilers would focus more on the harder parts, like JIT compilation or the virtual machine and concurrent garbage collecting.

You're more than welcome to your opinion, but I've explicitly outlined that those are non-goals of this series. Maybe maybe I'll get to caching or optimization.

Seems good and reading. But it should explain or say it will cover the difference to c if c makes a distinction of statement and expression. And why that matter. Not throw a bone into the sky and assume we know it will later become a space station.

Good point! It's sort of a throwaway comment to handwave away linguistic differences. I'll try to make it better when I have access to a computer.

(lets a 1 b 2 (+ a b)) is such a nice macro. Everything but the last expression are bindings.

What? The let way of variable bindings is standard just about everywhere. Why replace it with something that doesn't visually distinguish binding pairs? To my lispy eyes it is harder to navigate than

    (let ((a 1) (b 2))
      (+ a b))
and harder to implement as a syntax rules macro :)

Free yourself from the shackles of parens. Throw off your bindings and run into the streets. It's worth it.

Arc uses (let a 1 (+ a 2)), and lets is simply the logical progression of that.

I have gone the other way. when I first learned macros I spent loads of time trying to make paren-free versions of the standard scheme forms, and now I can't even be assed to do the racketism of surrounding my binding clauses with [].

The parens of let lets me easily distinguish the binding clauses, and lets me navigate it a lot easier in Emacs. The lets macro seems like the opposite of that. Every odd thing except the last element is an identifier and every even thing is that identifiers value.

Why is it nicer than (+ 1 2).

What is gained from an extra layer of indirection?

Fewer parens. It was just a trivial example to illustrate the macro.

(lets (a b) (list 1 2) c (+ a 3) (d e f) (some-func a b c) (* d e f))

It destructures automatically. The equivalent traditional macro might look like

(destructuring-bind* [[(a b) (list 1 2)] [c (+ a 3)] [(d e f) (some-func a b c)]] (* d e f))

which is unreadable. And yeah, normally it's indented properly and yada-yada, and people use editor plugins like paredit to make it not such a pain to type, but still. It adds up.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact