Hacker News new | past | comments | ask | show | jobs | submit login
Roll A Lisp In C – Reading (2020) (swatson555.github.io)
65 points by swatson741 14 days ago | hide | past | favorite | 21 comments



I have been using sbcl for a very long time and never knew that the core is a small optimised c part and the rest all written in Lisp. Not totally self hosted but quite close.

Now I have been searching for an overview of the CL commands that make up that core which would be needed to implement CL (run the rest), but I cannot find any. Anyone here knows?


The C runtime is pretty much all OS-interaction support components.

IIRC, none of the actual ANSI CL code is implemented in C - in fact, it would break one of the core design ideas of SBCL, which is "Sanely Bootstrappable" - SBCL can compile itself on any "complete enough" ANSI CL system - it doesn't even have to implement everything - then uses itself and its own standard library to compile itself again but into actual native code.

Out of memory, the C based runtime in SBCL has the garbage collector, low-level memory interfaces to OS (how to acquire memory, how to release memory, setting memory protection flags), some core FFI to OS which are called pretty deep from standard library (i/o, threads, processes), basic FFI supports (manipulating linkage tables, for example) setting CPU contexts, etc.

On the non-OS-specific side, there's core loading, which is a distant ancestor of Mach dynamic linker.

Some of it is C-side implementation of basic SBCL datatypes which allows both GC and runtime to interact with them safely, but usually the CL code does not call into those.

Some of the low-level manipulation is actually assembler.


Thanks, that’s very helpful! I wish people would write detailed implementation books about these things as it is very hard to get into when just jumping in the deep even for experienced C and Lisp programmers. I guess I need to jump in!


For starters, SBCL's non-CL parts are separated into src/runtime directory, which makes it much easier to figure out what is just "support structure" and what is actual language.

For learning more specifically on SBCL, you can look at following

https://pvk.ca/Blog/2013/04/13/starting-to-hack-on-sbcl/

https://simonsafar.com/2020/sbcl/

And of course SBCL code and comments.

There are also some papers related to CMUCL (and before that, Spice Lisp) that talk about Python Compiler (the compiler used by CMU CL, SBCL, Scieneer CL)


Anyone have a good reference on how write the interpreter itself?

I'm looking to write a really basic one in javascript except using plain arrays. No parsing or tokenizing for me.


While probably a bit overkill for writing a Lisp, Robert Nystrom’s Crafting Interpreters [0] is a really fantastic book on the topic.

[0] https://craftinginterpreters.com/contents.html


Seconding the Make-A-Lisp and Crafting Interpreters recommendations. TFA is also pretty good, it contains a hard written parser that you may learn from. Lisp grammar is regarded as simple enough that full parser generators need not be used.

There are also the Lis.py articles which are incredibly good:

https://norvig.com/lispy.html

https://norvig.com/lispy2.html

You mentioned you want to write your lisp in javascript so you'll be getting quite a lot of functionality for free: memory allocation and garbage collection. That alone solves an enormous amount of problems. Just in case anyone reading is going to write in C, here's an excellent introduction to the topic:

https://journal.stuffwithstuff.com/2013/12/08/babys-first-ga...

Don't forget to spill the registers.


Here's some pseudocode that should help you get started on a very simple thing

  evalExpression:
    switch peekChar
      ( evalFunctionCall
      0-9 parseNumber
      " parseString
      a-zA-Z parseAndGetVariable

  evalFunctionCall
     fn = evalExpression
     arguments = evalExpression  until )
     fn(arguments)


https://sicp.sourceacademy.org/chapters/4.html

This inlines the diffs between the original SICP and the new one that is implemented in Javascript too.


I’m not really sure how you would be able to write an interpreter without a parser, since you need to know what the user is trying to do. For a more complete tutorial there is Make-A-Lisp (mal) [0] that has steps for making a lisp in various (including JavaScript) and has a process guide [1] to get started.

[0] https://github.com/kanaka/mal

[1] https://github.com/kanaka/mal/blob/master/process/guide.md


I can second Make-A-Lisp. I worked through it in C# initially, and I'm now halfway through a Rust implementation. The MAL process guide is a good set of instructions but doesn't go as far as actually presenting pseudocode, so you do have to think, and sometimes 'cheat' by looking at one of the reference implementations. In terms of tokenising and parsing, the MAL guide does include a monster regex statement that recognises the relevant tokens, and the parsing is quite easy due to Lisp's relatively simple syntax. If you choose the right implementation language, it will do some of the heavy lifting for you (notably memory management).

[Edit] Just wanted to reiterate that lexing/parsing Lisp is easy as per the MAL instructions. IIRC the things that I found hardest (in C#, as a C# newby) were understanding how to implement the closures required to support function definition, and then implementing Lisp's macro expansion.


You can create an interpreter in the finally tagless style:

https://okmij.org/ftp/tagless-final/

This embeds the target language as combinators in a host language, so you can construct and evaluate programs.


I have not heard of tagless final style interpreter before and that seems really neat. However, based on the link it seems that tagless final interpreter uses a strong type language like Haskell or ML so I don’t know how well it would translate to Javascript, which is what the GP comment said they were using.


I've done them in C# with slightly less safety.

https://higherlogics.blogspot.com/search/label/tagless%20int...

The style of the construction is what matters most. If you're using JS then you're already giving up types, but the higher order abstract syntax is what lets you embed and play with language semantics.


I think I may have misunderstood what your original comment was referring to. Were you saying that one doesn’t need a parser to make a working interpreter? I want to make sure that I understand the context of your comments because looking at the post you linked where you did this in C# you show you are using a recursive descent parser. It was when I saw the parser that I realized I could be way off base on what I was thinking you were talking about.


Yes, it doesn't need a parser. I didn't link to one post, it's a tag that links a series of posts discussing tagless interpreters. You can use the interpreter in the first post directly without the parser as well, it's a separate component. IQueryLanguage<T> is the interpreter definition.


I should note that the other posts go into more of the theory and background of final tagless encodings, including links to Oleg's site who has done a lot of work on them. They are a very flexible for implementing embedded languages.

Many are based on the SICP metacircular evaluator. Write that in your language of choice then you can bootstrap into scheme from there. Also see the 1986 MIT lecture videos by the authors, to get the spirit of the thing.


The first time I attempted a Lisp back in about 1989 or something, I used the metacircular evaluator from SICP.

A little later I used Norvig's Scheme implementation from chapters 22 and 23 of PAIP (https://norvig.github.io/paip-lisp/#/?id=paradigms-of-artifi...).

I liked Norvig's a little better. It starts with a simple interpreter and adds improvements step-by-step, winding up with an optimizing compiler that targets a bytecode VM.

That approach makes a lot of implementation details manifest in a way that is easy to grasp. For example, I think I understood continuations much better after seeing Norvig's account of how to implement them.


Check out miniMAL from the person behind Make a Lisp. It’s exactly what you want, I think: http://kanaka.github.io/miniMAL/


Tokenizer is unsafe because the loops are not bounded to the array sizes, 128 and 32 respectively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: