Hacker News new | past | comments | ask | show | jobs | submit login
Compiler basics: Lisp to Assembly (eatonphil.com)
242 points by eatonphil 6 months ago | hide | past | web | favorite | 39 comments

Years ago I read an article or paper about writing a lisp compiler. The compiler started with regular looking lisp, then through macro expansion lisp primitives were defined in terms of each other. Then the small subset of lisp primitives were expanded to goto's and conditionals. The gotos and conditionals were still expressed in lisp S Expressions. Finally the gotos and conditionals were implemented in terms of assembly looking lisp primitives. Then they removed the parentheses and it left an assembly program.

I can't remember the name of the paper. Any ideas? It was such an elegant way of walking through a compiler implementation.

A similar approach is taken in this Obligatory mention: https://compilers.iecc.com/crenshaw/

Thank you. This is such a great resource and very readable on mobile too since it's just txt like a man page.

Hi Hacker News PL enthusiasts. This is off topic, but I hope you will have mercy on me.

I'm reaching out for some help from you. I've been working on interpreters this whole year, taking on no more work for pay than needed for rent and food, so that I can realize my programming language dream. I'm writing a couple of interpreters every week, trying out new ideas. But there is one hurdle I've been unable to overcome for a long time.

I'm trying to write a typechecker for a simple lambda calculus with polymorphic functions. My acceptance criteria is being able to typecheck the K combinator. I've been stuck at this point for months. Have read all I can find on the internet, and have read TAPL many times.

Am honestly feeling really stupid for not having been able to complete this.

What I'm struggling with beta reducing expressions that are type lambdas (i.e. the expression related to `Forall`).

Λ means type lambda, i.e. the set of expressions indexed by type

λ means expression lambda, i.e. the set of expressions indexed by expression

E.g. Λa.λx:a.x is the polymorphic "identity" function, which in javascript looks like `x => x`

k = ΛA.ΛB.λx:A.λy:B.x

k_intint = ((k IntTy) IntTy)

What I have yet been unable to do is type_of k_intint, and this because beta reduction should return an expression, but type_of should return a type.

I'll paste the source code of my latest attempt below, the function in question is `typ_of_expression`. I've written this in OCaml, but would be very thankful for a response in any language, or pseudocode, or just explained in English.

Many thanks for taking the time to read this. For any reply or pointer in the right direction I would be greatly thankful. You know what, I'd be happy to pay €100 for any help that lets me solve this (SEPA, Bitcoin).

  (* λ calculus, with Forall and simple types *)
  type 'a binding =
    | SymBind of 'a
    | QuantifierBind

  type 'a context = (string * 'a binding) list
  type typ =
    | UnitTy
    | IntTy
    | Arrow of typ * typ
    | Forall of string * typ
    (* Thought it simpler to separate Closure and Lambda, for w  hen I later add records/unions etc *)
    | ClosureTy of (typ context) * typ
    | SymbolTy of string
  type expr =
    (* again thought it simpler to separate Closure and Lambda,   for when I later add records/unions etc *)
    | Closure of (expr context) * expr
    | Unit
    | Symbol of string
    | Int of int
    | Lam of string * typ * expr  (* turns into Arrow. Takes ex  pr as argument, returns expr *)
    | App of expr * expr
    | TLam of string * expr       (* expr version of Forall. Ta  kes typ as argument, returns expr *)
    | TApp of expr * typ
  let rec lookup context key =
    match context with
    | [] -> Error "No result"
    | (k, QuantifierBind) :: rest -> lookup rest key
    | (k, SymBind v) :: rest -> if key = k
                                then Ok v
                                else lookup rest key
  let str = String.concat ""
  let rec string_of_typ t = match t with
    | UnitTy -> "UnitTy"
    | IntTy -> "IntTy"
    | Arrow _ -> "Arrow"
    | Forall (s, t) -> str ["Forall "; s; " "; string_of_typ t]
    | ClosureTy (_ctx, t) -> str ["Closure "; string_of_typ t]
    | SymbolTy s -> str ["SymbolTy "; s]
  let string_of_expr e = match e with
    | Unit -> "Unit"
    | Closure _ -> "Closure"
    | Symbol s -> str ["Symbol "; s]
    | Int i -> str ["Int "; string_of_int i]
    | Lam _ -> "Lam"
    | App _ -> "App"
    | TLam _ -> "TLam"
    | TApp _ -> "TApp"
  (* Convert between context types, such as (expr context) or (  typ context) *)
  let rec convert_context converter_fn original_context exprcon  text acc = (* カリたべたい *)
    let convert_context = convert_context converter_fn original  _context in
    match exprcontext with
    | [] -> acc
    | (x, QuantifierBind) :: rest ->
       convert_context rest ((x, QuantifierBind) :: acc)
    | (x, SymBind e) :: rest ->
       (match converter_fn original_context e with
        | Ok t -> convert_context rest ((x, SymBind t) :: acc)
        | Error e -> failwith (str ["typ_context error: "; e]))
  let rec typ_of_expr context e =
    match e with
    | Unit -> Ok UnitTy
    | Int i -> Ok IntTy
    | Closure (ctx, e) ->
       let new_context = convert_context typ_of_expr context ct  x context in
       typ_of_expr new_context e
    | Symbol s -> lookup context s
    | Lam (x, in_typ, body) ->
       (let new_context = (x, SymBind in_typ) :: context in
        match typ_of_expr new_context body with
        | Ok return_typ -> Ok (Arrow (in_typ, return_typ))
        | Error e -> Error (str [ "Error finding return type of   lambda. Error: "
                                ; e]))
    | App (e0,
           e1) ->
       (match (typ_of_expr context e0,
               typ_of_expr context e1) with
        | (Ok (Arrow (t0_0, t0_1)),
           Ok t1) -> if t1 = t0_0
                     then Ok t0_1
                     else Error (str ["Type mismatch"])
        | (Ok (ClosureTy (ctx, Arrow (t0_0, t0_1))),
           Ok t1) -> if t1 = t0_0
                     then Ok t0_1
                     else Error
                               ["Type mismatch. t0_0: "
                               ;string_of_typ t0_0
                               ;", t0_1: "
                               ;string_of_typ t0_1
                               ;", t1: "
                               ;string_of_typ t1
        | (Ok t, _) -> Error (str ["Not given a lambda as first   thing to App. "
                                  ;string_of_typ t])
        | _ -> Error (str ["Error getting typ of expr App"]))
    | TLam (a, body) ->
       (let new_context = (a, QuantifierBind) :: context in
        match typ_of_expr new_context body with
        | Ok body_type -> Ok (ClosureTy (new_context,
                                         Forall (a,
        | Error e -> Error (str [ "Failed to get typ of TLam, a  nd thus can not construct Forall. Error: "
                                ; e]))
    | TApp (TLam (a, body),
            t1) -> let new_context = (a, QuantifierBind) :: con  text in
                   typ_of_expr new_context body
    | TApp (e0, t1) ->
       (match typ_of_expr context e0 with
        | Ok (ClosureTy (ctx,
                         Forall (a, body))) ->
           let new_context = (a, SymBind t1) :: ctx @ context i  n
           Ok (ClosureTy (new_context,
        | Ok (ClosureTy(ctx,
                        t)) ->
           Error (str ["TApp given ClosureTy with non-forall on   left-hand-side. IS: "
                      ;string_of_typ t])
        | Ok (Forall (a, body)) -> Error "TApp given naked Fora  ll as first argument"
        | Ok _ -> Error "TApp given non-Forall first argument"
        | Error e -> Error (str ["Error in TApp - error: "
  let rec eval context e =
    let context: (expr context) = context in
    match e with
    | Unit -> Ok Unit
    | Closure (ctx, e) -> eval (List.append ctx context) e
    | Int i -> Ok (Int i)
    | Lam (x, t, body) -> Ok (Closure (context,
                                       Lam (x, t, body)))
    | Symbol s -> lookup context s
    | App (Closure(ctx,
                   Lam (x, t, body)),
           e1) ->
       let new_context = (x, SymBind e1) :: ctx @ context in
       eval new_context body
    | App (e0, e1) ->
       (match eval context e0 with
        | Ok (Closure (ctx,
                       Lam (x, t, body))) ->
           eval (ctx @ context) (App (Lam (x,
        | Ok _ -> Error "Can't apply non-Lam"
        | Error e -> Error (str ["Apply error: "
    | TLam (a, body) -> Ok (TLam (a, body))
    | TApp (TLam (a, body),
            given_typ) ->
       let new_context = (a, QuantifierBind) :: context in
       eval new_context body
    | TApp (e0, t) ->
       (match eval context e0 with
        | Ok (TLam (a, body)) ->
           let _ = failwith "Not gonna happen" in
           eval context (TApp (TLam (a, body), t))
        | Ok (Closure (ctx,
                       TLam (a,
                             body))) ->
           eval (ctx @ context) body
        | Ok _ -> Error "Can't type-apply non-Λ"
        | Error e ->
           Error (str [ "Error applying type lambda. Error: "
                      ; e]))
  (* helper functions *)
  let tlam a body = TLam (a, body)
  let lam x t body = Lam (x, t, body)
  let app e0 e1 = App (e0, e1)
  let tapp e0 tyT1 = TApp (e0, tyT1)
  (* eval [] (tlam "a" (lam "x"
   *                      (SymbolTy "a")
   *                      (Symbol "x"))) *)
  let k_combinator = (tlam "a" (tlam "_b"
                                     (SymbolTy "a")
                                        (SymbolTy "_b")
                                        (Symbol "x")))))
  let k_intint = (TApp (TApp (k_combinator,
  (* DEAR HN READER: This is what is failing me. *)
  let applied_k = typ_of_expr [] (app k_intint (Int 1))

Are you aware that the TAPL book comes with an implementation of System F?


I'm not gonna read all your code, and I'm not even sure if this is your problem, but just as a pattern match based on what you said: for System F, only the first and second ranks have decidable inference, not the whole thing. Inference in the first rank is often referred to as the "Hindley-Milner" algorithm; you can infer types for the second fragment but I'm not aware of an implementation. After that it's undecidable.

To put that another way, if `forall a.` introduces types (it's like the big lambda), then you can only have a type with `forall`s on the left on the outside. So:

`forall a. forall b. a + b` is a type but

`forall a. a -> forall b. b -> c` is not a type since that `forall` isn't all the way on the left in a chain like the first one.

If you're not in that fragment, and you're trying to infer types, you're gonna have a bad time unless you've got something exceedingly clever up your sleeve.

So `ForAll` should not be one of the "Type" type constructors. It should be its own type, like

    # The int is the number of variables bound, then the type is the type that uses them
    data Scheme = Int Type
    data Type = 0 | 1 | Var Int | Product Type Type | ...

> λ calculus, with Forall and simple types

Just fyi, this is called System F.

I do not remember any OCaml and I get an error when trying to run it (File "./test.ml", line 123, characters 6-7: Error: Syntax error: operator expected.) so I do not know if I can be of much help. What is the result of typ_of_expr [] (app k_intint (Int 1))? I presume that you expect it to be IntTy -> IntTy, right?

Thanks for your reply.

It is indeed system F I'm trying to implement :-)

My formatting for HN seems to have screwed the program up. You'll find the contents of systemf.ml in this paste: https://pastebin.com/gw20LBNR

Here is the result from my OCaml repl:

  # typ_of_expr [] (app k_intint (Int 1));;
  - : (typ, string) result =
  Error "Type mismatch. t0_0: SymbolTy a, t0_1: Arrow, t1: IntTy"
You are correct about what `(k_intint (Int 1))` should be. I hope for `k_intint` to be `IntTy -> IntTy -> IntTy`.

I am pretty sure that the type variable is not binded properly (when type checking App on the part where it says "(Ok (ClosureTy (ctx, Arrow (t0_0, t0_1)))," I get a symbol for t1, you are doing t1 = t0_0 which would be false in any case but the thing is that even when I use lookup in the enviroment I do not find said symbol). I would suggest instead of having closures to replace every occurrence of the type variable (or any variable really) during evaluation. So if you have something like ((\Lambda t (\lambda x t x)) int) to evaluate it into (\lambda x int x) directly instead of (\lambda x (sym "t") x) + a separate environment thing. I can give you my System F implementation in haskell that uses this method if you want.

As far as I see, you don't really seem to even try to infer types. With fixes to `string_of_*` routines (yeah, some cases are missing) the particular failure is as follows:

    ty(TApp ...) = Closure (Arrow (SymbolTy a) (Arrow (SymbolTy _b) (SymbolTy a)))
    ty(Int 1) = IntTy
    ty(App (TApp ...) (Int 1)) = Failure
      "Type mismatch. t0_0: SymbolTy a, t0_1: Arrow (SymbolTy _b) (SymbolTy a), t1: IntTy".
Here `t0_0` is a formal argument type and `t1` is being applied, so even though they are structurally different it should be inferred that `SymbolTy a` equals to `IntTy` (probably by updating the context). But your code doesn't do that.

I strongly recommend to first familize yourself with Hindley-Milner type system (and possibly Algorithm W) which type inference is efficient and easy enough to understand. Even though the decidable type inference in System F is impossible, the basic techniques for (partial) type inference are not too different.

I wrote up some notes on type reconstruction a few years back that you might find helpful: http://www.ccs.neu.edu/home/amal/course/7480-s12/inference-n...

I wrote a bit about how to inter types here: http://jeremymikkola.com/posts/2018_03_25_understanding_algo...

Two things I see missing from your code: A current substitution and a unification function.

This paper might also be helpful to read: http://citeseerx.ist.psu.edu/viewdoc/download?doi=

Why does everyone assume that the OP tries to implement type inference? System F and HM are incompatible with each other.

Why not just compile "primitive" functions call to respective assembly code? (+ => "add")

> In a better compiler, we would not make plus a built-in function. We'd emit code for the assembly instruction ADD. However, making plus a function makes code generation simpler and also allows us to see what function calls look like.


"[Assembly is] 1:1 representation of the binary instructions the CPU can interpret". 99% correct.

Assembly is NOT a 1:1 representation of the binary instructions. It is for the 99% of a given assembly. Pseudo-instructions exist in most assemblers and they usually translate to several machine code instructions.

Thanks for the correction. Do you have an example?

Division and multiplication in some RISC processors, e.g. MIPS.

Compiler without a Grammar...

What do you mean?

The article is about "compiler basics", but it never mentions PLs are parsed according to context-free grammars, let alone giving one. Just throws a bunch of code and says "yeah this'll parse it".

Maybe because compilers nowadays are mostly not about syntax? Parsers are the boring part.

While many implementations of languages use parser generators according to context-free grammars, many still use hand-written parsers like mine. The major differences are safety, error-handling, and functionality. I discounted those up front in my post.


Calling this a lisp is a bit of a stretch. S-expressions aren't lisp if they don't have lisp semantics.

What part of the code that will compile under this simple compiler (i.e. `(+ 2 (+ 3 4))`) _doesn't_ have lisp semantics?

Lisp uses S-expressions but not all the S-expressions are lisp. If you cannot define functions with free variables then is not lisp.

I would say this:

- tokens are just collections of non-space characters, passed raw through to an assembler, where it is assumed that they denote machine words. Whatever happens, happens. The "symbol" eax will refer to a register, for instance. (If this project had stuck with AT&T assembly syntax, at least that would have been qualified as %eax%).

- functions aren't registered under Lisp symbols but are just global references to assembly-language labels. It seems we can just do (1234 4) and it will just call address 1234.

- no LIS)t P)rocessing to be seen here.

That said, there is no language that is just "Lisp", and so there is no prescribed set of semantics. We can make a prefix expression preprocessor for x86 assembly language and call it "Phil Lisp" (or whatever qualifier).

Eval... as in REPL, is tough without an interpreter in process.

Integer overflow.

Not all historic dialects of Lisp have bignum integers. I don't know what the current situation is in Emacs Lisp, but it had only fixnums at one point.

For those who don't know. What exactly are the semantics of lisp?

This is really a minimal translator from s-exprs to assembly. Which is still an interesting project.

To be a lisp, you need to be able to declare and call functions, and construct and manipulate lists. Once you have that, you can then manipulate the code itself since s-exprs are just lists of a certain construction.

Ok, I get this is a DIY post, but wouldn't it make much more sense to use LLVM here? That way, you get low-level optimizations for free, and this allows you to focus on the rest of the language. You can always implement the code-generation part if you want to and have the time for it.

I think you answered your own question with the last sentence.

You could learn more from targeting an intermediate language than from targeting a specific architecture, imho.

Maybe you learn "more". But you learn very different things.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact