
Compiling Dynamic Languages (2013) [pdf] - pcr910303
http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6620/pdf/imm6620.pdf
======
rtjalkbo
Co-author of the thesis here - if anybody is interested in the source code,
let me know.

Keep in mind that JS has changed quite a bit in the years that have passed and
that certain aspects ('eval' in particular) was excluded from the project.

Though, it may be an interesting starting point for anybody, who wants to play
around with compiling JS.

~~~
chrisseaton
> certain aspects ('eval' in particular) was excluded from the project

If you’re interested in what it takes to compile dynamic languages, why did
you leave out such a notable dynamic language feature?

~~~
avidmoon
Because it is not really possible to "compile" an eval expression in the
general case.

By definition, in a compiler, translation from symbols to code happens on the
beforehand. But eval needs to do this at runtime. This means that to compile a
program a la "eval(input_string)" the output program whould need to contain
the entire compiler, which gets used to convert the eval_input before running.

So in practice you strapped a kind of interpreter to the output program. This
is often inefficient and hackish, and thus it is more desirable to just not
have a eval feature.

~~~
chrisseaton
Yeah I get it's problematic.

But if you're going to start cutting out the dynamic features, and those you
don't think aren't desirable, then why are you challenging yourself to write a
compiler language for a dynamic language in the first place? You could cut out
all these troublesome features if you just picked a static language.

~~~
coldtea
Because you don't need eval to do hugely interesting things in dynamic
languages.

Heck, you don't even particularly need reflection, but eval even less so.

------
divs1210
Looks like the authors completely ignored Lisp related work spanning almost
half a century. They will have their minds blown if they ever check out SBCL,
for example.

~~~
msla
> Looks like the authors completely ignored Lisp related work spanning almost
> half a century. They will have their minds blown if they ever check out
> SBCL, for example.

So help me, I think it's the name.

Or, rather, the fact the name hasn't changed in a half-century. LISP is Lisp
is Lisp, right? SBCL's optimized Common Lisp must be the same as mid-1960s
MACLISP must be the same as LISP 1.5 directly from the Stone Tablets, or
academia, whichever's closer.

Compare Algol: It was Algol up until PL/I, then it did a quick detour of being
called Ada, and now it's C and C++ and Java and C# and Javascript and Python
and Go and Rust. All-New All-Different, see? They must be greatly innovative,
changing names so much.

------
lispm
> Classically, dynamic languages have been interpreted, that is, evaluated at
> run- time by an interpreter program. In recent years, dynamic languages have
> also been complied, that is, translated to another language such as native
> code before being executed.

Strange, I thought I have seen compilers for languages like BASIC for a long
time. The first LISP compiler is from the early 60s:
[http://www.bitsavers.org/pdf/mit/ai/aim/AIM-039.pdf](http://www.bitsavers.org/pdf/mit/ai/aim/AIM-039.pdf)

Scheme compilers are from the 70s. Rabbit for Scheme:
[https://dspace.mit.edu/handle/1721.1/6913](https://dspace.mit.edu/handle/1721.1/6913)

> We conclude that ahead-of-time compilation is a viable alternative to
> interpretation of dynamic languages

Especially when the AOT compiler can be used incremental and the program can
be extended/modified incrementally. See sbcl.

> The project then showed that it is possible to compile this dynamic language
> ahead-of-time to a native binary, using a statically typed language as an
> inter- mediate step.

This has been known for a long time.

For example Kyoto Common Lisp from 1985:
[http://www.softwarepreservation.org/projects/LISP/kcl/doc/kc...](http://www.softwarepreservation.org/projects/LISP/kcl/doc/kcl-
report.pdf)

Design and Implementation of KCL:

[http://www.softwarepreservation.org/projects/LISP/kcl/paper/...](http://www.softwarepreservation.org/projects/LISP/kcl/paper/kcl-
paper.pdf)

KCL has been morphed into newer implementations like ECL and CLASP (addresses
LLVM) over the years.

Or CLICC:

[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=522...](http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=522E39082F536812D84D2879C35004B8?doi=10.1.1.38.1282&rep=rep1&type=pdf)

Or Stalin for an attempt on a highly optimizing whole-program compiler for
Scheme: [https://github.com/barak/stalin](https://github.com/barak/stalin)

> Overall the project found that compiling a dynamic language ahead-of-time is
> a viable alternative to interpreting the language

See SBCL: [http://www.sbcl.org](http://www.sbcl.org)

~~~
rtjalkbo
Co-author of the thesis here - we wrote it almost 6 years ago, so my memory
may be a little rusty :-)

The question we dealt with was how to compile a weakly, implicitly and
dynamically typed language (see the definitions in the thesis, but basically a
language where the variable types cannot be statically determined in the
general case and will be coerced if the run-time type does not match operator
requirements) in a manner that is more efficient than simply interpreting the
program source code.

I do not recall seeing any related work with regards to BASIC and Lisp at the
time, however, we may very well have overlooked something. Thank you for the
references!

~~~
lispm
Lisp is strongly, implicit and dynamically typed, with type coercions. For
example it provides generic arithmetic:

    
    
        * (let ((a 1)              ; integer
                (b 2.0)            ; default float
                (c 1/3)            ; ratio
                (d #c(1.2 2.3)))   ; complex number
            (+ a b c d))
        #C(4.5333333 2.3)
    

Variables A, B, C, D have no type declarations. The values usually carry type
information. The + operation will take any numeric value and create a result
value of a type it chooses. for example (+ 1/2 1/2) will result to 1. Adding
ratios might create a ratio or an integer.

There is a whole bunch of literature on compiling Scheme:

[https://github.com/scheme-
live/library.readscheme.org/blob/m...](https://github.com/scheme-
live/library.readscheme.org/blob/master/page8.md)

~~~
rtjalkbo
Thanks for the link - a lot of interesting literature to dig into!

Generally with unknown (at compile-time) variable types, you need to box the
variables (carry type information in addition to the value). The operators may
then either work on the boxed variables and choose behaviour based on the type
information or the operators may be specialized in many versions to work on
the unboxed variables (this requires the run-time to dispatch to the correct
specialized version, if the types cannot be determined statically).

This is generally a trade-off between space and execution time - if the number
of possible types are low (either because of a limited type system or because
the possible different types can be determined statically), then it may make
sense to specialize.

In JS, in addition to mutable types and values for each variable, you also
have a challenge with variable scope. It is possible to introduce new
variables in the global scope from a local scope, so depending on run-time
values, a variable for a given statement may or may not have been declared.

An example from the thesis:

    
    
      function f(){
        a = 5;
      }
    
      function g(){
        console.log(a); 
      }
    
      if(x){
        f();
      }
    
      g();
    
    

Assuming that 'a' was not declared elsewhere also, the call to 'g()' will
either print out the value of 'a' (ie. "5") or will result in a run-time
error.

~~~
lispm
> you need to box the variables (carry type information in addition to the
> value).

Dealing with boxing/unboxing and trying to minimize boxing operations in
computations is regularly done in Lisp compilers.

> It is possible to introduce new variables in the global scope from a local
> scope

just like in Lisp. Generally Javascript and Lisp have a lot in common. Scoping
rules are different though and Lisp is not object-oriented at the core - but
provides closures or adds object-oriented extensions like CLOS which are semi-
optional.

    
    
      * (defun f ()
          (setf a 5))
      ; in: DEFUN F
      ;     (SETF A 5)
      ; ==>
      ;   (SETQ A 5)
      ; 
      ; caught WARNING:
      ;   undefined variable: COMMON-LISP-USER::A
      ; 
      ; compilation unit finished
      ;   Undefined variable:
      ;     A
      ;   caught 1 WARNING condition
      F
    
    
      * (defun g ()
          (print a))
      ; in: DEFUN G
      ;     (PRINT A)
      ; 
      ; caught WARNING:
      ;   undefined variable: COMMON-LISP-USER::A
      ; 
      ; compilation unit finished
      ;   Undefined variable:
      ;     A
      ;   caught 1 WARNING condition
    
    

As you can see the compiler warns about undefined variables, but deals with
it.

    
    
      * (if (> (random 1.0) 0.5) (f))
      5
      * (g)
    
      5 
      5
    

If we remove the binding of A, then we get a runtime error.

    
    
      * (makunbound 'a)
      A
      * (g)
    
      debugger invoked on a UNBOUND-VARIABLE in thread
      #<THREAD "main thread" RUNNING {10005205B3}>:
        The variable A is unbound.
    
      Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
    
      restarts (invokable by number or by possibly-abbreviated name):
        0: [CONTINUE   ] Retry using A.
        1: [USE-VALUE  ] Use specified value.
        2: [STORE-VALUE] Set specified value and use it.
        3: [ABORT      ] Exit debugger, returning to top level.
    
      (G)
         source: (PRINT A)
      0] 
    

As one can see, both F and G are actually compiled machine code functions.
Both functions we directly AOT compiled to machine code when I entered them at
the prompt.

    
    
      * (disassemble #'f)
      ; disassembly for F
      ; Size: 38 bytes. Origin: #x226D3C9C
      ; 9C:       498B4540         MOV RAX, [R13+64]                ; no-arg-parsing entry point
                                                                    ; thread.binding-stack-pointer
      ; A0:       488945F8         MOV [RBP-8], RAX
      ; A4:       488B15A5FFFFFF   MOV RDX, [RIP-91]                ; 'A
      ; AB:       BF0A000000       MOV EDI, 10
      ; B0:       B904000000       MOV ECX, 4
      ; B5:       FF7508           PUSH QWORD PTR [RBP+8]
      ; B8:       B898914F22       MOV EAX, #x224F9198              ; #<FDEFN SET>
      ; BD:       FFE0             JMP RAX
      ; BF:       0F0B0F           BREAK 15                         ; Invalid argument count trap
      NIL
      * (disassemble #'g)
      ; disassembly for G
      ; Size: 57 bytes. Origin: #x226D3D3C
      ; 3C:       498B4540         MOV RAX, [R13+64]                ; no-arg-parsing entry point
                                                                    ; thread.binding-stack-pointer
      ; 40:       488945F8         MOV [RBP-8], RAX
      ; 44:       488B05A5FFFFFF   MOV RAX, [RIP-91]                ; 'A
      ; 4B:       8B50F5           MOV EDX, [RAX-11]
      ; 4E:       4A8B142A         MOV RDX, [RDX+R13]
      ; 52:       83FA61           CMP EDX, 97
      ; 55:       480F4450F9       CMOVEQ RDX, [RAX-7]
      ; 5A:       83FA51           CMP EDX, 81
      ; 5D:       7412             JEQ L0
      ; 5F:       B902000000       MOV ECX, 2
      ; 64:       FF7508           PUSH QWORD PTR [RBP+8]
      ; 67:       B838555422       MOV EAX, #x22545538              ; #<FDEFN PRINT>
      ; 6C:       FFE0             JMP RAX
      ; 6E:       0F0B0F           BREAK 15                         ; Invalid argument count trap
      ; 71: L0:   0F0B17           BREAK 23                         ; UNBOUND-SYMBOL-ERROR
      ; 74:       00               BYTE #X00                        ; RAX
      NIL

------
home_project123
Shouldn't there be a lot of previous work on this?

I just skimmed the TOC, and it looks interesting, but where is a survey of
existing work?

~~~
rtjalkbo
See "2.6 related works" :-)

Though other comments here have suggested some missing references for Lisp in
particular.

~~~
p_l
Lisp, Smalltalk, Self, and more. Huge body of research that appears nobody
pointed out to authors back when this was written (it's a BSc work if I
understand correctly, and supervisor should have pointed literature out...)

Good work otherwise :-)

------
faissaloo
This is super cool, I worked on something similar a while back
([https://github.com/faissaloo/Layne](https://github.com/faissaloo/Layne)).
Didn't finish it however because I found D, which did alot of what I wanted
better.

------
ptr
Interesting to see this paper here, I remember reading (and citing it) in my
thesis where I implemented various optimizations in an AoT LLVM compiler for a
dynamic language. Largest improvements were had by “inlining” values into the
address of a reference; this eased the pressure on the GC tremendously.

Was a lot of fun.

------
blacksqr
Also see quadcode compiler for Tcl: compiles Tcl bytecode into LLVM IR (and
thence to native code).

[https://wiki.tcl-lang.org/page/tclquadcode](https://wiki.tcl-
lang.org/page/tclquadcode)

