
Reple: “Replay-Based” REPLs for Compiled Languages - xiii1408
https://people.eecs.berkeley.edu/~brock/blog/reple.php
======
lispm
The believe that REPLs are based on interpreters is strange. The Lisp read-
eval-print-loop has been using any mix of compiled and interpreted versions
for several decades.

Something like ECL can compile Lisp incrementally to C, calls the C compiler
and loads the compiled code into Lisp.

SBCL for a long time only had a compiler. The compiler then either compiles to
memory or to files. These files can be loaded into Lisp. These system maintain
much of the state between compiles.

There is zero reason why EVAL can't be implemented by direct-to-machine-code
compiler - like in SBCL.

The SBCL Read-Eval-Print-Loop:

    
    
        * (flet ((my-function (a)
                   (+ a 42)))
            (disassemble (function my-function))
            (my-function 0))
    
        ; disassembly for (FLET MY-FUNCTION)
        ; Size: 37 bytes. Origin: #x1002FC5E98
        ; 98:       498B4D60         MOV RCX, [R13+96]                ; no-arg-parsing entry point
                                                                      ; thread.binding-stack-pointer
        ; 9C:       48894DF0         MOV [RBP-16], RCX
        ; A0:       48895DE8         MOV [RBP-24], RBX
        ; A4:       BF54000000       MOV EDI, 84
        ; A9:       488BD3           MOV RDX, RBX
        ; AC:       FF1425A000B021   CALL QWORD PTR [#x21B000A0]      ; GENERIC-+
        ; B3:       488B5DE8         MOV RBX, [RBP-24]
        ; B7:       488BE5           MOV RSP, RBP
        ; BA:       F8               CLC
        ; BB:       5D               POP RBP
        ; BC:       C3               RET
        42
    

If one wants to execute whole programs, this is also what Lisp usually
supports. One types to an editor and compile-and-load-buffer is a typical
operation. This first compiles the buffer and then loads the code into the
current Lisp. Sometimes output goes to temporary windows.

Some Lisp IDEs queue the editor buffer expressions into a REPL for execution
and may compile them piece by piece.

There is usually very little use for modifying existing output for changes -
though some implementations can do that - like Symbolics Genera, where output
is recorded and deltas can computed to generate minimum updates. This can be
used in custom applications, which have a command loop and an output recording
window.

Usually this is not done, because both the internal and the external state is
changing so much that in general updating the REPL output is useless. Might as
well clear the existing command loop / repl output and generate fresh output.

~~~
mamcx
This is a thing that interest me much (I'm building a relational lang in
rust), I haven't found enough info or help (I have asked like a dozen times
already!) in how do it with compiled languages (the answer is always "do read
this large codebase that solve it)".

If I understand you well, this demand the compiler to be a JIT? And be
responsible to emit and understand assembler?

Is possible to make a generalized solution, that for example, could work for a
transpiler?

Is viable for a compiled language in multiple platforms or arches without a
huge codebase?

What could be a minimal example that could work?

~~~
lispm
These are not JIT compilers. They are AOT compilers. Code gets immediately
compiled to machine code - either automatically or on direct request by the
user.

~~~
mamcx
But then what are the steps for make it real-able? I can see, like this
project, the idea to compile everything and run, but the incremental aspect of
a real is what I can't see how make it.

~~~
klibertp
It's actually quite simple: all you need to do is to have your compiler be
able to compile things smaller than the whole program. In Lisp, the unit of
compilation is function (I think, lispm will correct me if I'm wrong), in
Forth it's a word, in Prolog a rule and in CoffeeScript expression.

If you have that, the only thing that's left is being able to inject the
compiled code into the running REPL process. This is more complicated nowadays
because of restrictions placed on memory by the OS, ie. marking some segments
of memory as executable and non-executable, but fundamentally compiled code is
just a chunk of bytes, so "loading" here means allocating space for it and
placing a pointer to the first byte anywhere reachable by the rest of the
program, probably in some kind of a symbol table. The usual language rules
take care of searching for the symbol's definition, then the implementation
arranges the stack (or does anything else it wants with regards to arguments)
and makes execution jump to the compiled code.

If you used GDB, you'd notice that it's possible to call natively compiled
functions from its prompt, as well as to examine the symbols at the break-
point (at least the ones not compiled away). GDB prompt is a REPL itself,
although it only supports simple expressions IIRC, and the compilers of
supported languages are written to work on the file level, not on the smaller
chunks of code. But if you design a new language and implement a compiler for
it, there's nothing stopping you from re-using compiler internals when writing
a REPL.

The difference between JIT and AOT compilation is that with JIT you first
execute the code, then compile it, while Ahead-of-Time (as the name implies)
compiles code first and executes it later. What Lisp or Forth do is AOT
compilation; for example, SBCL (IIRC) is a compiler-only implementation of
Lisp, so it couldn't do JIT even if it wanted to, because there's no way for
it to execute the code before it gets compiled. It - as mentioned - has a
proper REPL and you'd be hard-pressed to find a difference in its look&feel
compared to Python's.

BTW, writing an interpreter is very similar to writing a compiler: after
lexing and parsing code into an AST you walk it, and then either perform
required actions directly (that's an interpreter) or emit the code which will
perform those actions when handed to a lower-level interpreter (like the CPU,
or a VM). If you design your language from the beginning for it, you should be
able to write both at the same time without much trouble. As an example, my
favorite non-Racket Scheme implementation, Chicken Scheme, has both an
interpreter and AOT compiler, with extensive support for dynamic behavior
(like eval) in both compiled and interpreted code.

Anyway, this is my understanding, although I'm yet to write a compiler, so I'm
not too sure about all the details, but the general outline should be more or
less correct. I hope :)

EDIT: there was a book (which was _not_ SICP) which taught writing
interpreters and compilers and the relation between them, but I've been trying
for half an hour now to recall what it was without success :( I should have
the pdf somewhere, will try to dig it up later.

~~~
mamcx
> If you have that, the only thing that's left is being able to inject the
> compiled code into the running REPL process.

This is alike JIT, where you MMAP and the mark that as executable?

Now thinking a little, could be easier using DLLs/dynamic linking, but how
model the API is something I can't yet figure.

P.D: I think TinyCC is able to compile on memory, but I'm using rust for my
project and probably the gymnastics of this are too much for now.

------
joe_the_user
A shell that reruns the entirety of a compiled language each time you enter a
line.

 _" This has obvious performance disadvantages if you're actually doing
computation inside your REPL instance—for each new line, you'll have to redo
all old computation. However, this technique has other distinct advantages
that make it very desirable for certain use cases."_

Are there any cleaver increment compile/run tricks that could speed this up?

~~~
yorwba
Instead of putting all lines into one big chunk of code that is recompiled and
rerun every time, one could instead compile it into a dynamic library with a
single function that gets the current variable assignments passed as
arguments, executes the single line of code and then waits for input to
generate the next dynamic library, which it then loads and calls with the new
variable assignment.

However, that solution would require the same level of language-specific
knowledge as the modification mentioned below "Limitations" in the blog post.
That's also why I never implemented the idea, despite having carried it around
in my head for a few years now. Hopefully someone else will eventually get
around to implementing it.

~~~
evmar
This is how
[http://neugierig.org/software/c-repl/](http://neugierig.org/software/c-repl/)
worked. (See "How it works" in the README:
[https://github.com/evmar/c-repl/blob/master/README](https://github.com/evmar/c-repl/blob/master/README)
)

~~~
rgoulter
I made use of ANTLR4 for parsing C to write a "worksheet" (like Swift's
Playgrounds) tool: [https://github.com/rgoulter/c-worksheet-
instrumentor](https://github.com/rgoulter/c-worksheet-instrumentor)

It's an example of what you can do with "specific to a language", although
it'd suffer the same limitation of "slower after a big computation".

------
rgoulter
In terms of "value for effort", this sounds like a cool idea.

I'd guess this approach also struggles with multiple-line input like if/else,
loops, functions.

I'm reminded of [https://github.com/jcartledge/sublime-
worksheet](https://github.com/jcartledge/sublime-worksheet) which is language-
agnostic on the other end of things. The two ideas might work well together in
limited cases.

~~~
xiii1408
It's actually pretty easy to detect multi-line inputs as long as they have
clear enclosers (like '{}' for a block in C/C++). This is what reple does.

This doesn't require a parser--you just have to know what the enclosers are
(one of the things that goes in a reple config file) and then count them to
see if they're matched or not.

For languages without Python-like syntax or with enclosers that can be
ambiguous (some dialects of Fortran), this would be pretty difficult.

------
mark_l_watson
OK for some use cases, but not mine. I often run long running computations in
repos, and like to be able to add/modify code. A replay based repl would not
work at all for me.

