
Bytecode interpreters for tiny computers (2007) - johkra
http://www.bentwookie.org/blog/kragen-tol/2007-September/000871.html
======
kragen
It's nice to see that people are interested in this stuff, but it's
disappointing to see that both of the top-level comments are attacking me for
bothering to point out that cons structures are not as good a way to represent
code as some other approaches. I knew this piece was kind of disorganized, but
I didn't realize it was _that_ bad.

So I'll try to summarize here, a bit.

Question: How compact can you make a full-featured interactive programming
environment? And what techniques would you use to do so?

Methods: Quantitative comparison of some sample code written for 17 different
virtual machines, quantitative analysis of the code in a Squeak image to see
what operations need to be most compact, and an implementation of a simple
dictionary.

Tentative answer: A stack-based bytecode with local variables and special
bytecodes for common operations, like Squeak's, provides the best density for
high-level code. Polymorphic method dispatch improves code density. You can
implement the virtual machine for the Squeak-style bytecode more compactly in
Forth-style threaded code than directly in machine code. So you should be able
to get an entire working interactive interpreter for a flexible, convenient
language into 2000 to 6000 bytes.

Does that help?

I'm sorry my logic appeared to be so "lame", but I think the fault is more in
the comprehensibility of the article than in its logic.

------
oconnore
What? There are zero lisp compilers that store COMPILED lisp code as cons
structures. He might as well reference GCC's abstract syntax tree, notice that
it's huge, and quickly dismiss C as appropriate for embedded work.

Most lisp compilers go to machine code. If you want an interesting comparison,
you would have to look at something like CLISP, which has a byte code
representation.

~~~
johkra
How does one represent Lisp as byte code anyway? I had a look at
<http://clisp.cons.org/impnotes/instr-set.html> and this is my idea:

(1 2 3 4) => Push 4; Push 3; Push 2; Push 1 In other words: Lists are simply
pushed onto the stack, probably in reverse order or car would be difficult.

(+ 1 2) => Push 2; Push 1; Call _n_ 2 (where n is the number of the
'+'-function)

Looking at my understanding of the problem, it's directly translatable from
list-based to stack-based and hence equivalent to Forth, isn't it? Please
correct me, if I'm wrong.

Edit: Thinking about it, I have no clue how to represent a nested list on a
stack. How would you represent ((1 2) (3 4))?

~~~
kragen
> Thinking about it, I have no clue how to represent a nested list on a stack.
> How would you represent ((1 2) (3 4))?

In PostScript, it looks like this:

    
    
        GS>[[1 2] [3 4
        GS<5>pstack
        4
        3
        -mark-
        [1 2]
        -mark-
    

The [ pushes a mark onto the stack, and ] allocates an array of the
appropriate size, sticks stuff into the array from the stack, and leaves the
array on the stack. (A pointer to it, if you want to get nitty-gritty.) Two
more ]'s would pack the above up into a single nested array. Perl uses a
similar strategy.

On the other hand, if you're representing (car '((1 2) (3 4))) in Lisp
bytecode (e.g. elisp), it looks more like this:

    
    
        pushconst 83258023
        car
    

That's the approach my Ur-Scheme takes in machine code, too; I just posted the
assembly it generates for that expression at <http://gist.github.com/427828>.
(Ur-Scheme is a very dumb, naïve compiler, like something you might write for
an exercise in a compilers class, and it uses the x86 as a stack machine.)

Finally, if you are representing (list (list 1 2) (list 3 4)) in stack
bytecode, you end up with something like this:

    
    
        pushconst 1
        pushconst 2
        pushconst #list
        call 2
        pushconst 3
        pushconst 4
        pushconst #list
        call 2
        pushconst #list
        call 2
    

You can also use a PostScript-style stack mark instead of an argument count,
and sometimes (e.g. commonly in Smalltalk) the function and argument count are
all packed up in a single bytecode. If the argument count is implicit in the
function (e.g. it's checked at compile-time) you don't need the argument count
at all, but that doesn't allow you to write variadic functions like "list".

------
jmcguckin
Yes, this blog entry is old (ca. '07), but the reasoning is pretty lame. In
particular, the paragraph where he uses a LISP syntax tree as the runtime
representation is a hoot.

I'd suggest he sit down with a copy of LISP IN SMALL PIECES to see how a Lisp
(or any bytecoded vm for that matter) interpreter can be implemented
efficiently.

~~~
kragen
Thanks for the suggestion! I still haven't read Queinnec's book after all
these years. Are you suggesting that I might come to different conclusions
after reading his book?

My conclusion was:

> With this approach, it should be possible to get a very slow language,

> with flexibility something like Python's, into maybe 2000-6000 bytes

> of a microcontroller's ROM. This should allow you to interactively

> get out-of-memory errors with great convenience and flexibility.

