
The Fastest VM Bytecode Interpreter (2010) - wtbob
http://byteworm.com/2010/11/21/the-fastest-vm-bytecode-interpreter/
======
gopalv

      for (pr = pn, ip = i;; ip++) {
          op = &optable[ip->op];

...

That looks like a switch-loop interpreter, which is considerably slower than a
direct threaded interpreter.

GCC supports goto * * which is a special mechanism that allows for going even
faster for the inner loop.

You can see a very similar comparison, comparing the Mono JIT vs an
interpreter from the early 2000s here - [http://rhysweatherley.sys-
con.com/node/38831](http://rhysweatherley.sys-con.com/node/38831)

~~~
phkahler
I wrote a direct threaded interpreter where the code block for each opcode
included fetch, increment, jump at the end. One possible problem is that
execution never passes through a central point any more, which is of course
why it's so fast. Then a friend pointed out the fastest way to read opcode,
get branch target, and jump is to replace all the opcodes (create a second
table) with the addresses of their implementations. This obviously saves an
extra lookup, but the real eyeopener was that you can do PC++, fetch, jump all
with one instruction. Use the host stack pointer as the PC in the interpreter
and this become ret - return grabs an address, increments the pointer, and
branches all in one. Of course you get program corruption on interrupts then.
Needs to be either a system you have complete control over, or one with a
second stack pointer.

~~~
peterfirefly
[http://flint.cs.yale.edu/jvmsem/doc/threaded.ps](http://flint.cs.yale.edu/jvmsem/doc/threaded.ps)

You should also try to play nice with the branch prediction machinery. Doing
unbalanced rets isn't good. Doing all the jumps or calls from a single jump or
call site isn't good, either. Spread them around so you only have one
destination per call site. The method in the above paper is pretty much the
ultimate you can easily do without rolling out the really heavy machinery.

~~~
agumonkey
online viewer ->
[http://view.samurajdata.se/psview.php?id=abaee002&page=1&siz...](http://view.samurajdata.se/psview.php?id=abaee002&page=1&size=full&all=1)

~~~
peterfirefly
Right. I forget that some people use Windows and that Windows machines usually
don't have PostScript viewers installed.

~~~
agumonkey
It's funny that it's such a nice, simple and old format, yet it's rarely
supported natively. I wonder if browser pdf plugins can manage ps.

------
MaulingMonkey
Historical link to mono's interpreter source (first link in the article tries
to find it in master, where it's been removed):

[https://github.com/mono/mono/blob/e85609a84907dc3a919bac289a...](https://github.com/mono/mono/blob/e85609a84907dc3a919bac289a379f388b5eab1f/mono/interpreter/transform.c)

------
ajbonkoski
Argh, I really dislike this article. It spreads a LOT of misinformation:

    
    
      1) His friend did NOT write a bytecode interpretor, he wrote a bytecode compiler: there is a HUGE difference between those two.
    
      2) No, it wasn't faster than assembly code. And the arrogance and audacity to make such a claim is obnoxious. Unless you gained access to Intel microcode, this is just plain wrong.
    

I wish I could downvote this: its just bad for aspiring compiler/interpreter
developers. If you want some quality discussion see Mike Pall:
[http://article.gmane.org/gmane.comp.lang.lua.general/75426](http://article.gmane.org/gmane.comp.lang.lua.general/75426)

------
swah
2010 (the link to the Mono source is broken now)

~~~
dang
Thanks, added.

