
The Fastest VM Bytecode Interpreter - m0th87
http://byteworm.com/2010/11/21/the-fastest-vm-bytecode-interpreter/
======
Groxx
Oh come on.

    
    
      gcc simvm.c
      time ./a.out
      857419840
    
      real	0m0.374s
      user	0m0.359s
      sys	0m0.003s
    

meanwhile:

    
    
      gcc -O3 simvm.c -o fast.out
      time ./fast.out 
      857419840
      
      real	0m0.006s
      user	0m0.001s
      sys	0m0.003s
    

Comparing optimized compiled code against unoptimized compiled code is
worthless. .NET does some optimizations as it runs, and JITs, your standard
`gcc` does very few.

/me goes to comment on the blog.

edit: bah! Foiled!

> _Hmmm, your comment seems a bit spammy. We're not real big on spam around
> here.

Please go back and try again._

edit2: holy cow, that spam filter _hates_ me. I can't get _anything_ to
post...

~~~
FooBarWidget
-O3 is too smart, it optimizes away the loop and transforms the code into printf(the_result).

~~~
Groxx
Hah, nice. Didn't check the resulting code. Know what the other optimization
levels do? I'm don't really know x86 assembler.

I do still side with the optimization though, in that the .NET-VM could do
this as well. The loop is pretty simple, and _can_ be optimized away. Why
shouldn't it?

~~~
bad_user
Your answer received lots of votes for what are basically assumptions.

The author of the article updated his post for -O3 by using a volatile
variable for the for-loop test, which makes the compiler not optimize away the
loop.

For .NET you can look at the generated assembly, and if it does optimize away
the loop you can use the same trick.

No reasons for guessing. We are software engineers after all.

~~~
JoachimSchipper
"volatile" forces the C compiler to issue load and store instructions for each
access (although real-world C compilers are notoriously buggy here), so it
makes the loop _much_ slower. That's not a fair comparison.

------
jasonwatkinspdx
This is a compiler, not an interpreter, as it uses .NET's dynamic assembly api
to emit .NET IL which is then JIT'd to native code. The benchmark is also
likely optimized away by static analysis.

For an actual benchmark of various bytecode interpretation schemes see:

<http://www.complang.tuwien.ac.at/forth/threading/>

These micro benchmarks also take some care to attempt realistic branch
prediction rates.

Running these on current hardware, switch based interpreters still perform
quite well. Direct threading gets you slightly more performance. I'd say that
sticking to ANSI c and just using switch is a good plan. If you need more
performance then you likely should go ahead and implement a native JIT of some
sort or use JVM/.NET.

~~~
stcredzero
_This is a compiler, not an interpreter_

This phrase, what does this mean anymore? Do these things exist in the form of
real, non-toy software anymore? What's the difference between compiling to an
AST, byte code, or LLVM intermediate? I think these words have more to do with
cultural expectations than with VM/compiler technology.

~~~
another
I'm sorry, but there _is_ a qualitative difference between, eg, a Java
application running on a modern JVM and a Python application executing through
the CPython runtime, neither of which are toys.

(Although it's true that CPython is one of the few extant members of a dying
breed.)

The connotations of "compiled" and "interpreted" might have plenty to do with
vague expectations, but the words still mean something in the context of
language implementation: even though it is reasonable to call the situations
you list "compilation", it would not be accurate to call a language that was
compiled to LLVM IR, then interpreted with lli, "compiled".

~~~
stcredzero
_it would not be accurate to call a language that was compiled to LLVM IR,
then interpreted with lli, "compiled"._

Why not? I wouldn't see that as being too different from compiling a Java
application and running it on "a modern JVM." With most examples of that I've
seen, I think I could fairly characterize them as being "compiled."

For some reason, people think of things like MRI 1.8 as being "interpreted"
and expect those things to be slow. One could just as well take the same
language and run it on a tracing JIT VM. (after some considerable engineering)
Semantically, the thing would still be "interpreting bytecodes," just doing it
in a highly optimized way. Note that the step where the MRI reads the source
and creates an AST is is fundamentally the same as parsing source code and
outputting bytecode. There is nothing somehow special or sacred about an
intermediate language in the form of bytecode.

Given the compiler/VM technology we have today, even the degree to which
things are (or aren't) late-bound is more flexible than in the past.

~~~
regularfry
> people think of things like MRI 1.8 as being "interpreted" and expect those
> things to be slow. One could just as well take the same language and run it
> on a tracing JIT VM. (after some considerable engineering)

Off topic, but in case you weren't aware, that "considerable engineering"
exists, and is called JRuby. It works (almost) exactly as you describe. And
yes, it is rather fast :-)

~~~
riffraff
hotspot is not a tracing jit, I believe parent^2 was thinking of
tamarin/tracemonkey 6 the likes.

~~~
regularfry
That'll teach me to reply before coffee :-)

------
steveklabnik
So... who wants to take a crack at why this would be?

~~~
stefano
The VB "interpreter" emits opcode for the .NET VM. This codes is then
optimized and jitted. This means that both these interpreters run native code
in the end, with the difference that the VB version can take advantage of the
optimizations implemented by the .NET JIT.

~~~
rbarooah
Indeed - frankly when I saw it I was surprised the guy had accepted this as
valid in their contest - if he's allowed to use an offboard compiler the
'interpreter' could just as well emit some generated C to a file and pass it
through GCC -O3.

------
fleitz
His 'VB' emits byte code for the .NET VM which is highly optimized. Also,
VB.NET runs on the same VM. VB.NET has a lot of 'features' that are
performance nightmares but it's not that slow if you know what to avoid.

Oh, I didn't see that VB was run on mono, run that code on a windows machine
and I bet it will beat the -O3 optimized C

------
pmjordan
Anyone got a cache of this? Site seems to be getting hammered.

~~~
Groxx
try googling for:

    
    
      cache:url
    

:)

~~~
pmjordan
That does indeed work now (though it didn't when I originally checked).

------
zokier
There is one important lesson here to be learned. Doing stuff at low level
(asm/c) doesn't automagically make your code fast.

