
Implementing Fast Interpreters  - malloc47
http://nominolo.blogspot.com/2012/07/implementing-fast-interpreters.html
======
silentbicycle
This post by Mike Pall has a bit more depth about the assembly interpreter
details, particularly about stuff like branch prediction:
<http://article.gmane.org/gmane.comp.lang.lua.general/75426>

~~~
reginaldo
This article is awesome. I had read it before but lost track of it. It was
nice reading it again. Mike's code is a bliss to read, I don't know how to
explain, it shows such a clear way of thinking... I believe one of the best
ways to learn about dynamic language implementation this days is by reading
his code and the things he writes when participating in discussion forums. He
is _the real deal_.

For an example see: <http://lambda-the-ultimate.org/node/3851> (he goes by
MikePall - without spaces)

~~~
gruseom
His writing is exceedingly lucid as well. Thanks for that link – there is a
lot of good stuff in that thread. I found it interesting, for example, that
Mike said that there's no intrinsic reason a JS JIT couldn't compete with
LuaJIT.

~~~
mraleph
You might be surprised but V8 these days is pretty close to LJ2 performance.
LJ2 does amazingly good job on loopy code but is easily outperformed by V8 on
OOPy polymorphic one: e.g. DeltaBlue benchmark ported to Lua was something
like 5-10 times slower on LJ2 than on V8 last time I checked.

[here is my port of DeltaBlue: <https://github.com/mraleph/deltablue.lua>, I
admit that I might have screwed up porting it, but internal benchmark
verification checks did not catch anything]

~~~
gruseom
So the difference between V8 and LJ2 is more in what kinds of optimization
they prioritize, than in overall superior performance by LJ2? Yes, I am
surprised to hear that, in a good way. The interesting thing about LJ2 (as
distinct from Lua the language) is that it broke such impressive new ground
for what is possible in optimizing dynamic languages.

------
yoklov
Interesting article. If you're interested in the labels as values approach he
mentioned, I read a recent article by Eli Bendersky, here:
[http://eli.thegreenplace.net/2012/07/12/computed-goto-for-
ef...](http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-
dispatch-tables/)

------
SoftwareMaven
How do projects like Pypy help this? I know Pypy wants to be able to build
fast, cross-platform interpreters; but how close to reality is that?

~~~
corysama
You can write an interpreter in RPython and PyPy will automatically generate a
just-in-time compiler for your interpreter.

[http://morepypy.blogspot.com/2011/04/tutorial-writing-
interp...](http://morepypy.blogspot.com/2011/04/tutorial-writing-interpreter-
with-pypy.html)

[http://morepypy.blogspot.com/2011/04/tutorial-
part-2-adding-...](http://morepypy.blogspot.com/2011/04/tutorial-
part-2-adding-jit.html)

[http://tratt.net/laurie/tech_articles/articles/fast_enough_v...](http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time)

[http://tratt.net/laurie/research/talks/2012/kent_fast_enough...](http://tratt.net/laurie/research/talks/2012/kent_fast_enough.pdf)

------
chj
"LuaJIT 2 has interpreters for 6 architectures of around 4000K lines per
architecture (ARMv6, MIPS, PPC, PPCSPE/Cell, x86, x86-64)."

Is that number 4000K correct? That sounds awfully big for me.

~~~
ctz
400K or 40K seems more likely. Firstly, webkit is ~4M lines. Secondly, nobody
would write '4000K' instead of '4M'.

~~~
sanxiyn
It's actually 4K.

------
_sh
Andy Wingo's exploration of V8's Lithium interpreter [1] describes the
reasoning behind the Ruby offline assembly generator.

Actually, if you're reading the original linked article, you should also be
reading Andy's series on V8.

[1] [http://wingolog.org/archives/2012/06/27/inside-
javascriptcor...](http://wingolog.org/archives/2012/06/27/inside-
javascriptcores-low-level-interpreter)

~~~
mraleph
The article you are linking to is about LLInt --- JavaScriptCore's
interpreter.

V8 does not have an interpreter. Lithium is V8's (low level) intermediate
representation.

~~~
_sh
Thanks for that clarification--I find myself confusing V8 for JSC every now
and then.

------
znmeb
Anton Ertl and David Gregg cracked this nut a _long_ time ago with gForth /
vmgen! It's only when you add registers to their basic two-stack model, like
Parrot does, that you gain any more speed.

