

The HipHop Virtual Machine - kmavm
https://research.facebook.com/publications/665463800219144/the-hiphop-virtual-machine/

======
rurban
Some comments from the maintainer of parrot, p2 and perl B::CC which do
similar things:

tracelets instead of basic blocks analysis sounds interesting, but php is
still doomed by not allowing optional types. In-house code can easily be
optimized by explicit types. The AUTOLOAD problem is a big one, and I am just
planning to tackle it, but came to the same design decisions mostly. We are
compiling modules, files as this is easiest to handle. My p2 jit has no type
guards and seperate specialized methods yet, I rather support optional early
binding, a jitted method cache and small tagged data, which doesn't fill up
the cache that much. It outperforms java and clr by far, just luajit is ahead.

With the static B::CC, type inference has the same problem as php, but has the
same performance advantages as hhpc, but I added special syntax for typed and
sized arrays, and to disallow too much runtime magic. The current production
compiler at Cpanel only uses better data layout to get its performance boost
at startup and overall memory usage. Readonly strings and hash keys mostly.
Perfect hashes not yet. IMHO most important is smaller data and ops overhead,
not the optimizer.

------
joeguilmette
I'm migrating to HHVM across my whole stack. Massive performance gains, only
issues have been with some plugins that I was able to pretty easily work
around.

Good stuff! Uncached, HHVM outperforms my cached php-fpm sites.

~~~
wldlyinaccurate
I've been doing the same and so far have seen big performance gains as well.
Using HHVM also has the benefit of being able to "tack on" any Hack code in
the future if (for example) you have some important code which could benefit
from static typing.

------
fijal
Question to the authors: Any reason why hippyvm is not included for comparison
in the paper? It does usually outperform hhvm on those benchmarks (but not on
real world use cases which is maybe a good reason to include real world use
cases more into such papers).

------
thejosh
We're trialing HHVM, so far the results are between 2x and 20x faster.

Great leaps have been made over the last year and now is very stable.

------
zaptheimpaler
This is awesome! I am not familiar with how compilers/VMs are generally
implemented, but the tracelets and guards idea strikes me as very general - in
particular, it looks like this approach could be applied to provide type
inference to any dynamically typed language. Has this kind of thing been tried
before?

In terms of optimization, is it possible to create tracelets that are not
continuous regions in code? Roughly, if you identify two non contiguous
tracelets, having the same inputs, and can guarantee the inputs havent changed
in between, then you could merge them together. Because bigger tracelets would
mean less guards and better performance.

~~~
rayiner
EDIT: That's what I get for not reading the paper first.

~~~
apaprocki
The tracing JIT was removed from Spidermonkey because it was brittle and
didn't perform as well as the traditional JIT compiler(s) (Jaeger..., Ion...)
that replaced it.

------
__Joker
I assume latent type means simply the implicit typing. i.e. you declare a
variable with the value, and the type is inferred through the value of the
variable rather than the explicit type declared.

~~~
kmavm
Hi, I'm one of the paper's co-authors.

As usual, the abstract really isn't enough to draw big conclusions from. The
concept of a latent type applies to arbitrary expressions, not just variables.
All the values flowing through a PHP program implicitly share the same union
type; they can be floats, strings, arrays, etc. The latent types are the
narrower types that can actually flow through the program in practice.

To be clear, it is more general than just things like:

    
    
      $a = 0; // $a is an int!
    

It includes learning that:

    
    
      g(foo() . bar());
    

foo() and bar() return strings. Since none of this information is marked
syntactically in PHP, and since it might actually be undecidable because of
dynamic control flow in the callees, dynamic binding, etc., you really need to
see the program run to do this stuff.

~~~
amelius
I'm wondering: how does this compare speed-wise to other real-world JITs. For
example, node.js (V8). Are there any tricks used in your VM which are not used
by V8, and conversely?

------
beagle3
So, a couple of questions:

a. Does the HHVM JIT do anything that LuaJIT doesn't? I assume you are
familiar with LuaJIT, as it is mentioned in the paper - and from a quick scan,
the only two things I didn't recognize from LuaJIT were the refcount
optimizations (not required by Lua GC) and guard relaxation.

b. Is HHIR tied to Php, or is it usable as a general purpose JIT backend?
LuaJIT's IR is, (unfortunately for other languages) tied very strongly to Lua
semantics.

Thanks for an interesting read!

~~~
nly
It's a shame. As far as I can tell, the only proven multi-language JIT for
FOSS platforms is still the JVM? (and Mono maybe?)

Despite a lot of that's gone in to Parrot, PyPy, v8, etc, none of these VMs
seem to have _really_ taken off beyond the language they were intended for.
Somewhat more sad is that pretty much all de-facto default runtimes for
popular dynamic languages (except Javascript) are still interpreters... the
cost of being portable.

~~~
beagle3
I wouldn't say the JVM is a proven multi-language JIT, FOSS or not.

It works for multiple languages that go the extra mile - e.g. Jython pays very
dearly in performance because of the object model mismatch between Python and
Java. And I haven't looked closely recently, but when Clojure first came out,
the impedance mismatch between Clojure's persistent data structures and Java's
mutable ones also had a ridiculous performance cost.

Mono is perhaps more deserving of that title, especially together with IKVM,
(so, everything JVM), F#, IronPython and Boo. But note that everything is
still shackled to the underlying object model.

But if anything, LLVM is the only proven multi-language JIT for FOSS
platforms; it's method-at-a-time, which is great for staticly typed languages
(whether those types are declared upfront or inferred), not so much for
extremely dynamic languages.

~~~
papercrane
Scala is pretty widely used. JRuby seems to do quite well as well.

Future times should be interesting as well. I'd be interested to know if the
work on value types in the JVM would be useful for Clojure.

------
q_no
I'm using HHVM in production for a few weeks now and all I can say is I'm very
happy with the results. It's running stable (compiled under CentOS7) and I was
able to cut the response times in half (~145ms with php-fpm, ~70ms HHVM).

~~~
reeze_xia
What is the version of your PHP?

~~~
q_no
[root@hostname:~]# php -v PHP 5.5.18 (cli) (built: Oct 16 2014 12:09:38)
Copyright (c) 1997-2014 The PHP Group Zend Engine v2.5.0, Copyright (c)
1998-2014 Zend Technologies

~~~
reeze_xia
Great! With opcache enabled right?

------
Shish2k
I've been trying this on my few remaining PHP based sites and finding it
great. Can we now get the same for Python? ;-)

~~~
jhgg
Have you taken a look at Pypy? It's even mentioned in the paper!
[http://pypy.org/](http://pypy.org/)

