
Speeding up PHP with the HipHop VM - kmavm
https://www.facebook.com/notes/facebook-engineering/speeding-up-php-based-development-with-hiphop-vm/10151170460698920
======
mcmire
Does anyone else think this is a tremendous waste of time for Facebook? I
mean, obviously PHP powers a lot of stuff at Facebook, I'm sure there are
zillions of lines of code they can't just replace today, and now they're
forced to make it scale. I don't mean to be negative -- building faster,
better systems is inspiring, and the stuff they are doing with PHP is pretty
neat, and there are really smart people trying to figure this stuff out. But,
you have to ask, why are they still using PHP?? Why not use some of the new
stuff that's out there now or heck, why not go with the JVM instead of
reinventing the wheel here?

No I'm genuinely asking. Isn't some of the stuff here already being done by
other languages, or is Facebook really breaking new ground here? (yes lbrandy
I saw your earlier comment)

~~~
mitchellh
My personal opinion on it is that this is in fact a very efficient way of
doing things at Facebook. Facebook probably doesn't hire more than a couple
people working on HHVM. Let's say their salaries individually are around
$100,000 a year. That's cheap for them versus the value they're getting out of
it.

Facebook probably has millions of lines of PHP code, PHP code proven to work.
This isn't just Facebook.com, but internal websites, their bug tracker
software (that is open source), analytical software for PHP, etc. etc.
Rewriting all that would be _expensive_.

Running it on normal Zend PHP, based on the benchmarks they get on HipHop,
requires _way_ more hardware at the scale of Facebook, so again, very
expensive.

PHP developers are abundant, PHP is a relatively easy language to teach,
resources are everywhere. Due to their abundance, PHP developers (even at
Facebook quality) are probably less expensive than other languages. Easier
hiring and lower cost developers, cheaper.

------
ck2
By the way, there is an alternative to HipHop that is actually easier to
implement because it makes php extensions.

It's called PHC <http://phpcompiler.org> and it's open source.

It was Paul Biggar's PhD thesis [http://blog.paulbiggar.com/archive/a-rant-
about-php-compiler...](http://blog.paulbiggar.com/archive/a-rant-about-php-
compilers-in-general-and-hiphop-in-particular/)

It just needs some community contributions to catch up.

~~~
pbiggar
While you are very kind to say so, even the original Hiphop blew past what phc
could do. Importantly, phc did not support the full PHP language (we had basic
object support, and no support for magic methods).

There were two cool things about phc: that it had a really advanced static
analyzer, and that it compiled to modules compatible with zend. However, these
are somewhat mutually exclusive, and making them work well together is an open
problem (though I had some ideas).

I moved on from phc, and there's no-one left in the community really. I now
run <https://circleci.com> (applying my compiler knowledge in a different
way). I tried to pass the reigns on, but nobody stepped up. Hard to find
people who love PHP but are also able to hack on compilers.

~~~
runekaagaard
Slightly OT, sorry. phc was such a nice project, and vastly unappreciated.
Your cleanup of PHP's grammar was a piece of art which I spend a long time
perusing through and trying to understand! phc could have largely benefited
the PHP community as a whole and it's such a shame it couldn't gain (enough)
traction!

Never got a chance to say thx, so THX!

~~~
pbiggar
I'm glad you liked it. FYI, I cant claim credit for the parser. The front end
(indeed it was a work of beauty) was done by phc's founders (Edsko de Vries
and John Gilbert) before I got there.

------
pbiggar
This is curious. Their trace-based approach looks like Mozilla's Tracemonkey -
even using the same terminology like side-exits. Mozilla discontinued
Tracemonkey because it was really good for deep loops and not much else. They
moved to JaegerMonkey, which is a method-compiled VM like v8 (at the time),
and are now moving to IonMonkey, which is a best-of-all-worlds version.

So I'd love to hear why using a circa-2009 technology was the right one? Is
PHP sufficiently different from Javascript for this to make sense (as someone
who has worked on VMs for both, I think there's a good chance of that)? Why
not use a method-compiler instead? Very interested in the answers and
comparison to other JITs out there, if any HHVM people are here.

~~~
pbiggar
One more thing - the use of a stack-based bytecode is also curious. PHP itself
(the Zend engine) uses a register-based bytecode (well, it uses a horror-show
of an in-memory bytecode-like-thing, which could best be described as a
register-based bytecode). The PHP engine details leak into the language a lot,
so the difficulties they describe aren't unexpected.

Not only that, but bleeding-edge JITs typically use register-based bytecode.
Java's JIT and all the modern versions of Javascript VMs use register-based
(in many cases actually translating from stack-based bytecode that the parser
outputs, to register-based bytecode consumed by the JIT compilers). Since
2003, when my PhD advisor and his co-authors (Google "David Gregg" and "Anton
Ertl") showed register VMs could be much more efficient that stack-based ones,
the jury has settled on register-based VMs.

So I'm curious as to why a stack-based instruction set was used in the VM
design?

~~~
dparoski
As noted elsewhere in this thread, a stack-based design typically produces
more compact bytecode. Compactness was a concern for us because of the size of
FB's PHP code base. Also, generally speaking a stack-based design tends to be
easier to deal with when working to get a prototype VM up and running quickly.

Many of the advantages of register-based designs (ability to optimize by
rewriting the program at the bytecode level, ability to write faster
interpreters, ability to map bytecode registers to physical registers, etc.)
weren't particularly attractive to us because we knew we were going to build
an x64 JIT that did its own analysis and optimization to take advantage of
type information observed at run time.

Thus, we drafted a stack-based design for HipHop bytecode. It captured PHP's
semantics correctly and happened to fit in fairly well with PHP's evaluation
order, so we ran with it and here we are.

~~~
pbiggar
Makes total sense, thanks!

Would using a register-based bytecode not have been useful for the x64 JIT?

~~~
kmavm
Check out our HHIR work. It's an SSA-based IR, that gets us most of the
advantages of a register representation. But it is at a much lower level than
the bytecode.

------
nicholassmith
Amazed at all the 'engineer X policy or feature request slackers'. Can't say
I'm the biggest fan of Facebook as a service but what their engineers are
doing in terms of pushing the state of the art is fantastic.

Out of curiosity if there's anyone involved in Facebook on HipHop, has there
ever been a discussion about just shifting from PHP to a more performant
language, or is a case of still reaping the benefits of PHP in terms of
dropping a developer in and not worrying about skill sets?

~~~
lbrandy
> ...about just shifting from PHP to a more performant language

We already shift lots of things out of PHP into more performant (c++, java,
etc) languages when we are building services and/or extensions and/or other
computationally intensive things.

But I think more to your point (and if not, let me indulge in the strawman),
since this comes up frequently on reddit/hn: php is not somehow -uniquely-
broken. While PHP might be uniquely broken as a language (ha ha), it's not
-uniquely- broken as a platform/runtime. Everything you think is performant is
broken at a large enough scale. Put another way, it's not just about the
switching costs of rewriting in some other language. If we could magically
snap our fingers and convert the entire codebase from php to some other
language, that's just the beginning of understanding, tweaking, and
occasionally rebuilding everything about the runtime/libraries/etc to make it
work.

At the end of the day the language itself is given way too much attention in
discussions like this.

~~~
dkhenry
So is your argument then that there is so much vested in facebook in dealing
with and optimizing for PHP at scale that the real cost would be acquiring
that knowledge a second time on a different platform since you _will_ run into
scale problems no matter what language you use ?

From my experience in dealing with PHP performance warts. You can make it
somewhat fast, but man the insides are so messed up every time I get into the
core and try to do anything I want to rip my hair out.

------
fomojola
I'm curious if they were still evolving HPHPc at the same time as they were
evolving HHVM: the chart shows HPHPc as having flat performance over time,
while HHVM was getting better over a 7-month period. Could HPHPc have achieved
the same performance gains if the same effort was expended?

~~~
kmavm
Hey, I'm an HHVM engineer.

We just normalized HPHPc performance to make the graph easier to read. HPHPc
actually got considerably faster over the same period as well; since both
systems share a runtime, many changes helped both.

HPHPc probably got about 20% faster over 2012, even though nobody was actively
working on it, mostly through happy side effects of work that was directed at
HHVM.

------
maratd
Things make sense now. I remember a while back Facebook invested in an
experiment with PHP on PyPy. They didn't pursue it, even though it produced
impressive results. It seems their own in-house JIT has better performance?

~~~
kmavm
HHVM was pretty far along when we started talking to the PyPy folks; it was
already able to run the site, and hosting internal development at Facebook.
Our interest in PyPy wasn't an immediate, drop-everything-and-change-to-PyPy
kind of interest. It was a research project, which had a positive outcome.
Making a production-ready, PyPy-based system for PHP would still be an
enormously big undertaking, though.

PyPy is taking a radically different approach from what HHVM is doing (and
really from what almost all other dynamic language systems are doing), and
it's a fascinating system. Part of what's exciting about it is that it seems
like it should be applicable to other languages with less effort than most
other JITs, and we wanted to understand its potential for a language like PHP.
We asked Maciej Fijalkowski (hi, Maciej, if you're reading!) to help us do a
research prototype to see what the first few roadblocks would look like, and
Maciej did a great job. Just because we didn't scrap our current project and
shift all our resources to PyPy should not be seen as a negative reflection on
PyPy at all.

~~~
bascule
I'm kind of surprised you didn't pursue the JVM and InvokeDynamic for this
purpose instead of PyPy

~~~
aardvark179
I think there is a project of that sort going on, there was a guy from
Facebook at the jvm language summit this year.

------
maratd
For those who want to get their hands dirty:

CentOS 6.3 x64: [https://github.com/facebook/hiphop-php/wiki/Building-and-
ins...](https://github.com/facebook/hiphop-php/wiki/Building-and-installing-
HHVM-on-CentOS-6.3)

Ubuntu 12.04 x64: [https://github.com/facebook/hiphop-php/wiki/Building-and-
ins...](https://github.com/facebook/hiphop-php/wiki/Building-and-installing-
HHVM-on-Ubuntu-12.04)

~~~
snissn
Instructions for Ubuntu 12.04 x64: <http://pastie.org/5456298>

------
rynop
First off thanks for all the hard work. Do you have a list of the php
extensions you support? I’m wondering if things like cURL, memcache, pdo MySQL
etc are supported. I found [https://github.com/facebook/hiphop-
php/wiki/Extensions-and-m...](https://github.com/facebook/hiphop-
php/wiki/Extensions-and-modules-roadmap) but its a bit outdated (last mod 2yrs
ago).

Also wondering what APC methods you support.

~~~
rynop
Found ans here: [https://github.com/facebook/hiphop-
php/tree/master/src/runti...](https://github.com/facebook/hiphop-
php/tree/master/src/runtime/ext)

------
timdorr
What version of PHP does HPHP match? 5.3 or 5.4? I ask because I've become
very accustomed the short array syntax ([1,2,3]) and other goodies in 5.4.

~~~
kmavm
It is closer to 5.3, though we've adopted some 5.4 features: traits, our
closures' treatment of $this, and f()[$x] syntax.

Edit: notably not short array syntax, at least yet.

~~~
dkhenry
This makes me sad. That's one of the nicest features of 5.4 not needing to
type array() all over the place.

~~~
debacle
It's a relatively trivial feature to implement (really just syntactical
sugar), it probably just wasn't prioritized.

------
olaf
"So, when you combine XHP with HipHop PHP you can start to imagine that the
performance penalty would be a lot less than 75% and it becomes a viable
approach." <http://toys.lerdorf.com/archives/54-A-quick-look-at-XHP.html>

~~~
debacle
That's from 2 years ago. What is its relevance now?

------
bretthellman
Will FB ever invest and move off PHP? Hiring talent at FB has to be getting
harder and PHP probably isn't helping.

~~~
untog
I think you underestimate the number of PHP developers out there. There aren't
many on Hacker News because we're all _too cool_ for that stuff, but in the
developer world in general there are a ton.

~~~
MichaelGG
I think his point is that real top-tier developers are going to despite
working with such a grotty language as PHP, so it hurts their hiring process
that way. Whereas if they worked in a superior language (an ML, Haskell, a
LISP...) they'd attract talent. Who wants to deal with a language such such
inane design decisions all day?

I'd imagine people signing up to work at FB do so despite the fact they use
PHP, not because of it.

~~~
Firehed
I just don't understand the hate. Like any language it has some syntactic
quirks and inconsistencies. There's no way to objectively say that one
language is superior to another, because there are so many metrics that
contribute to superiority and every person in the world is going to weight
those various metrics slightly differently.

I imagine it's neither "despite" nor "because" of PHP use; any top-tier
developer is going to use the tools available to him/her, rather than pick an
employer in order to use a specific set of tools. Lower-quality developers may
do so, but that's probably because they're less likely to know the right tool
for the job (eg someone may hate functional programming and avoid jobs that
use it, but that's because they don't know when FP is the ideal way to solve
the problem)

~~~
hottyuilio
You can't objectively say that one language is better than another, but you
can't deny that a language such as Haskell has a level of consistency and
mathematical rigour that PHP will now never attain. Top developers who
Facebook target (normally top CS students from top colleges and PhD programs)
strive to be consistent and mathematically rigorous themselves, and poor
design and math annoy them.

~~~
Firehed
Totally valid point. One of my housemates loves Haskell for very much the same
reasons, and all other things being equal I'd prefer that as well (who
wouldn't?).

But of course, anyone who's worked on production systems knows that all the
mathematical ideals in the world doesn't stop bugfixes from becoming a giant
hairy mess. It's more your culture, skill, and process that allow or prevent
that from getting refactored into a better solution and how long that takes.
Any language can be a part of a clean or messy codebase, although some tend to
skew towards one side or another - but I'd wager that's as much a factor of
barrier to entry as anything else.

------
lectrick
Like putting a rocket on a turd.

------
3825
Why is the blog on facebook?

~~~
kmavm
We work for Facebook, and make a lot of announcements on the engineering blog.
We also have a group blog on WordPress-in-HHVM running here:
<http://www.hiphop-php.com/wp/>

~~~
3825
Didn't realize that until I actually went to the page. I need to learn to keep
my pie hole shut more often.

