Hacker News new | comments | show | ask | jobs | submit login
Speeding up PHP with the HipHop VM (facebook.com)
203 points by kmavm 1639 days ago | hide | past | web | 118 comments | favorite

Does anyone else think this is a tremendous waste of time for Facebook? I mean, obviously PHP powers a lot of stuff at Facebook, I'm sure there are zillions of lines of code they can't just replace today, and now they're forced to make it scale. I don't mean to be negative -- building faster, better systems is inspiring, and the stuff they are doing with PHP is pretty neat, and there are really smart people trying to figure this stuff out. But, you have to ask, why are they still using PHP?? Why not use some of the new stuff that's out there now or heck, why not go with the JVM instead of reinventing the wheel here?

No I'm genuinely asking. Isn't some of the stuff here already being done by other languages, or is Facebook really breaking new ground here? (yes lbrandy I saw your earlier comment)

My personal opinion on it is that this is in fact a very efficient way of doing things at Facebook. Facebook probably doesn't hire more than a couple people working on HHVM. Let's say their salaries individually are around $100,000 a year. That's cheap for them versus the value they're getting out of it.

Facebook probably has millions of lines of PHP code, PHP code proven to work. This isn't just Facebook.com, but internal websites, their bug tracker software (that is open source), analytical software for PHP, etc. etc. Rewriting all that would be _expensive_.

Running it on normal Zend PHP, based on the benchmarks they get on HipHop, requires _way_ more hardware at the scale of Facebook, so again, very expensive.

PHP developers are abundant, PHP is a relatively easy language to teach, resources are everywhere. Due to their abundance, PHP developers (even at Facebook quality) are probably less expensive than other languages. Easier hiring and lower cost developers, cheaper.

> But, you have to ask, why are they still using PHP??

If you read the comments, you'll find the answer further down. They do address it. At their scale, all languages/platforms break and they would need to invest the same amount of time and effort getting it to work.

It won't make a difference if you switch to the latest hipster language. You'll still need an effort of this magnitude to get it to scale the way they need it to.

Furthermore, rewriting an entire code-base is a monumental undertaking that is both expensive and complex. Especially for something as large and diverse as Facebook. It also makes zero business sense and Facebook is a business.

Why should they take a risk with a new code-base, when what they're doing is clearly working? PHP just works.

Because you also need to consider the human factor.

Changing languages in the scale of Facebook, means that everyone, internal or external, need to know the new language.

This transition is a big process, which usually costs much more money in trainings and loss of productivity, than having a dedicated team improving the performance of existing tools.

By the way, there is an alternative to HipHop that is actually easier to implement because it makes php extensions.

It's called PHC http://phpcompiler.org and it's open source.

It was Paul Biggar's PhD thesis http://blog.paulbiggar.com/archive/a-rant-about-php-compiler...

It just needs some community contributions to catch up.

While you are very kind to say so, even the original Hiphop blew past what phc could do. Importantly, phc did not support the full PHP language (we had basic object support, and no support for magic methods).

There were two cool things about phc: that it had a really advanced static analyzer, and that it compiled to modules compatible with zend. However, these are somewhat mutually exclusive, and making them work well together is an open problem (though I had some ideas).

I moved on from phc, and there's no-one left in the community really. I now run https://circleci.com (applying my compiler knowledge in a different way). I tried to pass the reigns on, but nobody stepped up. Hard to find people who love PHP but are also able to hack on compilers.

Maybe you were just half a decade before your time and the rest of us had to catch up.

I worry the php community is losing or has lost some of the really smart folks.

Suhosin (Stefan Esser) is also fading away without any updates to work with php 5.4 and that's really sad, especially considering the radical performance improvements and memory-use reductions found in 5.4 and 5.5 vs 5.2 - 5.3

The PHP internals community is quite a poisonous place, IMO. Its a community that destroyed itself by being hostile to everyone who wasn't one of them.

I think that's a function of their inability to let go of their own work. The internals of PHP need a _complete_ overhaul to make PHP competitive on a speed/memory basis with the python and rubys of the world, but that would mean throwing away a lot of work by the current contributors and probably re-architecting things like zvals and internal opcode representation. That's something I don't think they are willing to do, and I don't know if there is enough community around the zend core to really get it done even if they wanted to.

I wouldn't know if that's the reason, but totally agree that the internals need a total overhaul. The entire codebase was a complete mess - hacks built on hacks built on hacks. Each variable takes what - 96 bytes I think on 64bit machines? The opcodes are terrible. The interpreter dispatch is switch based, and funnily enough it doesn't matter because the rest of the engine is so slow (function dispatch in particular) that interpreter dispatch doesn't even matter.

Disclaimer: I dont think I've looked at PHP source in 3 years, it might have changed significantly but I suspect it has not. I'd love to be corrected with some detail!

What's wrong with switch based dispatch? Lua's dispatch is switch based, and it is the fastest mainstream bytecode-based VM out there last I checked.

Lua's dispatch wasn't chosen because it was the fastest. If you read their Lua5 paper, they discuss how they prioritized being portable ANSI C over speed. Indirect threading is the fastest simple technique, but you can do even better. See section 3.3 of http://www.cs.tcd.ie/publications/tech-reports/reports.07/TC... for a discussion.

If you're giving up portability, you might as well go all the way and create a JIT. Techniques like what the paper calls "inline-threaded dispatch" seem of limited usefulness, since they give neither the speed of a JIT nor the portability of a bytecode VM.

My main point was that switch()-based dispatch isn't that bad. I'm surprised that this recent literature distinguishes between switch dispatch and token-threaded dispatch, since as Mike Pall notes, "Tail-merging and CSE will happily join all these common tails of each instruction and generate a single dispatch point," so the goto-spaghetti of token-threaded dispatch is likely not even worth it. (http://article.gmane.org/gmane.comp.lang.lua.general/75426)

> Indirect threading is the fastest simple technique

I'm confused; the paper itself says: "However, because of a level of indirection, indirect-threaded dispatch is not be as efficient as direct-threaded dispatch."

Even direct-threaded dispatch still results in an indirect branch for every VM instruction, it just tries to save a table lookup over token-threading.

Ultimately the most common and practical dispatch techniques seem to boil down to either a single indirect branch or replicated indirect branches.

You're not really "giving up" portability, only with ANSI C. For all architectures where GCC is available, its labels-as-values support will happily compile and hence give you a fully portable instruction dispatch.

Regarding the threaded code: the literature seems to be horribly inconsistent in this regard. Token threaded code might actually refer to indirect-threaded code depending on what the author has read. I was actually quite surprised to read what "indirect threaded code" originally meant when I read Debaere and van Campenhout's 1990 book "Interpretation and Instruction Path co-processing".

Regarding the effect of CSE & tail-merging: I have only seen GCC to do this once, by disabling basic block reordering (-fno-reorder-blocks, due to a so-called "software trace cache"), otherwise this has never happened to me. Since I have nothing but the highest respect for Mike Pall, I guess that he might have experienced this for Lua, which has--to the best of my knowledge--less than 40 instructions. I will try to verify this on the weekend.

> You're not really "giving up" portability

I should have been more clear; that comment was in reference to some of the techniques the paper discusses which do give up portability, namely "inline-threaded dispatch" and "context-threaded dispatch."

My overall points are: switch() isn't that bad, and most practical dispatch techniques ultimately boil down to either a single indirect branch or replicated indirect branches.

> Regarding the effect of CSE & tail-merging: I have only seen GCC to do this once

I've never seen it happen with my own eyes, but it appears the Python guys experienced this also at one point and had to tweak the flags -fno-crossjumping and -fno-gcse. http://bugs.python.org/issue4753

> I will try to verify this on the weekend.

Do you have a blog? I'd love to hear what you discover.

> Do you have a blog? I'd love to hear what you discover.

I second this. I'd be very interested to know who you (sb) are in real life.

I'm not sure where you're going with this, but you seem to have an emotional attachment to switch statements, which is otherwise unsupported by reality :)

> If you're giving up portability, you might as well go all the way and create a JIT.

Well no. A JIT is a massive massive amount of work, an interpreter is not. Don't throw the baby out with the bathwater. Besides, you can make a damn portable token-threaded dispatch for supported platforms (which means anything using GCC, LLVM or Visual Studio, which means basically anything that exists today), falling back to switch statements for unsupported platforms.

> Techniques like what the paper calls "inline-threaded dispatch" seem of limited usefulness, since they give neither the speed of a JIT nor the portability of a bytecode VM.

I forget the numbers, but lets say its 20%. It gives a 20% speed boost, for a few dozen lines of code. Hardly "limited usefulnes".

> My main point was that switch()-based dispatch isn't that bad.

Its portable and easy to implement. It is not the fastest dispatch mechanism. It will generally be the bottleneck in an otherwise-efficient interpreter. It does not compare well to most other dispatch techniques, and is only useful if you demand that absolutely only ANSI C will do.

> I'm surprised that this recent literature distinguishes between switch dispatch and token-threaded dispatch, since as Mike Pall notes, "Tail-merging and CSE will happily join all these common tails of each instruction and generate a single dispatch point," so the goto-spaghetti of token-threaded dispatch is likely not even worth it. (http://article.gmane.org/gmane.comp.lang.lua.general/75426)

Firstly, the work on interpreters was done in the 80s and 90s. Really awesome optimizing compilers post-date that considerably.

Secondly, Pall does not say those those dispatching techniques are equivalent. What he says is that interpreters in C are difficult, and you have to fight the optimizer. If you wrote them in assembly, as Pall did, you would not find that the techniques were equivalent.

> Ultimately the most common and practical dispatch techniques seem to boil down to either a single indirect branch or replicated indirect branches.

Its certainly common to use a single indirect branch. However, it is suboptimal. People like speed.

Finally, note that my original point is the switch-based dispatch is slow, but that better techniques in PHP are not worth it, because the rest of the interpreter is so much slower.

Wow, you read a way more argumentative tone into my comment than I intended. It's all good, I'm trying to expand my understanding. :) Though I'm not sure I agree with your characterization of my points as "nonsense."

> [Inline-threaded dispatch] gives a 20% speed boost, for a few dozen lines of code. Hardly "limited usefulness".

Unless I'm missing something, I think you would be very hard-pressed to generate basic blocks of native code in only a few dozen lines of platform-dependent code. I'm not seeing how you could do it without implementing your own code generator (which would take far more than a dozen lines). Is there some clever way of leveraging the C compiler's output that I'm missing?

Also, once you are generating basic blocks of native code, I would consider it a JIT at that point, even if it is not as sophisticated as a JIT that's doing register allocation and optimization. Writing this sort of JIT that uses a static register allocation and a fixed instruction sequence for each bytecode op doesn't seem that much more work than writing an interpreter in assembly language (especially if you use Mike Pall's excellent framework DynASM: http://luajit.org/dynasm.html).

> Its certainly common to use a single indirect branch. However, it is suboptimal.

I don't believe that a single indirect branch is a priori slower; while replicated dispatch makes better use of the branch predictor, it also results in less optimal i-cache usage (the LuaJIT code notes this trade-off, but observes that replicated dispatch is 20-30% faster in practice: http://repo.or.cz/w/luajit-2.0.git/blob/2ad9834df6fe47c7150d...).

I'm not saying switch is better, what I'm saying is that you can't reliably replicate dispatch in C (at least according to Mike's message that I cited earlier), which makes switch() a totally reasonable choice for interpreters written in C.

By the way, I'm not just making stuff up, I have written a JIT of my own: https://github.com/haberman/upb/blob/master/upb/pb/decoder_x...

> Wow, you read a way more argumentative tone into my comment than I intended. It's all good, I'm trying to expand my understanding. :) Though I'm not sure I agree with your characterization of my points as "nonsense."

Indeed, that was much too harsh and I retracted it. I thought I might have retracted it before anyone noticed, but I guess not. My bad, sorry about that.

> Unless I'm missing something, I think you would be very hard-pressed to generate basic blocks of native code in only a few dozen lines of platform-dependent code.

Whoops, I thought we were still talking about token-threading. I believe people implement token threading by fiddling with the compiler output. John Aycock had a paper about doing this for Perl or Python as I recall (also, since you're into JITs, you might enjoy his "a brief history of Just-in-time").

> Also, once you are generating basic blocks of native code, I would consider it a JIT at that point,

Yes, people often consider this to be a JIT. However, its much much easier to write than a "real" JIT. The complexity of writing a method-jit or trace-jit is very very high. This could probably be done in an afternoon by a great hacker.

Don't underestimate the work of writing the assembly language interpreter like Mike Pall did. Most incredibly hardcore compiler guys with whom I have discussed it were simply blown away by it. It might be one of the most hardcore feats in VM implementation history. He manually register allocated across dynamic flow control boundaries for gods sake!!

> reasonable choice for interpreters written in C

But not a fast one.

> also, since you're into JITs, you might enjoy his "a brief history of Just-in-time"

Indeed, I've bookmarked it, thanks! I couldn't find the other paper you mentioned about fiddling with compiler output; would be very interested in seeing that.

LuaJIT2 was the first interpreter and JIT that I ever studied deeply, so while I am very impressed by it, I don't have a great understanding of how it differs from previous work. How is its interpreter substantially different or more impressive than previous interpreters written in assembly language? Don't all assembly language interpreters need a fixed register allocation?

I've been a big fan of Mike's work for a long time and helped him get some Google funding: http://google-opensource.blogspot.com/2010/01/love-for-luaji...

By the way, when I was searching for the John Aycock paper I came across your dissertation which I've also bookmarked. I see that you went to Trinity College Dublin -- I visited Dublin a few years ago and what a beautiful university that is! Looks like there's a lot of interesting JIT-related research coming from there lately.

> How is its interpreter substantially different or more impressive than previous interpreters written in assembly language?

Traditionally, there were two implementation strategies: compilers and interpreters. JITs didnt become mainstream until Sun bought Hotspot and put it into the 2nd version of Java.

So you had to choose between compilers and interpreters. Compilers led to optimization, fast object code, but were complex to implement (especially multiplatform). Interpreters were simple to implement, very portable, but were very very slow.

Obviously there is more to both, but until recently, basically this is how people thought about language implementation. Considerations like a REPL, slow compilation speed, ease of prototyping, etc, were a complete side show (perhaps people cared, but you'd rarely see it discussed).

When all of the dynamic languages were implemented, they all used interpreters, written in C. They all used a simple dispatch loop to opcode implementation, and let the C compiler do the heavy lifting. All the research into fast compilers (the David Gregg/Anton Ertl stuff for example), looked at instruction set (registers/stack) and dispatch type. So when making interpreters fast, there were only 4 strategies:

- make a compiler,

- use better dispatch,

- rewrite using a register interpreter set,

- make a JIT.

Making a JIT is lunacy of course, because JITs are ridiculously hard, they're not portable. So that Pall was making a 1-man JIT (LuaJIT1) was incredible.

But that he made an interpreter that was as fast as a JIT was even more insane. In Trinity, all of us language/JIT/scripting language people were in one room, and when we heard about this we were just amazed. Nobody had even thought about the stuff - it was all brand new and novel in a field that barely had anything novel in decades! Until that point, basically all interpreters were one big while loop.

> How is its interpreter substantially different or more impressive than previous interpreters written in assembly language?

I wouldnt know, since I've not heard of any mainstream interpreters in assembly. I can only imagine that they were exactly the same as C interpreters: essentially a while loop with a switch statement, just written in assembly.

I find it amusing that you started at LuaJIT2. I would liken it to studying modern war, then wondering "why didnt they just use drone strikes at the Somme" :) Looking back from LuaJIT2, interpreters must seem really really primitive.

I don't think assemblers written in assembly were that bad. LuaJIT 2 uses direct threading (not new at all), register-based bytecode (relatively new), and manually optimised register assignment (perhaps new). AFAICT, the key innovations are that he did not use Lua 5.1's register-based bytecode format, but simplified it even further so it can be decoded efficiently on x86. The second key component is that he pre-decodes the following instruction in order to take advantage of out-of-order execution. This technique also required fixing some register's roles.

Don't get me wrong, I think LuaJIT2's interpreter is great, but interpreters before LuaJIT2 weren't complete crap, either. Many emulators, for example, have very good interpreters written in assembly (some aim to be cycle-accurate).

I was trying to describe how it looked from an academic standpoint. Direct threading and register bytecode was well known (register stuff is actually very old, but the jury was out until about 2003), but everything else Pall did was basically new to programming language researchers and implementers.

Aycock paper is available here: http://pages.cpsc.ucalgary.ca/~aycock/

title: "Converting Python Virtual Machine Code to C."

I don't know about the exact definition of "mainstream bytecode-based VM", but just for the record: both the OCaml interpreter and the GForth interpreter usually score pretty well, too.

Probably the de facto fastest interpreter is the one of the Sun HotSpot Java virtual machines. There is a paper by Robert Griesemer detailing some information, but AFAIR it is a hand-coded optimized assembly interpreter that does not too bad. LuaJIT's interpreter is doing quite well, too. (Mike Pall said in the LTU thread about trace-based compilation that the LuaJIT2 interpreter is mostly on par with the LuaJIT1 compiler, or at least not much slower.)

EDIT: replaced "LuaJIT1 interpreter" with more accurate compiler as pointed out by haberman.

Ah, sorry I should have been more specific and said "for dynamic languages."

The LuaJIT2 interpreter is on par with the LuaJIT1 JIT compiler. I don't believe LuaJIT1 had a separate interpreter -- it was just a JIT compiler that used the main Lua interpreter.

Ok, well even for dynamic languages threaded code is likely to give you a noticeable speedup. The only negative example I'm aware of was adding threaded code (+superinstructions) to the Tcl interpreter, which resulted in slowdowns sometimes.

Even though the potential is not going to be as big as for Java, Forth, and OCaml interpreters (where people frequently report 2x speedups), for example Python gains between 20 and 45%. But somebody already replied to a similar inquiry and said that ANSI C compatibility is more important than the increase in performance. (Python uses conditional compilation to achieve both.)

Genuine question: Couldn't we all just move to HipHop if the PHP internals are so bad?

We're getting there, that's why this is so exciting!

Uhh, to the extent that it makes sense to compare speeds of programming languages, PHP is faster and uses less memory than both ruby and python in many or most cases:

Python: http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...

Ruby: http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...

Did you looked at the source ?

For example for the pidigits bench:

The PHP version use the GM extension http://shootout.alioth.debian.org/u64q/program.php?test=pidi...

And Ruby do it in pure Ruby http://shootout.alioth.debian.org/u64q/program.php?test=pidi...

So basically it's a C vs Ruby benchmark. Many peoples complained on the forum, take a look.

The Debian shootout is interesting, but don't take it as an absolute truth.

The Debian shootout has made some very questionable decisions, and the author has defended them by saying that you cant rely on the results. Which is definitely true.

>> the author has defended them by saying that <<

Please show where I said that, or admit that you are putting words into my mouth.

Do you believe that the shootout results reliably show that one language is faster than another?

You claimed I said "that you cant rely on the results".

Please show where I said that, or admit that you are putting words into my mouth.

This is a discussion forum, not a court of law. I'm not about to trawl through 5 or 6 years of reddit and hacker news comments to find an exact quote.

If you believe the shootout results can reliably show that one language is faster than another, say so. Otherwise, de facto you are saying you cant rely on the results.

People here are smart enough to understand that you want to claim "you cant rely on the results" and to make your claim seem stronger you put your words in my mouth.

Who's putting words in who's mouth now?

You told us "you cant rely on the results" and gave emphatic support to that claim -- "Which is definitely true."


Your words -- not mine.

You're going a long way to avoid saying that you can rely on the results. If you felt we could, this discussion would be over a long time ago.

I'm stopping this dance here. Feel free to make a claim about the reliability of the results, and whether you can use them to say that one language is faster than another. Then we can dance some more, but otherwise I'm out.

You can rely on the results, the results are what they claim to be on the home page -- "the time taken for this task, by this program, when compiled with this compiler, with these options, on this machine, with these workloads."

> Feel free to make a claim...

The only point of "this dance" was to help people understand that you are putting words into my mouth.

Exactly. You've said it yourself - you cannot rely on the results to tell if a language is faster than another. That was the original context of my claim, and you are admitting it yourself.

I said - "You can rely on the results."

Please say what you mean by "a language is faster than another".

Please say what in The Java™ Language Specification tells us that "language is faster than another."


Exactly. You can't use the results to say that one language is faster than another. It doesn't even make sense.

(That's the context of the conversation, see http://news.ycombinator.com/item?id=4851781)

>> You can't use the results to say that... <<

That's a different claim.

>> It doesn't even make sense <<

That depends on what `notJim` meant by "to compare speeds of programming languages".

`notJim` might have thought that everyone would understand he was talking about particular language implementations and particular tasks and particular programs.

The benchmarks game is not "The Debian ..." anything -- just one of 960 hosted projects.

The pi-digits page also shows a PHP program which does not use the GMP extension.

Why do you think no one has contributed a Ruby program that uses GMP? Is that too hard to do using Ruby?

The benchmarks game is the absolute truth about the reported measurements.

Slightly OT, sorry. phc was such a nice project, and vastly unappreciated. Your cleanup of PHP's grammar was a piece of art which I spend a long time perusing through and trying to understand! phc could have largely benefited the PHP community as a whole and it's such a shame it couldn't gain (enough) traction!

Never got a chance to say thx, so THX!

I'm glad you liked it. FYI, I cant claim credit for the parser. The front end (indeed it was a work of beauty) was done by phc's founders (Edsko de Vries and John Gilbert) before I got there.

Wow, thanks! John and I spent a long time thinking about it :)

This is curious. Their trace-based approach looks like Mozilla's Tracemonkey - even using the same terminology like side-exits. Mozilla discontinued Tracemonkey because it was really good for deep loops and not much else. They moved to JaegerMonkey, which is a method-compiled VM like v8 (at the time), and are now moving to IonMonkey, which is a best-of-all-worlds version.

So I'd love to hear why using a circa-2009 technology was the right one? Is PHP sufficiently different from Javascript for this to make sense (as someone who has worked on VMs for both, I think there's a good chance of that)? Why not use a method-compiler instead? Very interested in the answers and comparison to other JITs out there, if any HHVM people are here.

It isn't tracing. Don't be fooled by the name "tracelet".

Unlike trace trees, tracelets contain no control flow. It is just a type-specialized basic block. This means that the jit output does not combinatorially explode when we encounter polymorphism. In many ways, our approach is the opposite of tracing; we jit the first time we hit any code at all. And we look terrible on loopy microbench code, but run FB's multi million LOC application very well.

Very interesting. Rereading with that insight makes your approach clearer - side exits aren't control flow, they're for the weird shit that PHP can throw at you. Aliased local variables are one of the major differences between PHP and Javascript, and that's a really good way of handling them.

So it avoid trace explosion, and I would describe this as a sort-of local (in the compiler sense of "basic block") version of a method compiler. Can you compare the approaches?

So it's closer to V8's "basic" JIT?

Our approach is different from other dynamic language JITs that I'm aware of. While there is lots of diversity out there, most other dynamic language JITs can be roughly categorized as either having a method-at-a-time strategy, or a tracing strategy. Our system is neither tracing jit nor method-at-a-time, but basic-block-at-a-time.

The systems that it has the most in common with are actually binary translators; e.g., VMware's software x86 hypervisor, or the Dynamo system. Those systems run basic block at a time to solve a number of problems; e.g., disambiguation of code and data. We're basic-block-at-a-time for a different reason: closure under type inference.

Im not a expert but I think the V8 compiles methods that contain controlflow even in the basic compiler.

One more thing - the use of a stack-based bytecode is also curious. PHP itself (the Zend engine) uses a register-based bytecode (well, it uses a horror-show of an in-memory bytecode-like-thing, which could best be described as a register-based bytecode). The PHP engine details leak into the language a lot, so the difficulties they describe aren't unexpected.

Not only that, but bleeding-edge JITs typically use register-based bytecode. Java's JIT and all the modern versions of Javascript VMs use register-based (in many cases actually translating from stack-based bytecode that the parser outputs, to register-based bytecode consumed by the JIT compilers). Since 2003, when my PhD advisor and his co-authors (Google "David Gregg" and "Anton Ertl") showed register VMs could be much more efficient that stack-based ones, the jury has settled on register-based VMs.

So I'm curious as to why a stack-based instruction set was used in the VM design?

As noted elsewhere in this thread, a stack-based design typically produces more compact bytecode. Compactness was a concern for us because of the size of FB's PHP code base. Also, generally speaking a stack-based design tends to be easier to deal with when working to get a prototype VM up and running quickly.

Many of the advantages of register-based designs (ability to optimize by rewriting the program at the bytecode level, ability to write faster interpreters, ability to map bytecode registers to physical registers, etc.) weren't particularly attractive to us because we knew we were going to build an x64 JIT that did its own analysis and optimization to take advantage of type information observed at run time.

Thus, we drafted a stack-based design for HipHop bytecode. It captured PHP's semantics correctly and happened to fit in fairly well with PHP's evaluation order, so we ran with it and here we are.

Makes total sense, thanks!

Would using a register-based bytecode not have been useful for the x64 JIT?

Check out our HHIR work. It's an SSA-based IR, that gets us most of the advantages of a register representation. But it is at a much lower level than the bytecode.

AFAICT, two reasons come to mind:

1) A stack-based ISA is still more space-compact. (AFAIR the Shi et al. paper mentions something like a 40+% increase in space requirement for the instructions.)

2) The performance improvement of a register-based ISA is only visible for interpreters that suffer most from instruction dispatch [1]. PHP is a rather complex programming language that is most certainly not bound by instruction dispatch at all. So I guess it could very well make sense to stick with a simple stack-based ISA, which incidentally is also easier to compile from the AST.

[1] for the sake of completeness: the stack architecture emits many instructions to push operands onto the operand stack. A register-based interpreter does not need those. Hence, the overall number of dispatches is lower, and if dispatch cost is your bottleneck the overall performance increases. OTOH, you need more space, because in addition to the opcodes, you need to specify/encode which registers to take operands from and put results to. (Hint: quadruple code and the likes.)

1) Yes, that's my recollection too. I think 40% is exactly right.

2) But this is exactly my point. Zend PHP uses a register-based bytecode. The HipHop JIT could choose anything - why choose stack-based - its not only different, but also worse! If I had to guess, I'd say it was chosen to make the HipHop interpreter easier to implement, and then was legacy code when creating the JIT, and it was easier to keep it than to replace it.

> all the modern versions of Javascript VMs use register-based (in many cases actually translating from stack-based bytecode that the parser outputs, to register-based bytecode consumed by the JIT compilers)

Pretty sure V8 doesn't as it has no interpreter at all, it has a fast JIT and a slow JIT, but it only ever runs native code.

Sorry, I didn't mean to imply they all use interpreters. The more compilery parts use three-address code and SSA, which are IRs that are conceptually similar to a register-based bytecode.

Java's HotSpot is a stack machine, AFAIK. Dalvik is a register machine, but I wouldn't want to point to that as a positive. ;)

Internally, Hotspot doesn't use a stack representation. It converts it to an IR (internal representation) which is not stack based, which is my point. See Fig 2 from http://www-leland.stanford.edu/class/cs343/resources/java-ho...

Huh, so it's not. Thanks for the interesting read!

This is somewhat different from my memory. AFAIR, TraceMonkey was a trace-based JIT. JaegerMonkey was a JIT (however not just-in-time like V8 with a template-based JIT, but still using an interpreter initially). IonMonkey is--to the best of my knowledge--JaegerMonkey plus type inference (there was a paper by two Mozilla employees at PLDI'12 about their type inference.) So I guess it's not really using anything from trace compilation.

I think a trace-JIT still gives you a lot of bang for the buck and is (in theory at least) easier to implement. Two known projects using trace-compilation are LuaJIT2 (usually well known) and Dalvik VM's JIT compiler (not so well known, needed to watch the Google I/O 2010 announcement.)

There's a bunch of not-quite-correct information here... Let me try to summarize. Jaegermonkey is a JIT. It takes spidermonkey bytecode, and when functions grow hot, it compiles them to native code in a way quite similar to v8's full AST compiler (wit no optimizations).

IonMonkey is also a JIT. However, instead of just doing translations directly from bytecode to native code, it builds up an SSA-based IR, does some optimizations (last I checked, loop-invariant code motion, range analysis, global value numbering, on-stack replacement to be able to jump to JITted code in the middle of a hot loop, and some other small optimizations). It then does register allocation using LSRA. It takes advantage of type inference to aggressively type-specialize these optimizations.

This is all more expensive than just blatting out native code, but the code performs better.

In IonMonkey-enabled Firefox (Firefox 18+), the compilation strategy is to interpret, then use JM on hot functions. Particularly hot functions will get IM compiled.

This is similar to what "crankshaft", v8's version, does. The compilation strategy is quite similar, but v8 has no interpreter--it has two compilations: a fast, non-optimized one and a slow, optimized one.

Trace-based computation is...I would say... more complicated than a traditional compiler approach. It is certainly differently-complicated than the usual compiler problem. Recording traces requires some relatively complicated state, and significantly muddle up the interpreter. Removing the tracer from the Mozilla code base made things substantially cleaner--the tracer was holding back the method-based JITs.

Note that Dalvik is actually not a very good trace JIT. There's a recent paper comparing Dalvik to an embedded version of Hotspot. The Dalvik interpreter was slightly faster (~5%), but the JIT speedup was only 3-4x whereas it was ~12x for the method JIT. The reason is that Dalvik's traces are way too short and cannot even span method boundaries. That means little scope for optimisation and a lot of code just to load the state from the stack into registers and then write it back. Weirdly enough, Dalvik also generates more code than the method JIT, yet low amount of compilation overhead was their main goal.

Fortunately for Dalvik, most applications actually spend most of their time in native code.

The paper (behind ACM paywall): http://dl.acm.org/citation.cfm?doid=2388936.2388956

Amazed at all the 'engineer X policy or feature request slackers'. Can't say I'm the biggest fan of Facebook as a service but what their engineers are doing in terms of pushing the state of the art is fantastic.

Out of curiosity if there's anyone involved in Facebook on HipHop, has there ever been a discussion about just shifting from PHP to a more performant language, or is a case of still reaping the benefits of PHP in terms of dropping a developer in and not worrying about skill sets?

> ...about just shifting from PHP to a more performant language

We already shift lots of things out of PHP into more performant (c++, java, etc) languages when we are building services and/or extensions and/or other computationally intensive things.

But I think more to your point (and if not, let me indulge in the strawman), since this comes up frequently on reddit/hn: php is not somehow -uniquely- broken. While PHP might be uniquely broken as a language (ha ha), it's not -uniquely- broken as a platform/runtime. Everything you think is performant is broken at a large enough scale. Put another way, it's not just about the switching costs of rewriting in some other language. If we could magically snap our fingers and convert the entire codebase from php to some other language, that's just the beginning of understanding, tweaking, and occasionally rebuilding everything about the runtime/libraries/etc to make it work.

At the end of the day the language itself is given way too much attention in discussions like this.

So is your argument then that there is so much vested in facebook in dealing with and optimizing for PHP at scale that the real cost would be acquiring that knowledge a second time on a different platform since you _will_ run into scale problems no matter what language you use ?

From my experience in dealing with PHP performance warts. You can make it somewhat fast, but man the insides are so messed up every time I get into the core and try to do anything I want to rip my hair out.

Thanks for the response.

That's a big part of my point, but Facebook seems to have invested a lot in the HipHop compiler and VM as a way to tackle the rough spots from using php. All languages are in their own way rough in places, but I was more curious about whether anyone has gone "this is great but maybe we're at the symptoms and not the cause". But you're right about the total investment being way, way larger than just a code base rewrite.

Interesting to learn that you do use multi languages for different things, a lot gets made of HipHop and Facebooks use of php that I didn't think to guess that different levels of the service might be getting built with different tools, at least not at scale.

>> Can't say I'm the biggest fan of Facebook as a service but what their engineers are doing in terms of pushing the state of the art is fantastic.

They let people upload comments and pictures. Anything fancy is done mostly in the client or am I overlooking something?

> They let hundreds of millions of people upload billions of comments and pictures

Fixed it for you. Scale makes simple things difficult.

Well, for a non-client example they do a lot of analytics, probably more than we know. Plus they created their own highly optimised php compiler from scratch that isn't just production ready, but battle hardened.

HipHop alone should get Facebook's development team praise.

The PHP language is performant just fine, though it has a few warts (that could be solved by making HHVM support a superset of PHP).

The problem is that the interpreter is shite.

I'm curious if they were still evolving HPHPc at the same time as they were evolving HHVM: the chart shows HPHPc as having flat performance over time, while HHVM was getting better over a 7-month period. Could HPHPc have achieved the same performance gains if the same effort was expended?

Hey, I'm an HHVM engineer.

We just normalized HPHPc performance to make the graph easier to read. HPHPc actually got considerably faster over the same period as well; since both systems share a runtime, many changes helped both.

HPHPc probably got about 20% faster over 2012, even though nobody was actively working on it, mostly through happy side effects of work that was directed at HHVM.

I'm not on the HPHP team, but when discussing these results internally, they mentioned that HPHPc performance was improving during this time as well (as you would expect we would need to keep running the site efficiently), and the graph showed point-in-time comparisons of their performance, not from a baseline.

As to whether it would have been possible to get better performance on HPHPc, I'll let people with more knowledge answer.

The benefit of having a single runtime for development and production (and production-like staging environments) is huge, though. There may also be benefits in the push process, which while fast when compared to most other companies, which will continue to deliver benefits for increased speed.

Things make sense now. I remember a while back Facebook invested in an experiment with PHP on PyPy. They didn't pursue it, even though it produced impressive results. It seems their own in-house JIT has better performance?

HHVM was pretty far along when we started talking to the PyPy folks; it was already able to run the site, and hosting internal development at Facebook. Our interest in PyPy wasn't an immediate, drop-everything-and-change-to-PyPy kind of interest. It was a research project, which had a positive outcome. Making a production-ready, PyPy-based system for PHP would still be an enormously big undertaking, though.

PyPy is taking a radically different approach from what HHVM is doing (and really from what almost all other dynamic language systems are doing), and it's a fascinating system. Part of what's exciting about it is that it seems like it should be applicable to other languages with less effort than most other JITs, and we wanted to understand its potential for a language like PHP. We asked Maciej Fijalkowski (hi, Maciej, if you're reading!) to help us do a research prototype to see what the first few roadblocks would look like, and Maciej did a great job. Just because we didn't scrap our current project and shift all our resources to PyPy should not be seen as a negative reflection on PyPy at all.

I'm kind of surprised you didn't pursue the JVM and InvokeDynamic for this purpose instead of PyPy

I think there is a project of that sort going on, there was a guy from Facebook at the jvm language summit this year.

Instructions for Ubuntu 12.04 x64: http://pastie.org/5456298

First off thanks for all the hard work. Do you have a list of the php extensions you support? I’m wondering if things like cURL, memcache, pdo MySQL etc are supported. I found https://github.com/facebook/hiphop-php/wiki/Extensions-and-m... but its a bit outdated (last mod 2yrs ago).

Also wondering what APC methods you support.

What version of PHP does HPHP match? 5.3 or 5.4? I ask because I've become very accustomed the short array syntax ([1,2,3]) and other goodies in 5.4.

It is closer to 5.3, though we've adopted some 5.4 features: traits, our closures' treatment of $this, and f()[$x] syntax.

Edit: notably not short array syntax, at least yet.

This makes me sad. That's one of the nicest features of 5.4 not needing to type array() all over the place.

It's a relatively trivial feature to implement (really just syntactical sugar), it probably just wasn't prioritized.

"So, when you combine XHP with HipHop PHP you can start to imagine that the performance penalty would be a lot less than 75% and it becomes a viable approach." http://toys.lerdorf.com/archives/54-A-quick-look-at-XHP.html

That's from 2 years ago. What is its relevance now?

Will FB ever invest and move off PHP? Hiring talent at FB has to be getting harder and PHP probably isn't helping.

PHP is used for a subset of development at Facebook. I'm not sure what the split is, but I would not be surprised if PHP is the primary language used by less than 60% of Software Engineers at Facebook. The rest code in C++, Java, Python, and then a bunch of other languages that are less used.

The PHP development environment at Facebook is unlike that found at any other company I'm aware of, and common PHP pitfalls and legacy code are actively "linted" out of the system (as an automated part of the code review process). It still wouldn't be my choice of language to use outside of Facebook, but I have no issue working on the PHP code base here (although I rarely venture into it).

Over time, much logic and processing has been offloaded to backend services, with PHP increasingly playing a (largely parallelised) dispatch, post-process, and render role. That will likely continue.

With all that said, it is questionable whether it would be an investment to move off of it - which language/environment would you think a good target would be, and why?

re-writing the code of any big project is insane. You are going to face those walls you faced in the past - again. Using this logic you would change the whole codebase every few years.. ridiculous

Amazon did it in the 2002/2003 timeframe when they moved from Obidos (C++) to Gurupa (Perl/Mason).

I though Amazon was still C++ at their main infrastructure.

I'm sure having some PHP code at a company like Facebook isn't deterring any serious candidates. Working for a company with almost a billion users who've created software like Cassandra, Thrift, HipHop and Haystack and working on writing loosely coupled systems which will run across thousands of machines; analyzing 10s of PB of data every day with Hadoop, there are definitely very specific and challenging tasks which would attract tons of talent. I think having a PHP codebase there isn't indicative that your new job would be sitting and writing crappy PHP code templates all day and wanting to kill yourself because it is so boring. Just my 2c.

I think you underestimate the number of PHP developers out there. There aren't many on Hacker News because we're all too cool for that stuff, but in the developer world in general there are a ton.

I think his point is that real top-tier developers are going to despite working with such a grotty language as PHP, so it hurts their hiring process that way. Whereas if they worked in a superior language (an ML, Haskell, a LISP...) they'd attract talent. Who wants to deal with a language such such inane design decisions all day?

I'd imagine people signing up to work at FB do so despite the fact they use PHP, not because of it.

It's not like people who work at Facebook have to use PHP anyway - there are quite a few who work in "superior" languages, although most just use C++, Java, and Python.

In my experience, the top-tier developers enjoy challenges and solving problems, and languages are just tools they use to defeat those challenges and solve those problems. Sometimes it's a language they really enjoy using. Sometimes it's a situation where that language isn't a suitable one.

I just don't understand the hate. Like any language it has some syntactic quirks and inconsistencies. There's no way to objectively say that one language is superior to another, because there are so many metrics that contribute to superiority and every person in the world is going to weight those various metrics slightly differently.

I imagine it's neither "despite" nor "because" of PHP use; any top-tier developer is going to use the tools available to him/her, rather than pick an employer in order to use a specific set of tools. Lower-quality developers may do so, but that's probably because they're less likely to know the right tool for the job (eg someone may hate functional programming and avoid jobs that use it, but that's because they don't know when FP is the ideal way to solve the problem)

You can't objectively say that one language is better than another, but you can't deny that a language such as Haskell has a level of consistency and mathematical rigour that PHP will now never attain. Top developers who Facebook target (normally top CS students from top colleges and PhD programs) strive to be consistent and mathematically rigorous themselves, and poor design and math annoy them.

Totally valid point. One of my housemates loves Haskell for very much the same reasons, and all other things being equal I'd prefer that as well (who wouldn't?).

But of course, anyone who's worked on production systems knows that all the mathematical ideals in the world doesn't stop bugfixes from becoming a giant hairy mess. It's more your culture, skill, and process that allow or prevent that from getting refactored into a better solution and how long that takes. Any language can be a part of a clean or messy codebase, although some tend to skew towards one side or another - but I'd wager that's as much a factor of barrier to entry as anything else.

I don't know, working on HHVM certainly seems interesting enough to attract good developers.

Facebook lives in a SOA. PHP is probably the best choice for dispatching services written in more "advanced" language. The beauty of SOA is that you can use the right tool for the right job.

>I think his point is that real top-tier developers are going to despite working with such a grotty language as PHP

Actual top-tier developers don't fret much about it. Complaining about some language's warts is mostly for bike-shedding and blogging types. PG, that writes about blub languages and who-cool-is-list, used Perl in ViaWeb too (which is as inelegant a mess as PHP is, but at the time was good for web work).

Top tier developers get shit done in whatever language. And most of them rarely ever comment or blog.

Where do you get your estimates from?

First, it's really hard to do a large rewrite of software. Netscape, for example, made the mistake of trying to rewrite their browser from scratch, and it killed them [1].

Second, what evidence to you base your statement "hiring talent at FB has to be getting harder"? If you're talking about market cap / upside, Google has had no problems hiring (well, relative to other tech companies, anyway), despite a market cap 3.5x+ what FB is today.

Lastly, FB is not all PHP. A lot of services are written in C++ or Java, and use Thrift to talk to the frontend [2]. Thrift allows a more gradual movement off PHP, by moving isolated components to other languages individually.

[1] http://www.joelonsoftware.com/articles/fog0000000069.html [2] http://www.quora.com/Facebook-Engineering/What-is-Facebooks-...

Like putting a rocket on a turd.

Why is the blog on facebook?

We work for Facebook, and make a lot of announcements on the engineering blog. We also have a group blog on WordPress-in-HHVM running here: http://www.hiphop-php.com/wp/

Didn't realize that until I actually went to the page. I need to learn to keep my pie hole shut more often.

HipHop is created by Facebook engineering staff.

HipHop was developed by Facebook?

I thought it was DJ Kool Herc.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact