Hacker News new | past | comments | ask | show | jobs | submit login
Facebook releases HHVM, 60 percent faster than its current PHP interpreter (facebook.com)
164 points by adeelarshad82 on Dec 13, 2011 | hide | past | favorite | 143 comments

Hats off.

I have no love for Facebook's product or business model, but their engineering is diesel, and it's very cool that they're sharing such a deep piece of work.

I know some great Facebook engineers and some terrible ones. An ex-colleague of mine who loves writing spaghetti php code joined Facebook almost a year ago.

I would say that 30% of Facebook's engineers are top notch, and 70% are just average php and javascript developers.

Hey, I work at Facebook on HHVM.

I actually view Facebook as the opposite of what you're describing. VMware, for instance, really did have a bimodal engineering organization; some pieces were amazing, but many were so-so. That can have its strengths, too, and it has worked out ok for VMware to date, but Facebook has the absolute opposite philosophy. For instance, you don't have a "hiring manager" here, because the selection process is decoupled from the allocation process. This means a uniform bar across groups; e.g., I interview people destined for all different parts of the stack, and I expect the same level of skill for all of them.

Hi, Some questions:

Wasn't it kind of a waste of time do develop a static compiler plus an interpreter (while ending up writing a jit anyway)? Why did you not start out with a JIT? The problem of static compilation of dynamic languages (spezially PHP) was well known (phc!). The kind of JIT that is discribed in the post does (as far as I can see) nothing in it that was not already known when HipHop was originally developed.

What kind of optimization are you guess doing? Since the scope of your optimization is only one basic block (did I understand that right?) how much do these optimizations acctully help compaired to unoptimized translation to assembly?

Have you tought about using the pypy toolchain. The pypy jit allready does quite a lot of optimizations (not to mention the much wider scope for optimization for "real" traces) and it would be much less work to implment.

This might be getting less interesting to the general audience, so we can take it offline if you have any more questions. I'm my initials (Keith Michael Adams) @ fb.com.

The static compiler that preceded HHVM was started in a different time; it was 2008, and the JavaScript wars were just getting started. Conventional wisdom (to anybody who hadn't worked on Self in the late '90's) was that running dynamic languages fast was just kind of a lost cause. Since the static compiler was started earlier, and was inherently a more tractable system, it was done sooner. This is something that is hard for a lot of systems people to accept, but software has time value; even if HHVM ends up being faster than the static compiler, without the two years that HPHPc has been providing 2-4x improvements, Facebook would have been less able to make products. And keep in mind, the static compiler is beating our jit at most benchmarks at this writing; improving on it will take some doing.

The only optimizations we're doing are those that bottom-up type inference makes possible; so, we can generate code that knows that strings are strings, arrays are arrays, ints are ints, doubles are doubles, etc. This doesn't sound like much, but avoiding the double-dispatch on things like $a + $b is significant. We're still using the static compiler's front-end, so all of the things it can do (e.g., CSE, constant-folding, deduplicating identical data, compile-time binding of functions and methods) we hope to "inherit" when running in production mode.

PyPy is a very interesting project, with a beautiful approach. I'd be very, very curious to see what an HHVM interpreter written in RPython could do under PyPy. Unfortunately PHP is not scheme; it is a big language, with many non-orthogonal parts, so it would be a significant effort to answer this question. If I were starting over today, I'd take a much closer look at the PyPy approach, but that is just my opinion.

     > This might be getting less interesting 
     > to the general audience, so we can take
     > it offline if you have any more questions.
This is Hacker News. I, for one, would much rather read about the nitty-gritty details of this project than debate about how great Facebook's engineers are or read another opinion on whether or not PHP is a good language for X.

I think its importend to ask everything in these threads. I often find intressting questions answer in these threads (on reddit too). For JIT fans I recomend online-stalking of Mike Pall, here on Hacker News and on reddit.

Well I think if you have a task like that a team should look into what other people where doing. Self, LuaJit, parrot, some smalltalk systems, strongtalk where all allready done or on the way in 2008. In a talk about phc (https://www.youtube.com/watch?v=kKySEUrP7LA) Paul Biggar talk about why AOT might be better then JIT. His reasoning was that the suffer from bad startup times and php is a language where the comon uscase are scripts that only run a short time.

Anyway the static compiler certantly wasn't a bad way to get some speed fast.

Getting some type in there is probebly the best thing you can do :) Where can I get some more information about tracelets. I have never seen litrature about it. Did you guys invent that yourselfs?

I have been thinking about something simular. If you have a language that is dynamic but is suited for static compiling (dylan for example). On could write a compiler that does all kinds of optimization and then spits out a bytecode that is suited for the JIT. Then at runtime you let the JIT do the rest. Im not sure if this is such a good idea because the static compiler would take away some opportunities for the JIT to do an even better job. Thoughts anybody?

Thats the problem I always have. For every language I think about how good it would work with pypy. Somebody started to work on a Clojure impmentation in pypy and I want to start hacking on that. Never having done python is holding me back a little.

More Questions:

Since how long and with how many people have you been working on HHVM?

Why is the bytecode stackbased? From what I read stackbased codes arn't really suited to JIT on a register mashine. Dose the JIT compile everything to SSA first? I think thats what Hotspot does (not sure, does anybody know?).

HHVM is a follow-on effort to the static compiler. We got started long after HPHPc was in production; I started playing around in early 2010, and three of us (Jason, myself, and Drew Paroski) started earnestly putting in full-time work in summer of 2010.

I made up the term "tracelet." Do not use it when trying to sound intelligent. It roughly means "typed basic block," though the compilation units aren't quite basic blocks, and what we're doing isn't quite tracing, and ... so we thought it would be less confusing to just make up a name. Think "little tiny traces."

The bytecode is stack-based for a couple of different reasons, but it will probably remain stack-based because of compactness. Since instructions' operands are implicit, a lot of opcodes are 1, 3, or 5 bytes long. Facebook's codebase is large enough that its sheer compiled bulk can be problematic.

The translator doesn't turn the stack into SSA, instead turning it into a graph where the nodes are instructions and the edges are their inputs/outputs. You can sort of squint at this system and call it SSA where the edges are the "single assignments."

Let me give you a discription of the compiler and then you can tell me if I understand everything correct.

You have a interpreter that for every basic block starts to record a trace. If you walk in to a basic block (or codeblock) the X-time with the same type you compile it for that type and you set a typeguard. Now everytime the interpreter goes trough there again it does a typecheck then either keeps interpreting or jumps into the compiled block. So if you have basic block in a loop you have to typecheck every time you go threw the loop.

Pretty close. Once we've translated to machine code, the interpreter doesn't do type checks; it blindly enters the translation cache for anything that has a translation, and the guards live in the tracelets themselves. The tracelets directly chain to one another in machine code, so they need embedded guards.

I see. Thx for answering all my questions. Good luck with outperfmorming the static compiler :)

> The static compiler that preceded HHVM was started in a different time; it was 2008, and the JavaScript wars were just getting started. Conventional wisdom (to anybody who hadn't worked on Self in the late '90's) was that running dynamic languages fast was just kind of a lost cause.

To be fair, LuaJIT has been blazing trails in running dynamic languages fast since 2005. But maybe it wasn't well-known enough to make its way into conventional wisdom.

Of course. I have kind of a difficult line to walk here; I would like to be respectful to both the authors of the HPHPc compiler, which is a really impressive achievement, and to the many many efforts past and present at running dynamic languages. LuaJIT, various SmallTalk systems, Self systems, Erlang, HPHPc, the JVM body of work, V8,hardware-level binary translators, ... are all among the shoulders we are standing on.

All that said: I get the impression reading the comments here of an emerging misconception that "everybody knows" the right way to run these languages fast. We emphatically do not know. It is still a research problem. Heck, garbage collection, a proper subset of the problem area we're talking about here, is still pretty wide open. Given the smart people who have given decades of their career to it, we should not expect there to be a single, silver-bullet technique that cleanly maps all these dynamic languages to high-performance machine code.

So we need more than one project in the air. Some of those projects should explore whole-program analysis, like HPHPc and some of Agesen's offline work on Self, and decades of Lisp compilers. Some of them should explore runtime techniques like tracing and inline-caching. Some of them should explore unifying these approaches. We need them all.

Just wanted to say thanks for answering all of the nitty-gritty questions.

HackerNews should have Reddit-like AMAs for software projects. "I work on HHVM/Linux Kernel/Scala/etc. Ask me anything..."

Well purly emphatically we know nothing about compilers. Since I know what a JIT is I really like them so Im bias.

I think there has been so much work put into static compiler for lisps and other dynamic languages but there is not that much to show for it. I mean sure the lisp compilers are really fast but most of the time they get there by throwing away safty and providing typehints. While the work on self was done by one researchteam. The same with LuaJit one guy writes something for a dynamic languages and without adding typehints it starts to perform like crazy.

I think the Lisp community and the rest of the world really missed out on the JIT advantsments. Self was allready fast but Sun picked Oak (Java) to be there Weblanguage (anybody know more about that?).

WRT Self: some of Agesen's work on Self work for doing type inference on conventional, ahead-of-time compilation, and they got pretty good results this way! More interestingly, the opportunities the AOT compiler found were different than the opportunities the JIT compiler found, so there is some hope that they will combine constructively. See. e.g., http://www.cs.ucla.edu/~palsberg/paper/spe95.pdf

Cool I self paper that I have not seen befor, nice (actually reading all of them is another matter).

> "the opportunities the AOT compiler found were different than the opportunities the JIT compiler found"

Good to know.

"This might be getting less interesting to the general audience ..."

Anything that you can say about actual engineering details at FB would certainly be of interest to many of the people at HN. Large-scale social networks are like supercomputing. Few people get to see the inside of an operation which pushes the software/hardware envelope of size and performance.

You already got an excellent reply, but I'd also like to add that a foo -> c++ -> binary compiler chain is considerably simpler than a foo -> x86 jit. Certainly both are within facebook's capabilities, but the former is both quick to write and leverages the capabilities of the c++ optimizer and c/c++ runtime. The latter requires understanding the instruction architecture and writing your own runtime support.

Matt Might has a couple nice posts on compiling toy schemes to c or java.

Sometimes static compilation is the only way to get certified on certain devices (iPhone, Xbox, Google's NativeClient, etc.) - e.g. places where dynamic compilation (jit) is not allowed (security, or other reasons).

Not a problem for facebook because the only want there servers to run fast. They are not intressted in making a fast portable compiler.

And it seems that the 30% creates tools so the 70% can continue writing sub-optimal code that works well.

That is actually the principle behind enterprise java, I thought. Force everything into its own isolated and replaceable small black box so that your grunt development can be done by "commodity" programmers.

This may actually be a sound business model, regardless of its unpleasant aftertaste.

For a company that size, 30% is amazing.

I don't think I'd call it amazing, but Facebook isn't as big as you'd think (especially compared to Google). It's around 3,000 total, with some fraction of that being engineers. Engineering currently makes up 64 of 402 open positions.

> Engineering currently makes up 64 of 402 open positions.

There are 64 unique job titles in Engineering (and 56 unique job titles in Technical Operations, which also includes software development job titles), but that represents a lot more than 64 (or 120) open engineering positions.

That's not to say that Facebook isn't as big as many people think, though.

(I work at Facebook.)

I would say that 30% of Facebook's engineers are top notch, and 70% are just average php and javascript developers.

Pet peeve of mine: why is the word "engineer" thrown around so much on HN? Is it meant to distinguish a certain class of people from mere coders? Or is it just a fancy word of saying "coder"?

I've always referred to myself as a coder. Should I change this?

I am genuinely curious, not trying to put anyone down.

It's how Facebook refer to their 'coders', and in many ways it's a better word because it suggests something more than just hacking [insert language] all day. For example, an engineer might be concerned about how their code fits into the bigger picture, whether there is a better way of doing things etc., rather than thrashing out code for a daily rate.

A coder is somebody who writes code. Engineering is much broader discipline than just writing the code - it means writing it well, writing it to be concise, readable, maintainable, standards-compliant, specification-compliant, reliable, well-performing, possibly scalable, user-friendly and everything else that what is essentially a robust machine needs to be. In addition, software engineers can specialise in development, testing, design/architecture, performance, UI and probably many other things I can't think of right now.

What I'm saying is that making software good is hard. It takes a lot of different skills in addition to mere coding. Actually writing the code - pressing buttons on a computer to enter text into an editor - is one stage in a long process, and all of these stages are as important as each other.

Some of these are skills which you probably have! So calling yourself a mere coder is underselling yourself.

I personally find the word coder derogatory. It is looked upon that way by someone who thinks of the act as a assembly line creation process. Programer and Engineer are better tasteful replacements.

agreed, and it is not only a deep piece of work, but the core infrastructure that powered their business.

Agreed. I don't like what Facebook is doing with the social network portion of its business whatsoever, but their engineering is brilliant and it's awesome for them to release something as amazing as this software.

...with the social network portion of its business...

Isn't that the entirety of their business?

Yeah, I phrased that awkwardly thinking back to it.

I find it interesting that the economics of inertia work out in favor of Facebook expending so much effort in improving the performance of a historically technologically poor language like PHP.

Jason Evans is the author of jemalloc (and more). Given all the things he would be capable of applying himself to, it's surprising to see him working on PHP runtime performance (though, the problems are interesting).

I'm not sure that I'd refer to PHP as a "historically technologically poor language". I'm not even quite sure what that means, exactly, but PHP has proven itself over and over again to be more than capable of working in environments with heavy load and numerous simultaneous users. In fact, in comparison to something like Ruby, it historically destroys it.

Now, if you want to get into the details of syntax, naming conventions, etc.. sure, PHP is not everyone's favorite language to work with, but it is far from "technologically poor".

It's always hard to talk about these differences because 90% of the time we're actually just chest beating and comparing the length of our respective community's e-penises.

When people complain about PHP they often mean that in an aesthetic, and in an academic sense, the designers of PHP have, arguably, made a lot of poor choices.

What they usually mean is, due to the low barrier of entry coupled with poor aesthetic design choices the PHP community at large seems to have poor engineering standards.

These all being Turing machines though don't prevent it from having quality engineering being applied to it, as is the case of Facebook.

So, the corollary to "only a poor craftsman blames his tools" is "when you only have a hammer everything starts looking like a nail".

I wonder how many people are paid to directly hack on a Ruby implementation vs Python (yay Google) or PHP (yay Facebook).

Isn't that what Matz is doing now at Heroku?

I think Engine Yard pays 3 people to be fulltime on the JRuby core team and 2 (not sure on that, certainly one) to be fulltime on Rubinius. Microsoft was supporting IronRuby until a couple years ago.

Absolutely. PHP just works and it's easy to make it work. That's the most important thing when choosing a platform. Asthetics are always secondary. It's certainly a pain to deal with inconsistent syntax, but fighting your toolchain is a far worse fate and with PHP, that's rarely an issue.

It doesn't "just work" for Facebook though, look at all the effort they have exerted to be able and use it. They don't get the main benefit of PHP, but still get all of its flaws.

Nothing out there "just works" when you are at Facebook scale.

For a hack, maybe.

Fighting your toolchain is an upfront penalty. Once things start moving smoothly, they tend to continue to work smoothly. And that's a problem that can be solved by further development of the toolchain.

Poor aesthetics/readability? You pay that price every second you work on the code.

> Once things start moving smoothly, they tend to continue to work smoothly.

This is flat out wrong. If you're on a language all the cool kids are using, you'll find those cool kids have an obsession with perfection. To the point where they deprecate things instantly and break everything that worked before. See NodeJS and Python for examples.

If you have 10K lines of code, not so bad. So you do a rewrite every year. When you have 100K lines of code or more, their obsession with perfection will destroy your business.

A rewrite becomes an immense undertaking. You end up running old versions. Those old versions require dependencies which require you to run old operating systems. Eventually the entire ecosystem implodes.

But guess what? Code that was written in PHP3 over 10 years ago works just as well today under PHP5. You can even mix and match. Does that make the syntax of the language ugly? Sure. It even encourages bad practices. So what? You're running a business, not an art gallery. Code should be beautiful, but not at the expensive of having a successful business and a working product.

> But guess what? Code that was written in PHP3 over 10 years ago works just as well today under PHP5.

Sorry, but having felt the pain of keeping legacy installs of old PHP versions around to run business-critical processes, I have to call bullshit on that one.

> legacy installs of old PHP versions around to run business-critical processes

Why? Your code should work in the latest version. Can you give an example?

But, I don't use the languages those cool kids are using. :) I imagine I would have to formulate a very defensive strategy to deal with breaking changes that would gobble up my productivity gains from using said cool language, just like the people that use Mongo have to DIY the checks that RDBMSes get for free.

I use Java and C++, and I don't rush update the second new libraries become available. My team has a large regression suite, too, so it's been less painful (can't say painless) to detect breakage when our dependencies need to be updated.

> I use Java and C++

Comparing Java and C++ to PHP is the equivalent of comparing a Ford Mustang to a Mac Truck. Yes, they are both vehicles. However, they are designed to do very different functions. Java and C++ are statically typed, compiled languages. PHP is not.

Yes, if you drive a Mac Truck, you can bully and make fun of the guy driving a regular car, but if you're using your Mac Truck to commute to work, you're a lunatic.

For every problem, a proper tool.

When you're trying to ferry boatloads of bits down the information superhighway, the Mack truck starts to look pretty good. Sure, the automatic transmission Mustang is cheap, quick to acquire, and easy to find drivers for, but you have to get 10 of them for every Mack truck's worth.

Given that FB's solution to their PHP problem was to discover the static typing, I'd say that PHP is not the proper tool for the current job.

Mack Truck

"See NodeJS and Python for examples."

What's an example of the Python community doing this. Usually Python get's criticised for being overly-conservative if anything.

Funny, my experience is that learning to read the idioms (and write the most convenient idioms) of the language is an upfront penalty, and once you've made the investment things start moving smoothly.

Of course, it's everyone's privilege to decide which penalty they'd prefer to pay, and that doesn't mean there aren't language features that can make put it on a new order of expressiveness.

But are we talking about those features? Or are we talking about "aesthetics/readability"? If it's the later, that's probably a different thing, and I have my doubts that it's anything objective. Particularly if it has much to do with which specific tokens indicate the beginning/ending of blocks or enclosing arguments to a function.

(You know, the funny thing with downvotes is that you never know if you just made someone uncomfortable, or if there's a credible argument against what you just said. Though one tends to assume that if the later were true it would have been employed along with -- perhaps even in stead of -- the downvote.)

The best way to deal with a downvote is to assume that it was an accident.

I'm referring to the quality of the implementation, its performance profile, the soundness of the type system and syntax, and -- like the original post -- measuring relative to the state of the art in JITs, including the JVM.

If the JVM is state of the art, how many applications of the scale of Facebook are using it? A serious question, as I don't know of any.

The JVM is one of the most advanced production-quality JITs available.

As for who is using it, everyone from Google to Twitter to Apple, as well a huge number of high-transaction business systems. The JVM is something that tends to get used quietly and widely.

It's also in my experience a massive memory hog, makes deploying your web application a pain (restarts anyone?), and is much more complex to develop for. If your Facebook and you're trying to get more bang-for-your-buck from your existing hardware, do you think its a good choice to migrate everything to the JVM (including a full rewrite to Java/JRuby/Scala etc.)?

> It's also in my experience a massive memory hog

It's not that Java is memory hog per se, but rather that the total VM memory allocation must be specified at launch, and the VM will not (rarely?) give that memory back to the OS.

This is largely an artifact of Sun's generational GC design+implementation, and has some interesting performance wins; namely, object allocation and deallocation are incredibly cheap, because in the fast case, all allocation requires is updating a single pointer.

The last time I profiled java object allocation, creating short-live objects turned out to be insanely cheap compared to other allocators/runtimes.

> Makes deploying your web application a pain (restarts anyone?)

This largely depends on how you structure your web application.

> If your Facebook and you're trying to get more bang-for-your-buck from your existing hardware, do you think its a good choice to migrate everything to the JVM (including a full rewrite to Java/JRuby/Scala etc.)?

That depends largely on how much they're spending on hardware and JIT/language engineers, versus how much a rewrite would cost, amortized over the reasonable lifetime of their code base.

You have to factor in the fact that they've built an entire organization around PHP, and it's not just a technological migration problem; they might also have to retrain or outright replace a large number of existing engineers.

I'm inclined to say that it was a mistake to choose PHP, and moreover, to continue to use PHP as they scale. However, funding the implementation of better PHP runtime implementation may be the most fiscally conservative decision left available to them.

> the performance of a historically technologically poor language like PHP

One of the biggest reasons PHP supplanted Perl as the de facto web programming language ~10 years ago was that Perl running through CGI was incredibly slower than PHP.

Sorry, but that's just historically inaccurate. PHP _10_ years ago was popular because it was a templating language that could be embedded in HTML, in which case it supplanted SSI and really was a competitor to ASP.

mod_perl was spanking PHP 10 years ago from a performance standpoint, but the democratization of the web meant that a lot of people that started with basic HTML layout then learned to extend it with PHP in templates, then moved on from there. Much the same as Perl became a dominant CGI language in mid/late 90s because a lot of sysadmins and hackers transitioned to back-end development due to web programming demands.

No, it's not historically inaccurate. mod_perl was fast, but no shared hosts offered it. So unless you wanted to shell out big bucks for a dedicated server, mod_perl wasn't much of an option. And if you wanted to write software to give away or sell to others, mod_perl certainly wasn't much of an option unless you wanted to only focus on a very small part of the market.

I personally transitioned from writing Perl CGI scripts to writing PHP scripts about (pause while I look through my records) 9 years ago, and it was exclusively for performance reasons (I still prefer Perl as a language). I was not alone in my thinking back then.

As an aside, there are an enormous number of ways to deploy a modern perl application. For anyone who is curious, see slides 30 through 79 of http://www.slideshare.net/miyagawa/deploying-plack-web-appli...

Perl had a reputation for being more complex for a beginner too and leading to code that was difficult to maintain.

I always thought it was because basic PHP was so intuitive for anyone familiar with a C-like language.

There's a ton of reasons PHP took off, including off the top of my head

- cruftyness of perl

- good timing, few alternatives (C and Perl basically) at a key point in the growth of the internet and web developers.

- very easy migration from a static only site

- easy to deploy, leading to...

- wide availability of PHP shared hosting

- in depth docs

- commenting on docs (cut-and-paste programming ;))

- thousands of builtin functions: first real batteries-included language

- recognizable syntax (esp vs Perl) for Java/C programmers

- not awful performance

- builtin MySQL support from an early stage

- oh, it was free (and also Free)

- dynamic typing

- weak typing (as in, values coerced to other types easily, I know this isn't a real term)

> - commenting on docs (cut-and-paste programming ;))

Very true and rarely mentioned. PHP docs never had the fancy wiki/social/javadoc features you see in many languages, just a primitive comments system - and it was perfect.

When you were picking it up back when "PHP3" yielded 0 results in Amazon (and we had to change the oil on our desktops every week) the docs were your bible, not merely in the sense of occasionally contradicted themselves but also in having examples, Q&A and recipes for common tasks posted by your peers in the comments, much faster than doc writers could catch up with the language's growth.

> good timing, few alternatives (C and Perl basically) at a key point in the growth of the internet and web developers.

Two web language competitors to add here: MS .ASP and ColdFusion. PHP broke out by being both free and Free (the REPL helped too).

mod_perl was there, it performed better but little bit extra memory footprint.

I don't know why mod_perl was known to very few and PHP become popular.

mod_perl required real programming techniques due to the shared namespace. So many cgi scripts were written with no localization and so many programmers of that era had embraced the sloppiness of lack-of-persistence that it was hard to port existing cgi to mod_perl even assuming they bothered to try and grok the "voodoo" of persistence in a single finite process CGI world.

Shared hosting services offered php. You can't do shared hosting with mod_perl.

Umm, actually you can... it just requires some skills on the apache config side.

mod_perl also had serious performance and memory leak issues.

PHP sucked at the time too, but it sucked in ways that were easier to track down and fix.

They've spoken to this directly. Facebook has tried to migrate to something else (generally with a small team leading the charge) a few times.

Let's say that there's 1M lines of PHP code right now handling Facebook's site. In the time it takes the team to re-write those million lines of code, another million have been written. So they can never catch up. Almost like Zeno's paradox, except the turtle moves faster :).

All that aside, you can teach PHP to pretty much anyone, and it's rather effective at solving web problems.

For quite a while they had some really smart people working on APC to help with performance, they've since switched horses to work on HipHop and the like to get the performance they need.


Since they're hiring so many Googlers one presumes Python would be a top choice, but if you need an opportunity to whine about Ruby/Rails go for it.

Good to see that Facebook's engineers still have a startup-y sense of humor.

By which you mean juvenile. I don't mean that in a condescending way. No one wants to work with a bunch of uptight folks. I think it's good too, but let's call a spade a spade.

Believe it or not, my $hit meter was too insensitive to notice the visual similarity to a four letter word until it was repeatedly pointed out by readers. Now that we're all on the same wavelength, let's call a spade a shovel, or even a $hovel, and, well, now we have $hovel and $hit to work with. Imagine the possibilities.

I would prefer juvenile and brilliant (but productive) programmers to boring men in suits sitting in dark cubicles, lifelessly writing code day after day, and who have no sense of humor or fun whatsoever.

Interesting that so far most of the comments are pretty positive. Every time Facebook gives a HipHop talk at my university, it seems the general feeling in the room is "ugh, why?" The PL people wonder why they're bothering with such a broken language, and other language zealots get angry that they aren't using [insert-favorite-language-here].

I wonder if there is a point where they can't squeeze anything more out of HipHop and have to do a drastic rewrite.

The thing is, when they can't squeeze any more out of HipHop, the last thing they are likely to do is use one of those [insert-favorite-language-here], because HipHop will be so much faster than any implementation of those other languages. An important point to remember is that languages, as interesting as they are, are simply means to translate ideas into machine code. If they translate it into really fast machine code, it doesn't really matter that much (except academically) how "correct" the language is. I do not enjoy writing in PHP, but I have a lot of respect for the real world work that Facebook is doing with it.

If you want fast, why would you use PHP and then write a new VM, when you could just use Java or Common Lisp or Haskell or C? Those already exist and are already damn fast.

You could also write your software in assembly and it'll be faster than all of those you mentioned. The point is to select a language that is feasible for a business. Facebook is not an academic exercise. PHP has a lower barrier to entry, which in practical terms reduces the wages that need to be paid. You then get a few guys who are great at C (PHP is written in C) and fix whatever parts are getting in the way of great performance. Which is exactly what Facebook is doing. Remember, Facebook is a business.

You're getting downvoted by people who disagree with you, not because you aren't contributing to the discussion. I just wanted to make it clear that they are wrong for doing that, and that your comment, whether I agree or not, is appreciated.

Maybe if it's a video decoder that uses special video decoding instructions. For real world code, though, a good JIT is going to make most code very fast, and you will actually be able to maintain it. I don't think Java is any harder to maintain than PHP, but it runs much faster. (I think Perl and Python are much easier to maintain, however, so if that was what we were arguing about, then it would make sense. PHP is objectively the worst "P" language.)

You're missing my point so completely that it becomes a fine example of why engineers make poor businessmen.

The best business solution is the one that accomplishes the task at hand with the least resources. By that metric, PHP is far superior to Java, regardless of the technical merits at hand.

It costs less to hire an army of PHP devs and a small team to optimize PHP, than it does to hire an army of Java developers.




The difference in pay is around $20K. If you have 500 developers, that's a difference of $10 mil per year. Over the span of a decade, you're talking about over $100 million in savings.

As I said, Facebook is a business, not a pet project or an academic exercise. Decisions on which platform to use and keep are made based on dollars and cents ... not on which is more elegant or technically superior.

> It costs less to hire an army of PHP devs and a small team to optimize PHP, than it does to hire an army of Java developers.

Is that what Facebook is doing? Everything I've read about interviewing and getting hired at Facebook leads me to believe that they are not hiring an army of cheap PHP developers.

Edit: to avoid a totally uninformative comment, as I see no links to previous discussions, this one: http://news.ycombinator.com/item?id=3335217

The 2 URLs in the parent show the limit at which adding ellipsis sometimes make the inner text of links longer instead of shorter.

My impression is that Facebook has plenty of people who would not find learning Haskell, OCaml or Lisp particularly difficult, and would thereafter be more productive than in PHP. In fact, I know for a fact that some of their internal tools are written in Haskell and some variant of ML.

If there is any company that should not be worried about a language's barrier to entry or trying to get cheap programmers, it's Facebook--my impression is that they have an engineering team more like Google than your favorite nameless corporation.

I think what Facebook is doing is working with a ton of legacy PHP code. In fact, from what I've heard, they're definitely open to using other languages in new projects; it's just that rewriting all the PHP code they already have would be horribly wasteful.

Fully agree, given the code base that exists the best ROI likely comes from improving execution speed of PHP rather than rewriting the codebase.

Facebook does use Java (entire Hadoop infrastructure, HBase deployments, likely more), Haskell (AST manipulation, afaik), OCaml (with a fairly interesting library for static analysis: https://github.com/facebook/pfff ), C++, and C (fairly self explanatory).

The explanation they seem to give is there are a lot of people who can write PHP, and can write PHP code very fast. So because PHP allows their engineers to iterate and prototype new features very quickly, any attempt to convert to a new language is thwarted when those writing in PHP are outproducing the conversion process that it'll never catch up.

I'm not saying it's a great explanation, but it is what it is.

If the people you find to write code aren't capable of learning something other than PHP, you should find new people. People who write code for Facebook should be good at programming, not just "good" at PHP.

But the language itself is conducive to writing a lot of code rather quickly. There is no memory management to worry about, there isn't a bunch of verbose boiler plate like in Java. You just write code and even if it's not perfect, you can iterate on it quickly.

That's not to say that's how I'd run Facebook given the opportunity, but from talking with people who have presented HipHop here, these are the reasons they give.

My understanding is that Facebook rarely hires PHP developers, it hires other developers and teaches them PHP.

This bases on a conversation from 2009 when they had just 200 programmers, so this is likely to no longer be true. However, I think it's highly likely that Facebook developers know a lot more than PHP.

Because you can get a web application up and running quickly in PHP, then tune it later if and when your application needs to scale to 500 million users (which it probably won't, because only a handful of sites reach that size).

There's also a far bigger pool of talent to draw from if you want developers. I don't think there are many people who know how to write web applications in Common Lisp or Haskell.

Facebook really doesn't care what languages new hires know. Only a small number of people I started with had any experience with PHP. Anyone who works here is capable of becoming proficient in the language during bootcamp (http://www.quora.com/How-does-Facebook-Engineerings-Bootcamp...).

Facebook already got started.

Let's be honest. 99% of the people that start a PHP project have it fail before it ever needs to scale. The people that do need to scale, though, are up a creek, because if you started your project with PHP, you probably don't know how to write a better VM for it. Consider, as proof, the fact that there is no better VM for PHP.

Yes, there is no gigantic pool of experienced Haskell or Common Lisp developers.

However, I suspect Facebook has a disproportionate amount of them. Additionally, anybody working at Facebook is very likely able to pick up Lisp or Haskell relatively easily even if they do not know it already.

Facebook is exactly the sort of company that does not look for developers of a particular language. I know a bunch of people who have interned there and several (ones a year or two ahead of me) who are going to work there full time. None of them had any PHP experience. Some of them still don't (the one I know best used C++ there, if I remember correctly).

I think the PHP at Facebook is nothing more than a legacy of how it started.

I believe this is a fallacious argument; performance inefficient introduces significant costs at lower scales, too.

>Because you can get a web application up and running quickly in PHP

You can get a web app up and running quickly in good languages too, and then you also don't have ongoing maintenance nightmares and the massively increased hardware and hosting costs that come with PHP.

I don't know where you're getting this "massively increased hardware and hosting costs" from - we run our trading platform on bog standard x86_64 hardware, as has every other PHP site I've worked on (some even run in VMs with 500MB of memory).

Because you are buying 10 times as many servers, and paying for power for 10 times as many servers. Where on earth did this "bog standard x86_64 hardware" business come from?

It's not like those requirements arose all at once; at first they wanted to get stuff done using tools they already knew. Then, somewhere north of a million LoC they were like "oh right, speed". At that point, it's no surprise that building a VM was cheaper than a full port.

Why? HipHop is already faster than your Dad's fancy sport car.

Their logging server Scribe, which they also open sourced, is equally awesome. Kudos to Facebook for releasing such cool technology.

The discussion of this over on Ars Technica centered around their current PHP interpreter being twice as slow as the Zend PHP engine so a 60% improvement would still leave HHVM slower.

What am I missing?

Ignoring the fact that these are all running on the standard PHP interpreter, not one of Facebook's creation...

The benchmarks linked ran on a 4 core machine with varying amounts of parallelism, generally with Java being more parallelized. When running a server farm, as Facebook does, aggregate CPU time is more important than wall time^, since one can easily scale the number of processes to fit the machine. By CPU time the difference is down a factor of two at least, possibly more.

Be careful with benchmarks. They distill a comparison into a single value for easy use, but it's not going to be the most appropriate metric in all cases.

^So long as wall time is "sufficiently small", which it generally is.

> a 4 core machine with varying amounts of parallelism

Let's force the programs onto one-core.


> aggregate CPU time is more important than wall time

Both CPU secs and Elapsed secs were shown.

> Be careful with benchmarks. They distill a comparison into a single value for easy use, but it's not going to be the most appropriate metric in all cases.

afsina pointed to a program-by-program comparison - not something distilled down to a single value.

Meta comment: HN is becoming an incredibly sad place when the developer of the shootout benchmark site makes a technically valid and polite point - and people mark him down simply because they don't agree with it.

And the irony is the shootout site is written in PHP - so its not like he is especially biased against it.

So check the single core version: http://shootout.alioth.debian.org/u32/benchmark.php?test=all...

Oh look, PHP is still 30 times slower than java. A 60% improvement to that still won't get it within an order of magnitude. And PHP is not significantly faster or easier to develop in than java, so you are losing execution speed to gain nothing.

> And PHP is not significantly faster or easier to develop in than java...

Yes it is easier and faster to develop in (speaking as someone who has done both commercially). Examples of productivity gains:

1. No need to compile your code. Build cycle vastly improved (no time wasted with Ant/Maven/take your pick).

2. Less verbosity.

3. No need to restart your web server every time you deploy some new feature, just hit F5 on your browser and your done.

4. No classpath/jar file fun to slow your down.

5. You can hack a code fix in situ on a dev/server (or production if your brave/insane, I did say hack!). All you need is SSH.

6. Connecting to a database is trivial, support for most databases build in (no messing about with JDBC).

7. Deployments are a breeze: just drop the source code on your server, and you're done.

I could go on.

> so you are losing execution speed to gain nothing.

I wouldn't call a much faster development cycle "nothing", and if you understand Facebook's culture of "move fast" as laid down by Zuckerburg, they value development time higher than execution time (and rightly so).

* Edited to add line returns.

if you've done Java commercially you've used sub-par tools.

1. any decent IDE will compile as you type

2. I'll give you this one

3. Java supports dynamic class reloading, it works out of the box with jetty.

4. you just press "run" in the ide.

5. same applies to java, just requires you take your (automatically built) .class file

6. I'm not sure how difficult you think JDBC is.. you just give it the connection string and the driver name.

7. just drop the war into your servlet directory, and you're done

> if you've done Java commercially you've used sub-par tools.

I spent 3 years doing Java on large-scale telco projects using IntellJ, I'm not an amateur Java dev. I now code PHP full-time on a social network.

> 1. any decent IDE will compile as you type

Fine, but my original point was about building for a production environment.

> 2. I'll give you this one


> 3. Java supports dynamic class reloading, it works out of the box with jetty.

You using Jetty with dynamic class loading enabled in production?

> 4. you just press "run" in the ide.

Once again, your local dev environment has nothing to do with production servers (where you obviously won't have an IDE).

> 5. same applies to java, just requires you take your (automatically built) .class file

And what if that .class file is buried in a .jar somewhere, which is in turn buried in a .war somewhere else? Not quite as easy as opening a .php file with vi on the terminal, is it?

> 6. I'm not sure how difficult you think JDBC is.. you just give it the connection string and the driver name.

...and the driver has to be on the class path, and the driver might have local binary dependencies (depending on the vendor), and you have to be careful to ensure that you are using the correct driver .jar version for your version of the database, and ensure that it is packaged up will all of your other project dependencies when you build your .war/.ear etc.

> 7. just drop the war into your servlet directory, and you're done

...after you have built the .war using a build technology like Ant or Maven (I refer you back to my original point 1 about the Java build/deploy cycle being more complex).

so the war is just built trivially by the IDE when you use a webapp type project. no messing around with the classpath as the war contains all dependent jars.

if you want clean builds you set up something like jenkins (with the automatically generated exported build files).

1) and 3) I was referring to development, you can just code away and hit F5 to see your results the same way in Java you can with PHP.

wrt database jars/binaries: a modern IDE does this all for you too, and is all output into the target war.

You could go on? You are at 1/7 so far, perhaps you should quit while you're ahead. How many decades ago were you developing in java professionally? You don't manually build your code, you don't restart your development web server, are you doing your java development in notepad or what? I've been the sysadmin for big java projects, and a developer on them. I've never had any problems with DB access or "classpath/jar file fun" or deployment in either role. I genuinely have no idea what you are even talking about with the "classpath/jar file" thing, it honestly sounds like something someone who has never used java before might make up.

The only valid point you have is that java is slightly more verbose. But of course, it also prevents a number of bugs that slip into PHP code bases, which I find balances it out. And keep in mind, this is coming from someone who finds java offensive, and refuses to use it any more. You're not dealing with someone who is in love with java, but rather someone who thinks java is a terrible language that should never have been made. It is simply not as bad as PHP.

Java is a compiled language. PHP is not. If you need to do a benchmark to realize that a compiled language is faster than its dynamic counterpart, then I really don't know what else to say to you.

Some compiled languages are faster, some aren't. Speed of execution is largely related to the optimizations the interpreter / compiler can do, and in many practical cases the qualities of the default library. The other problem is that many interpreted languages are implemented in C leading to the usual performance implications of inner-platform effect. Another problem is the underlying CPU architecture, the vast majority of CPUs are designed to execute C-like languages.

What is your point? That does not chance the fact that using Php for a huge site like Facebook is a poor choice. A very poor choice indeed. They must be wasting energy on hundreds of servers because of this choice. One can hope that they are using Php only for the very basic front-end operations otherwise they are royaly screwed.

I don't know why you are saying anything to me since you are responding to something I didn't say and that has nothing to do with the post I made. Is "dim witted attempts to respond to poorly thought out strawmen" really the level of discourse here?

What? Who are you responding to?

You. You may want to consider that if there's two replies to your post, and both are "your post is entirely irrelevant", then perhaps your post genuinely is entirely irrelevant. Protip: read the whole thread from the top.

Where did I say "dim witted attempts to respond to poorly thought out strawmen" or "your post is entirely irrelevant"? Either you're confusing me with someone else or you don't understand the concept of quotes.

Again, read the entire thread. I am not confusing you with anyone else, and I understand the concept of quotes. You are failing to grasp the context of posts, and assuming everything anyone puts in quotes must be a statement you made. There are other people who post, why on earth would you think "your post is entirely irrelevant" is something I claim you said? I very clearly pointed out that two people replied to you saying that your post is irrelevant.

> I very clearly pointed out that two people replied to you saying that your post is irrelevant.

So you're just repeating what other people said? To make yourself feel better? Still a bit confused.

>Still a bit confused

Clearly that is not a situation I can remedy for you. I suggested you read the thread, as that would resolve the confusion for anyone with basic reading comprehension skills. If that is not sufficient for you, then you'll need to ask an adult to explain the words you don't understand.

I'll probably get dinged for making a sophomoric observation.

Their example code makes use of the variable $hit, performs a var_dump($hit) and then returns $hit. Am I the only one that noticed this?

How easy/hard/logical would it be to implement this for a medium-size website with about 250k monthly visits?

I don't think it's logical. Stock PHP can easily handle a website with 250k monthly visits. Or, rather, your speedups will most likely come from talking to the database less, doing less IO, that sort of thing. CPU probably isn't your bottleneck now.

Fair enough answer. Thanks! I'll report back when we hit 800m. ^^

The ironic thing about Facebook releasing this is they're (and a handful of other sites) are the only ones that would really benefit from this. I do love that they release it as open source.

At my last company, we handled about 20 million API requests a day through a PHP API. Granted, the request and response were smaller than a standard webpage, but that was using stock PHP (hell it wasn't even custom compiled, it came straight from the Debian packages).

So, you need to hit a pretty huge amount of traffic before something like this makes sense. Of course, it's always neat technology to play with.

Not everybody using PHP is serving web pages. Zynga use PHP to do server side game simulation, I expect there are a myriad of smaller companies doing CPU intensive tasks in PHP that will welcome these developments.

Surely you used opcode caching?

Presumably he just installed the php-apc package

Even when you hit 800m, you'll be fine. Your performance gains will be outside of PHP via good cache usage, (opcode cacheing as well, but that should go without saying), avoiding hitting the disk or the DB, etc.

More details: That's about one hit every ten seconds. While I know that you'll get bursts, you still shouldn't be exceeding 10 hits per second very often. Any web framework can handle that; the reasons why you can't sustain that or can't get good performance out of that are going to be found in your code, not the framework code, and linear speedups probably won't be the difference between "working" and "not working".

Are you finding that your visits are starting to actually make a performance hit on the servers? For that kind of scale, assuming your code isn't doing weird things, the lowest tier Linode VPS or similar should be more than capable of serving the pages.

No, we are not experiencing any trouble at all. I'm just always looking for new ways to optimize and keep me and my colleagues busy. Playing with kind of stuff seems like fun.

Others answered the logical part of the question. I would be interested in how easy/hard it is. If anyone tried to get it running.

I have two reactions:

1. it's still PHP

2. why are they still using PHP in a world where things like Python, Ruby and Java exist?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact