I have no love for Facebook's product or business model, but their engineering is diesel, and it's very cool that they're sharing such a deep piece of work.
I actually view Facebook as the opposite of what you're describing. VMware, for instance, really did have a bimodal engineering organization; some pieces were amazing, but many were so-so. That can have its strengths, too, and it has worked out ok for VMware to date, but Facebook has the absolute opposite philosophy. For instance, you don't have a "hiring manager" here, because the selection process is decoupled from the allocation process. This means a uniform bar across groups; e.g., I interview people destined for all different parts of the stack, and I expect the same level of skill for all of them.
Wasn't it kind of a waste of time do develop a static compiler plus an interpreter (while ending up writing a jit anyway)? Why did you not start out with a JIT? The problem of static compilation of dynamic languages (spezially PHP) was well known (phc!). The kind of JIT that is discribed in the post does (as far as I can see) nothing in it that was not already known when HipHop was originally developed.
What kind of optimization are you guess doing? Since the scope of your optimization is only one basic block (did I understand that right?) how much do these optimizations acctully help compaired to unoptimized translation to assembly?
Have you tought about using the pypy toolchain. The pypy jit allready does quite a lot of optimizations (not to mention the much wider scope for optimization for "real" traces) and it would be much less work to implment.
The only optimizations we're doing are those that bottom-up type inference makes possible; so, we can generate code that knows that strings are strings, arrays are arrays, ints are ints, doubles are doubles, etc. This doesn't sound like much, but avoiding the double-dispatch on things like $a + $b is significant. We're still using the static compiler's front-end, so all of the things it can do (e.g., CSE, constant-folding, deduplicating identical data, compile-time binding of functions and methods) we hope to "inherit" when running in production mode.
PyPy is a very interesting project, with a beautiful approach. I'd be very, very curious to see what an HHVM interpreter written in RPython could do under PyPy. Unfortunately PHP is not scheme; it is a big language, with many non-orthogonal parts, so it would be a significant effort to answer this question. If I were starting over today, I'd take a much closer look at the PyPy approach, but that is just my opinion.
> This might be getting less interesting
> to the general audience, so we can take
> it offline if you have any more questions.
Well I think if you have a task like that a team should look into what other people where doing. Self, LuaJit, parrot, some smalltalk systems, strongtalk where all allready done or on the way in 2008. In a talk about phc (https://www.youtube.com/watch?v=kKySEUrP7LA) Paul Biggar talk about why AOT might be better then JIT. His reasoning was that the suffer from bad startup times and php is a language where the comon uscase are scripts that only run a short time.
Anyway the static compiler certantly wasn't a bad way to get some speed fast.
Getting some type in there is probebly the best thing you can do :) Where can I get some more information about tracelets. I have never seen litrature about it. Did you guys invent that yourselfs?
I have been thinking about something simular. If you have a language that is dynamic but is suited for static compiling (dylan for example). On could write a compiler that does all kinds of optimization and then spits out a bytecode that is suited for the JIT. Then at runtime you let the JIT do the rest. Im not sure if this is such a good idea because the static compiler would take away some opportunities for the JIT to do an even better job. Thoughts anybody?
Thats the problem I always have. For every language I think about how good it would work with pypy. Somebody started to work on a Clojure impmentation in pypy and I want to start hacking on that. Never having done python is holding me back a little.
Since how long and with how many people have you been working on HHVM?
Why is the bytecode stackbased? From what I read stackbased codes arn't really suited to JIT on a register mashine. Dose the JIT compile everything to SSA first? I think thats what Hotspot does (not sure, does anybody know?).
I made up the term "tracelet." Do not use it when trying to sound intelligent. It roughly means "typed basic block," though the compilation units aren't quite basic blocks, and what we're doing isn't quite tracing, and ... so we thought it would be less confusing to just make up a name. Think "little tiny traces."
The bytecode is stack-based for a couple of different reasons, but it will probably remain stack-based because of compactness. Since instructions' operands are implicit, a lot of opcodes are 1, 3, or 5 bytes long. Facebook's codebase is large enough that its sheer compiled bulk can be problematic.
The translator doesn't turn the stack into SSA, instead turning it into a graph where the nodes are instructions and the edges are their inputs/outputs. You can sort of squint at this system and call it SSA where the edges are the "single assignments."
You have a interpreter that for every basic block starts to record a trace. If you walk in to a basic block (or codeblock) the X-time with the same type you compile it for that type and you set a typeguard. Now everytime the interpreter goes trough there again it does a typecheck then either keeps interpreting or jumps into the compiled block. So if you have basic block in a loop you have to typecheck every time you go threw the loop.
To be fair, LuaJIT has been blazing trails in running dynamic languages fast since 2005. But maybe it wasn't well-known enough to make its way into conventional wisdom.
All that said: I get the impression reading the comments here of an emerging misconception that "everybody knows" the right way to run these languages fast. We emphatically do not know. It is still a research problem. Heck, garbage collection, a proper subset of the problem area we're talking about here, is still pretty wide open. Given the smart people who have given decades of their career to it, we should not expect there to be a single, silver-bullet technique that cleanly maps all these dynamic languages to high-performance machine code.
So we need more than one project in the air. Some of those projects should explore whole-program analysis, like HPHPc and some of Agesen's offline work on Self, and decades of Lisp compilers. Some of them should explore runtime techniques like tracing and inline-caching. Some of them should explore unifying these approaches. We need them all.
HackerNews should have Reddit-like AMAs for software projects. "I work on HHVM/Linux Kernel/Scala/etc. Ask me anything..."
I think there has been so much work put into static compiler for lisps and other dynamic languages but there is not that much to show for it. I mean sure the lisp compilers are really fast but most of the time they get there by throwing away safty and providing typehints. While the work on self was done by one researchteam. The same with LuaJit one guy writes something for a dynamic languages and without adding typehints it starts to perform like crazy.
I think the Lisp community and the rest of the world really missed out on the JIT advantsments. Self was allready fast but Sun picked Oak (Java) to be there Weblanguage (anybody know more about that?).
> "the opportunities the AOT compiler found were different than the opportunities the JIT compiler found"
Good to know.
Anything that you can say about actual engineering details at FB would certainly be of interest to many of the people at HN. Large-scale social networks are like supercomputing. Few people get to see the inside of an operation which pushes the software/hardware envelope of size and performance.
Matt Might has a couple nice posts on compiling toy schemes to c or java.
There are 64 unique job titles in Engineering (and 56 unique job titles in Technical Operations, which also includes software development job titles), but that represents a lot more than 64 (or 120) open engineering positions.
That's not to say that Facebook isn't as big as many people think, though.
(I work at Facebook.)
Pet peeve of mine: why is the word "engineer" thrown around so much on HN? Is it meant to distinguish a certain class of people from mere coders? Or is it just a fancy word of saying "coder"?
I've always referred to myself as a coder. Should I change this?
I am genuinely curious, not trying to put anyone down.
What I'm saying is that making software good is hard. It takes a lot of different skills in addition to mere coding. Actually writing the code - pressing buttons on a computer to enter text into an editor - is one stage in a long process, and all of these stages are as important as each other.
Some of these are skills which you probably have! So calling yourself a mere coder is underselling yourself.
Isn't that the entirety of their business?
Jason Evans is the author of jemalloc (and more). Given all the things he would be capable of applying himself to, it's surprising to see him working on PHP runtime performance (though, the problems are interesting).
Now, if you want to get into the details of syntax, naming conventions, etc.. sure, PHP is not everyone's favorite language to work with, but it is far from "technologically poor".
When people complain about PHP they often mean that in an aesthetic, and in an academic sense, the designers of PHP have, arguably, made a lot of poor choices.
What they usually mean is, due to the low barrier of entry coupled with poor aesthetic design choices the PHP community at large seems to have poor engineering standards.
These all being Turing machines though don't prevent it from having quality engineering being applied to it, as is the case of Facebook.
So, the corollary to "only a poor craftsman blames his tools" is "when you only have a hammer everything starts looking like a nail".
I wonder how many people are paid to directly hack on a Ruby implementation vs Python (yay Google) or PHP (yay Facebook).
Fighting your toolchain is an upfront penalty. Once things start moving smoothly, they tend to continue to work smoothly. And that's a problem that can be solved by further development of the toolchain.
Poor aesthetics/readability? You pay that price every second you work on the code.
This is flat out wrong. If you're on a language all the cool kids are using, you'll find those cool kids have an obsession with perfection. To the point where they deprecate things instantly and break everything that worked before. See NodeJS and Python for examples.
If you have 10K lines of code, not so bad. So you do a rewrite every year. When you have 100K lines of code or more, their obsession with perfection will destroy your business.
A rewrite becomes an immense undertaking. You end up running old versions. Those old versions require dependencies which require you to run old operating systems. Eventually the entire ecosystem implodes.
But guess what? Code that was written in PHP3 over 10 years ago works just as well today under PHP5. You can even mix and match. Does that make the syntax of the language ugly? Sure. It even encourages bad practices. So what? You're running a business, not an art gallery. Code should be beautiful, but not at the expensive of having a successful business and a working product.
Sorry, but having felt the pain of keeping legacy installs of old PHP versions around to run business-critical processes, I have to call bullshit on that one.
Why? Your code should work in the latest version. Can you give an example?
I use Java and C++, and I don't rush update the second new libraries become available. My team has a large regression suite, too, so it's been less painful (can't say painless) to detect breakage when our dependencies need to be updated.
Comparing Java and C++ to PHP is the equivalent of comparing a Ford Mustang to a Mac Truck. Yes, they are both vehicles. However, they are designed to do very different functions. Java and C++ are statically typed, compiled languages. PHP is not.
Yes, if you drive a Mac Truck, you can bully and make fun of the guy driving a regular car, but if you're using your Mac Truck to commute to work, you're a lunatic.
For every problem, a proper tool.
Given that FB's solution to their PHP problem was to discover the static typing, I'd say that PHP is not the proper tool for the current job.
What's an example of the Python community doing this. Usually Python get's criticised for being overly-conservative if anything.
Of course, it's everyone's privilege to decide which penalty they'd prefer to pay, and that doesn't mean there aren't language features that can make put it on a new order of expressiveness.
But are we talking about those features? Or are we talking about "aesthetics/readability"? If it's the later, that's probably a different thing, and I have my doubts that it's anything objective. Particularly if it has much to do with which specific tokens indicate the beginning/ending of blocks or enclosing arguments to a function.
As for who is using it, everyone from Google to Twitter to Apple, as well a huge number of high-transaction business systems. The JVM is something that tends to get used quietly and widely.
It's not that Java is memory hog per se, but rather that the total VM memory allocation must be specified at launch, and the VM will not (rarely?) give that memory back to the OS.
This is largely an artifact of Sun's generational GC design+implementation, and has some interesting performance wins; namely, object allocation and deallocation are incredibly cheap, because in the fast case, all allocation requires is updating a single pointer.
The last time I profiled java object allocation, creating short-live objects turned out to be insanely cheap compared to other allocators/runtimes.
> Makes deploying your web application a pain (restarts anyone?)
This largely depends on how you structure your web application.
> If your Facebook and you're trying to get more bang-for-your-buck from your existing hardware, do you think its a good choice to migrate everything to the JVM (including a full rewrite to Java/JRuby/Scala etc.)?
That depends largely on how much they're spending on hardware and JIT/language engineers, versus how much a rewrite would cost, amortized over the reasonable lifetime of their code base.
You have to factor in the fact that they've built an entire organization around PHP, and it's not just a technological migration problem; they might also have to retrain or outright replace a large number of existing engineers.
I'm inclined to say that it was a mistake to choose PHP, and moreover, to continue to use PHP as they scale. However, funding the implementation of better PHP runtime implementation may be the most fiscally conservative decision left available to them.
One of the biggest reasons PHP supplanted Perl as the de facto web programming language ~10 years ago was that Perl running through CGI was incredibly slower than PHP.
mod_perl was spanking PHP 10 years ago from a performance standpoint, but the democratization of the web meant that a lot of people that started with basic HTML layout then learned to extend it with PHP in templates, then moved on from there. Much the same as Perl became a dominant CGI language in mid/late 90s because a lot of sysadmins and hackers transitioned to back-end development due to web programming demands.
I personally transitioned from writing Perl CGI scripts to writing PHP scripts about (pause while I look through my records) 9 years ago, and it was exclusively for performance reasons (I still prefer Perl as a language). I was not alone in my thinking back then.
- cruftyness of perl
- good timing, few alternatives (C and Perl basically) at a key point in the growth of the internet and web developers.
- very easy migration from a static only site
- easy to deploy, leading to...
- wide availability of PHP shared hosting
- in depth docs
- commenting on docs (cut-and-paste programming ;))
- thousands of builtin functions: first real batteries-included language
- recognizable syntax (esp vs Perl) for Java/C programmers
- not awful performance
- builtin MySQL support from an early stage
- oh, it was free (and also Free)
- dynamic typing
- weak typing (as in, values coerced to other types easily, I know this isn't a real term)
Very true and rarely mentioned. PHP docs never had the fancy wiki/social/javadoc features you see in many languages, just a primitive comments system - and it was perfect.
When you were picking it up back when "PHP3" yielded 0 results in Amazon (and we had to change the oil on our desktops every week) the docs were your bible, not merely in the sense of occasionally contradicted themselves but also in having examples, Q&A and recipes for common tasks posted by your peers in the comments, much faster than doc writers could catch up with the language's growth.
Two web language competitors to add here: MS .ASP and ColdFusion. PHP broke out by being both free and Free (the REPL helped too).
I don't know why mod_perl was known to very few and PHP become popular.
PHP sucked at the time too, but it sucked in ways that were easier to track down and fix.
Let's say that there's 1M lines of PHP code right now handling Facebook's site. In the time it takes the team to re-write those million lines of code, another million have been written. So they can never catch up. Almost like Zeno's paradox, except the turtle moves faster :).
All that aside, you can teach PHP to pretty much anyone, and it's rather effective at solving web problems.
For quite a while they had some really smart people working on APC to help with performance, they've since switched horses to work on HipHop and the like to get the performance they need.
I wonder if there is a point where they can't squeeze anything more out of HipHop and have to do a drastic rewrite.
The best business solution is the one that accomplishes the task at hand with the least resources. By that metric, PHP is far superior to Java, regardless of the technical merits at hand.
It costs less to hire an army of PHP devs and a small team to optimize PHP, than it does to hire an army of Java developers.
The difference in pay is around $20K. If you have 500 developers, that's a difference of $10 mil per year. Over the span of a decade, you're talking about over $100 million in savings.
As I said, Facebook is a business, not a pet project or an academic exercise. Decisions on which platform to use and keep are made based on dollars and cents ... not on which is more elegant or technically superior.
Is that what Facebook is doing? Everything I've read about interviewing and getting hired at Facebook leads me to believe that they are not hiring an army of cheap PHP developers.
The 2 URLs in the parent show the limit at which adding ellipsis sometimes make the inner text of links longer instead of shorter.
If there is any company that should not be worried about a language's barrier to entry or trying to get cheap programmers, it's Facebook--my impression is that they have an engineering team more like Google than your favorite nameless corporation.
I think what Facebook is doing is working with a ton of legacy PHP code. In fact, from what I've heard, they're definitely open to using other languages in new projects; it's just that rewriting all the PHP code they already have would be horribly wasteful.
I'm not saying it's a great explanation, but it is what it is.
That's not to say that's how I'd run Facebook given the opportunity, but from talking with people who have presented HipHop here, these are the reasons they give.
This bases on a conversation from 2009 when they had just 200 programmers, so this is likely to no longer be true. However, I think it's highly likely that Facebook developers know a lot more than PHP.
There's also a far bigger pool of talent to draw from if you want developers. I don't think there are many people who know how to write web applications in Common Lisp or Haskell.
Let's be honest. 99% of the people that start a PHP project have it fail before it ever needs to scale. The people that do need to scale, though, are up a creek, because if you started your project with PHP, you probably don't know how to write a better VM for it. Consider, as proof, the fact that there is no better VM for PHP.
However, I suspect Facebook has a disproportionate amount of them. Additionally, anybody working at Facebook is very likely able to pick up Lisp or Haskell relatively easily even if they do not know it already.
Facebook is exactly the sort of company that does not look for developers of a particular language. I know a bunch of people who have interned there and several (ones a year or two ahead of me) who are going to work there full time. None of them had any PHP experience. Some of them still don't (the one I know best used C++ there, if I remember correctly).
I think the PHP at Facebook is nothing more than a legacy of how it started.
You can get a web app up and running quickly in good languages too, and then you also don't have ongoing maintenance nightmares and the massively increased hardware and hosting costs that come with PHP.
What am I missing?
so x30 more to go..
The benchmarks linked ran on a 4 core machine with varying amounts of parallelism, generally with Java being more parallelized. When running a server farm, as Facebook does, aggregate CPU time is more important than wall time^, since one can easily scale the number of processes to fit the machine. By CPU time the difference is down a factor of two at least, possibly more.
Be careful with benchmarks. They distill a comparison into a single value for easy use, but it's not going to be the most appropriate metric in all cases.
^So long as wall time is "sufficiently small", which it generally is.
Let's force the programs onto one-core.
> aggregate CPU time is more important than wall time
Both CPU secs and Elapsed secs were shown.
> Be careful with benchmarks. They distill a comparison into a single value for easy use, but it's not going to be the most appropriate metric in all cases.
afsina pointed to a program-by-program comparison - not something distilled down to a single value.
And the irony is the shootout site is written in PHP - so its not like he is especially biased against it.
Oh look, PHP is still 30 times slower than java. A 60% improvement to that still won't get it within an order of magnitude. And PHP is not significantly faster or easier to develop in than java, so you are losing execution speed to gain nothing.
Yes it is easier and faster to develop in (speaking as someone who has done both commercially). Examples of productivity gains:
1. No need to compile your code. Build cycle vastly improved (no time wasted with Ant/Maven/take your pick).
2. Less verbosity.
3. No need to restart your web server every time you deploy some new feature, just hit F5 on your browser and your done.
4. No classpath/jar file fun to slow your down.
5. You can hack a code fix in situ on a dev/server (or production if your brave/insane, I did say hack!). All you need is SSH.
6. Connecting to a database is trivial, support for most databases build in (no messing about with JDBC).
7. Deployments are a breeze: just drop the source code on your server, and you're done.
I could go on.
> so you are losing execution speed to gain nothing.
I wouldn't call a much faster development cycle "nothing", and if you understand Facebook's culture of "move fast" as laid down by Zuckerburg, they value development time higher than execution time (and rightly so).
* Edited to add line returns.
1. any decent IDE will compile as you type
2. I'll give you this one
3. Java supports dynamic class reloading, it works out of the box with jetty.
4. you just press "run" in the ide.
5. same applies to java, just requires you take your (automatically built) .class file
6. I'm not sure how difficult you think JDBC is.. you just give it the connection string and the driver name.
7. just drop the war into your servlet directory, and you're done
I spent 3 years doing Java on large-scale telco projects using IntellJ, I'm not an amateur Java dev. I now code PHP full-time on a social network.
> 1. any decent IDE will compile as you type
Fine, but my original point was about building for a production environment.
> 2. I'll give you this one
> 3. Java supports dynamic class reloading, it works out of the box with jetty.
You using Jetty with dynamic class loading enabled in production?
> 4. you just press "run" in the ide.
Once again, your local dev environment has nothing to do with production servers (where you obviously won't have an IDE).
> 5. same applies to java, just requires you take your (automatically built) .class file
And what if that .class file is buried in a .jar somewhere, which is in turn buried in a .war somewhere else? Not quite as easy as opening a .php file with vi on the terminal, is it?
> 6. I'm not sure how difficult you think JDBC is.. you just give it the connection string and the driver name.
...and the driver has to be on the class path, and the driver might have local binary dependencies (depending on the vendor), and you have to be careful to ensure that you are using the correct driver .jar version for your version of the database, and ensure that it is packaged up will all of your other project dependencies when you build your .war/.ear etc.
> 7. just drop the war into your servlet directory, and you're done
...after you have built the .war using a build technology like Ant or Maven (I refer you back to my original point 1 about the Java build/deploy cycle being more complex).
if you want clean builds you set up something like jenkins (with the automatically generated exported build files).
1) and 3) I was referring to development, you can just code away and hit F5 to see your results the same way in Java you can with PHP.
wrt database jars/binaries: a modern IDE does this all for you too, and is all output into the target war.
The only valid point you have is that java is slightly more verbose. But of course, it also prevents a number of bugs that slip into PHP code bases, which I find balances it out. And keep in mind, this is coming from someone who finds java offensive, and refuses to use it any more. You're not dealing with someone who is in love with java, but rather someone who thinks java is a terrible language that should never have been made. It is simply not as bad as PHP.
So you're just repeating what other people said? To make yourself feel better? Still a bit confused.
Clearly that is not a situation I can remedy for you. I suggested you read the thread, as that would resolve the confusion for anyone with basic reading comprehension skills. If that is not sufficient for you, then you'll need to ask an adult to explain the words you don't understand.
Their example code makes use of the variable $hit, performs a var_dump($hit) and then returns $hit. Am I the only one that noticed this?
At my last company, we handled about 20 million API requests a day through a PHP API. Granted, the request and response were smaller than a standard webpage, but that was using stock PHP (hell it wasn't even custom compiled, it came straight from the Debian packages).
So, you need to hit a pretty huge amount of traffic before something like this makes sense. Of course, it's always neat technology to play with.
1. it's still PHP
2. why are they still using PHP in a world where things like Python, Ruby and Java exist?