Hacker News new | past | comments | ask | show | jobs | submit login
Facebook makes HipHop for PHP open source (facebook.com)
315 points by ivankirigin on Feb 2, 2010 | hide | past | favorite | 190 comments

"HipHop for PHP isn't technically a compiler itself."


"HipHop programmatically transforms your PHP source code into highly optimized C++"

Sounds like a compiler to me. Compilers don't have to target machine code.

Here's a quote from an alleged Facebook employee on Reddit regarding this matter.

This entire thread needs to go fall in a well and die. If they had just called it a compiler, you nerds would be arguing about the same thing and saying it was a source code transformer. They just called it that because it translates PHP to C++, and to most people who don't have Asperger's that's not the same as compiling.

How do I know this? I sit 20 feet from the guys who wrote it.


I'm not sure how working 20 feet from the guy makes him an authority on how us "nerds" would react.

Anyway, the thing that got me was the "technically", otherwise I may have let it slide. You could argue it's not a compiler in the commonly accepted sense of the word, but technically it most certainly is a compiler.

Exactly my thoughts. To further support your statement, theoretically, it should be possible to create a CPU that interprets C++ source code as machine code (not sure why would anyone do that though.)

By that logic then PHP itself is also assembly because someone could make a machine that executes it natively. We might as well throw all distinctions away if we go down this road.

I'm not sure if I understand what you're trying to suggest. Indeed, someone could make a machine that executes PHP natively. It's a Turing-complete language, just like C++ or x86 assembly for that matter so it should be possible to build a (weak) universal machine that executes them directly.

My point is that what someone could create shouldn't have any bearing on the discussion. They are using c++ as an intermediate language, fine. Just because we could make a machine that executes it shouldn't alter that distinction. We can still call that compiling, but hypothesizing new machines doesn't help.

A better example is modern x86 CPUs. You feed them x86 instructions, but that is not the native language of the CPU; everything is translated into the real underlying (RISC) language as the x86 instructions are encountered.

Or so I'm told. I've never read Intel's source code :)

I doubt it would be this elegant ;-)

Exactly. The textbook definition of a compiler is a program that translates a program from one language to another.

Remember when the meme was "%90 of web apps will not reach a point where scaling is an issue"? Well, of those who do reach it, %90 will not reach a point when PHP's execution (assuming you use on of the existing code accelerator, all which are free IIRC and one will be bundled with PHP6) is the bottleneck.

It's a cool hack, but really targets Facebook-level needs. For most users, the downside (eg compatibility issues) will far outweigh the benefits.

Actually it's an efficiency gain, rather than scalability. Facebook already have a system they can add more servers at, but this is designed to let them use less and save money as a result.

Also at least 90% of web apps are tiny or are startups that never made it, either way they don't need this for a long time, like you say.

efficiency -> scalability, in the sense that serving the request faster will free the server to serve another request.

My point is that for nearly all apps, executing PHP is very small part of the total request handling time, so even a significant improvement will have little overall effect. I bet many people excited by HipHop could get a much bigger performance gain by adding some simple caching)

Not really, that type of efficiency is not predictable.

From Wikipedia: "scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner or to be readily enlarged. For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added."

I realize we're nitpicking, but I don't really get it: Why isn't it predictable (in the sense that any execution time for an HTTP request ever is)? You run http_load for x seconds and see how many requests got served, then make change y and do it again. Obviously there are other variables but it's possible to get a fair idea.

Because as you described you have to try it to find out how much it will save you, by which point you could probably just roll it out already.

Say you have 10 servers now and you expect business to grow by 10x in next year, so you put 6 months of dev. time into implementing caching. Then you get 10x the traffic in a month and you get V.C. investment as a result. You then have a problem, people are getting a bad experience and the development won't be done for another 5 months. Your CEO buys a rack full of servers, but you can't do anything with them because you don't have a scalable system. Alternatively you could find out after 6 months that you can't make that 10x performance jump after all.

Twitter had this problem a while back, essentially their DB server got overloaded but it was already a 8 CPU, 64GB machine and despite having lots of money they couldn't quickly solve the problem, because you couldn't really get a bigger system.

Facebook knows how to scale, they want to cut costs where they can to become (more) profitable.

It's more like scalability is the derivative of efficiency, for N_1 and N_2 user counts,

scalability = (efficiency(N_2) - efficiency(N_1))/(N_2 - N_1)

You might be surprised in some cases.... We recently did some tsung testing on an optimization system we've developed (purposely general here).

Running it the developers workstation w/ APC and Memcache and local Percona MySQL, it maxed at 300 request/s. (It's just a workstation, remember). Anyway, there may be some observer effects here, but the profiler indicated that most the time was being spent on php functions and system calls.

So we might be in the 1% you're describing. Or _maybe_ it's a more than 1% after all.

Though I agree that most single sites wont benefit from this. We have a single site serving 100 pages/sec and php is not even close to becoming a bottleneck.

The other application where this may be helpful is the cloud. In a system like this you have 100s or 1000s of sites on a single machine that may reach the combined traffic where this optimization can come into play.

Or when you pay per cpu cycles / memory. If you can reduce that usage, then your costs are directly affected.

Some details not in the official announcement:

Currently works off of the PHP 5.2 language specs

Compiles to a stand-alone libevent-driven web server or CLI executable

Not based on Zend's runtime/code

Currently no windows support

Intended for 'drop in' PHP replacement - if it runs on PHP and doesn't use a few things like eval() it will compile with HipHop

Did they reimplement the entire standard PHP library bug-for-bug in C++? I'm curious about that aspect of it..

Why would they need to do that? The standard library is a mix of PHP, which they can already handle, and C, which doesn't have to be rewritten.

First, they didn't call the PHP standard library: the poster of the root comment we are replying to has stated they didn't use the Zend runtime or code.

The PHP standard library has a huge number of functions that are mostly useless permutations of one another. As opposed to, say, Erlang's meager but usable lists and string module, PHP has such high-level meta-randomness as stristr() (case-insensitive search), str_word_count(), and str_shuffle(). So if you were going to convert that directly to C++, there wouldn't necessarily be a direct replacement in strings.h. Therefore I'd guess they either have to reimplement all of those in PHP, and compile that to C++, or write them in C++. Furthermore, some of these functions have unusual quirks, so some kind of framework would be required I think.

I'm sorry, that's indeed what the root comment said. Thanks for the correction.

Still, I imagine they must use a lot of the same code - half the PHP standard library is just a thin wrapper of some C library, and reimplementing all of them sounds stupid.

Then again, they may just not support the majority of such libraries.

Intended for 'drop in' PHP replacement - if it runs on PHP and doesn't use a few things like eval() it will compile with HipHop

This sounds a lot like Python's 2to3. This technique worked so well that the Python 2 series is a distant memory of the past.

(Oh wait... the opposite.)

So it converts PHP into C++?!? Woaw.

Didn't people accuse Joel Spolsky of jumping the shark for doing that kind of thing (Wasabi script => php or vbscript).

.....wait this means that Wasabi can now technically convert into C++! :D

I think where most people chokes on Wasabi is the perception of compiling from an "insane" language to a sane language. No matter how cool it is to work at Fog Creek, it's very 1965 to work in a proprietary language, especially for web development. If Joel wasn't famous, anyone leaving Fog Creek would have serious problems finding a new job.

Facebook compiles from a "sane" language (PHP) to a less sane, but twice as fast language (C++). The alternative was to get their devs to work in C++, which they decided was an unattractive solution -- this seems reasonable.

PHP is somehow a more sane language than classic ASP? They are contemporaries of each other, and both were pretty popular in their day. PHP has continued, while ASP has been replaced by something significantly better than either.

P.S. If the company you're applying to is upset by you having worked in a proprietary language, they're hiring for the wrong things.

> If the company you're applying to is upset by you having worked in a proprietary language, they're hiring for the wrong things.

In a perfect world, sure. But in the real world, the guy with actual experience in the language/framework used gets an interview first. Then the guy with experience in any language/framework the interviewer immediately recognizes as relevant. Then the guy with experience in something so obscure it's not even on Wikipedia (which Wasabi wouldn't be had Joel and Fog Creek not been famous).

I don't think you quite understand how Wasabi works. Wasabi compiles VBScript (written mostly like normal VBScript with some conventions to ease the work on the compiler) to PHP. At least, that's how it originally worked. Now it compiles the same code to .NET IL which runs on Windows and Mono. They're not writing code in some obscure language.

If there's some company out there that doesn't want to hire people capable of writing their own compilers to more easily add cross-platform support to their software the guys at Fog Creek aren't going to want to work there anyway.

According to Spolsky, Wasabi is backwards compatible with VBScript, but "we like closures, active records, lambdas, embedded SQL a la LINQ, etc. etc. and so those are the kinds of features we put into Wasabi."

If they'd been working in vanilla VBScript in 2010, it'd be even more insane, and also, I think Joel would have some serious problems holding on to employees.


> from a "sane" language (PHP) to a less sane, but twice as fast language (C++)

C++ is much faster than PHP, not just twice as fast.

Hand-coded C++ is much more than twice as fast, but 2x is what you get from machine-converted HipHop at this point.

Agreed. Scripted/interpreted languages are, in general, one to two orders of magnitude slower than a compiled language such as C/C++. It's damn difficult to go much better than 10x slower due to the nature of an interpreter--even fast opcode-based interpreters. There are some extensions to the GNU C/C++ compilers that can help here, though.

Languages aren't "fast" or "slow", it's programs that can be fast or slow.

If a higher-level language lets you implement a smarter algorithm, your program will run faster.

> Languages aren't "fast" or "slow", it's programs that can be fast or slow.

We can say that "C++ is faster than PHP", because most, if not all, programs using the same algorithms will run faster when written in C++ and compiled to native code.

> If a higher-level language lets you implement a smarter algorithm, your program will run faster.

PHP is generally easier to master than C++, but both are high level enough to provide adequate tools to write efficient algorithms with comparable amount of effort, provided one is competent with the language at hand.

PHP is sane?

I'd say PHP is more sane from a business and coding standpoint. Need to find a PHP developer? No problem. Want a PHP IDE? No problem. A large community to handle bugs and any future development? Yes.

It's obvious that wasabi has been good to Fog Creek, but that doesn't make it a sane decision, regardless of the quality of the syntax and semantics of the language itself.

What does "sane" mean in this thread? I can't figure it out other than "something that makes sense to the poster".

>> Need to find a PHP developer? No problem.

Not being able to hire just anyone to code for you is a good thing just as much as it is bad. It means you're going to have some pretty bright people working for you, which apparently is something Fog Creek values and can afford to do.

> Want a PHP IDE? No problem.

Most hackers have their own preferred editors anyways, and Emacs/vi/etc work well with any language if that's the editor you're experienced with.

>> A large community to handle bugs and any future development? Yes.

Again, that's a hindrance just as much as it is a benefit. If you need to make a change or fix a bug in the language, you're going to have much better luck if you know a bunch of people who are intimately familiar with how that language works, especially if their right down the hall from you.

In response to your first comment I think the idea that developers that don't use PHP are better or brighter is absurd.

Not that it matters anyhow for most roles you neither need nor can afford to hire the best and brightest away from their current role which they probably love a lot(being the best gives you the ability to choose great places to work). So unless you're looking for a tech lead you're going to be looking for great, much better than average but not specifically the best and for those conditions PHP will give you a much larger developer base to choose from every day of the week.

As for the last point I think you're really reaching here to find a negative. A large community to handle bugs a hindrance? Seriously? And your alternative ideal solution is some people down the hall in your office that know how to use the language well?

Uhm, if the guy down the hall WROTE the language, that's gonna be a hell of a lot more useful than posting to a mailing list. Especially if you can't post your secret source code, so instead you have to spend a lot of time coming up with a simple test case you can post which demonstrates the problem -- whereas with the guy down the hall, he's allowed to see the code.

That's great if you work down the hall from the guy that wrote the language but unless you're talking about a niche in house language when it comes to web development you're talking about what half a dozen people that you can claim that against? And how many of those work at Google and the like?

The vast majority of people will never have the opportunity to be down the hall from that guy and for the vast majority of people a large helpful community will be infinitely more valuable.

>> The vast majority of people will never have the opportunity to be down the hall from that guy and for the vast majority of people a large helpful community will be infinitely more valuable.

Which means what to Fog Creek?

And what happens when the guy down the hall leaves the company?

Ah, well, then you're fucked. My point is not that you should consider it a good idea to make your own proprietary language, just that that situation can sometimes be a really good, productive one.

(I'm dealing with this in my work -- we use a proprietary internal scripting language, and the guy who wrote it is not around. I'm slowly becoming the local expert on the language -- very slowly, though, because opportunities to work on it don't come up too often.)

I think I introduced the word. I used it to describe the perceived relative merit of PHP vs. Wasabi/C++, where, in the example at hand, the sanity of PHP exceeds that of the alternatives for the given situation.

More sane than any proprietary language, yeah. And more accessible than C++.

You're comparing apples an oranges. PHP is no more sane than classic ASP. They're both pretty awful.

I think thats unfair. PHP sucks some hairy donkey's balls, but VBScript is significantly worse. Dim, Redim Preserve, ByRef, Arrays that are basicly C arrays (see also Redim), Ridiculous Object/Value dichotomy (for those that arent aware you need to use the Set keyword to assign an object to a variable, but not for a value. The distinction bleeds all over your code and makes things hurt) and friends are evil sons of bitches that don't belong in a scripting language.

Hello pot. This is kettle. You're black.

Would you like to expand on your claim? Be quite clear that i dislike both languages. VBS is the worst of the two though.

Classic ASP is whole class of awful all to itself. PHP has far more in common with other scripting languages than it does with Classic ASP. The fact that they both support embedding code into HTML is their only similarity.

Wasabi >> Classic ASP, if that's what you're referring to. But it's a whole lot more, and on top of that it's proprietary. PHP isn't, and that alone makes it more sane.

> No matter how cool it is to work at Fog Creek, it's very 1965 to work in a proprietary language, especially for web development

So you're basically against the use of Domain Specific Languages on the whole? Or only those whose syntax you don't like?

When the domain is general web development, I'm against it. It's an instance of the "right tool for the job"-problem.

If you're hiring developers based on experience with a specific language, you're probably doing it wrong anyway.

Hire based on experience with a specific paradigm. OO. Functional. Etc. Or experience with a specific domain problem. Compilers. DSLs. Operating Systems. Etc.

I wish someone would invent some sort of C-like language that wasn't as complicated.

Maybe it could even compile itself on the fly to machine code to continually optimize itself!

Javascript? :)

We Fog Creek developers were just mentioning this hilarious possibility.

There's a pretty big difference between converting a very widely used language to C++ for performance reasons and inventing an entirely new language (which converts to some other, very similar language) for the purposes of tackling the unique and fundamental problems that arise when writing a bug tracker. Difference being only one of these is tremendously silly.

Well, from what I've heard its more like FB had a mountain of PHP they couldn't ever rewrite, so they made PHP faster by turning it into C++.

Fog Creek rewrote everything in a brand new language they invented that compiles down into multiple other languages.

So, in actuality, the projects are the exact opposite of each other.

Cute, but completely wrong.

Wasabi started out as a VBScript -> PHP compiler, allowing us to keep the existing FogBugz source, but run it on Unix and Linux, where ASP wasn't available. Once we had that, it hit us that it'd be pretty easy to add a VBScript -> VBScript compiler, at which point we could add language extensions. So we did. This allowed new code to take advantage of things like macros, better declaration syntax, lambdas and so on, without rewriting any existing code. Finally, because it was clear that ASP was dead, and specifically because we didn't want to rewrite FogBugz from scratch, we modified Wasabi to target .NET.

The whole point of Wasabi was to avoid rewriting FogBugz to support other platforms and technologies. I have no idea where you got the idea we rewrote the entire program; if we'd done that, we'd have just done it in C#.

I have no idea where you got the idea we rewrote the entire program ...

I think the mythology of Wasabi has become larger than life and pretty heavily distorted.

Yep. Mainly because people like to think that Joel is a completely incompetent idiot that quite naturally would do something like inventing a new language for no absolutely no reason whatsoever, never mind the fact that he's running a successful company making a successful product.

Fog Creek didn't really rewrite anything. In fact, that was the whole point of Wasabi, to not rewrite anything.

Wasabi did extend VBScript to add objects and other language features, which were gradually incorporated into the code base, but just on changes going forward.

So, really, it is almost exactly the same thing.

Except for the part where Facebook runs this in a production environment to serve hundreds of millions of requests and Wasabi is a toy.

Except for the part where Wasabi powers FogBugz, the main product of Fog Creek Software, and enough people pay for this software to keep that company running very profitably for 10+ years.

Hmmm, getting nasty there

Both companies had huge investments in a code base and found a geeky way to keep their code base while reaping modern benefits.

Most things are toys compared to face book, Google or Twitter. That doesn't make them pointless.

What is nasty about pointing out this difference in scope?

Wasabi is an app-specific dialect of VBScript used by one company to deploy a bug tracker to small teams. HipHop is a general-purpose PHP compiler whose output is used at scale by the world's largest social networking site.

I can write a file system that will work fine on my computer, with my workload, and maybe even write an essay about how productive that makes me. But I don't think you'd want to install it.

These two projects are not in the same league - they're not even in the same sport - and so I am baffled to see Wasabi come up in this thread at all.

You've introduced a new idea here by comparing FogBugz to Facebook.

Those earlier comments are comparing the strategy that Fog Creek took versus the one that Facebook has taken. The strategies are similar, and so the comparison makes sense.

It should be noted that Wasabi is a general purpose programming language. It is Turing complete (up to relatively trivial finite-memory issues, just like any other general purpose programming language). Like C, for example. It should also be noted that Wasabi is a .NET language. It has full access to the .NET Framework and all of its classes.

One could use Wasabi to write a C compiler.

[NB: I am a Fog Creek developer, working on the FogBugz team.]

What is nasty about pointing out this difference in scope?

In this case, your tone.

[citation needed]

What evidence do you have that Wasabi is a toy? I seriously doubt you have any knowledge on the topic either way.

Actually, the point of Wasabi was to avoid rewriting anything.

Whats even worse is fog creek uses .net now, so it even makes less sense to have your own proprietary language.

So it converts PHP into C++?!? Woaw.

Joey Lawrence?!

I see this as being useful if you already have these two things:

1) A large PHP codebase and

2) A CPU-bound app with scaling issues

If you're starting a new app, I don't see why you would choose PHP. And I think most web apps are database-bound rather than CPU-bound. HipHop seems incredibly useful for Facebook, but are other people here on HN excited about using it?

I think this is where many run into issues with this story. This project was never really meant to be something to help the PHP community as a whole. It was meant to be a tool to help Facebook and to that end they've succeeded very well. 3 engineers + 2 years = 50% CPU improvement without having to rewrite the whole PHP codebase from scratch? That's a great achievement for Facebook but YMMV.

Seriously. $100k * 3 devs * 2 years = $600k to buy half as many app servers in the next year(s)? They saved Facebook tens of millions of dollars. I should imagine their next performance evaluation will be quite smooth.

This is what people mean when they talk about great hackers being 10x as productive...

It would have been a great freedom for the devs, at a company of facebook's size it wouldn't have been disastrous had they not really achieved a great enough performance boost to justify it's use.

I have a CLI application that's IO-bound. I'm extremely excited. Furthermore, I'm sure this is exciting to anyone who has deployed Memcache, and now has an application that is CPU-bound.

I wish memcache changed my IO bound app to CPU bound.

You can optimize on multiple fronts. Just because facebook needs to scale data access doesn't mean it shouldn't scale processing time too. As the post mentioned, this has had a huge impact. [i work at facebook, comments on this site are my own opinion, and I'm not intimately involved in the project]

But generally I agree about new projects. If you have a favorite language, use it.

There is no need to switch to PHP only for this reason, but there is a huge number of apps already using PHP. It will be a great benefit to them. Also, it will become another factor for people considering developing a new application: if a language can quickly be converted to C++, this is easier than writing extensions in C/C++ directly.

This sounds rather similar in principle to Python's Shedskin compiler, which turns ordinary Python code in C++ code, using C++-based runtime replacing the ordinary Python runtime. This works extremely very well on a certain class of problems. http://code.google.com/p/shedskin/

Does this make more sense than targeting LLVM? It doesn't seem easier to maintain, and you won't get the advantages of the optimizations/work already done.

According to the Unladen Swallow's team [1], LLVM's "just-in-time infrastructure was relatively untested and buggy" when they started working with it. So, I think not using LLVM was a pragmatic decision. In addition, debugging C++ code is much easier than dealing with intricacies of dynamically generated machine code.

[1]: http://www.python.org/dev/peps/pep-3146/#performance-retrosp...

Yes, but it's not now. Unladen Swallow isn't exactly alone in blazing this trail, Rubinius is using LLVM as well.

Also MacRuby.

HipHop is a static compiler, though. That part of LLVM seems to have been heavily exercised by GCC and Clang.

Well maybe The Facebook guys were not only after speed optimization, but memory too. No matter how great the LLVM JIT is it probably uses some memory to compile the code, and this way wasting some amount. Static compilation would not waste this amount.

I'm actually proponent of JIT, over static compilation, but only for my little pet projects.

I don't know much about WEB, but for games development, static compilation is sometimes the only way to go (or script interpretation). Sometimes there is just not enough memory, and sometimes it's restricted (hell, it so much resticted that you can't change the VTABLE at runtime in C++ class even if you want to)

Was just thinking the same thing. "Man, I guess it works, but isn't it weird that they're generating what's probably hard to read C++ just to get fast machine code? Eventually they should make a language just for the purpose of optimizing. Oh wait."

I doubt he was somehow ignorant of the fact that g++ optimizes its generated code. I believe his point was that great work has already been done on optimizing compilers for dynamic languages targeting LLVM (c.f., Unladen Swallow and Rubinius).

I wonder if it'll let you convert things out of the box or if it was optimized for the way php is written at facebook. (eg. it doesn't support eval() but eval() is bad anyway)

Also, how hard would it be to do the same thing for Ruby and Perl? I'm sure there have been other attempts at compilers, but as noted elsewhere, having FB behind it makes HipHop more stable...

Oh, and atleast one thing from this anonymous interview came true: http://therumpus.net/2010/01/conversations-about-the-interne...

I wrote a PHP->C++ compiler and bits and pieces of the standard library many moons ago (BinaryPHP: http://sourceforge.net/projects/binaryphp/ ), and it really wasn't terribly difficult, even for a complete compiler novice. We didn't do the fancy optimization that HipHop does, but it worked fairly well at the time. The actual compiler would probably only take a month or two (disregarding real optimizations) for a talented developer or two, but the runtime (assuming it doesn't share anything with PHP proper -- don't know if this does or not) would take much, much longer.

Doing it for Ruby or Python would be fairly straightforward to get running initially, but doing Perl<=5 would be impossible (for all cases) due to it being proven impossible to statically parse.

You can compile any code - full static parsing doesn't have anything to do with that. If you have a place that is not possible to parse one way, you do a switch on the condition that causes the parsing choice. Then you compile each branch assuming that code was parsed in a specific way.

how hard would it be to do the same thing for Ruby... ?

Much harder, but not impossible. The two issues are (a) how much of Ruby the language can be compiled to C++ using Type Inference, and (b) For the stuff that can't be compiled, how often does it show up in production code?

FB's PHP solution relies on having a large code base with few instances of code that doesn't compile cleanly.

With Ruby, I suspect both issues are a problem. I think there's a lot more of Ruby the language that will be difficult to infer, and furthermore I think Ruby programmers tend to use those features a lot.

That's just a guess, mind you. You could probably do a lot if you make some major breaks with tradition. For example, if instead of compiling Ruby source code you load Ruby classes, run all the meta-programming, then reverse-engineer the source (ParseTree can do this with Ruby 1.8 but not Ruby 1.9), you could basically do the meta-programming in interpreted mode but run the resulting code in compiled mode.

I suspect that would work well for a lot of Rails programs where the hairiest stuff is run when classes are loaded, not on the fly when responses are being served.

Wouldn't the challenge in ruby be the prevalence of things you just can't and don't do in PHP? Thinks like instance_eval, class_eval, etc....in Ruby projects I've worked on (and especially in Rails), these are all over the place.

Of course, assuming these calls all happen on the front end (source load) and not downstream in response to some triggering event, your approach would probably work.

`instance_eval` and `class_eval` are very hairy if you pass strings (which you MUST do for some kinds of template metaprogramming). But they aren't hairy at all if you pass blocks, where all you are doing is (essentially) setting `this`.

I really could not justify not doing web stuff in PHP anymore. PHP is THE web language. No, Ruby is not going to take over, and no, python is not going to take over.

Those two would have been the main contenders for the space, but they both had their shot and their 15 minutes of fame, and rather than rapidly growing to dominate web development, they nichified.

PHP however is being improved, being cleaned up, and most importantly, being speed optimised.

PHP is what you can trivially easily outsource - try to do that with python or ruby.

This right here is basically the move that is going to cement PHP as the web language. The largest sites are using PHP and are going to stay with PHP. Nothing will take over from PHP. The battle is over.

PHP is what you can trivially easily outsource

Yes, you can... but be careful - you might end up with this: http://paste.lisp.org/display/76132

That's real production code I removed from an application I was hired to extend and debug. Eventually, I convinced the client to scrap it and let me write a replacement with Rails.

Sadly that resembles most of the ASP/PHP code I've ever worked with.

The very low barriers to entry of PHP and ASP means you have (had) a lot of very inexperienced folks writing code.

That said, I agree with the original point that PHP can be very productive. No compiles, no builds... just edit code/refresh browser to immediately see the effect of a change. That's nice if you have the discipline to not do stuff like the linked example.

It's important to note that the above-linked code could be written in any language that has printf or something like it. That includes Lisp, Python and Haskell.

Of course, people who know those languages don't write things like that. I think it's due to pg's "Python Paradox"; I'm inclined to think outsourcing is actually easier with "smart" languages because the percentage of bad providers will be very low.

I think one of the problems comes from trying to use PHP too heavily as a template language, and then people try and do the shortest thing possible even if their source looks horrible. (I know I've done some icky things in the past to make the HTML look nice at the expense of the PHP file looking worse, or vice versa.) It's very easy to fail in separating code and visible content in PHP, which is great for some cases but bad for others.

What I initially wrote to replace The Printf wasn't exactly pretty; it was a PHP function containing mostly HTML, with a few snippets of PHP interspersed. I consider that sort of thing excusable, especially if there's external time pressure. There's not much of an excuse for using an impenetrable 22-arg printf call to generate a table row though.

I think for people looking for outsourcing to it probably comes down to price, on average it's going to be cheaper to get get some php that works that get some python that works.

Just look at rentacoder and the like, most of those people don't care how it's written or think about how much trouble the code base could give them in the future as long as it get's done and works.

That hasn't been the case with most projects I work on, but most people I work with have been burned by taking the cheap route in the past.

Well me either, but just by some of the random bid invites you get sent you can see that's what a lot of people are going for.

There are a few narrow cases where I'd choose to do a project in PHP, but very few.

It is possible to write nice clean code in PHP, but few apps I've ever seen do so. Most use includes like function calls all over the place, so there is no notion of what variables will be used or defined inside the included code.

Ruby to some extent, and especially Python, are designed so that your code looks good and is easy to read. Admittedly some people can write highly obfuscated Ruby, but that (fortunately) went out of style a few years ago.

The average programmer does not use Ruby or Python. He uses Java, PHP or C. And the programming world is dependent on the average programmer.

I think the problem is you state your opinion as absolute fact. Your model for programming appears to be 10,000 monkeys typing away at the specifications handed down from On High. PHP does work in that scenario. It's simple. It's easy to think in because it's so simple. You can do complex things, but it'll be by gluing infinitesimally simple parts together over and over and over again. (Think of PHP programs as an essay written by a 2nd grader. Lots of simple sentences. Lots of repetition.)

If all you have available is a bulk of mediocre people, then PHP is a great way of making sure they produce what you want right now. Of corse, in this process, they will remain mediocre and never become more than your "average programmer" stereotype due to being forced to only think in the absolute simplest terms.

The writing is on the wall, javascript is THE web language, php will be THE legacy web language.

The train is just leaving the station, be sure to get the best seats.

I actually agree with you. Javascript is next in line. But it's not yet its turn. It will come soon, though, but what's the point in jumping off the dock on a boat when the boat has not arrived yet?

Are you serious? I'm not trying to be flippant, but I literally can't tell if you're joking or not.

Yes I am serious. The most widely used and supported web language is likely to remain the dominant language on the web and will not be surpassed by tiny niche languages.

That statement is one I would describe as reasonable.

That statement is indeed reasonable, but some of the other stuff you said wasn't. For instance, your assertion that Ruby and Python are "tiny niche languages" is false.

You also used hyperbole like: "Nothing will take over from PHP. The battle is over." It seems hard to believe that this was being said with a straight face.

I know how you like to jerk the Hacker News community around at times, so I thought it prudent to ask if you're being serious. :)

* http://news.ycombinator.com/item?id=1038779

I wish I could find more of these, but ruby and python are indeed tiny compared to PHP: http://www.zacker.org/files/market_share_graph.jpg

My judgement call is that the battle is over. I may be wrong, but I am on the right side of wrong. If I turn out to be wrong in the future, I'll look back and still think I made the right call given the data I had.

It has been a while since I decided to learn python and django. At the time I thought they would take over. Well, I was wrong. They have not - PHP has continued to grow faster than they have.

So this time my call is that PHP is the thing to stick to, particularly when I see moves like this one facebook has. In any case, this choice I made is the safe one. I can't lose by making this decision.

I wouldn't make a blanket assumption based on a graph like that. The number of PHP installations are higher for 2 useless reasons:

1. It is trivial to see if PHP is installed on a webserver. It is not so trivial to see if Python or Ruby is.

2. PHP has a strong presence in shared hosting, which is popular for hobbyist or low traffic websites.

When you take into account that big websites like Youtube, Reddit, and FriendFeed are written in Python, these graphs are downright useless.

I don't know anything about that graph, but I'd be happy to cede the point that there are many more installations of PHP than Ruby/Python on the web as long as you'll cede the point that this fact doesn't relegate them to "tiny niche languages." If you don't agree, let me know and I'll argue that point.

Obviously what you choose to use for development is your decision. That said, I am confused as to why it's so important to you that you use the most popular language. After a certain threshold of popularity, it doesn't seem that it should matter. PHP, Python, and Ruby all have the critical mass of developers and library support needed to render language popularity concerns irrelevant.

Okay, I'll cede that point. Python (don't know much about ruby) has the support you need to write your own web apps. But try to find people who you can pay to continue your work - very difficult and very expensive. That's my main reason.

That's a lovely graph you've got there from 2006.

I don't agree with you, but I find it sad that your comment has been voted down below 0. Is this what HN has come to? We just vote stuff we don't agree with down?

People might have voted the comment down because they thought it was obvious trolling. Personally I'm not interested in yet another conversation on merits of PHP vs other scripting languages.

That's the point of the karma system.

I can't tell if you're trolling, but that's not the point of the karma system; that being to vote up "good contributions" that you would like to see more of at HN and vote down "bad contributions" which you would like to see less of at HN, regardless of whether you agree with them or disagree with them.

Nice troll, but there are a ton of websites being developed in Ruby/Rails these days. I don't think it's nichified at all, and the number of job ads is going up, not down.

Ruby is being improved and sped up too, in case you hadn't noticed.

As always, use the best tool for the job. For an awful lot of small business sites - where PHP got its start - that is now Rails, and it's slowly emerging in larger projects too. I think Ruby will definitely give PHP a run for its money in the fullness of time.

That's what you believe. I believe something different. I believe that these languages had the chance, and it didn't happen. Technology rarely gets a second chance without some type of major change.

Yeah, that's why I said "I think ..". Your statements were in the tone of stating facts, hence my rebuttal.

Anyway, I think your dismissal of the newer languages is very premature. PHP took a long, long time to get where it is. 10 years ago you would have been saying "Perl has WON!!!"

You realize Python is older than PHP and Ruby is the same age right?

Yeah. PHP and Ruby are contemporaries; I shouldn't have described it as "newer". But I would argue that unlike PHP, which was crude but useful right from the beginning, Ruby took a long time to gestate. It's only really in the last few years that Ruby's made the transition from enjoyable academic plaything to serious tool. And let's face it, we needed to wait that long for computers to become fast enough to run it ... ; )

But now, there's an astonishing amount of development and innovation surrounding Ruby. Look at MacRuby, for example - if that isn't a vote of confidence I don't know what is. In my opinion, the language is only just beginning to properly hit its stride.

Who knows what will happen. Ruby might fail to gain critical mass, of course. But my point was that the OP's assertion that Ruby had tried and failed to make its mark is very premature. The Rails hype might have died down but from my perspective Ruby has more momentum than ever.

About MacRuby -- I think it will start infringing on Objective-C turf soon.

There's also Rubinius. I think an exhaustive test suite for the language spec is a good thing.

This is truly the PHP solution to the problem. Design a virtual machine that uses runtime information to generate highly-optimized native code? Nahh.... just regex the source into C++, compile with g++, and enjoy!

(To be fair, some language implementations have achieved good results with this method; notably Chicken scheme. But I was expecting something better and potentially more useful to the language implementation community at large.)

C++ is known to be fast, so what advantage could a more complex solution provide?

Not having to wait fifteen minutes for your app to compile and link, for one thing. But seriously, the C++ implementation comes at the cost of many of PHP's features; a "more complex solution" would allow those features to be kept. And while C++ is fast, a script translated into C++ is not going to be as fast as if you had written the application directly in C++, unless the compiler is very, very smart. (I have not seen any implementation that does this; GHC's native code generator produces faster code than it's "via-C" code generator, for example.)

Also, static compilation precludes runtime-based optimization techniques, which has made a lot of code run "faster than C" (including C, ironically; see LLVM).

So anyway, all this is is a way to compile a limited subset of PHP to C++. In doing so, you have to write a special dialect of PHP, without PHP's easy deployability and code/test/debug cycle.

In that case, why are you using PHP to begin with, when a language like GHC or SBCL would run faster and without any compromises?

Not even the Linux kernel takes 15 minutes to compile and link anymore. My desktop machine does it in under 10 minutes, but I realize you weren't being entirely serious.


Umm...note in the post the mention of HPHPi, which gives you the code/test/debug cycle of regular PHP but using (apparently) HipHop.

It seems odd to me that anyone thinks PHP is "easy".

The syntax is odd, as is the behavior of the built-in functions (many retain seriously warts from older versions of PHP)

It's "easy" in the sense that you just edit one file and the webpage changes, but if that's a stumbling block for your developers then you have bigger problems.

PHP is "easy" because anyone can grab some code snippets, rewrite some mysql_query's and have a terrible little my first dynamic website doing in a couple of hours.

Developing something on a large scale or even developing something of a normal size while maintaining good programming techniques and structure can be difficult.

It is really easy if you already have experience with C++. Also, it is not very different from UNIX shell, and it is clearer than Perl, so I don't get why people complain so much about it.

Everything is easy if you have experience with C++.

I fail to see how PHP is clearer than Perl. It may have fewer special characters, but everything is done with oddly-named functions that take positional arguments in an inconsistent order, return error codes instead of throwing exceptions, and randomly spew warning messages.

From a language design standpoint... there is no design. Perl may not be perfect either, but the design has been flexible enough to radically modify as the community sees fit. Or rather, how you choose to modify it, as most modifications are modular and only affect a single lexical scope. Oh.

> everything is done with oddly-named functions that take positional arguments in an inconsistent order, return error codes instead of throwing exceptions, and randomly spew warning messages

These are all anti-features for programmers who already know how to program but they're quite helpful or moot when you're a complete noob. You're going to be spending at least a third of your time in the manual so guessable function names aren't as important as findable ones, you'll be doing lots of "what the?" style debugging so verbose warnings right on the page save you one fewer step in the debugging process.

Additionally it's easy to find a PHP hosting provider for peanuts, it will happily scale to moderate use without overmuch cleverness architecturally and there are literally thousands of noob-level tutorials that get you from "I want to build a website" to something that works rapidly. You don't feel the pain of the language until you're pretty far down the road but as Flickr and (obviously) Facebook have shown it's not too difficult to limit the pain and keep growing.

There is just so much needless complexity and oddness, consider the way arrays contain a pointer (so that they can work like a stack) which has crazy odd behavior (which changes depending on which version of PHP you use) when you copy the array..

Hmm, this is one of those rare occasions where something has been released where I have a domain name that could be useful - hiphoperator.com

Now I just have to figure out what to do with it :D

at least 1 youtube video

I can't wait to try to try this out on a large PHP codebase like Drupal and then run some benchmarks. It'd be interesting to see how much faster it is when compared to vanilla PHP vs opcode cached PHP.

I am sad to see Facebook helping to perpetuate PHP.

On the bright side though, even if HipHop yields a 50% speedup, it is unlikely to be faster than Python, Javascript V8, or LuaJit.



What's with the downvotes? I can't mention my opinions on php?

Use your imagination. Perhaps other languages can be compiled down with HipHop. Perhaps other languages can call PHP libraries compiled with HipHop. Then you might have an end to end solution to transition from PHP to Python.

This is obviously just a fantasy of mine. Back to coding PHP...

> Perhaps other languages can be compiled down with HipHop.

There's already something like HipHop for Python called Cython. You can compile any pure python code into a static C shared library (with the exception of generators).

I suspect for interoperability, using a single framework is the most likely to succeed. But this is the whole point of open source, right? Let the community enhance it in awesome ways.

Are you serious? A magical framework that translates PHP/Python/etc into C++ code? I've got to be misinterpreting what you're saying...

Then again, you might be drinking a bit too much of that Facebook juice... :)

Yes, I can easily imagine even people outside facebook adding to the HipHop effort in adding other languages. Easily imagined and easily executed aren't the same thing, unfortunately. By the way, I'm out of my element when it comes to compilers.

I can imagine it really being a very open project with outside contributors, but I'm not sure that the exact same tools will work with other languages.

For example, I think that dealing with something like Python would require a completely different set of translations because it is mainly dictionary based.

That's not to say that doing an X to C++ converter isn't possible or a good idea, I just think that the HipHop techniques may not apply to other languages very easily.

GitHub were crushed by the release of HipHop. Too bad, I wished to test the HipHop translator.

Their file servers crashed before the Hip Hop announcement, and Hip Hop hasn't even been pushed to GitHub yet, according to this announcement.


So, does this mean that PHP has officially forked? I mean who is going to want to keep writing in the interpreted version as opposed to the hiphop-able version? How many changes are there? I've only seen mentions of ditching eval, but there is probably more. Will people be debugging their code in both vanilla PHP and HPHP?

How well does this support other projects, like wordpress (for example)?

Another issue this raises is who is now in control of PHP? Not in legal terms, but in terms of mindshare. It's one this for a student to write a PHP translator, but quite another for Facebook to do it. I'll be interested to see how Zend responds to this. At least when Google started unladen swallow, they had Guido working for them.

No, you'll probably see no changes in your day to day life as a result of this. There's no "interpreted version as opposed to the hiphop-able version", in terms of the code you write.

As the article mentions, almost no one is really running straight PHP, they're running it through FastCGI or are at least using an opcode cache like APC. So you could be swapping out APC for HPHP, but this is at a level of abstraction below the one you're working in.

Regarding future control of PHP, this does not change anything. Its a compiler for the language, but does not propose any changes to the language itself.

you'll probably see no changes in your day to day life as a result of this

I'm fairly confident about that... I gave up PHP years ago :)

Where is the link to download it or just the documentation? On the announcement page there is only a Wiki link (besides wikipedia entries and other php-to-other-lang compilers) to a Github page that redirects to the github.com.

Some notes from Facebook's UStream event this evening: http://www.recessframework.org/page/notes-from-facebooks-hip...

High points:

the performance numbers were against PHP with APC op-code caching enabled

eval, create_function, and preg_replace with /e (basically eval), and conditional flow based on the existance of functions are the only features not supported.

Currently PHP5.2 with a roadmap of 5.2.12 compliant in the next month and 5.3 being the focus after that.

What CGI mechanism are they using to host a blended PHP/C++ code base?

"It has it's own built-in web server when you compile it, via libevent." from Ben Ramsey who attended the demo at FB Campus.

I guess they transform complete PHP libraries to C++, they build them and then they "require" them from some other PHP code.

With HipHop we've reduced the CPU usage on our Web servers on average by about fifty percent, depending on the page.

Hmm. I'm guessing they're comparing it to their PHP stack that has already had the living daylights optimized out of it (with cached bytecode, etc.) and/or a large proportion of their pages doesn't hit much code (lots of stuff cached?). I'd have thought there would be more mileage in this.

After this, can we still make fun of C++? I use it daily and love it. Sure, it takes someone with a brain to use it, but once you learn C++, nothing else comes close for speed and expressiveness... seems FaceBook just verified that. So while people who cannot code in C++ will still talk about it as if they know it, it continues to kick ass in solving real-world computing problems

Facebook appears to be using C++ the way C++ devs use assembly. We can definitely continue making fun of it.

How templatey is the generated code? How big are the resulting binaries?

Amazon's original core platform (obidos) was written in C++ and deployed as one monolithic binary. I've heard that one of the major things pushing their Java rewrite (dp) was that the binary no longer fit comfortably into memory!

Just curious, but how is Java going to solve that problem? Now you've got the Java opcodes, plus any JIT information/native code, plus the JVM. Seems like that would be even larger!

Will Wikipedia, Wikia, and other major MediaWiki sites use this? (Does MediaWiki code use eval()?)

I would bet that Wikipedia is more database bound than CPU bound. HipHop would probably help, but not tremendously.

You reckon? I would have thought Wikipedia a textbook case for a great big cache. I would have guessed that the great majority of their views were read-only by unregistered users, with the pages in question following a power law distribution. Hm, would be interested to find out more about that.

update: I went and looked it up. They do indeed have a great big cache, many of them in fact. In fact their Amsterdam presence runs only cache! From what I can tell, it seems their bottleneck is actually Apache, and presumably a lot of that is PHP.

Interesting presentation here: http://wikimania2009.wikimedia.org/w/index.php?title=Media:R...

Also check this out: http://ganglia.wikimedia.org/pmtpa/?gw=fwd&gs=Wikimedia%...

Go down to the MySQL section. Plenty of headroom, but the Apaches are working hard.

So... stupid question - Where's the code? It's the next day and no sign of it on github.

It would seem they forgot to make the project public on github.

Third paragraph from the bottom of the post says they'll be releasing the source in Github this evening (probably pacific time).

Ahhh, they should have held off on that wiki link.

If this makes a big difference to memory footprint, won't it be rather handy to embedded devs?

If this is adopted widely hosting costs will drop off considerably, no?

Will this be faster than less say... python?

They probably just use a python script and type

from php import cppCompiler

not sure why that took 2 years

Damnit - a cool project name wasted on a php to c++ compiler?? DJ Kool Herc is disgusted with Facebook.

Facebook's lack of engineering prowess continually amazes me.

There were quite a few other options aside from writing a translator.

Are you living in an alternate universe? Facebook are scarily competent. They have executed brilliantly and their engineering is top notch.

Agreed. I'm impressed by the great stuff coming out of Facebook. Cassandara, Scribe, Thrift, etc.

Really? You mean like their ultra-reliable chat server they wrote?

Or maybe the way their replication works, so when you post things get out of order depending on what server you are looking at?

With criticisms like that, I think your expectations are unrealistic. Considering the size of their userbase, their chat server is quite an accomplishment. And minor replication artefacts are hardly a big issue.

In short, if FB had incompetent or even "just" competent engineering, they would not have close to 400 million users. And yet they do, and I have never noticed the site performing slowly. They're not Google, fine, but they have an excellent team and if you're not impressed by their performance I wonder what you would be impressed by.

Having a large user base is not an indication of technical authority.

Myspace had (and still does to a lesser extent) extremely large user base, and we know they had some stability issues (Honestly though, I've listened to their tech presentations and am more impressed than the things I've seen Facebook do).

No, I don't imagine Gacebook is programmed by a bunch of incompetent monkeys, but compared to Google, Amazon, Yahoo, or many other companies that have had to deal with large scaling issues, their approach seems amateur.

"300,000 lines of code and more than 5,000 unit tests."

Both values sound good in theory, but are both meaningless. How is the code coverage? How many man months? How many developers?

I find it ironic that you feel LOC is meaningless and then asked for man-months, an even more useless unit of measurement :D

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact