"HipHop programmatically transforms your PHP source code into highly optimized C++"
Sounds like a compiler to me. Compilers don't have to target machine code.
This entire thread needs to go fall in a well and die. If they had just called it a compiler, you nerds would be arguing about the same thing and saying it was a source code transformer. They just called it that because it translates PHP to C++, and to most people who don't have Asperger's that's not the same as compiling.
How do I know this? I sit 20 feet from the guys who wrote it.
Anyway, the thing that got me was the "technically", otherwise I may have let it slide. You could argue it's not a compiler in the commonly accepted sense of the word, but technically it most certainly is a compiler.
Or so I'm told. I've never read Intel's source code :)
It's a cool hack, but really targets Facebook-level needs. For most users, the downside (eg compatibility issues) will far outweigh the benefits.
Also at least 90% of web apps are tiny or are startups that never made it, either way they don't need this for a long time, like you say.
My point is that for nearly all apps, executing PHP is very small part of the total request handling time, so even a significant improvement will have little overall effect. I bet many people excited by HipHop could get a much bigger performance gain by adding some simple caching)
"scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner or to be readily enlarged. For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added."
Say you have 10 servers now and you expect business to grow by 10x in next year, so you put 6 months of dev. time into implementing caching. Then you get 10x the traffic in a month and you get V.C. investment as a result. You then have a problem, people are getting a bad experience and the development won't be done for another 5 months. Your CEO buys a rack full of servers, but you can't do anything with them because you don't have a scalable system. Alternatively you could find out after 6 months that you can't make that 10x performance jump after all.
Twitter had this problem a while back, essentially their DB server got overloaded but it was already a 8 CPU, 64GB machine and despite having lots of money they couldn't quickly solve the problem, because you couldn't really get a bigger system.
Facebook knows how to scale, they want to cut costs where they can to become (more) profitable.
scalability = (efficiency(N_2) - efficiency(N_1))/(N_2 - N_1)
Running it the developers workstation w/ APC and Memcache and local Percona MySQL, it maxed at 300 request/s. (It's just a workstation, remember). Anyway, there may be some observer effects here, but the profiler indicated that most the time was being spent on php functions and system calls.
So we might be in the 1% you're describing. Or _maybe_ it's a more than 1% after all.
The other application where this may be helpful is the cloud. In a system like this you have 100s or 1000s of sites on a single machine that may reach the combined traffic where this optimization can come into play.
Currently works off of the PHP 5.2 language specs
Compiles to a stand-alone libevent-driven web server or CLI executable
Not based on Zend's runtime/code
Currently no windows support
Intended for 'drop in' PHP replacement - if it runs on PHP and doesn't use a few things like eval() it will compile with HipHop
The PHP standard library has a huge number of functions that are mostly useless permutations of one another. As opposed to, say, Erlang's meager but usable lists and string module, PHP has such high-level meta-randomness as stristr() (case-insensitive search), str_word_count(), and str_shuffle(). So if you were going to convert that directly to C++, there wouldn't necessarily be a direct replacement in strings.h. Therefore I'd guess they either have to reimplement all of those in PHP, and compile that to C++, or write them in C++. Furthermore, some of these functions have unusual quirks, so some kind of framework would be required I think.
Still, I imagine they must use a lot of the same code - half the PHP standard library is just a thin wrapper of some C library, and reimplementing all of them sounds stupid.
Then again, they may just not support the majority of such libraries.
This sounds a lot like Python's 2to3. This technique worked so well that the Python 2 series is a distant memory of the past.
(Oh wait... the opposite.)
Didn't people accuse Joel Spolsky of jumping the shark for doing that kind of thing (Wasabi script => php or vbscript).
.....wait this means that Wasabi can now technically convert into C++! :D
Facebook compiles from a "sane" language (PHP) to a less sane, but twice as fast language (C++). The alternative was to get their devs to work in C++, which they decided was an unattractive solution -- this seems reasonable.
P.S. If the company you're applying to is upset by you having worked in a proprietary language, they're hiring for the wrong things.
In a perfect world, sure. But in the real world, the guy with actual experience in the language/framework used gets an interview first. Then the guy with experience in any language/framework the interviewer immediately recognizes as relevant. Then the guy with experience in something so obscure it's not even on Wikipedia (which Wasabi wouldn't be had Joel and Fog Creek not been famous).
If there's some company out there that doesn't want to hire people capable of writing their own compilers to more easily add cross-platform support to their software the guys at Fog Creek aren't going to want to work there anyway.
If they'd been working in vanilla VBScript in 2010, it'd be even more insane, and also, I think Joel would have some serious problems holding on to employees.
C++ is much faster than PHP, not just twice as fast.
If a higher-level language lets you implement a smarter algorithm, your program will run faster.
We can say that "C++ is faster than PHP", because most, if not all, programs using the same algorithms will run faster when written in C++ and compiled to native code.
> If a higher-level language lets you implement a smarter algorithm, your program will run faster.
PHP is generally easier to master than C++, but both are high level enough to provide adequate tools to write efficient algorithms with comparable amount of effort, provided one is competent with the language at hand.
It's obvious that wasabi has been good to Fog Creek, but that doesn't make it a sane decision, regardless of the quality of the syntax and semantics of the language itself.
>> Need to find a PHP developer? No problem.
Not being able to hire just anyone to code for you is a good thing just as much as it is bad. It means you're going to have some pretty bright people working for you, which apparently is something Fog Creek values and can afford to do.
> Want a PHP IDE? No problem.
Most hackers have their own preferred editors anyways, and Emacs/vi/etc work well with any language if that's the editor you're experienced with.
>> A large community to handle bugs and any future development? Yes.
Again, that's a hindrance just as much as it is a benefit. If you need to make a change or fix a bug in the language, you're going to have much better luck if you know a bunch of people who are intimately familiar with how that language works, especially if their right down the hall from you.
Not that it matters anyhow for most roles you neither need nor can afford to hire the best and brightest away from their current role which they probably love a lot(being the best gives you the ability to choose great places to work). So unless you're looking for a tech lead you're going to be looking for great, much better than average but not specifically the best and for those conditions PHP will give you a much larger developer base to choose from every day of the week.
As for the last point I think you're really reaching here to find a negative. A large community to handle bugs a hindrance? Seriously? And your alternative ideal solution is some people down the hall in your office that know how to use the language well?
The vast majority of people will never have the opportunity to be down the hall from that guy and for the vast majority of people a large helpful community will be infinitely more valuable.
Which means what to Fog Creek?
(I'm dealing with this in my work -- we use a proprietary internal scripting language, and the guy who wrote it is not around. I'm slowly becoming the local expert on the language -- very slowly, though, because opportunities to work on it don't come up too often.)
So you're basically against the use of Domain Specific Languages on the whole? Or only those whose syntax you don't like?
Hire based on experience with a specific paradigm. OO. Functional. Etc. Or experience with a specific domain problem. Compilers. DSLs. Operating Systems. Etc.
Maybe it could even compile itself on the fly to machine code to continually optimize itself!
Fog Creek rewrote everything in a brand new language they invented that compiles down into multiple other languages.
So, in actuality, the projects are the exact opposite of each other.
Wasabi started out as a VBScript -> PHP compiler, allowing us to keep the existing FogBugz source, but run it on Unix and Linux, where ASP wasn't available. Once we had that, it hit us that it'd be pretty easy to add a VBScript -> VBScript compiler, at which point we could add language extensions. So we did. This allowed new code to take advantage of things like macros, better declaration syntax, lambdas and so on, without rewriting any existing code. Finally, because it was clear that ASP was dead, and specifically because we didn't want to rewrite FogBugz from scratch, we modified Wasabi to target .NET.
The whole point of Wasabi was to avoid rewriting FogBugz to support other platforms and technologies. I have no idea where you got the idea we rewrote the entire program; if we'd done that, we'd have just done it in C#.
I think the mythology of Wasabi has become larger than life and pretty heavily distorted.
Wasabi did extend VBScript to add objects and other language features, which were gradually incorporated into the code base, but just on changes going forward.
So, really, it is almost exactly the same thing.
Both companies had huge investments in a code base and found a geeky way to keep their code base while reaping modern benefits.
Most things are toys compared to face book, Google or Twitter. That doesn't make them pointless.
Wasabi is an app-specific dialect of VBScript used by one company to deploy a bug tracker to small teams. HipHop is a general-purpose PHP compiler whose output is used at scale by the world's largest social networking site.
I can write a file system that will work fine on my computer, with my workload, and maybe even write an essay about how productive that makes me. But I don't think you'd want to install it.
These two projects are not in the same league - they're not even in the same sport - and so I am baffled to see Wasabi come up in this thread at all.
Those earlier comments are comparing the strategy that Fog Creek took versus the one that Facebook has taken. The strategies are similar, and so the comparison makes sense.
It should be noted that Wasabi is a general purpose programming language. It is Turing complete (up to relatively trivial finite-memory issues, just like any other general purpose programming language). Like C, for example. It should also be noted that Wasabi is a .NET language. It has full access to the .NET Framework and all of its classes.
One could use Wasabi to write a C compiler.
[NB: I am a Fog Creek developer, working on the FogBugz team.]
In this case, your tone.
What evidence do you have that Wasabi is a toy? I seriously doubt you have any knowledge on the topic either way.
1) A large PHP codebase and
2) A CPU-bound app with scaling issues
If you're starting a new app, I don't see why you would choose PHP. And I think most web apps are database-bound rather than CPU-bound. HipHop seems incredibly useful for Facebook, but are other people here on HN excited about using it?
This is what people mean when they talk about great hackers being 10x as productive...
But generally I agree about new projects. If you have a favorite language, use it.
I'm actually proponent of JIT, over static compilation, but only for my little pet projects.
I don't know much about WEB, but for games development, static compilation is sometimes the only way to go (or script interpretation). Sometimes there is just not enough memory, and sometimes it's restricted (hell, it so much resticted that you can't change the VTABLE at runtime in C++ class even if you want to)
Also, how hard would it be to do the same thing for Ruby and Perl? I'm sure there have been other attempts at compilers, but as noted elsewhere, having FB behind it makes HipHop more stable...
Oh, and atleast one thing from this anonymous interview came true: http://therumpus.net/2010/01/conversations-about-the-interne...
Doing it for Ruby or Python would be fairly straightforward to get running initially, but doing Perl<=5 would be impossible (for all cases) due to it being proven impossible to statically parse.
Much harder, but not impossible. The two issues are (a) how much of Ruby the language can be compiled to C++ using Type Inference, and (b) For the stuff that can't be compiled, how often does it show up in production code?
FB's PHP solution relies on having a large code base with few instances of code that doesn't compile cleanly.
With Ruby, I suspect both issues are a problem. I think there's a lot more of Ruby the language that will be difficult to infer, and furthermore I think Ruby programmers tend to use those features a lot.
That's just a guess, mind you. You could probably do a lot if you make some major breaks with tradition. For example, if instead of compiling Ruby source code you load Ruby classes, run all the meta-programming, then reverse-engineer the source (ParseTree can do this with Ruby 1.8 but not Ruby 1.9), you could basically do the meta-programming in interpreted mode but run the resulting code in compiled mode.
I suspect that would work well for a lot of Rails programs where the hairiest stuff is run when classes are loaded, not on the fly when responses are being served.
Of course, assuming these calls all happen on the front end (source load) and not downstream in response to some triggering event, your approach would probably work.
Those two would have been the main contenders for the space, but they both had their shot and their 15 minutes of fame, and rather than rapidly growing to dominate web development, they nichified.
PHP however is being improved, being cleaned up, and most importantly, being speed optimised.
PHP is what you can trivially easily outsource - try to do that with python or ruby.
This right here is basically the move that is going to cement PHP as the web language. The largest sites are using PHP and are going to stay with PHP. Nothing will take over from PHP. The battle is over.
Yes, you can... but be careful - you might end up with this: http://paste.lisp.org/display/76132
That's real production code I removed from an application I was hired to extend and debug. Eventually, I convinced the client to scrap it and let me write a replacement with Rails.
The very low barriers to entry of PHP and ASP means you have (had) a lot of very inexperienced folks writing code.
That said, I agree with the original point that PHP can be very productive. No compiles, no builds... just edit code/refresh browser to immediately see the effect of a change. That's nice if you have the discipline to not do stuff like the linked example.
Of course, people who know those languages don't write things like that. I think it's due to pg's "Python Paradox"; I'm inclined to think outsourcing is actually easier with "smart" languages because the percentage of bad providers will be very low.
Just look at rentacoder and the like, most of those people don't care how it's written or think about how much trouble the code base could give them in the future as long as it get's done and works.
It is possible to write nice clean code in PHP, but few apps I've ever seen do so. Most use includes like function calls all over the place, so there is no notion of what variables will be used or defined inside the included code.
Ruby to some extent, and especially Python, are designed so that your code looks good and is easy to read. Admittedly some people can write highly obfuscated Ruby, but that (fortunately) went out of style a few years ago.
If all you have available is a bulk of mediocre people, then PHP is a great way of making sure they produce what you want right now. Of corse, in this process, they will remain mediocre and never become more than your "average programmer" stereotype due to being forced to only think in the absolute simplest terms.
The train is just leaving the station, be sure to get the best seats.
That statement is one I would describe as reasonable.
You also used hyperbole like: "Nothing will take over from PHP. The battle is over." It seems hard to believe that this was being said with a straight face.
I know how you like to jerk the Hacker News community around at times, so I thought it prudent to ask if you're being serious. :)
My judgement call is that the battle is over. I may be wrong, but I am on the right side of wrong. If I turn out to be wrong in the future, I'll look back and still think I made the right call given the data I had.
It has been a while since I decided to learn python and django. At the time I thought they would take over. Well, I was wrong. They have not - PHP has continued to grow faster than they have.
So this time my call is that PHP is the thing to stick to, particularly when I see moves like this one facebook has. In any case, this choice I made is the safe one. I can't lose by making this decision.
1. It is trivial to see if PHP is installed on a webserver. It is not so trivial to see if Python or Ruby is.
2. PHP has a strong presence in shared hosting, which is popular for hobbyist or low traffic websites.
When you take into account that big websites like Youtube, Reddit, and FriendFeed are written in Python, these graphs are downright useless.
Obviously what you choose to use for development is your decision. That said, I am confused as to why it's so important to you that you use the most popular language. After a certain threshold of popularity, it doesn't seem that it should matter. PHP, Python, and Ruby all have the critical mass of developers and library support needed to render language popularity concerns irrelevant.
Ruby is being improved and sped up too, in case you hadn't noticed.
As always, use the best tool for the job. For an awful lot of small business sites - where PHP got its start - that is now Rails, and it's slowly emerging in larger projects too. I think Ruby will definitely give PHP a run for its money in the fullness of time.
Anyway, I think your dismissal of the newer languages is very premature. PHP took a long, long time to get where it is. 10 years ago you would have been saying "Perl has WON!!!"
But now, there's an astonishing amount of development and innovation surrounding Ruby. Look at MacRuby, for example - if that isn't a vote of confidence I don't know what is. In my opinion, the language is only just beginning to properly hit its stride.
Who knows what will happen. Ruby might fail to gain critical mass, of course. But my point was that the OP's assertion that Ruby had tried and failed to make its mark is very premature. The Rails hype might have died down but from my perspective Ruby has more momentum than ever.
There's also Rubinius. I think an exhaustive test suite for the language spec is a good thing.
(To be fair, some language implementations have achieved good results with this method; notably Chicken scheme. But I was expecting something better and potentially more useful to the language implementation community at large.)
Also, static compilation precludes runtime-based optimization techniques, which has made a lot of code run "faster than C" (including C, ironically; see LLVM).
So anyway, all this is is a way to compile a limited subset of PHP to C++. In doing so, you have to write a special dialect of PHP, without PHP's easy deployability and code/test/debug cycle.
In that case, why are you using PHP to begin with, when a language like GHC or SBCL would run faster and without any compromises?
The syntax is odd, as is the behavior of the built-in functions (many retain seriously warts from older versions of PHP)
It's "easy" in the sense that you just edit one file and the webpage changes, but if that's a stumbling block for your developers then you have bigger problems.
Developing something on a large scale or even developing something of a normal size while maintaining good programming techniques and structure can be difficult.
I fail to see how PHP is clearer than Perl. It may have fewer special characters, but everything is done with oddly-named functions that take positional arguments in an inconsistent order, return error codes instead of throwing exceptions, and randomly spew warning messages.
From a language design standpoint... there is no design. Perl may not be perfect either, but the design has been flexible enough to radically modify as the community sees fit. Or rather, how you choose to modify it, as most modifications are modular and only affect a single lexical scope. Oh.
These are all anti-features for programmers who already know how to program but they're quite helpful or moot when you're a complete noob. You're going to be spending at least a third of your time in the manual so guessable function names aren't as important as findable ones, you'll be doing lots of "what the?" style debugging so verbose warnings right on the page save you one fewer step in the debugging process.
Additionally it's easy to find a PHP hosting provider for peanuts, it will happily scale to moderate use without overmuch cleverness architecturally and there are literally thousands of noob-level tutorials that get you from "I want to build a website" to something that works rapidly. You don't feel the pain of the language until you're pretty far down the road but as Flickr and (obviously) Facebook have shown it's not too difficult to limit the pain and keep growing.
Now I just have to figure out what to do with it :D
What's with the downvotes? I can't mention my opinions on php?
This is obviously just a fantasy of mine. Back to coding PHP...
There's already something like HipHop for Python called Cython. You can compile any pure python code into a static C shared library (with the exception of generators).
Then again, you might be drinking a bit too much of that Facebook juice... :)
For example, I think that dealing with something like Python would require a completely different set of translations because it is mainly dictionary based.
That's not to say that doing an X to C++ converter isn't possible or a good idea, I just think that the HipHop techniques may not apply to other languages very easily.
How well does this support other projects, like wordpress (for example)?
Another issue this raises is who is now in control of PHP? Not in legal terms, but in terms of mindshare. It's one this for a student to write a PHP translator, but quite another for Facebook to do it. I'll be interested to see how Zend responds to this. At least when Google started unladen swallow, they had Guido working for them.
As the article mentions, almost no one is really running straight PHP, they're running it through FastCGI or are at least using an opcode cache like APC. So you could be swapping out APC for HPHP, but this is at a level of abstraction below the one you're working in.
Regarding future control of PHP, this does not change anything. Its a compiler for the language, but does not propose any changes to the language itself.
I'm fairly confident about that... I gave up PHP years ago :)
the performance numbers were against PHP with APC op-code caching enabled
eval, create_function, and preg_replace with /e (basically eval), and conditional flow based on the existance of functions are the only features not supported.
Currently PHP5.2 with a roadmap of 5.2.12 compliant in the next month and 5.3 being the focus after that.
Hmm. I'm guessing they're comparing it to their PHP stack that has already had the living daylights optimized out of it (with cached bytecode, etc.) and/or a large proportion of their pages doesn't hit much code (lots of stuff cached?). I'd have thought there would be more mileage in this.
Amazon's original core platform (obidos) was written in C++ and deployed as one monolithic binary. I've heard that one of the major things pushing their Java rewrite (dp) was that the binary no longer fit comfortably into memory!
update: I went and looked it up. They do indeed have a great big cache, many of them in fact. In fact their Amsterdam presence runs only cache! From what I can tell, it seems their bottleneck is actually Apache, and presumably a lot of that is PHP.
Interesting presentation here: http://wikimania2009.wikimedia.org/w/index.php?title=Media:R...
Also check this out: http://ganglia.wikimedia.org/pmtpa/?gw=fwd&gs=Wikimedia%...
Go down to the MySQL section. Plenty of headroom, but the Apaches are working hard.
from php import cppCompiler
not sure why that took 2 years
There were quite a few other options aside from writing a translator.
Or maybe the way their replication works, so when you post things get out of order depending on what server you are looking at?
In short, if FB had incompetent or even "just" competent engineering, they would not have close to 400 million users. And yet they do, and I have never noticed the site performing slowly. They're not Google, fine, but they have an excellent team and if you're not impressed by their performance I wonder what you would be impressed by.
Myspace had (and still does to a lesser extent) extremely large user base, and we know they had some stability issues (Honestly though, I've listened to their tech presentations and am more impressed than the things I've seen Facebook do).
No, I don't imagine Gacebook is programmed by a bunch of incompetent monkeys, but compared to Google, Amazon, Yahoo, or many other companies that have had to deal with large scaling issues, their approach seems amateur.
Both values sound good in theory, but are both meaningless. How is the code coverage? How many man months? How many developers?