

Alex Gaynor - Thoughts on HipHop PHP - twampss
http://alexgaynor.net/2010/feb/02/thoughts-hiphop-php/

======
ericb
_Almost every single website on the internet is I/O bound, not CPU bound_

This claim is patently false.

Under load, a surprisingly large percentage of applications are CPU bound.
Disclaimer: I make my living doing web application load tests on a
consultative basis--I don't have a study to cite offhand, just experience. I
_should_ publish a study, but that's a different issue.

~~~
ehsanul
That's interesting. I'm sure some people here, like me, believe the opposite,
since I/O is so much slower than processors now. Can you give a bit more of an
explanation as to what kind of applications are CPU bound and why?

~~~
wmf
I/O is slow _if you actually do any_ , but RAM is so cheap that many Web sites
shouldn't need much I/O.

~~~
nixy
I think that by I/O in this case people mean network I/O, not disk I/O.

Edit: Sorry, maybe I was jumping to conclusions. Databases and files are of
course not always cached in-memory.

------
scott_s
_Because of differences like this ... I believe that the work done on HipHop
represents a fundamentally smaller challenge than that taken on by the teams
working to improve the implementations of languages like Python, Ruby, or
Javascript._

I think this is true. On the other hand, it's not Facebook's job to do
research. If they can benefit equally from solving a simpler problem, then
that's what they should do.

I agree with the author's points, I just want to make explicit that Facebook
didn't make wrong choices. It's just that their work is probably of limited
value to everyone else.

~~~
pvg
There are many more people working on PHP apps than there are people working
on dynamic language runtimes. Facebook's work is of limited value to everyone
else for a limited value of 'everyone else'.

~~~
scott_s
I'm taking the author's arguments at face value, that their project is of
limited value to others because most PHP apps aren't CPU bound, and that they
only handle a subset of the language.

------
jrockway
My thoughts exactly; this is really not too exciting. (I will disagree that
web apps are never CPU-bound. That has always been the bottleneck for me.)

I haven't seen the source code yet, but I think Chicken Scheme did everything
HipHop does but many years ago. And, there are some hard problems to solve
when implementing Scheme, notably handling continuations. PHP is much simpler
in comparison.

I do see good things coming out of this project, though. As people are enticed
by the biggest temptress in computing, speed, they'll learn to run and deploy
web applications that aren't as simple as "ftp this HTML file to a directory".
Once PHP's only advantage over other languages is gone, and developers realize
that it wasn't that big of an advantage, they'll switch to better programming
languages. Then we can forget PHP ever happened, and the field can move on!

~~~
pvg
You think PHP is more likely to decline in popularity because there's now a
way to run it faster? I suspect you are in for a very long surprise.

------
coffeemug
_Firstly, there's the question of what problem HipHop solves._

It reduces infrastructure expenses of the top 100 web properties so much that
the pain of building it, testing it, rolling it out, administering it, and
maintaining it is worth it.

------
barrkel
I wonder where this myth that "[websites] on the internet [are] I/O bound, not
CPU bound" seemly implying that optimizing CPU usage is a waste of time, comes
from.

Since the I/O bound generally comes from latency, rather than total
throughput, the number of concurrent connections a single webserver can handle
is often proportional to how much memory and CPU resources each connection
uses. If you have an asynchronous design for your server, concurrent
connections don't cost physical threads - they just cost the bookkeeping
overhead in the kernel. The faster you can switch between those connections,
and get finished with work on them when I/Os complete, the fewer physical
machines you need to serve a website to a given number of users.

Or to put it another way, the saturation point of processing asynchronous I/Os
is CPU-bound, even when speeding up individual requests is I/O-bound.

~~~
ubernostrum
Well...

In a dynamic web application, at first, you will nearly always be database
bound. Faster algorithms and faster programming-language implementations on
web servers will do nothing whatsoever for this (and increasing the concurrent
load of requests the web server can handle will in fact only overload the DB
even more).

That's when you start doing caching, and that's why caching has such dramatic
effects.

Once you've got your caching going nicely and your DB humming along, you will
end up bandwidth bound. Not by your own pipe, but by clients; you may have a
nice fat line running out of the data center, but your users may be on
anything down to mobile phones or even dialup, and you'll only be able to push
responses to them at the speed they can handle. This is the spoon-feeding
problem, and once again algorithms and language implementations on the server
can't do anything at all to help it.

That's when you start putting fast, light, highly-concurrent reverse proxies
(nginx appears to be winning the market share battle) in front of your actual
web servers, and once again you will see a drastic effect. Or you combine
caching and proxy into one component and do Varnish.

Once you've done this, you might finally start to reach a point where you're
genuinely I/O or CPU bound on a server that's actually running your
application code. Or you might not; there are other roadblocks you might run
into first.

At any rate, optimizing CPU usage is, for the vast majority of websites, a
waste of time at least until you've been through the phases I've outlined
above. And, generally, I think you'll find that's the advice (the "myth")
you've been hearing: fiddling with programming languages and algorithms is
literally a net loss of performance until you've dealt with quite a few other
(and more important, performance-wise) things.

~~~
barrkel
> _Once you've got your caching going nicely and your DB humming along, you
> will end up bandwidth bound. Not by your own pipe, but by clients [...] This
> is the spoon-feeding problem, and once again algorithms and language
> implementations on the server can't do anything at all to help it._

Here. Here is where you made the mistake in your assertions.

> _That's when you start putting fast, light, highly-concurrent reverse
> proxies (nginx appears to be winning the market share battle) in front of
> your actual web servers_

My point is that the number of web servers you need in this spoon-feeding case
is inversely proportional to how CPU-optimized your servers are. The spoon-
feeding problem, as you put it, is just bookkeeping in the kernel (keeping
track of open sockets) and iteratively processing I/Os when they come due, and
is _CPU-bound_ , unless you actually have approaching 64K open sockets, in
which case a reverse proxy won't do, you'll need DNS tricks etc.

~~~
mfukar
You have misunderstood the problem. The fact that clients sit behind lean,
low-bandwidth/high-latency lines means that your webserver spends its time
waiting for them. The very definition of I/O bound processing. Solving the
spoon-feeding issue is much more complex than simple book-keeping in the
kernel, because it is independent (from the angle we're looking at it) from
the server's design and operation.

~~~
barrkel
And I think you've completely missed my point.

My assumptions:

* Server built around async I/O

* Clients with low bandwidth and high latency, but in aggregate insufficient to saturate the server's bandwidth

* Caching etc. on the server side so that server-side I/O bandwidth isn't the limiting factor

* Sufficient concurrent connections that you need more than one webserver

Under these assumptions, each webserver doesn't _wait_ , in the OS blocking
sense, for any given connection. It processes I/O completions _as they come
due_ , as fast as it can. It is this process that is CPU-bound; if less than
100% CPU is utilized, it means that there are periods where no I/O completions
are currently due.

There are memory costs per concurrent connection: whatever is needed to pick
up processing as the related I/O completes, and for some structures
representing the open socket in the kernel.

But these costs don't magically add up to "waiting". To the degree that the
webserver is tied up in "waiting", it is that the kernel is idle between CPU
interrupts generated from the networking hardware. In other words, it's doing
nothing, and if you have too many machines spending their time doing nothing,
you can eliminate some of them.

Now consider the same assumptions, except in the synchronous case: what this
does is move the memory and CPU costs around and increase them. The memory
cost per concurrent connection increases, to store the stack etc. The CPU
costs increase because now context switches are required in between each I/O
completion. These costs can be substantial; they can add up to being a
limiting factor in themselves - but as part of either the memory or CPU limits
of the machine, _not I/O_.

Consider this: why would one need more than one webserver if the servers were
"I/O bound", and not on server-side bandwidth, but rather on spoon-feeding
clients?

Since it's not server-side bandwidth, it can only be because of either memory
or CPU bounds. And where do those bounds come from? They come from the CPU or
memory cost associated with concurrent connections, as well as the CPU and
memory costs of processing any given request. You can minimize the costs
associated with concurrent connections by leveraging an async I/O design of
the server. But minimizing the CPU and memory costs of per-request processing
is pure gravy in terms of reducing the number of machines you need to keep
running to process X concurrent requests.

And it is here that the myth lies. Just because you may need to keep spoon-
feeding very slow clients, such that decreasing the CPU cost of any given
request would not visibly affect the client's perceived latency, that doesn't
mean that optimizing the server for CPU usage is pointless. The less CPU usage
you spend, the less hardware you need for the same load; likewise for memory.

Another way to think of it: how can e.g. nginx work as a reverse proxy, if it
runs on a single machine and is spoon-feeding lots of slow clients? Async
design, and offloading CPU/memory requirements to other machines, that's how.

------
leej
he may meant "it is not as CPU-bound as in the case of rendering or
en/decoding video".

