You aren't pegging the CPU, are you? Right now, you're probably using 0% CPU or thereabouts, and you don't have a problem caching can solve. What you do have is a limited pool of connections. Having KeepAlive on causes any browser that grabs a connection for any reason to hold onto it for a few extra seconds, to improve its performance on subsequent requests for e.g. static resources. Unfortunately, if you have more clients than you can service, they all get to wait until the keepalive period is over so that one connection gets released, immediately satisfies an additional client with cached data, and then does nothing for another few seconds. This looks to people in the queue like your server died.
Apache KeepAlive defaults to on in several popular distributions, including Ubuntu last time I checked. That default interacts very, very poorly with common conditions for blogs in 2010.
(Edit: I don't think this was the original problem, since when the site went down the CPU was in fact pegged. Regardless, it sounds like I should have KeepAlive off.)
You aren't pegging the CPU, are you? Right now, you're probably using 0% CPU or thereabouts, and you don't have a problem caching can solve. What you do have is a limited pool of connections. Having KeepAlive on causes any browser that grabs a connection for any reason to hold onto it for a few extra seconds, to improve its performance on subsequent requests for e.g. static resources. Unfortunately, if you have more clients than you can service, they all get to wait until the keepalive period is over so that one connection gets released, immediately satisfies an additional client with cached data, and then does nothing for another few seconds. This looks to people in the queue like your server died.
Apache KeepAlive defaults to on in several popular distributions, including Ubuntu last time I checked. That default interacts very, very poorly with common conditions for blogs in 2010.