

Surviving a traffic surge: Three techniques to scale your site fast - RiderOfGiraffes
http://matt.might.net/articles/how-to-emergency-web-scaling/

======
mixu
These are decent tips, but the real fix for me when I got a (minor) traffic
surge was changing KeepAliveTimeout from 15 (sec) to 2 (sec). Basically, due
to the high default keep alive time for requests, most Apache threads were
waiting for the timeout.

So the number of threads you have (e.g. setting StartServers,
MaxSpareServers/MaxSpareThreads) is way less important than the keepalive
timeout: you can/should start enough threads to use all the available
resources, but it will only make a difference if you aren't idling all those
threads with a high KeepAliveTimeout. Apparently, that's the Apache default
setting.

~~~
patio11
Ding ding ding, we have a winner. KeepAlive can even kill your blog at 2
(achievement unlocked 4 times over last year).

Edit to elaborate:

Sorry, was eating dinner and Kindle is not exactly made for typing on
technical documentation. This comment is a abbreviated version of
[http://www.kalzumeus.com/2010/06/19/running-apache-on-a-
memo...](http://www.kalzumeus.com/2010/06/19/running-apache-on-a-memory-
constrained-vps/) \-- read that if you want a longer spiel. (It is my most
cited blog post on HN. I don't know whether to be happy or sad about that.)

Basically, there are a couple of Apache MPMs available. You may have the
prefork MPM installed. You can check by running "apache2 -l". If you see
prefork in the output, take a look at your config file (quite possibly
/etc/apache2/apache2.conf) and check for the setting" KeepAlive On". If
KeepAlive is on, your blog is broken and you just haven't found the failure
condition yet.

On my server (Ubuntu, has gone from Dapper to Lucid over the years), the
package default Apache2 settings for the prefork MPM are: 15 second keepalive,
150 MaxClients. If your server has enough RAM to support 150 processes for
Apache (and, if you're on a VPS, you probably don't), that will let you
process a _hard theoretical maximum_ of 600 clients per _minute_. There are
many, many things you can do to exceed that maximum bound: getting on the
front page of Reddit or getting retweeted by Jimmy Wales at the right hour of
the day both qualify.

Calculation: any client requesting any file, _regardless_ of whether it is
dynamic, static, cached, generated by a PHP monstrosity, whatever, occupies
one process for a hard minimum of 15 seconds. 4 clients saturate one process
for 1 minute.

With special attention to fellow VPS owners: after having died hard several
times when apache2 decided to use up all available RAM and then swap the
machine to death, I eventually tweaked the MaxClients setting down to 24. This
means that, even with KeepAlive at 2 seconds, my max throughput was 720
clients per minute. Again, that number is achievable under very plausible
circumstances for a personal blog in 2010/2011.

There are a variety of countermeasures one can take against this. One is not
using the prefork MPM, but you have to be a configuration Jedi to figure out
how to actually do this and still run PHP on your server. "apt-get install
apache2 libapache-mod-php5", which is what substantially all guides will tell
you to do, will force you to use the preform MPM. If you had been using the
worker MPM instead, you would have a much, much harder time crashing your
server serving static content.

Another alternative: switch to Nginx. This problem goes away instantly. (If I
didn't have 15 config files I would have to migrate, I would have done this
years ago.)

The easiest alternative: turn off KeepAlive. This will give you a very modest
throughput hit, but I'll trade "Blog stays up if mentioned in the NYT" for
that hit any day of the week.

~~~
mbowcock
Do you run your site with keepAlive set low all the time or only when needed?

~~~
patio11
KeepAlive is off 100% of the time because "Wake up at 3 AM in morning, in
response to my cell phone playing Ride of the Valkyries because a server is
offline, to tweak the config file locking out thousands of people who want to
see my writing" is sensible precisely 0% of the time.

------
krobertson
Wow, I was scared he was talking about scaling an application through most of
that post. Only to finally realize it mentions its for a blog towards the end.

Its drastically oversimplified if you need to scale an application. "Step 2:
Make content static". For an actual application, there is far more to say than
4 sentences.

In my opinion:

* There are few cases where Apache is better than nginx. I don't run PHP, so that may be still there. * Varnish is awesome, use it, love it. * Purely static blogs, like jekyll, are great.

------
ceejayoz
600 MaxClients on a Linode box with 512MB RAM? At this point, it became quite
clear that this article isn't going to be hugely useful.

~~~
hybrid11
what would you recommend instead?

~~~
ceejayoz
Apache's memory usage should not exceed the available RAM on the machine. If
you do that, it starts having to use swap, which drastically slows things down
- if you're already getting lots of hits, it'll start a death spiral.

Apache's memory usage varies based on what modules are enabled and the code
they're serving and a number of other factors.

The rule of thumb is to take the average free RAM when Apache isn't running,
divide it by the average RAM usage of a single Apache process on your system,
and set MaxClients to a couple under that value.

For example, on a 512MB Linode box, if you've got 450MB free when Apache isn't
running, and Apache takes up 12MB per process, you'd allow about 35 at the
most.

~~~
hybrid11
Makes sense, thanks for clarifying!

------
jasonkester
No. Stop it. Never ever scale your blog.

It sounds like the author was making the same mistake that pretty much
everybody makes: Treating your blog as though it were dynamic content. But
it's not. It's static HTML, and you should never have to make any
modifications to anything to make it scale.

Step one: Have your blog export all entries to plain HTML.

Step two (optional): move your imagery out to S3/Cloudfront.

That's it. That will allow your little out-of-the-box slice handle all the
traffic that we can throw your way.

Scaling is an issue that you're meant to have with your _product_. Because
your product actually needs to talk to databases and _do things_ , it may have
trouble doing those things when lots of people hit it at once. A website
hosting a blog, on the other hand, needs to serve files. And that's been a
solved problem for fifteen years.

~~~
sp332
If the blog supports comments, it needs to talk to databases.

 _Step one: Have your blog export all entries to plain HTML._

He did that.

 _A website hosting a blog, on the other hand, needs to serve files. And
that's been a solved problem for fifteen years._

"Too few Apache threads" is a known problem, which he recognized as soon as he
saw the load numbers.

~~~
JoachimSchipper
> If the blog supports comments, it needs to talk to databases.

Sure, but it doesn't have to fall over. Outsourcing to Disqus is the easiest
solution, but you can build your own AJAXy solution. Or just write out a new
static file for each comment (if you're really overloaded, comments may take a
while to be processed, but you can just serve the old page in the interim.)

------
mootothemax
My number 1 tip: ditch Apache.

One of my web apps has the occasional spike in traffic that previously caused
Apache to consume vast amounts of memory on my VPS, eventually crashing it due
to lack of memory.

After reading many guides, experimenting, and generally getting quite
frustrated (and working out what VPSs I could afford to upgrade to), I tried
setting up Nginx on a separate port. It took maybe 1 hours for me to have my
former LAMP stack set up and working, so I put it live, and haven't looked
back since.

If you're on a VPS, use Nginx. The config file is wildly different to that of
Apache, and you'll no doubt spend a few minutes cursing trying to figure out
how to port over your rewrite rules, but after that it's plain sailing.

~~~
adambard
This. As an added bonus, I've also found that I prefer Nginx's configuration
files.

Worst case, you can serve static files with Nginx and route dynamic requests
to your Apache instance (I still do this with a few old PHP apps I have).

------
inji
Using a reversed proxy (e.g. nginx's built in, or maybe varnish) is a much
easier and dynamic solution then to render things to static files.

~~~
brycethornton
Agreed. Heroku makes it dead simple to use their Varnish proxy. Just set one
http header and they'll cache the page for you. After that, your app doesn't
do anything until the cache expires. This obviously won't work for highly
dynamic pages, but for semi-static front pages/blog entries it can be a
lifesaver.

------
yuvadam
I'm not at all familiar with Linode's iPhone app - but the bottom graph looks
an awful lot like a system load graph, and not a CPU utilization graph.

"CPU utilization never exceeded 3%"? Really? Maybe the system load was at 3.0
for a few days?

~~~
edu
Completely agree, that a system load graph not CPU usage.

~~~
commx
No, it's CPU usage, _not_ a load average.

------
zavulon
> With amazon's EC2 service, I'll be able to deploy as many temporary mirrors
> as I need in just a few minutes

Can you go into detail on how does that work? I thought you had to have your
site originally hosted on EC2 to do that ...

~~~
city41
I'm curious about this too. I'm imagining he adds a redirect to the EC2
instance in an .htaccess file for the page that is getting hit hard?

------
coderdude
>>(Step 1: Cut image quality) Page load time dropped from 24 to 12 seconds.

Wow. I had no idea that could make such a difference. I suppose the issue was
with the low number of threads set in the Apache configuration. The server was
spending its time sending out static content when it could have been doing
more important things? I signed up with S3 to serve up my static content to
keep that load off my server. I should probably be using CloudFront instead
though.

I'm interested in looking for a backup host just in case. I currently use
WebFaction and I love them to death -- but I'm worried that under incredible
stress the shared hosting won't hold. With Linode, do you start from scratch
with a blank OS and just install everything you need from there (Apache,
mod_wsgi, etc. kind of thing) or do they have preset installs? With WebFaction
I can select a particular setup and I'm up and running in minutes.

~~~
JshWright
Linode offers StackScripts, which can automate a lot of the initial deployment
steps. <http://www.linode.com/stackscripts/>

------
blhack
One [probably naive] thing that I've done in the past has been to use
mod_rewrite to redirect people to a static version of a page.

Something like this would work in .htaccess

RewriteRule ^t/item/4372/$ /static/4372.html

It's saying "Hey, apache, if you see somebody asking you for
website.tld/t/item/4372/, send them to website.tld/static/4372.html instead"

A blog post I wrote got about 100k hits in a day a few weeks ago, and using
mod_rewrite in this fashion, I was able to keep the site running for the
entire day.

------
joakin
Nice post. It would be nice to read a comparative between apache and nginx on
these cases.

------
bmelton
Those tips are obviously worthwhile, but doesn't address massive scale for
heavily dynamic sites. These should definitely be at the beginning of any
optimization checklist for sure.

One thing I found interesting was his remark this:

 _Google Analytics failed to detect the surge: page load time was so high that
visitors were closing the page before analytics could load._

He then remarks that it took Analytics 15 hours to detect the spike, but isn't
that true of all Analytics instances? I'm not sure if the author is mistaking
Google Analytics' delayed reporting as a fault or if I'm missing something.

~~~
JoachimSchipper
These tips are for people getting slashdotted, not people building the next
Google. Your average blog simply isn't "heavily dynamic", and if you're
building Google/Facebook/... there are better resources.

Google Analytics is Javascript-based and thus vulnerable to people closing the
page before it loads. It's actually near-realtime, but just doesn't work if
you're having this problem. I'd imagine watching free Apache
instances/threads, outgoing bandwidth or probably even the Apache logs would
be more useful.

~~~
jedsmith
> Google Analytics is Javascript-based and thus vulnerable to people closing
> the page before it loads.

And people with Javascript disabled, which is a startling number. The
disparity between reported views in Analytics and what I can observe from my
Web server logs is amazingly large.

~~~
JoachimSchipper
Are you sure those aren't bots? Analytics packages vary widely in their
ability to filter out bots.

~~~
jedsmith
Bots have Javascript disabled too. :)

You're right, of course. I considered it and started grepping them out, but
got bored and did something else interesting. From a casual glance, there were
a lot more Mozilla UAs than reported views, though.

------
luckytaxi
Where the hell have I been, didnt realize Linode has an iphone app.

