

How to Gracefully Degrade Web 2.0 Applications to Maintain Availability - lmacvittie
http://devcentral.f5.com/weblogs/macvittie/archive/2010/01/27/how-to-gracefully-degrade-web-2.0-applications-to-maintain-availability.aspx

======
kordless
Back when I was at Splunk we had a bunch of customer conversations around this
type of use case. The basic recipe people implement goes something like this:

1\. build a centralized logging store (definitely was a challenge for Twitter)

2\. every N minutes a saved search/script runs on the last X many lines of
logs searching for 50Xs (or whatever)

3\. at a particular result threshold, another script is fired off to extract
the URL methods responsible for the errors

4\. using the extracted method, make a call to the load balancer to redirect
that method's URLs to a fail page

5\. alert an admin

One caveat here. If you find yourself in the cloud trying to do this, there
are fewer tools available to you. Amazon's load balancer doesn't support layer
7, so you are left to implementing a software load balancer yourself on a
single box that itself can fail under load.

F5 doesn't have a virtual appliance on Amazon (although they do on GoGrid),
but you can run Zeus' ZXTM product, or a free one like nginx that does very
basic redirects to accomplish the same thing.

It would rule if Amazon added layer 7 support to their ELB offering.

