

100% Uptime for Web Applications - greentrack
http://www.opstools.com/2012/05/100-uptime-for-web-applications/

======
Smerity
Luckily when a client demands something outrageous they tend to discover that
it comes with a formidable price tag. Uptime requires redundancy, redundancy
requires a substantial increase in cost without even factoring in the
additional development time.

A theoretical (as you can't guarantee it anyway) 100% uptime requires a
ridiculous amount of redundancy (i.e. consider that you now need to factor in
every idiot with a backhoe[1], car[2] or boat[3]). At 100% uptime you have to
consider not just your site being up but all the infrastructure between your
servers and the given user being up as well.

What's better is discussing with your client what happens when they hit
downtime and how the negative impact can be mitigated. Can you run a limited
service that covers the core functionality of the website (i.e. only make the
core components redundant[4])? What parts of the service are absolutely
desired? How much money will the client lose (either directly or through lost
customer faith)?

You'll tend to find the client will realise the time is better spent
developing features or improving the user experience than worrying over that
last 0.1% between 99.9 and 100.

[1]: <http://www.wired.com/science/discoveries/news/2006/01/70040>

[2]:
[http://www.datacenterknowledge.com/archives/2010/05/13/car-c...](http://www.datacenterknowledge.com/archives/2010/05/13/car-
crash-triggers-amazon-power-outage/)

[3]: <http://en.wikipedia.org/wiki/2008_submarine_cable_disruption>

[3]: <http://techblog.netflix.com/2011/07/netflix-simian-army.html>

------
sicxu
To some extend, 100% uptime is like perfect security. There is no perfect
security, the closer you get to it, the more expensive it is. At some point,
you have to draw a line and design your system to deal with the downtime.

~~~
davyjones
I read somewhere that for each 9 added after the decimal point, the cost goes
up exponentially. Can't recall the link/study.

~~~
astrodust
A rule of thumb might be at least a factor of two to three for each additional
nine.

For instance, first you start with internal redundancy (RAID, redundant power)
which can easily double your cost.

Then you have redundant systems, at least twice as many, which may more than
double your cost. Software licenses and support contracts may need to be
upgraded. A more sophisticated configuration will require training or hiring
better qualified ops people.

Then you need redundant data-centers, which is obviously at least twice the
cost.

Beyond that you will want to rack up redundant provider feeds, layer on more
monitoring systems, have more ops people on standby in case of trouble, and so
on.

Each nine is like a whole new world.

Nearly anyone can do 90% uptime. 99% isn't that hard. 99.9% is when things
require a more disciplined approach.

------
maytc
I believe every client would like to have a system with a 100% up time. You
should instead frame the question to what eplison error is acceptable which in
you case, is 0.02% error.

------
mVChr
This could easily turn into a thread of "when clients say X what they really
mean is Y... and you better clarify it up front!"

------
GMali
It says in the article that "In 2007, only 3 of the top 20 websites were able
to achieve 5 nines or 99.999% uptime." Among the top 3 was Myspace. Which is
funny because you need traffic to worry about Uptimes...

~~~
SoftwareMaven
Myspace was #3 by traffic, not by uptime. Yahoo, Comcast, and AOL were the
three that made it in the top (5 nines) by availability.

Zero down time by Yahoo is, quite frankly, astounding. Kudos to their devops
team! Not even Google made that (though I bet they can now).

