

Getting Google Apps to 99.99% - abraham
http://googleenterprise.blogspot.com/2011/01/destination-dial-tone-getting-google.html

======
ghshephard
99.984% Availability in 2010 is an absolutely incredible accomplishment, and
the Google Team deserves a ton of accolades.

They are making two (2) changes for 2011:

    
    
      o Removing scheduled downtime - that's now included
        in their SLA.  That's Huge.
      o Removing transient outages from their downtime - an 
        outage of a few minutes is now included in their downtime 
        calculations.
    

This sets the bar for the industry in terms of what type of availability they
need to shoot for enterprise class SaaS. Nice to see Google pushing the
envelope here.

~~~
eitally
What they haven't published is their definition of downtime. With close to 200
million users globally, how many have to be affected and for how long before
it's considered an outage? If you're a business customer with 20,000 users and
you regularly have one or two suffering the 24hr account outage for
maintenance, it's a tiny percentage of Google's population but if it's the
wrong individuals it can wreak havoc.

~~~
brown9-2
Sounds to me like they're counting up all requests in total and determining
which of them experienced some issues, and then labeling that downtime. The
translation into minutes is really in a different type of units.

The fact that they're including delays of just a few seconds is what makes me
think this.

------
mikeknoop
This article made me contemplate that pure uptime percentage is not the best
way to measure availability. Surely having 5 to 10 second outages 10 times a
month is better than having a single outage for 4 hours?

~~~
lutorm
I don't understand your point: 10 x 10s outages = 100s downtime. One 4h outage
= 14000s downtime. Uptime percentage agrees with you that the former is
better.

~~~
wmf
The original example wasn't very good, but try this one: would you prefer 240
1-minute outages or one 4-hour outage?

~~~
nickolai
I'm not sure that beyond 1 minute or so it actually matters whether it is 2
minutes or 2 hours. The system is down. Period. The average user will
acknowledge that the system is down and will try again in several hours. From
this perspective, it is definitely better to have one big 2-hour outage than a
ton of small 2-minute ones.

(EDIT) Saw someone post something like "miltiple small downtimes erode the
user's belief in system stability". too bad he deleted the message : it was
nailed spot on!

~~~
smackfu
I don't know if I agree that the average user will try again in several hours.
There is a big difference if my email is down for 5 minutes or an hour, if I
need to check something now, since "now" includes 5 minutes from now but
likely not an hour.

------
melling
I want the bleeding edge version that is only up 90% of the time where they
keep adding features. Let's call it the nightly build version. Google might
also consider letting people subscribe to a version that only supports very
modern browsers. FF4, IE9, etc. This will create a beta channel for "release
early, release often."

------
hughw
"Email is much more complex than your home phone..." That doesn't seem like a
slam dunk obvious truth to me. The PSTN is no mean feat either.

------
vegai
Microsoft BPOS -- I can come up with a few ways to expand that.

