Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Getting Google Apps to 99.99% (googleenterprise.blogspot.com)
29 points by abraham on Jan 14, 2011 | hide | past | favorite | 19 comments


99.984% Availability in 2010 is an absolutely incredible accomplishment, and the Google Team deserves a ton of accolades.

They are making two (2) changes for 2011:

  o Removing scheduled downtime - that's now included
    in their SLA.  That's Huge.
  o Removing transient outages from their downtime - an 
    outage of a few minutes is now included in their downtime 
    calculations.
This sets the bar for the industry in terms of what type of availability they need to shoot for enterprise class SaaS. Nice to see Google pushing the envelope here.


What they haven't published is their definition of downtime. With close to 200 million users globally, how many have to be affected and for how long before it's considered an outage? If you're a business customer with 20,000 users and you regularly have one or two suffering the 24hr account outage for maintenance, it's a tiny percentage of Google's population but if it's the wrong individuals it can wreak havoc.


Sounds to me like they're counting up all requests in total and determining which of them experienced some issues, and then labeling that downtime. The translation into minutes is really in a different type of units.

The fact that they're including delays of just a few seconds is what makes me think this.


For anyone curious, 99.984% uptime works out to 84 minutes of downtime over one year.


Ask Google, (100-99.984)% of 1 year in minutes and it will tell you: (100 - 99.98400)% of (1 year) = 84.1518026 minutes

I'm sure many here will already know that they can do this but I do so many calculations like this with Google that I'm sure it will be a benefit to someone reading this.


This article made me contemplate that pure uptime percentage is not the best way to measure availability. Surely having 5 to 10 second outages 10 times a month is better than having a single outage for 4 hours?


I don't understand your point: 10 x 10s outages = 100s downtime. One 4h outage = 14000s downtime. Uptime percentage agrees with you that the former is better.


The original example wasn't very good, but try this one: would you prefer 240 1-minute outages or one 4-hour outage?


I'm not sure that beyond 1 minute or so it actually matters whether it is 2 minutes or 2 hours. The system is down. Period. The average user will acknowledge that the system is down and will try again in several hours. From this perspective, it is definitely better to have one big 2-hour outage than a ton of small 2-minute ones.

(EDIT) Saw someone post something like "miltiple small downtimes erode the user's belief in system stability". too bad he deleted the message : it was nailed spot on!


I don't know if I agree that the average user will try again in several hours. There is a big difference if my email is down for 5 minutes or an hour, if I need to check something now, since "now" includes 5 minutes from now but likely not an hour.


Clients automatically retry. The real issue is timeliness of email delivery - as long as you can compose and register email, all that's left is getting it read.

I noticed they don't mention the reliability of delivery - how many email are lost? Elephant on the table.


Sorry for the poor math -- You would have to have roughly 100 10 second outages each month to approach a 4 hour outage one a year.


While your second statement is true, uptime percentage remains the best way until somebody suggests a new (feasible) method.


A combination of both uptime percentage and mean outage time (or mean time between outages.. which is essentially the same thing).


Yes, that's why many SLAs do not count short outages.


I'd be pretty mad if that was in my SLA, unless it was very carefully worded. If the system's going down for a minute every hour or two, it's probably more of a disruption to me than if it goes down for an hour or two once a month.


I want the bleeding edge version that is only up 90% of the time where they keep adding features. Let's call it the nightly build version. Google might also consider letting people subscribe to a version that only supports very modern browsers. FF4, IE9, etc. This will create a beta channel for "release early, release often."


"Email is much more complex than your home phone..." That doesn't seem like a slam dunk obvious truth to me. The PSTN is no mean feat either.


Microsoft BPOS -- I can come up with a few ways to expand that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: