Google outage – resolved

eric_khun · on Dec 14, 2020

If you pay for Google Services, they have an SLA (service level agreement) of 99.9% [1]. If their services are down more than 43 minutes this month[2], you can request “free days” of usage.

Edit: Services were down from ~12:55pm to ~1:52pm, it's 57minutes. Thanks hiby007

[1] https://workspace.google.com/intl/en/terms/sla.html

[2] https://en.wikipedia.org/wiki/High_availability#Percentage_c...

sit_i_piz · on Dec 14, 2020

If all 6 million G Suite customers, with an average number of users at 25 per G Suite account, paying the $20/user fee, requested the three day credit for this breach in the SLA contract for the outage, it'd cost Google about 300 million dollars.

Which is .22% of there COH this quarter...

docuru · on Dec 14, 2020

Or on the regular basis, they'd get $300M every other month (exclude any fee)

rachelbythebay · on Dec 14, 2020

Your nines or their nines?

I bet if you personally can't use it, but their overall reliability meets the bar, then they're within SLA.

Don't ask why I know this.

mk89 · on Dec 14, 2020

You know there are companies who build crawlers or health check agents exactly for this purpose, so that they know precisely from when to when the service they pay or they need for their business doesn't work and went out of SLA. I think it's brilliant and the only way to make any company pay. I believe you can sometimes get away with a couple of pingdom checks/jobs.

danuker · on Dec 14, 2020

Like they have any bargaining power against Google.

mk89 · on Dec 15, 2020

I don't know how it works in such cases, I mean, this is publicly known that there was an outage.

What I believe is that customers will probably get free GCP credits and that's it, everything is good as before.

blaser-waffle · on Dec 14, 2020

This person service-provides

fgblanch · on Dec 14, 2020

In [1] it says: "Customer Must Request Service Credit." Do you know how to request it?

judge2020 · on Dec 14, 2020

Admins can go to admin.google.com and click the help button to start a support request.

ehsankia · on Dec 14, 2020

It seems like they claim around 45m

https://techcrunch.com/2020/12/14/gmail-youtube-google-docs-...

LdSGSgvupDV · on Dec 15, 2020

Is this official announcement? I don't find any link to google in your posted url.

ehsankia · on Dec 15, 2020

not an announcement, seems like just a statement. I assume an official statement will come with the post mortem blog post in the next few days.

hiby007 · on Dec 14, 2020

If you click on the red status dots, it has report with timing.

api · on Dec 14, 2020

SLAs are largely bullshit.

kaushikt · on Dec 14, 2020

nickjj · on Dec 14, 2020

This topic just came up recently on a podcast I was on where someone said a large service was down for X amount of time and the service being down tanked his entire business while it was down for days. But he was compensated in hosting credits for the exact amount of down time for the 1 service that caused the issue. It took so long to resolve because it took support a while to figure out it was their service, not his site.

So then I jokingly responded with that being like going to a restaurant, getting massive food poisoning, almost dying, ending up with a $150,000 hospital bill and then the restaurant emails you with "Dear valued customer, we're sorry for the inconvenience and have decided to award you a $50 gift card for any of our restaurants, thanks!".

If your SLA agreement is only for precisely calculated credits, that's not really going to help in the grand scheme of things.

jjguy · on Dec 14, 2020

I like your anecdote, I might steal that one.

IANAL, but I negotiate a lot of enterprise SaaS agreements. When considering the SLA, it is important to remember it is a legal document, not an engineering one. It has engineering impact and is up to engineering to satisfy, but the actual contents of it are better considered when wearing your lawyer hat, not your engineering one.

e.g., What you're referring to is related to the limitation of liability clauses and especially "special" or "consequential" damages -- a category of damages that are not 'direct' damages but secondary. [1]

Accepting _any_ liability for special or consequential damages is always a point of negotiation. As a service provider, you always try to avoid it because it is so hard to estimate the magnitude, and thus judge how much insurance coverage you need.

Related, those paragraphs also contain a limitation of liability clause, often at capped at X times annual cost. Doesn't make much sense to sign up a client for $10k per year but accept $10M+ liability exposure for them.

This is just scratching the surface -- tons of color and depth here that is nuanced for every company and situation. It's why you employe attorneys!

1 - https://www.lexisnexis.com/lexis-practical-guidance/the-jour...

sitkack · on Dec 15, 2020

> Doesn't make much sense to sign up a client for $10k per year but accept $10M+ liability exposure for them.

Businesses do this all the time, this is how they make money. And they use a combination of insurance and not %@$#@*! up.

jellykid · on Dec 14, 2020

I've never seen an SLA that compensates for anything more than credit off your bill. I can't imagine a service that pays for loss of productivity, one outage and the whole company could be bankrupt. If your business depends on a cloud service for productivity you should have a backup plan if that service goes down.

Sodman · on Dec 14, 2020

I haven't seen one (at least for a SaaS company) that will compensate for loss of productivity/revenue etc, but something like Slack's SLA[0] seems like it's moving in the right direction. They guarantee a 99.99% uptime (max downtime of 4 min/22 seconds per month) and give 10x credits for any downtime.

Granted, there's probably not many businesses that are losing major revenue because slack's down for half an hour, but it's nice to at least see them acknowledge that 1 minute down deserves more than 1 minute of refunds!

[0] https://slack.com/terms/service-level-agreement

toyg · on Dec 14, 2020

> I haven't seen one (at least for a SaaS company) that will compensate for loss of productivity/revenue

They won't show up on automated systems aimed at SMEs, but anybody taking out an "enterprise plan" with tailored pricing from a SaaS, will likely ask for tailored SLA conditions too (or rather should ask for them).

pantulis · on Dec 14, 2020

It's hard to give a compensation for profit loss, as then you would have to know the profit of the customer beforehand and put an adequate pricing including that risk. It's almost like insurance!

dman · on Dec 15, 2020

In financial markets I have seen SLA's where you will make people whole on the losses they incur due to downtime you inflict on them.

patrickthebold · on Dec 14, 2020

Seems like you want insurance. As with the hospital bill you'd generally be paying a bunch of extra money for your health insurance plan to not get stuck with the bill.

Not sure that exists for businesses, but I'd expect you'd need to go shopping separately if you want that.

Seems like a good business idea if it doesn't exist.

waisbrot · on Dec 14, 2020

I think the idea here is that if the payment for SLA breach is just "don't pay for the time we were down" or (as I've seen in other SLAs) "half off during the time we were down" that doesn't feel like much of an incentive on the service provider.

They have other incentives, obviously, like if everyone talks about how Google is down then that's bad for future business. But when thinking of SLAs I'm always surprised when they're not more drastic. Like "over 0.1% downtime: free service for a month".

zeroping · on Dec 14, 2020

Independent 'a service was down' insurance isn't the same though. It is important for the cost to come out of the provider's pocket, thus giving them a huge financial incentive to not be down. Having that incentive in place is the most important part of an SLA.

toyg · on Dec 14, 2020

Even with insurance, some of the cost will come out of the provider's pockets - as increased premiums at renewal (or even immediately, in some cases). Insurers might also force other onerous conditions on the provider as a prerequisite for continued coverage.

patrickthebold · on Dec 14, 2020

I hear you, but there's going to be a cost for that. For the sake of argument, say Google changes the SLA as you wish and ups the cost of their offering accordingly.

Would they gain or lose market share?

I don't think it's obvious one way or the other.

s3cur3 · on Dec 14, 2020

> . . . like going to a restaurant, getting massive food poisoning, almost dying, ending up with a $150,000 hospital bill and then the restaurant emails you with "Dear valued customer, we're sorry for the inconvenience and have decided to award you a $50 gift card for any of our restaurants, thanks!".

It's even slightly worse than that. SLAs generally refund you for the prorated portion of your monthly fee the service was out, so it's more like "here's a gift card for the exact value of the single dish we've determined caused your food poisoning." Hehe.

nickjj · on Dec 14, 2020

You're right and the funny thing is that's exactly what he said after I chimed in.

boringg · on Dec 14, 2020

Completely agree with your analogy but have you ever seen any SLA that provides any additional liability? I haven't seen them - you are stuck either relying on hosting services SLA or DIY.

gravitas · on Dec 14, 2020

I cannot share details for the obvious reason, but yes - there are SLAs signed into contract directly behind the scenes which result in automatic payouts of a condition isn't met and it's not a simple credit.

Enterprise level SLAs are crafted by lawyers in negotiations behind the scenes and are not the same as what you see on random public services. Our customers have them with us, and we have them with our vendors. Contract negotiations take months at the $$$$ level.

boringg · on Dec 16, 2020

That is a fair point. Is this a situation where you asymmetrically powerful? I have to imagine you would have considerable clout to represent a fair bit of their revenue in order to dictate terms. When I wrote my comment it was in the vein of a smaller organization.

gravitas · on Dec 16, 2020

I am but a technical cog in the machine my friend, while I know about what goes on in business and contract negoatiations I cannot comment on power dynamics. I would assume it's like any other negotiation - whomever has the greatest leverage has the power, I doubt it's ever fairly balanced.

gwright · on Dec 14, 2020

Or purchasing business continuity insurance.

fullstop · on Dec 14, 2020

Not OP, but how do you measure them? Let's say, for example, you can send and receive email, but attaching files does not work. Is the service up or down?

What if the majority of your users can access the service, but one of your BGP peers is not routing properly and some of your users are unable to access?

lawrjone · on Dec 14, 2020

Google do a very good job of defining their SLAs.

In answer to your question, they'll accept evidence from your own monitoring system when you claim on the SLA. They pair that up to their own knowledge about how the system was performing, then make the grant.

Google are exceptionally good at this, from my experience. Far better than most other companies, who aim to provide as little detail as possible while getting away with 'providing an SLA'.

pas · on Dec 14, 2020

The SLA itself should specify the way availability is measured.

gsich · on Dec 14, 2020

Down because email attachments are base64 encoded files written in plaintext into the body. So if those are not working, email itself is not working.

abluecloud · on Dec 14, 2020

that was a bad example. i guess the comment was trying to say "how do you account for partial service degradations".

(i dont think SLAs are BS btw)

gsich · on Dec 14, 2020

In this example: you get free days. Which depending on your business might be worthless if you have suffered more monetary loss due to the downtime than the free days are worth.

SifJar · on Dec 14, 2020

But still better than nothing. And for some (most?) people/businesses, probably worth more than any resulting monetary loss

Cthulhu_ · on Dec 14, 2020

Exactly; downtime doesn't cost a cloud service much. At worst it causes reputation damage, with possibly large companies deciding to go for a competitor, losing a contract worth tens or hundreds of millions.

mmillin · on Dec 14, 2020

Given the blast radius of this (all regions appear to be impacted) along with the fact that services that don't rely on auth are working as normal, it must be a global authN/Z issue. I do not envy Google engineers right now.

tarruda · on Dec 14, 2020

> I do not envy Google engineers right now.

A few years ago I released a bug in production that prevented users from logging into our desktop app. It affected about ~1k users before we found out and rolled back the release.

I still remember a very cold feeling in my belly, barely could sleep that night. It is difficult to imagine what the people responsible for this are feeling right now.

papito · on Dec 14, 2020

When I was interviewing at Morgan Stanley, I asked "how do you do this job if a mistake can cost people money?".

The answer was "well, if you don't do anything, you make NO money".

xenocratus · on Dec 14, 2020

I'm reminded of the quote from Thomas J. Watson:

> Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?

huijzer · on Dec 14, 2020

Agreed, and also it's worth noting that we're talking about companies here. Yes, for any individual the amount of money lost is insane, but that's the risk for the company. If one individual can accidentally nearly bankrupt the company, then the company did not have proper risk management in place.

That isn't too say that it wouldn't also affect my sleep quality.

Maxion · on Dec 14, 2020

Sadly a lot of managers don't see it this way, they'd rather assign blame.

rebuilder · on Dec 14, 2020

Depending on the company's culture, that calculus can vary. If management start firing subordinates for making mistakes, then what should be done to management if they fail to account for human error, resulting in multimillion-dollar losses?

papito · on Dec 14, 2020

Well, why, good Sir - don't they usually get bonuses?

mavhc · on Dec 14, 2020

They'd rather avoid the blame hitting them

sitkack · on Dec 15, 2020

That is because they are managers, most got where they are by assigning rewards to themselves.

cvrjk · on Dec 14, 2020

Welp, as a new grad there, I had brought down one very important database server on a Sunday night (a series of really unfortunate events). Multiple senior DBAs had to be involved to resuscitate it. It started functioning normally just a few hours before market open in HK. If it was any later, it would have been some serious monetary loss. Needless to say, I was sweating bullets. Couldn't eat anything the entire day lol. Took me like 2 days to calm down. And this was after I was fully shielded cuz I was a junior. God knows what would've happened if someone more experienced had done that.

h1srf · on Dec 14, 2020

I brought down the order management system at a large bank during the middle of the trading day. The backup kicked in after about a minute but it was not fun on the trading floor.

b0afc375b5 · on Dec 14, 2020

I'm so glad I'm not the only one feeling deployment anxiety. The project I'm involved in doesn't really have serious money involved, but when there's a regression found only after production deployment my stress levels go up a notch.

FloatingUp · on Dec 14, 2020

When I was working at a pretty big IT provider in the electronic banking sector, we (management and senior devs) made it an unspoken rule, that: - Juniors shall also handle production deployments regularly. - A senior person is always on call (even if only unofficially / off the clock). - Junior devs are never blamed for fuckups, irrespective of the damage they caused.

That was the only way to help people develop routine regarding big production deployments.

awesomemonster · on Dec 14, 2020

Same thing -- used to work at a very large hosting provider. One of our big internal infra management teams wouldn't consider newhires fully "part of the team" until they had caused a significant outage. It was genuinely a right of passage, as one person put it, "to cause a measurable part of the internet to disappear".

I got to see a lot of people pass through this right of passage, and it was always fun to watch. Everyone would take it incredibly seriously, some VP would invariably yell at them, but at the end of the day their managers and all their peers were smiling and clapping them on the back.

codechicago277 · on Dec 14, 2020

Sounds like hazing.

pdelgallego · on Dec 14, 2020

as a new grad there, it wasn't your fault. There should be guardrails to protect you.

cvrjk · on Dec 14, 2020

Yep. It was supposed to be a very small change. I blundered. My team understood that and was super supportive about it all too. But this was after it was all fixed.

During the outage though, no one (obviously) had time for me. This was a very important server. The tension and anxiety on the remediation call was through the roof. Every passing hour someone even more important in the chain of command was joining the call. At that time I thought I was done for...

onychomys · on Dec 14, 2020

I work for an extremely famous hospital in the American midwest. We're divided into three sections, one for clinical work, one for research, and one for education. I always tell people that I'm pretty content being in research (which is less sexy than clinical), because if I screw something up, some PI's project takes ten months and one week instead of ten months. In clinical, if you screw something up, somebody dies! I just don't think I could handle that level of stress.

ignoramous · on Dec 14, 2020

Same.

At AWS, I once took an entire AZ down of a public-facing production service (with a mis-typed command), but that was nothing compared to when I accidentally deleted an entire region via internal console (too many browser tabs). Thank goodness turned out to be unused / unlaunched, non-production stack. I felt horrible for hours despite zero impact (in both the cases).

papito · on Dec 14, 2020

Jesus. One would think you'd have some safeguards for that. Even Dropbox will give you an alert if you try to nuke over 1,000 files. More reasons to COLOR CODE your work environments, if possible.

ignoramous · on Dec 14, 2020

Yes but that was eons ago. The safeguards are well and truly in-place now. Not just one, several in fact.

qz2 · on Dec 14, 2020

Apart from the ones that they haven't worked out yet :)

zentiggr · on Dec 14, 2020

When I meet the engineer who can design for the unknown unknowns, I will bow to them.

papito · on Dec 14, 2020

The trick is to be paranoid. You literally sit down and think exclusively about what COULD go wrong.

GreenWatermelon · on Dec 14, 2020

Anxiety is a bitch.

sitkack · on Dec 15, 2020

Formal methods for your formal methods. And never shipping on Friday.

marcus_holmes · on Dec 14, 2020

colorblind (red/green) person here - 5% of the male population just don't see color enough for it to be an important visual clue.

So sure, color-code your environments, but if you find someone about to do something to a red environment that they clearly should only be doing to a green environment, just check if they're seeing what you're seeing before you sack them ;)

warrenm · on Dec 21, 2020

My primary customer right now color codes both with "actual color", and with words - ie the RED environment is tagged with red color bars, and also big [black] letters in the [red] bars reading "RED"

ardy42 · on Dec 14, 2020

> At AWS, I once took an entire AZ down of a public-facing production service (with a mis-typed command), but that was nothing compared to when I accidentally deleted an entire region via internal console (too many browser tabs). Thank goodness turned out to be unused / unlaunched, non-production stack. I felt horrible for hours despite zero impact (in both the cases).

It seems like a design flaw for actions like that to be so easy. E.g.

> Hey, we detected you want to delete an AWS region. Please have an authorized coworker enter their credentials to second your command.

ignoramous · on Dec 15, 2020

If it was indeed in-production, I'd never in a million years have had access rights to delete the stack. Those are gated just the way one would imagine they should be.

The service stack for the region (and not an entire region itself) looked like prod, but wasn't. It made me feel like shit anyway.

znpy · on Dec 14, 2020

It reminds me of this: https://www.youtube.com/watch?v=30jNsCVLpAE -- "GOTO 2017 • Debugging Under Fire: Keep your Head when Systems have Lost their Mind • Bryan Cantrill"

ayoubElk · on Dec 14, 2020

Irrelevant to the discussion, but I just wanted to say thank you for the categorized list of users I can follow on your profile!

laddng · on Dec 14, 2020

Which tool do you use to follow users on HN?

ignoramous · on Dec 14, 2020

There used to be hystry.com [0] but it isn't functional anymore.

Another workflow, though cumbersome, is: Search for a username on hn.algolia, select "comments" and "past months" as filters, then press enter.

Ex: https://hn.algolia.com/?dateRange=pastMonth&query=nostrademo...

[0] https://news.ycombinator.com/item?id=71827

laddng · on Dec 14, 2020

Thanks for sharing those tips!

rptr · on Dec 14, 2020

wow, that list is crazy. thanks OP.

robertlagrant · on Dec 14, 2020

Who automates the automators? :)

ahachete · on Dec 14, 2020

Doesn't AWS (and every big cloud/enterprise) follow best-practices for production operation like FIT-ACER? https://pythian.com/uncategorized/fit-acer-dba-checklist/

That's even more surprising to me.

zeven7 · on Dec 14, 2020

Several years back when I was working at Google I made a mistake that caused some of the special results in the knowledge cards to become unclickable for a small subset of queries for about an hour. As part of the postmortem I had to calculate how many people likely tried to interact with it while it was broken. It was a lot and really made me realize the magnitude of an otherwise seemingly small production failure. My boss didn't give me a hard time, just pointed me toward documentation about how to write the report. And crunching the numbers is what really made me feel the weight of it. It was a good process.

I feel for the engineer who has to calculate the cost of this bug.

robotnikman · on Dec 14, 2020

This sounds like a good practice and hopefully something they still do. Calculating the exact numbers would definitely help cement the experience and its consequences into your mind.

zymhan · on Dec 14, 2020

Presumably there were more failures than a single engineer could've been responsible for here.

fjcp · on Dec 14, 2020

Its absolutely possible, the worst AWS outage was caused by one engineer running the wrong command [0].

"This past Tuesday morning Pacific Time an Amazon Web Services engineer was debugging an issue with the billing system for the company’s popular cloud storage service S3 and accidentally mistyped a command. What followed was a several hours’ long cloud outage that wreaked havoc across the internet and resulted in hundreds of millions of dollars in losses for AWS customers and others who rely on third-party services hosted by AWS."

[0] - https://www.datacenterknowledge.com/archives/2017/03/02/aws-...

rplnt · on Dec 14, 2020

If you alone were able to do it, then the system was designed badly. The bigger the impact, the more robust it has to be to prevent accidents.

a3_nm · on Dec 14, 2020

The big mistake in the system is that everyone in the world is relying on Google services... These problems would have less impact with a more diverse ecosystem.

lawrjone · on Dec 14, 2020

Would they have less impact? Or would it have the same impact, just distributed across many more outages?

You can rely on Google outages being very few and far between, and recovering pretty fast. For the benefits you get from such a connected ecosystem, I'm not sure anyone is net positive from using a variety of different tools rather than Google supplying many of them.

rightbyte · on Dec 14, 2020

Compare closing down one road for repair a day per year to closing down all roads one day a year.

lawrjone · on Dec 14, 2020

I'm not sure I see that as a fair comparison. I think it's best to use the same durations for this, as an entire day changes the level of impact. <1hr outages have a fraction of the impact that an entire days outage might have.

It's obviously subjective, but even with our entire work leaning on Google– from GMail, GDrive and Google Docs, through to all our infrastructure being in GCP– todays outage just meant everyone took an hour break. History suggests we won't see another of these for another year, so everyone taking a collective 60m break has been minimally impactful vs many smaller, isolated outages spread over the year.

warrenm · on Dec 21, 2020

Just think if Google actually decided to "take the day off"

...like I did a dozen+ years ago: https://antipaucity.com/2008/01/09/what-if-google-took-the-d...

domano · on Dec 14, 2020

I remember how one of our engineers had his docker daemon connected to production instead of his local one and casually did a docker rm -f $(docker ps -aq) .

Same thing happened to me but with CI, which felt bad enough already.

selimnairb · on Dec 14, 2020

"Hey let's make developers do two very different jobs, development, and operations. We'll call it DevOps. We'll save money. Everything will be fine."

alias_neo · on Dec 14, 2020

No Engineer should have production access from their workstation. Period.

source: am Engineer =).

shawkinaw · on Dec 14, 2020

Engineers shouldn’t deploy to prod directly, but sometimes it’s necessary to SSH into an instance for logs, stack dumps, etc. Source: worked for 2 big to very big tech cos.

alias_neo · on Dec 14, 2020

For a large or v large tech co you should probably be aggregating logs to a centralised location that doesn't require access to production systems in this way. Stack dumps should also be collected safely off-system if necessary.

Perhaps my industry is a little more security conscious (I don't know which industry you're talking about), but this doesn't seem like good practice.

shawkinaw · on Dec 14, 2020

Let me be clear, I agree it should not be normal to SSH into a prod box. Our logs are centrally aggregated. But it’s one thing to say it’s not normal, but quite another to say engineers shouldn't have access, because I totally disagree with that.

throwaway201103 · on Dec 14, 2020

What normally (should) happens in that unusual case is that the engineer is issued a special short-lifetime credential to do what needs to be done. An audit trail is kept of when and to whom the credential was issued, for what purpose, when it was revoked, etc.

adwww · on Dec 14, 2020

Who fixes the centralised log system when that needs debugging?

Unless prohibited in something like banking, following best practice to the letter is sometimes unacceptably slow for most industries.

mlnj · on Dec 14, 2020

There should be tools that allow the team to gather such logs. Direct prod access is a recipe for disaster.

SteveNuts · on Dec 14, 2020

Not having those things centralized is also a huge operational failure regardless of company size.

tasuki · on Dec 14, 2020

Why not? (I think I can find some cases where production access from an engineer's workstation is a good idea)

alias_neo · on Dec 14, 2020

It can be efficient, particularly in smaller companies, but that's where exactly this rule should be applied.

In some industries, security and customer requirements will at times mandate that developer workstations have no access to production. Deployments must even be carried out using different accounts than those used to access internal services, for security and auditing purposes.

There are of course good reasons for this; accidents, malicious engineers, overzealous engineers, lost/stolen equipment, risk avoidance, etc.

When you apply this rule, it makes for more process and perhaps slower response times to problems, but accidents or other internal-related issues mentioned above drop to zero.

Given how easy it is to destroy things these days with a single misplaced Kubernetes or Docker command, safeguards need to be put in place.

Let me tell you a little story from my experience;

I built myself a custom keyboard from a Numpad kit. I had gotten tired of typing so many docker commands in every day and I had the desire to build something. I built this little numpad into a full blown Docker control centre using QMK. A single key-press could deploy or destroy entire systems.

One day, something slid off of something else on my desk, onto said keyboard, pressing several of the keys while I happened to have an SSH session to a remote server in focus.

Suffice it to say, that little keyboard has never been seen since. On an unrelated topic, I don't have SSH access to production systems.

mlnj · on Dec 14, 2020

This exactly. I have deleted database records from a production DB thinking I am executing on my development DB. I've kept separate credentials and revoked dev machine access to prod environment ever since.

tasuki · on Dec 14, 2020

Congratulations - you found a counterexample yourself: engineers in small companies.

solids · on Dec 14, 2020

Well, it's something that can happen to anyone, take it easy. When I made the transition from developer to manager and become responsible for this situations, at first every problem made me feel as you describe. Eventually what helped me to be free is the understanding that how we feel about a fact does not change anything about that fact.

cube00 · on Dec 14, 2020

Don't be too hard on yourself, no dev works in a silo, there is usually user acceptance testing and product owner sign offs involved so they also have to wear some of this too.

mrcus · on Dec 14, 2020

Nope, especially considering the implications of this, with the amount of people working remotely. Google Meet, Classroom, etc. are down. This is probably literally costing billions every minute just in loss of productivity.

ants_a · on Dec 14, 2020

Total world economic output is ~$150M / minute, so billions every minute is off by few orders of magnitude.

tzs · on Dec 14, 2020

You are assuming that a minute of disruption can not cause more than a minute's loss of productivity. I don't think that assumption is justified.

Consider an exactly one minute outage that affects multiple things I use for work.

First, I may not immediately recognize that the outage is actually with some single service provider. If several things are out I'm probably going to suspect it is something on my end, or maybe with my ISP. I might spend several minutes thoroughly checking that possibility out, before noticing that whatever it was seems to have been resolved.

Second, even if I immediately recognize it for what it is and immediately notice when it ends it might take me several minutes to get back to where I was. Not everything is designed to automatically and transparently recover from disruptions, and so I might have had things in progress when the outage stuck that will need manual cleanup and restarting.

ants_a · on Dec 14, 2020

I'm also assuming most of the world doesn't grind to a halt when gmail is down. Crops keep growing and factories keep running.

itsrajju · on Dec 14, 2020

Even software engineers who are in a state of flow keep working :)

rocho · on Dec 14, 2020

That figure seems way too low, what are your sources on it?

ascar · on Dec 14, 2020

Simple math says:

World GDP (via Google) $80,934,771,028,340

Minutes per year 365 * 24 * 60 = 525,600

Divide and you get 153,985,485

HPsquared · on Dec 14, 2020

World GDP is $80 trillion per year.

rakoo · on Dec 14, 2020

World GDP was ~$90B last year (https://databank.worldbank.org/data/download/GDP.pdf), which averages to ~$150M/minute

vultour · on Dec 14, 2020

That's trillion not billion

jrh206 · on Dec 14, 2020

https://en.wikipedia.org/wiki/Billion

A billion is a number with two distinct definitions:

- 1,000,000,000, i.e. one thousand million, or 10^9, as defined on the short scale. This is now the meaning in both British and American English.

- 1,000,000,000,000, i.e. one million million, or 10^12, as defined on the long scale. This is one thousand times larger than the short scale billion, and equivalent to the short scale trillion. This is the historical meaning in English and the current use in many non-English-speaking countries where billion and trillion 10^18 maintain their long scale definitions.

Nevertheless almost everyone uses 1B = 10^9 for technical discussions

ikt · on Dec 14, 2020

This is a financial discussion though so:

https://www.worldometers.info/gdp/gdp-by-country/

World's GDP is $80,934,771,028,340 (nominal, 2017).

https://www.wolframalpha.com/input/?i=%2480%2C934%2C771%2C02...

$80.93477102834 trillion

Nobody would argue world GDP is anything billion, that's crazy.

selectodude · on Dec 14, 2020

https://fr.wikipedia.org/wiki/Liste_des_pays_par_PIB_nominal

In France, they use milliard and billion.

rakoo · on Dec 14, 2020

Sorry, language mistake. The result is the same: GDP is ~$150M/minute

kerng · on Dec 14, 2020

That depends where in the world you are!

yashap · on Dec 14, 2020

Indeed. Also, Google’s revenue is about $300K per minute. The value they provide is likely higher than that, but as you said, being able to send an email an hour later than you hoped it’s fine in most cases. Also, Google Search was fine, and that’s their highest impact product.

I’d guess actual losses to the world economy were more on the order of about $100K per minute, or about 1/3 of Google’s revenue. MAYBE a few hundred thousand per minute, though that seems unlikely with Search being unaffected, and everything else coming back. Certainly a far cry from billions per minute :)

demosito666 · on Dec 14, 2020

I never understood this type of calculation as it implies that time is directly converted into money. However, I struggle to come up with an example for this. Even the most trivial labor cases like producing paperclips don't seem to be directly converting time into profit: even you will make 10k units instead of 100k this hour, you don't sell them immediately. They bring revenue to the firm via a long chain of convoluted contracts (both legal and "transactional") which are very loosely coupled to the immediate output.

Nothing is operating at minute margins unless it's explicitly priced on a minutely basis, like a cloud service. Even if a worked on a conveyor belt can't produce paperclips without looking at Google Docs sheet all the time, this will be absorbed by the buffers down the line. And only if the worker will fail to meet her monthly target due to this, loss of revenue might occur. But in this case the service has to be down for weeks.

In case of more complex conversions of time into money, like in the most of intellectual work, this is even less obvious that short downtimes will cause any measurable harm.

PeterStuer · on Dec 14, 2020

Besides the exaggerated figure, I always find these claims bizarre. Sure, there was some momentary loss, but aggregated over a month this will not even register.

optimalsolver · on Dec 14, 2020

I was unable to watch the Mogwai - Autorock music video. :-(

Corrado · on Dec 14, 2020

In a previous lifetime I removed an "unused" TLS certificate. It turns out that it was a production cert that was being used to secure a whole state's worth of computers.

In my defence, the cert was not labeled properly, nor was it used properly, and there was no documentation. It took us 2 days to create a new cert and apply it to our software and deliver it to the customer. Those were 2 days I'll never get back. However, when I was finished the process was documented and the cert was labeled, so I guess its a win.

ilikehurdles · on Dec 14, 2020

Coincidentally, Google Authenticator was finally just updated on iOS after many years without update.

beyondcompute · on Dec 14, 2020

I am not sure why are they allowing it. Meaning why aren’t services completely isolated? Isn’t it obvious that in an intertwined environment those things are bound to happen (as in “question of when, not if”)? I understand, in smaller companies that are limited in resources (access to good developers and pressure to get product to market as soon as possible) we have single points of failure all over the place. But “the smartest developers on the planet”? What is it if not short-sighted disregard for risk management theories and practices? I mean, Calendar and Youtube, say, should be completely separate services hosted in different places, their teams should not even talk to each other. Yes, they can use same software components, frameworks and technologies. Standardization is very welcome. But decentralization should be an imperative.

Edit: again downvotes started! Thanks to everyone “supporting freedom of expression” :)

robotnikman · on Dec 14, 2020

I've been in that situation before at one of my previous jobs, where some important IT infrastructure when down for the whole company. Nowhere as big of a scale as this, but it was easily one of the most stressful moments of my life

ojosilva · on Dec 14, 2020

If this does not improve soon, we're looking at one of the most significant outages in recent internet history, at least from the number of people impacted.

Diederich · on Dec 14, 2020

Several others have shared their 'I broke things' experiences, and so I feel compelled to weigh in.

Many years ago, I was directly responsible for causing a substantial percentage of all credit/debit/EBT authorizations from every WalMart store world-wide to time out, and this went on for several days straight.

On the ground, this kind of timeout was basically a long delay at the register. Back then, most authorizations would take four or five seconds. The timeout would add more than 15 seconds to that.

In other words, I gave many tens of millions of people a pretty bad checkout experience.

This stat (authorization time) was and remains something WalMart focuses quite heavily on, in real time and historically, so it was known right away that something was wrong. Yet it took us (Network Engineering) days to figure it out. The root cause summary: I had written a program to scan (parallelized) all of the store networks for network devices. Some of the addresses scanned were broadcast and network addresses, which caused a massive amplification of return traffic which flooded the satellite networks. Info about why it took so long to discover is below.

Back in the 1990s, when this happened, all of the stores were connected to the home office via two way Hughes satellite links. This was a relatively bandwidth limited resource that was managed very carefully for obvious reasons.

I had just started and co-created the Network Management team with one other engineer. Basically prior to my arrival, there had been little systematic management of the network and network devices.

I realized that there was nothing like a robust inventory of either the networks or the routers and hubs (not switches!) that made up those networks.

We did have some notion of store numbers and what network ranges were assigned to them, but that was inaccurate in many cases.

Given that there were tens of thousands of networks ranges in question, I wrote a program creatively called 'psychoping' that would ICMP scan all of those network ranges with adjustable parallelism.

I ran it against the test store networks, talked it over with the senior engineers, and was cleared for takeoff.

Thing is, I didn't start it right away; some other things came up that I had to deal with. I ended up started it over a week after review.

Why didn't this get caught right away? Well, when timeouts started to skyrocket across the network, many engineers started working on the problem. None of the normal, typical problems were applicable. More troubling, none of the existing monitoring programs looked for ICMP at all, which is what I was using exclusively.

So of course they immediately plugged a sniffer into the network and did data captures to see what was actually going on. And nothing unusual showed up, except a lot of drops.

We're talking > 20 years ago, so know that "sniffing" wasn't the trivial thing it is now. Network Engineering had a few extremely expensive Data General hardware sniffers.

And to these expensive sniffers, the traffic I was generating was invisible.

Two things: the program I wrote to generate the traffic had a small bug and was generating very slightly invalid packets. I don't remember the details, but it had something to do with the IP header.

These packets were correct enough to route through all of the relevant networks, but incorrect enough for the Data General sniffer to not see them.

So...there was a lot of 'intense' discussions between Network Engineering and all of the relevant vendors. (Hughes, ACC for the routers, Synoptics and ODS for the hubs)

In the end, a different kind of sniffer was brought in, which was able to see the packets I was generating. I had helpfully put my userid and desk phone number in the packet data, just in case someone needed to track raw packets back to me.

Though the impact was great, and it scared me to death, there were absolutely no negative consequences. WalMart Information Systems was, in the late 1990s, a very healthy organization.

leonidasv · on Dec 14, 2020

Makes sense, at work we have an application running on Google Cloud and everything seems to be working. So the outage is probably not at network or infrastructure level.

ikiris · on Dec 14, 2020

Went to reply, then saw the username. My guess was lb layer

reddotX · on Dec 14, 2020

yeah, not working in Europe

hackernews1134 · on Dec 14, 2020

4:41AM PT, Google services have been restored to my accounts (free & gsuite).

And I have never seen them load so fast before - gmail progress bar barely seen for a fraction of a second whereas I am more used to seeing it for multiple seconds (2-3 sec) until it loads.

I observe the same anecdotal speedup for other sites... drive, youtube, calendar. I wonder if they are throwing all the hardware they have at their services or I am encountering underutilized servers since it is not fixed for everyone.

It is nice to experience (even if it is short lived) the snappiness of Google services if they weren't so multi-tenented.

cranekam · on Dec 14, 2020

If this phenomenon is actually real instead of just perception then I'd guess it is down to reduced demand of some short. Some possibilities:

a) users haven't all come back yet b) Google is throttling how fast users can access services again to prevent further outages c) to reduce load, apps have features turned off (which might make things directly faster on the user's end or just reduce load on the server side)

hackernews1134 · on Dec 14, 2020

At Google's scale, I'd expect it to be all of the above.

I hope they make their learnings, post-mortem, etc. public so that we can all learn from it.

My engineer hat is saying - "damn, I wish I was part of fixing this outage at their scale."

My product owner hat is saying - "Aaaaaaaaaaaaaaa......Aaaaaaaaaaaaaaa...."

:D

orzig · on Dec 14, 2020

Everything is snappier for a while if you turn it off and then on again

faeyanpiraat · on Dec 14, 2020

Except when there is no cache warming when you turn it on

nhlx2 · on Dec 15, 2020

Except when there is cache warming when you turn it on?

q3k · on Dec 14, 2020

I would guess autoscaling kicked in (RPC error rate caused higher CPU usage?) and now things will scale back down again.

tibbydudeza · on Dec 14, 2020

Perhaps they rebooted their clusters and it flushed the memory ???.

rubatuga · on Dec 14, 2020

Oh man, you're right. Bloated gmail loaded instantly. What's going on? It's loading almost 2x to 3x faster.

ithkuil · on Dec 14, 2020

Isn't this a good indication that the performance problem if gmail may not be related to the "bloat" of the frontend itself?

yjftsjthsd-h · on Dec 14, 2020

It might suggest that the frontend isn't the only issue, at least - and maybe this explains why it's usually so slow, if the frontend can be fast on a fast enough backend. On the other hand, the speed of the "basic HTML" version implies that the frontend can be the issue.

savanaly · on Dec 14, 2020

Entirely possible as well that the "basic HTML" uses different API service in the background that are snappier for comparative lack of users.

fleaaa · on Dec 14, 2020

I always thought gmail being slow is because of me using firefox but now it's surprisingly snappy. What the hell is going on?

proto-n · on Dec 14, 2020

Wow, it's faster in firefox than it used to be in chrome... while in chrome it's almost isntanteneous

Tenoke · on Dec 14, 2020

I wish it was always like this. I hate how slow YouTube, Gmail, etc. often are normally.

roadnottaken · on Dec 14, 2020

I wonder if they just killed the affected service so it's loading faster-than-usual now

jacquesm · on Dec 14, 2020

So, anybody still feel like arguing that 'the cloud' is a viable back-up? Or is that a sore point right now? Just for a moment imagine: what if it never comes back again?

Of course it will, - at least, it better - but what if it doesn't? And if it does, are you going to take countermeasures in case it happens again or is it just going to be 'back to normal' again?

bonoboTP · on Dec 14, 2020

I guess a lot of people are fine with the risk.

Everybody uses it, so if, like, Gmail loses all the emails, we are then in such a state that the consequences will be more bearable and socially normal.

Most people are fine with accepting that whatever future thing will happen to most people will also happen to them. Because then the consequences will also be normal.

If the apocalypse comes, it comes for almost all of us and that's consolation enough.

mizzao · on Dec 14, 2020

This sounds like the good old 1970-80s "No one ever got fired for buying IBM" argument.

draugadrotten · on Dec 14, 2020

Still haven't heard of anyone being fired for buying IBM. Have you?

thu2111 · on Dec 14, 2020

Yeah but for the people making that argument, it was a good one!

TomGullen · on Dec 14, 2020

The way I see it, backups are a strategy to reduce risk of ruin.

For me, backing up to the Cloud is fine, because I find the risk of my home being broken into and everything stolen AND the cloud goes down AND the cloud services are completely unrecoverable is a small enough risk to tolerate.

I don't think it's possible to have permanently indestructible files in existence over a given time period.

mbar84 · on Dec 14, 2020

Different failure mode. If the cloud goes down, many more people are affected. If your self-hosted thing goes down, only you are affected. If everybody self-hosted, would the overall downtime be lower? Even if it were, would it be worth the effort of self-hosting?

bgilroy26 · on Dec 14, 2020

For baby pictures yes, for everything else, no

forgotmypw17 · on Dec 14, 2020

Most of the things I backed up for myself are either gone forever or irretrievably lost.

Most of the things I backed up with google remain largely accessible, except for an occasion like this.

It's rare that any services I operate solo come back this quickly after there is a downing issue.

cuspycode · on Dec 14, 2020

I have the opposite experience, at least with regard to your first two paragraphs. Most of the things that I have backed up on other people's computers over the past 3-4 decades are irretrievably lost. But most of the things that I have taken care to make backups of on personal equipment over the years, are still with me.

Cloud storage is still useful of course, but I prefer to view it as a cache rather than as a dependable backup.

smarx007 · on Dec 14, 2020

Of course it's viable as a backup. Availability != realibility. My data is still reliably saved in the cloud even if there is an outage for a few hours. The key point is backup, e.g. Dropbox. When you use Google Docs, it becomes a single source of truth and a SPOF.

techphys_91 · on Dec 14, 2020

This depends on the circumstances. If your personal photos are unaccessible then maybe it doesn't matter, but if it's your documentation for a mission critical bit of infrastructure then a few hours could be very significant. Somebody in that situation probably wouldn't agree with your assertion that "availability != reliability". If I can't access it when I need it then I wouldn't consider it reliable.

V-2 · on Dec 14, 2020

Whatever data I have backed up in the cloud is synced across multiple devices that I use. Even if the cloud disappeared altogether, I still have it. The cloud allows me to keep an up to date copy across various devices.

krono · on Dec 14, 2020

Both Google Drive/Photos and OneDrive have an option to only keep recently used files on your local device, and even periodically suggest they automatically remove local copies of unused files to "free up space".

I highly suggest everyone disable this setting on their own, but also on their (perhaps less technical) friends' and relatives' devices. Otherwise, if anything happens to your account or - less likely - the storage provider or their hardware, your data could very well be gone forever. I can't believe anyone would want that.

kaszanka · on Dec 14, 2020

You don't need 'the cloud' to do that. Look into Syncthing. It does depend on an outside "discovery server" by default to enable syncing outside of your LAN, but you can run your own.

https://syncthing.net/

elcomet · on Dec 14, 2020

What's annoying is that synchronisation does't work for google slides or google docs. They are just synchronized as links to the webpage on my computer.

pedro2 · on Dec 14, 2020

If you use Insync you have the option of converting to DOCX or ODT. Insync has other issues though, my "sourcetreeconfig" is being downloaded as "sourcetreeconfig.xml".

V-2 · on Dec 14, 2020

Not 100% on that, but I think you can save these documents on Google Drive, and then they're treated (and synced) just like any other files.

coldtea · on Dec 14, 2020

>So, anybody still feel like arguing that 'the cloud' is a viable back-up? Or is that a sore point right now? Just for a moment imagine: what if it never comes back again?

Much less chance of that happening than my local backups getting borked...

jacquesm · on Dec 14, 2020

But much higher than the chances of both of them getting borked.

matkoniecz · on Dec 14, 2020

Of course cloud is a viable back up, similarly to physical drives.

Both have vastly different failure modes and typical backup should use both of them.

This way if all my backups are gone I likely have way more important issues that loss of files.

(and yes, my backups are encrypted)

griffiths · on Dec 14, 2020

Just a few moments ago, I wrote in my company's group chat this is the time when we buy NAS. We have a lot of documents not accessible right now in Google Drive.

jacquesm · on Dec 14, 2020

Clever.

oAlbe · on Dec 14, 2020

What worries me the most is email. I basically don't use any other Google services other than Gmail and YouTube, but for email I really don't know of an alternative.

Sure you can argue "move to Fastmail/Protonmail/Hey/whatever", but those can also go down on you just like Google is down now. And self hosting email is apparently not a thing due to complexity and having to forever fight with being marked as spam (ndr.: not my personal experience, I never tried self hosting, just relying what I read here on HN when the topic comes up).

So, yeah, what do we do about email? I feel like we should have a solution to this by now, but somehow we don't.

tomschwiha · on Dec 14, 2020

I've been happy with using Hosted Exchange on Microsoft. I own the domain so ultimately I can point the DNS to some other provider. Outlook stores the mails locally so I have a backup. I think the most important thing about E-Mail is to receive future E-Mails and not look at historical ones. In the end you can always ask the receipent to send you a copy of the email conversation - if you dont own the domain it get much harder to convince you actually own the email.

fortran77 · on Dec 14, 2020

For < $100/year, Microsoft will sell you hosted Exchange (and you can use it with up to 900 domains [0]), 1 TB of storage, 2 installable copies of Office, and Office 365 on-line.

That's _much_ better than trying to host my own email server.

[0] https://docs.microsoft.com/en-us/office365/servicedescriptio...

beshrkayali · on Dec 14, 2020

The point is that you shouldn't put all your eggs in one basket. All services go down. If you're worried about someone else handling it when it goes down then host your own [1], otherwise you can use something different for each thing you need. Don't rely on Google for everything.

1- https://github.com/awesome-selfhosted/awesome-selfhosted#ema...

oAlbe · on Dec 14, 2020

Yes, I know what's the point. But how do you avoid putting all your eggs in one basket? You can't host your email on more than one "provider" (including self host), and the vast majority of important services that you link your email to (bank, digital identity, taxes and other government services) does not allow you to have more than one linked to it; which means, that one goes down, you don't have one. Sure, I can give my accountant and my lawyer a second email address, hosted on a different provider, but that poses two problems: 1. how are they gonna know when one is working and one isn't? It's not like you get a notification if your email didn't reach most of the time, it just drops; 2. if you always send all emails to both addresses, now two providers have my data instead of one (of course excluding if one is self hosted). And you also need to always keep in mind that for all things important: one is none, two is one; so you should really have 3 addresses on 3 different providers according to that, which brings us back to the problems above. (and I'm not even mentioning the confusion that it would generate if you don't manage to get the same name with every provider "Wait, was it beshrkayali@gmail.com, or was it alibeshrkay@gmail.com? Or was that fastmail?")

As I said (literally in the second sentence), I don't rely on Google for everything, as you mention. I don't actually rely on Google for anything other than gmail, and of that I am also unhappy. The point I was trying to make is that there aren't really alternatives, and I was hoping someone might come out with a suggestion about how to overcome that problem.

beshrkayali · on Dec 16, 2020

You shouldn't use you@company.com as your main email, you should have your own domain. So `something@yourdomain.com` will always be yours no matter if you self host or use 3rd party. I currently use Fastmail and i've been very happy with them. If they fail or turn evil, I'll switch to something else maintaining the same address. Emails themselves should be downloaded/backed-up regularly, kind of like important documents you'd have on your disk.

igetspam · on Dec 14, 2020

> You can't host your email on more than one "provider"

You can do split delivery and have your email be delivered to two different destinations. It's less common than it used to be but it's trivial.

NorwegianDude · on Dec 14, 2020

I'm running my own mail server, and I think anyone who has some experience with Linux should be able to do the same in a day or two. Once it's set up it just works.

You can still use Gmail and fall back to connecting directly to your server if Gmail is down.

Some mails might be flagged as spam if the IP/domain has no reputation, but that quickly passes, at least that's my experience.

treesknees · on Dec 14, 2020

I specifically use Gsuite so that I don't have to deal with managing a spam filter or dealing with IP reputation issues. I'd be willing to self-host almost anything else.

robertlagrant · on Dec 14, 2020

I guess something highly resilient would be - say - a mailserver on a rented VM replicating to two cloud providers via a service mesh.

Nice and simple! :D

swiley · on Dec 14, 2020

A lot of domain registrars will host/relay mail for you if you don't want to think about it. Otherwise it's not too hard to host yourself. The sucky part is when it breaks because you can't really just put off fixing it.

dankerr · on Dec 14, 2020

I've been using mailu.io to host my own email server. Makes it real easy to manage yourself.

I haven't had any issues with new domains being marked as spam, but I always make sure the SPF, DKIM and DMARC records are set up.

krallja · on Dec 14, 2020

I’ve heard multiple founders argue that it’s safe to have downtime because of a cloud outage, because you’re not likely to be the highest importance service that your customers use that also had downtime.

Cthulhu_ · on Dec 14, 2020

Well yeah; I don't trust myself enough to own & operate my own servers, and I cannot give myself any uptime guarantees - let alone at the scale that a cloud provider can offer me.

rakoo · on Dec 14, 2020

"The Cloud" is vague, and if you don't specify what it means then the answer to your question can only be "it depends".

If the question is "anybody still feel like arguing that 'a single provider' is a viable back-up" then it's yes for most cases. A better strategy is of course to use multiple providers. The chances that it never comes back again is much lower.

xyst · on Dec 14, 2020

People would probably argue a “multi cloud” solution. Have your infrastructure be “cloud agnostic” and this sort of problem would be avoided.

There was actually a project called “spinnaker” that was supposed to solve this problem.

Whether the cost of paying 2 or more cloud providers is worth it for most companies is up in the air.

papito · on Dec 14, 2020

"Multi-cloud" only works if you stick with the basics. Like disk storage, compute, and a well-supported database. Once you tie in into a cloud's specific offerings.....

__s · on Dec 14, 2020

It's getting convoluted now that cloud providers seem to realize there's demand for this

https://aws.amazon.com/hybrid https://azure.microsoft.com/en-us/services/azure-arc https://cloud.google.com/anthos

Full disclosure: I work for Azure. Don't work on Arc tho. Don't have experience being a customer for these products

robertlagrant · on Dec 14, 2020

I find the Anthos docs and talks so confusing. Half of them say Anthos is for hybrid between on-prem and GCP. The other half say it's for multicloud and hybrid.

SAI_Peregrinus · on Dec 14, 2020

Well, since a viable backup strategy requires at least 3 storage locations (eg the in-use primary, an on-site or off-site backup, and a secondary off-site backup) "the cloud" is fine as an off-site backup or secondary off-site backup.

ignoramous · on Dec 14, 2020

Let's hope tailscale swoops in and creates a no-gimmicks, highly usable, private internet for everyone.

They seemed to have figured out the hard parts already.

kerneis · on Dec 15, 2020

You mean like using Google GSuite for SSO? In the context of this authentication-related outage, it's a funny suggestion.

https://tailscale.com/kb/1013/sso-providers

robertlagrant · on Dec 14, 2020

> Of course it will, - at least, it better - but what if it doesn't?

Same question for non-cloud.

dageshi · on Dec 14, 2020

Back to normal. I can live without email for an hour...