Hacker News new | past | comments | ask | show | jobs | submit login
More On Gmail’s Delivery Delays (googleenterprise.blogspot.com)
64 points by daigoba66 on Sept 24, 2013 | hide | past | favorite | 32 comments



A ~2.5 hour delay that effected ~1% of messages -- I've never had a mail provider so perfect that a delay of a couple hours didn't very occasionally happen.

As far as their telling us, the failure didn't cascade causing their queues to fill up and other horrible things to happen. As far as we know, no mail got bounced, no mail got dropped on the floor and lost permanently, etc.

A 2.6 second delay isn't even worth mentioning, how could one even tell the difference between a 2.6 second delay due to a problem on google's side, or routine problems in the email network outside of google that could easily lead to a second delay or whatever.

Did people really get upset about this? Was it worse than it seems because of details they are ommitting? Are they preemptively advertising how well they dealt with this barely-a-problem-noticeable-to-users problem, because the message is really "see, look how damn stable gmail is, our worst problem ever is barely noticeable"?

Or, what?

Maybe those 1.5% of messages were all to the same users, so some people had no delay, and some people had ALL their mail delayed a couple hours? That would make it more noticeable.

But in general, occasional (like once a year or less) couple-hour mail queue delays should probably be expected from any mail provider, no?


We use google apps for business and a fair number of our account were "heavily" affected, with mail taking upward of half an hour to deliver.

Which is still really fast, especially when you know how mail work and remember the "old days". I guess people got used to email being an instantaneous form of communication, which says a lot about how great it is working most of the time, given that instantaneous is not in email's job description.


> Which is still really fast, especially when you know how mail work and remember the "old days".

Mmm, dialing in to QuantumLink at 300 baud... ahh, the good^Wold days.


My business sends a lot of emails. Yesterday morning we were scheduled to send an email to ~8,000 of our subscribers. I sent an test email to a test list, and it did not go through. This test email went to 4 different Gmail address. None of them received it. Around 40 minutes later, one of the Gmail accounts finally received the message. At this point I sent another email to the test list. Still nothing. I tried emailing from a Yahoo account, to Gmail, and it went through. However, for over 3 hours I couldn't send an email from my server to Gmail. Emails from other servers were able to go through.

The problem does not appear to be with individual user accounts, but from Google's ability to receive emails from certain parts of the internet universe.

The result was lost time and lost money on my end. I imagine this had very little impact on the experience of individual users -- but for businesses that depend on getting emails to their customers, this had a much bigger impact.

My experience was this:

1. This problem was not a 2.6 second delay. It was a multi-hour delay.

2. The problem was not with 1.5% of Gmail users, but with Google's ability to receive messages from other parts of the internet. (A major pipe was down, according to the OP.)

3. No Gmail user had all their incoming mail delayed for a couple hours -- Certain senders to Gmail had all of their mail delayed for several hours.


> This problem was not a 2.6 second delay. It was a multi-hour delay.

Technically this is not a problem. SMTP does not guarantee instantaneous delivery, just that if a SMTP server accepts the message it will be delivered.

In reality, this 'delay' could be perfectly normal and is just fine according to spec.


> But in general, occasional (like once a year or less) couple-hour mail queue delays should probably be expected from any mail provider, no?

I've hosted a dozen or so mailboxes at Rackspace Mail (formerly Mailtrust) for several years now. I have experienced zero delayed delivery or service failures of any kind in all those years.

For $2/mo/mailbox ($10/mo minimum total), you get a 100% uptime SLA, 25GB per mailbox, IMAP-push/POP3, good spam filtering and 24/7/365 support. That's 100% uptime or you get paid, and a real person will pick up the phone if you have a problem at 3AM on Christmas, for $2.


Getting an email server to work in 2013 is relatively easy. Fixing it quickly without losing anything when a problem arises and things pile up is not.

Having your worst damage in case of critical failure being only 2.5 hours of delay, on a very large scale architecture ? That's genuinely great work. 2.5 hours is not even a failure in email terms.


The SLA isn't really great (but that's not exclusive to RS). For RS managed hosting SLA 30 minutes of network downtime means 5% of monthly fee back.

So if this counted as downtime (and that's unlikely) and your email was down for 3 hours during your business hours, and if the mail SLA is the same as managed hosting's network SLA you'd get back 60 cents per user.

Anyway, Google Apps also has a SLA which would credit 10% of the monthly for less than 99.9% -- but again, that's uptime so "some users not getting email on time" is not going to count.


We use email communication for our help ticket systems and all our users were affected (~20k). At

9am it was ~10min delay.

10am about 21min

noon about 1h

3pm emails with ~3h delay arrived.

Then it got better. We werent' able to help our users and they of course blamed that we responded slowly. Actually a bounce message would've been preferable since then they could've used a phone or stopped by.

You don't really expect your company @example.org to take 3h. Either bounce withing ~3-4min or you expect it to be delivered by that time.


We use email communication for our help ticket systems and all our users were affected (~20k).

After this event are you considering a different architecture?


We noticed similar delays at our office (a lot fewer people though). Somehow all the test email we sent seem to have ended in the 1.5%. Odd.


This. Absolutely this. Considering hosting your own service involves paying a competent system administrator at least $75K/year in addition to buying hardware and network and power, $5/user/month is more than reasonable for such an extremely high level of service. If some mission critical email was delayed for a little over an hour, pick up the phone. This is the very definition of a mountain out of a molehill.


While "1.5% of messages" doesn't sound like that much, it probably added up to millions of e-mails over that ~2.5 hour period.

(Also: it's probably not feasible to do for everyone at your companies, but offlineimap[0] is a great tool for ensuring that you always have a local copy of your mailbox.)

[0]: http://offlineimap.org/


>Maybe those 1.5% of messages were all to the same users, so some people had no delay, and some people had ALL their mail delayed a couple hours?

Probably this. Gmail was just about unusable for me yesterday.

Not that I'm complaining, for exactly your reasons.


Definitely this. I manage a 20,000+ GApps domain and a lot of folks weren't affected at all, but some people (myself included) only received a trickle of messages all day, with a flood coming in late afternoon.


I'm seeing quite a few comments across the web from people saying they were in the 1.5%, which makes me wonder how accurate that number is. Then again, Gmail has a LOT of users, so that could be a very accurate number.

We use Google Apps for Business and were experiencing delays of several hours, and I did receive some bounces when sending to others within my organization. Strangely, most of this happened when using Outlook with Google Apps Sync - not when using the web interface.

Still, we've been very happy with the service - things like this happen every once in a while. We made do until it was straightened out.


How many of the other 98.5% weren't even aware that anything happened and, thus, haven't complained about it? My mother probably doesn't even have a clue that there was an issue.


No one was in the 1.5%. It was 1.5% of messages. For example, maybe 1/4 of my incoming emails had delays. If that was true of everyone who had delays, now you're looking at 6% of people affected.


Hi,

I guess the people who got upset are those whose email got delayed hours (that tiny percentage of users which amounts to some dozens).

"A 2.6sec delay" is just the median (as per the OP), that leads nowhere if you do not know the mean or other statistics. Which was the maximum delay?

Just trying to clarify that the fact that you are not upset does not mean the 50% people above 2.6 secs should not.


In a 300,000 user mail system I've seen mail queues grow by many thousands of messages a minute causing hundreds of gigabytes of email to be queued on the disks.

Once the issue is resolved and the queue starts flowing again it can take a very long time for these messages to be sent successfully, especially when you need to scan each one for viruses and account for disk latency that doesn't usually occur on these servers.


I have a hard time believing these statistics. Unless of course the 1.5% happened to be every single person who happened to have a Twitter, HN, Facebook, or Reddit account and was willing to post about it.


The missing statistic is what % of users are active.

If 97% of users are inactive, and this affected 1.5% of users, then 50% of active users were affected.


Note that they didn't say 1.5% of users, they said 1.5% of messages were affected.


I'm totally aware of how they said what they said, I just don't necessarily believe it.


As a counterpoint, nobody I know experienced any problems with gmail yesterday. Including coworkers.


The date on this appears wrong - I think they mean yesterday, Sep 23. I must be one of the 1.5% affected - my gmail messages arrived 5 hours late.


That's not what it says. It says that 1.5% of messages arrived at least 2 hours late. If you got 40 messages yesterday then you probably got at least one that was 2 hours late.


Ah. Better drink more coffee I guess...


Based on the percentage of people I know who use GMail and/or Google Apps, I really feel like they're low-balling the effected population. Just my thoughts.


1.5% of messages may include 30% of users. Each affected user may have had some messages delivered quickly and others delayed. If 30% of users had 5% of their messages delayed, then about 1.5% of total messages would be delayed, not considering the difference in volume between users.


Same here, gmail addresses of our company were affected


The problem for us wasn't so much the email delay, it was that whenever a user would send a message from their iPhone, they'd get an error message "Server rejected mail." If gmail had accepted the message but delayed it on the server side, we probably wouldn't have noticed an issue at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: