
Gmail's Second Major Problem - kapilkale
http://www.kapilkale.com/blog/gmails-second-major-problem/
======
hurstdog
The issue that you're complaining about was due to the incident linked at [1].

I can't go into the technical details as to why this happened, but I can
roughly explain that it was due to the CAP theorem, essentially "Consistency,
Availabilty, Partition Tolerance. Choose two." [2]

Furthermore, you have to choose partition tolerance [3]. The delivery delays
that were seen yesterday is because we choose consistency over availability in
our systems.

In fact, most of the outages I see people complain about on Hacker News
related to Gmail are because we won't sacrifice consistency of user accounts.
It's a different problem than huge scale serving of web search indexes or
facebook timelines because in those cases if you're missing a few entries most
people won't notice or care. When you're searching for an email, you know what
email you expect to find and you'll get angry if it isn't there.

Users won't stand for an email showing up one day, disappearing next hour, and
then coming back later (which is what could happen in some designs for
eventual consistency when serving from different datacenters).

Thus, Gmail availability is lower sometimes because we make sure that all of
your data is there all the time. We're insane about it, and we have huge jobs
that run constantly on our systems to ensure that we're even resilient to bad
hardware. With those we regularly find single bit errors and bad CPUs.

So, as a Gmail engineer, I'm sorry that there were delivery delays yesterday,
and all I can say is that every time these happen we tweak and redesign our
systems to make these more rare and to improve Gmail's uptime. We'll never
have the snappy response and perfect uptime[4] of a computer under your desk.
But at the same time a hurricane could take our one of our datacenters and we
won't lose your data.

-Andrew, a Gmail Engineer.

1\.
[http://www.google.com/appsstatus#hl=en&v=issue&ts=13...](http://www.google.com/appsstatus#hl=en&v=issue&ts=1359100799000&iid=8a775c169a6d52d33eea2ba9c2919a6e)
2\. <http://en.wikipedia.org/wiki/CAP_theorem> 3\. <http://codahale.com/you-
cant-sacrifice-partition-tolerance/> 4\. For some definitions of perfect.

~~~
kapilkale
Thanks Andrew- this is really helpful.

------
neya
The author is furious about a technology that was not designed for
instantaneous delivery (SMTP) and fumes about it, because it isn't
instantaneous. What's worse is, he quickly assumes goes into some kind of
'super-hero mode' and makes a pretty heavy claim that this is Gmail's _second
biggest problem_.

I think this is the best approach - If someone hands over you a free glass of
wine, which they've tried their level best to make it perfect, you just drink
it instead of trying to suddenly become a food critic and blame the person who
gave you the beer. If you don't like it, don't drink it. Buy your own beer
from somewhere else. As simple as that!

Tell you what, you should try Yahoo!, I bet.

~~~
brown9-2
Isn't the 115 minute lag in the screenshot lag between one Google SMTP server
and another Google SMTP server? That seems like a problem Google should be
able to address.

~~~
blablabla123
> That seems like a problem Google should be able to address.

This is part of the problem.

SMTP and the whole E-Mail thing is inherently unreliable and slow. Speaking
for myself, but suspecting this is true for most heavy Gmail users, Gmail
feels so good because it seems more reliable than using any other Web mail
service. It is even more reliable when both ends use Gmail -- attachments are
almost a non-issue within Gmail. Before Gmail mails could take ages to be
delivered, get silently lost in spam filters/folders or did not arrive at all.

When we ask Google to be more reliable in that part, we ask them to not do
E-Mail but proprietary Magic-Gmail-Messaging. This is part of the problem, not
the solution, this is why Facebook is doing E-Mail now. (Ironically FB
Messaging seems in some way even less reliable than the Gmail Messaging.)

BTW, what's up with this X.400 thing?

------
psynix
Your problem is that you're relying on a method of communication that was
never guaranteed to be instantaneous. Maybe you should investigate other forms
of communication within your team besides (or in conjunction) with email.

~~~
derfniw
100% agreed.

I try to treat email as any other letter I recieve(1). Meaning that the sender
can only expect an instant reply if I know beforehand the mail is coming and
urgent. The same way one would rapidly reply to an urgent letter one is
expecting, but easily take a day or more to reply to, or even open, a non
urgent letter.

If you need to get a hold of someone directly you should use your phone (to
call them). If they don't pick up their phone, leave a voicemail or send an
sms.

(1) Registration confirmations / new password emails are an obvious exception
here.

~~~
snogglethorpe
...and of course even voicemail / SMS are not guaranteed to be seen anytime
soon!

[If at all; I know plenty of people who simply never check their voicemail...]

------
teilo
I pay over $800 a month for Gmail (ok, for apps for business for many users),
and we STILL have this issue, and while we can indeed call Ireland to get
support, the answer is always, "We are having a delivery issue. Our engineers
are working on it."

So paying doesn't help.

~~~
ceejayoz
> So paying doesn't help.

Well, you can now say "Google knows about the issue" to your clients/bosses
rather than "fuck if I know".

------
buster
Whaaat, SMTP != Instant Delivery!? You must be joking!

As much as i can understand the pain of an email not being delivered after 2
hours this seems to be too much drama. "happens 2 times in a month" is not a
"major problem" for a free service especially since i suppose this user is
part of a rather small minority. I never noticed substantial delivery lags
myself over the past years myself.

Anyway, if you say "i'd pay 50$/month" please email me, i'll be happy to
provide you with a very overprized mail account with same-second delivery! :)

~~~
sskates
As funny as you think it might be to charge $50 a month for guaranteed quick
delivery, this is exactly what he actually wants. Large random delays in
delivery are costing him way more, so he's willing to pay to fix the pain.
Instant delivery isn't in the RFC, but conformance with the spec is not
something he cares about. He cares about fast delivery.

~~~
buster
Just as you say: What he wants and what he has have nothing to do with each
other. I want my pocket to contain 1 million dollar. It's not how it works,
unfortunately :(

Look, there are many many reasons why an email may not arrive in time or at
all at the destination. To say "i lose so much money but i rely on such an
unreliable protocol" is just not the mistake of google.

And as i said, for 50$ dollar you can easily buy 2 virtual machines in two
different locations, one domain and have virtually a no-downtime mail service
for your own. Add some roundcube, clamav, spamassassin and there you go. Setup
takes a bit, fine, but that's about it. I guess it wouldn't even take that
long. It won't guarantee delivery as well, but mails won't be stuck in some
MTA ;)

~~~
mseebach
That's assuming your time is worth nothing. Also, there's much more to
reliability than having servers in more than one location.

------
DangerousPie
Looks like Greylisting to me: <https://en.wikipedia.org/wiki/Greylisting>

~~~
boogah
Yep. I used to work at DreamHost and our customers would run into issues with
Gmail greylisting incoming mail. In fact, they do it like crazy.

In a lot of cases, adding an SPF record for the sending domain would clear
things up.

~~~
UnoriginalGuy
It is disturbing how many DNS/domain hosting companies still have zero support
for SPF.

It is A, MX, CNAME, and MAYBE a couple of other common record types. But a TXT
record? No chance...

I had to move to Route53 just for this functionality.

------
alxndr
I had seen massive delays in delivery to Gmail when sending newsletters at my
last $dayjob, and I had always assumed it was part of their spam filtering.
For example, if it were me and I was getting thousands of nearly-identical
emails sent to Gmail addresses, it seems like a good idea to let 1% of them
through to see if end users mark them as spam or not. If they all get marked
as spam, I don't need to let the rest through.

~~~
icelancer
Likewise. I used a script to recurse over a long list of names, and putting in
a small sleep() delay in-between emails ended up fixing the problem
immediately.

------
hnolable
Are you hitting any of their receiving limits?
[http://support.google.com/a/bin/answer.py?hl=en&answer=1...](http://support.google.com/a/bin/answer.py?hl=en&answer=1366776)

Because this is exactly what happens when you do.

~~~
kapilkale
Not even close.

------
plg
i wish some major computer company would release an easy-to-use plug and play
server product that would enable consumers to run their own mail servers on
cheap hardware, just by toggling a switch

just imagine... no ads, no privacy concerns, complete control.

if they were smart they would include some sort of automatic backup program
that runs in the background and saves everything every once in a while, a
backup that could be restored with a click of a button... like going back in
time ... like a .....

~~~
rz2k
How come Zimbra seems to be so widely disparaged?

Spam control requires a little more intervention than with Gmail, but I've
found it straightforward enough to set up once, then do nothing else for a
couple years.

~~~
nwh
Straightforward? I'd love to know your secret.

Compile this odd package with this patch. Now these three more. Oh they don't
compile? That's right, they haven't been maintained for 3 years. Etc, etc.

~~~
rz2k
I created a VPS with CentOS. I think I started with CentOS 5.6 and Zimbra 6.8
Community Version. I last updated it about two years ago, so that it is now
CentOS release 5.8 and Zimbra 7.1.1_GA_3196.RHEL5_64.

It intermittently was used by different groups of about 50 people who accessed
it using IMAP and the web portal without any problem.

Now all it is doing is continuing to collect email subscriptions from software
vendors I was trying out at the time, though some other people may still be
using it. I don't think anyone else even knew how to access the administration
console, and I haven't logged into the admin console in over a year.

Occasionally, log into the shell, because of a bug I never bothered to
address. It slowly collects a lot of temporary files. Speaking of which I
should do that now:

`Last login: Sat Sep 1 19:27:28 2012 from __`.

Then I ran:

    
    
        for i in {0..9}
        do
            find /tmp/jna${i}*.tmp -cmin +30 -exec rm {} \;
            echo $i "out of 9" `date`
        done
    

To be honest, that _is_ ugly, and I should upgrade. The script took 30 minutes
to run, and found 20GB that had accumulated in under 5 months. However, it
isn't critical for anyone, and < 3hrs/year is a nice level of admin effort.

To answer your question: I'm sure not every module is functioning perfectly,
and though it hasn't needed much maintenance over the past year, it hasn't
been under much load either.

------
codeka
Even though I agree with people here saying that you can't expect 100%
instantaneous delivery from SMTP, this incident was actually posted on the
Google Apps status dashboard (
[http://www.google.com/appsstatus#hl=en&v=issue&ts=13...](http://www.google.com/appsstatus#hl=en&v=issue&ts=1359118799000&iid=8a775c169a6d52d33eea2ba9c2919a6e)
).

So it's not even like this is some common problem that happens all the time
and Google is ignoring. No service is perfect and outages happen.

~~~
kapilkale
Ah- there IS a status page. Thank you.

------
sergiotapia
My main gripe against GMail is the visual noise and clutter. There are way too
many buttons and gradients and shadows now.

I remember in private beta, it's cleanlisness was heralded as the second
coming, but now they seem to be adding features 90% of users don't need.

I've switched to Outlook.com and haven't looked back. Back to cleanliness and
non-introsive buttons and popups.

------
kaolinite
I'd say the #1 problem is spam filtering, which for me has gotten worse and
worse. Recently I have had PayPal payment notifications going into spam! It's
a shame because Gmail used to have incredibly good spam filtering - good
enough that I didn't check it, as I knew I could trust it - however now I
don't trust it at all.

~~~
snogglethorpe
I'm not sure what you're doing (it certainly depends on what your legitimate
email stream looks like), but for me, gmail's spam filters are still
incredibly good, probably as close to perfect as I've ever seen.

It's impossible for such a thing to be 100% perfect, of course (even your own
eyes will deceive you occasionally!), but gmail's filters are good enough that
spam is not an issue for me any longer.

------
Karunamon
kapilkale, for the price you say you'd pay a month, you could get Google Apps
for Business and spend that much per year. And get live support and other
goodies.

Have you thought about this yet?

~~~
georgemcbay
Having used both gmail as an individual and gmail via paid Google Apps for
Business, I'm inclined to believe that neither has any advantage when it comes
to mail delivery speed, either in terms of declared SLA nor real-world
difference.

I've seen unexplained lags of the type mentioned here on both about equally.
In some cases I've had someone who was on the very same Google Apps for
business plan email me, and then send a forward of the original when I
reported I had not seen the first mail 10 minutes later, after which I
received the 2nd mail immediately and still didn't see the original mail until
like 30+ minutes after that.

I never bothered to follow up with Google's support on apps for business with
the lag issue because it didn't happen that often (at least not that I
noticed) and also I view Internet email as an inherently laggy system (though
in practice it is near instant most of the time), so maybe if you do complain
they'll do something, but I don't think just switching to the Google Apps for
Business plan is an immediate cure for this guy's problem.

------
jf22
$50 -> $1000 is one order of magnitude.

Not "orders".

Orders of magnitude hyperbole needs to stop.

Why not write "some would pay $1000 a month" when quoting the Graham article.
You didn't even show you have users willing to pay $5000 a month for gmail
which would in be the "orders of magnitude" you wrote.

------
dsr_
95% of non-spam email is delivered within 5 minutes of being sent. This number
is made up for the purposes of argument, but I think it's fairly accurate.
I've administered mail machines for many years.

There are all sorts of reasons why the other 5% doesn't zip along, and some of
those reasons are persistent, some are fixable, and some of them are
essentially never going to be tracked down. Does Gmail have an internal
problem? Maybe, maybe not, but there's not enough data here to find out.

If you want instant communication, use a direct connection under your control.
It's still not guaranteed but at least you'll see the progress or lack
thereof.

------
armored_mammal
I've had massive delivery delays every so often with gmail and they're a
little irritating, sure, but quite frankly email makes no guarantees about
delivery time. If you need guaranteed fast delivery maybe email isn't the
answer.

------
RyanZAG
Google's servers are incredibly complex. Probably too complex - the more
complicated they make their infrastructure (datacenter failover, region
failover, bla bla) the more unstable it seems to get.

Google uses a very bureaucratic code commit system that requires sign offs
from different people. This process takes a long time, and devs can't move
onto the next step until the previous step has been accepted [1]. While this
system is awesome for catching the localized bugs (no buffer overflow is going
to get past that kind of code review), there is a major tradeoff. A dev can
only keep so much state in mind when building architecture. If he is only
working on the problem once a week with large time gaps, is he not going to
lose track of important pieces of the puzzle?

This is probably the age old problem - if you make something that is too
clever for even the creator to fully understand, how are you possibly going to
make sure it is bug free? The problem being some delay between Google servers
hints at an inter-region datacenter problem. I wonder if anybody at Google
even understands the entire failover and interlinked data center system
completely?

[1] [http://www.splinter.com.au/2012/12/26/behind-enemy-lines-
goo...](http://www.splinter.com.au/2012/12/26/behind-enemy-lines-google/)

~~~
rachelbythebay
That link seems to be dead.

I wish I could share with you the pictures of "the big picture" in which every
piece of proprietary tech was given its own little circle on a whiteboard and
then was connected to everything else which it uses or which uses it.

To say it was huge was an understatement.

------
sakopov
Never had any latency issues or lag in delivery. I wonder if this is some sort
of regional issue.

------
kunle
I experience this same issue a couple of times a month as well. I happen to
have a few plugins that I use in GMAIL (Xobni/Tout/Base etc) so I assumed that
might be part of the problem? Are you running Gmail clean or do you have a
similar situation?

~~~
kapilkale
Running clean.

------
hayksaakian
Gmail seems pretty good to me at least. The worst trouble I've had is a few
seconds of delay to receive an email without a refresh. Usually my phone and
tablet get it first.

~~~
aeturnum
Delivery is usually instantaneous for me, but on occasion I won't get email
for 6+ hours. It's a lot easier for me to notice the delay when communicating
with co-workers, as the sender often wants to know why I haven't responded to
the mail he sent 4 hours ago.

------
sonabinu
try using Yahoo, Gmail will suddenly feel like a supersonic jet

------
afterburner
Anecdotally, I wonder if the reason I don't see this as much as I used to is
because the quick email discussions I have happen almost entirely between
Gmail users.

------
tomovo
The solution is cheap and simple: FAX. It starts coming out on the other end
before you're finished putting it in!

------
tbirdz
I've been using Zoho mail myself for a while, and it's worked out fairly well.

------
thorin_2
Something else to lament: their complete lack of support for webhooks.

------
wei2012
Try Yahoo or Hotmail(Outlook now), then you may get the answer.

------
jf22
"Take my money. I’d pay $50 / month to get reliable service; others would be
willing to pay orders of magnitude more."

Should be "order of magnitude more".

Getting really tired of tech writers using "orders of magnitude" hyperbole
when its not really the case.

\----

Also don't like him complaining on not receiving an "urgent" email in time.
Urgent communications require phone calls.

~~~
freehunter
Well, there are many reasons you could need an urgent email for something that
wouldn't be satisfied over the phone. Sending a contract or a statement of
work, for example. Sure it could be faxed, if both parties have fax. It could
be put on Dropbox, if both parties have Dropbox, etc.

If it's not just a communication but rather an exchange of data, a phone call
won't suffice. The author even mentioned that he _had_ called the person, who
resent the email 4 times. Obviously a phone call isn't what needed to happen.
There's just a lack of good file transfer solutions on the Internet.

