
Google OAuth Login Issues - derwiki
https://status.cloud.google.com/incident/cloud-iam/19001
======
aaronharnly
Our incident management includes chatops automation that, among other things

\- generates a per-incident Google Doc

\- opens a per-incident Google Hangout

\- pages appropriate people via PagerDuty

During this incident (which affected our customers who use Google to log in),
people had difficulty accessing all three (our PagerDuty uses Google Oauth).

This was a good reminder to have alternatives for all crucial incident
management services. (We have backup plans in case Slack or our automation
scripts are down, but didn’t have backups for Google...)

~~~
shhehebehdh
Good advice. Within Google, they have backups that depend on no production
infrastructure. Something as simple as an IRC server that is always running on
prem can be invaluable.

------
ereyes01
For me the problem is Google-wide, got kicked off my gmail accounts, and I
can't sign in again- get an error telling me to clear cookies and cache (which
doesn't help).

Lots of people having the same issue:
[https://support.google.com/chrome/thread/4399961?hl=en](https://support.google.com/chrome/thread/4399961?hl=en)

~~~
jwilliams
In fact it makes it just slightly worse, as deleting cookies potentially logs
the user out of other services (some of which require Google).

~~~
raxxorrax
Google should host backup services in AWS. I like cognito very much, even if
it still lacks some features.

------
sethvargo
Hey everyone - Seth from Google here. We are aware of this issue and our team
is working on it.

~~~
chris_wot
Your dashboard that shows outages doesn't seem to show OAuth. Has caused us
some issues working out what's going on in our organization!

~~~
sethvargo
You can see our outages here:
[https://status.cloud.google.com/](https://status.cloud.google.com/). OAuth is
included under "Identity & Security":
[https://status.cloud.google.com/incident/cloud-
iam/19001](https://status.cloud.google.com/incident/cloud-iam/19001)

------
shereadsthenews
This kind of thing has the tendency to paralyze my organization. And these
outages happen with surprising frequency. Isn't this terribly disruptive to
lots of people?

~~~
toomuchtodo
Kinda dispels the myth that Google is better at uptime than your own org. Sure
they have SREs, sure they have global infrastructure, but they still have
outages like everyone else just the same.

~~~
dekhn
ex-SRE here. Of course Google has outages, but it would be crazy to think that
nearly any smaller outfit could manage the level of service $GOOG provides.

~~~
toomuchtodo
The gap between 3 9s and 5 9s isn't that wide. Gmail itself has already had
~4-5 hours of downtime for the year [1] [2]. A couple more hours and they'll
hit 3 9s (99.9%) for 2019.

A small org can do about the same, and they're not paying what Google pays for
infra and staff. Does Google have it harder at their scale? Of course, but
they also argue they're better than most at it, at scale.

[1]
[https://downdetector.com/status/gmail/archive](https://downdetector.com/status/gmail/archive)

[2]
[https://www.theguardian.com/technology/2019/mar/13/googles-g...](https://www.theguardian.com/technology/2019/mar/13/googles-
gmail-and-drive-suffer-global-outages) (Google's Gmail and Drive suffer global
outages; Users in Australia, the US, Europe and Asia report problems with
various applications for several hours)

~~~
dekhn
How could a smaller org run a million+QPS service with petabytes of storage
and have the same uptime as google. that doesn't even make sense.

~~~
kerng
Commenter is highlighting that a small org doesnt need that scale, so it's
easier, maybe also cheaper, to run yourself.

Also worth highlighting that for the "email user experience" pretty much any
system is better then Gmail these days.

~~~
dekhn
Personally (and I think many others do) consider "Availability" as being
uptime * bytes, not uptime. Any true distributed system is going to have
partial availability all the time (small ranges of data which cannot be
fetched within a reasonable latency limit, or somebody's ISP being down).
Google accepts tiny ranges of data unavailibility for limited times to not be
a real service outage, while the service being hard down for 25+% of users to
be a service outage.

It doesn't even really make sense to compare. And since we're dealing with
cloud services where the large players have billions of users increasingly
providing these services to everybody from small shops (I work for a startup
and we use gSuite) to enterprises (many large companies host all their email
on one of 2-3 major email providers).

Nothing I said above has anything to do with user experience as you describe
it- that's a non sequitur.

~~~
jjeaff
It absolutely makes sense to compare. Because the comparison is generally made
in the context of using Google services or hosting your own or going with a
much smaller provider. One could easily host their own email or other similar
service for a very small org and likely have better uptime.

------
zelon88
I couldn't even get to google[.]com for about 5 minutes yesterday.

