Ask your friend to build a post about what happened. I can imagine the heat in the online conference, one thing is to get all nodes from one service down; another thing is to get all the servers down at the same time.
Clearly they're using the same infrastructure for everything which IMHO is a huge mistake
Clearly they're using the same infrastructure for everything which IMHO is a huge mistake.
Let's not forget that most Google services have exceptional reliability and things like today's outage are incredibly rare. I'd bet that failures due to interoperability between disparate services would cause a lot more downtime than Google suffers right now if teams ran different infrastructures.
If they had multiple instances of infrastructure, then yes the outage wouldn't be prevented but at least it would be minimized as only a few services would be down instead of the whole thing.
It's easy to prove it, when Google was down, except for those who depend directly on it, how many other services from the internet went down? None. If Google was truly distributed internally like the internet is, then they wouldn't have this problem. This is clearly a symptom of a crap thought out infrastructure. I know folks who work there, and if you think Google Services architecture is great then you better think twice. There are many smaller businesses with much better availability from that of Google.
Be aware that Search wasn't down, so it clearly doesn't share something in common between all the other services.