As the updates to  say, we're working to resolve a networking issue. The Region isn't (and wasn't) "down", but obviously network latency spiking up for external connectivity is bad.
We are currently experiencing an issue with a subset of the fiber paths that supply the region. We're working on getting that restored. In the meantime, we've removed almost all Google.com traffic out of the Region to prefer GCP customers. That's why the latency increase is subsiding, as we're freeing up the fiber paths by shedding our traffic.
Edit: (since it came up) that also means that if you’re using GCLB and have other healthy Regions, it will rebalance to avoid this congestion/slowdown automatically. That seemed the better trade off given the reduced network capacity during this outage.
As one of my old bosses said: I don't care that the site/service is technically running, if the customers can't reach it, then IT'S DOWN.
My customers don't care that the network is down, the servers are down, or aliens have landed. The severity is the same and our infrastructure, regardless of the cause, was down.
During the impacted time period, we did a full DR failover to appengine instances we spun up in west2. This was not a minor hiccup.
But the people who have to fix it, desperately care about which specific part is down. That's just about the highest priority information they need. Honing in on where the problem is, is one of the few ways to get to fixing the problem. Having a boss shout that "everything is down, it's all broken" is the opposite of identifying the problem.
find the idea it was "a ridiculous time to nitpick" hilarious.
What? You lost critical business functionality for 5 hours, and you'd rather the boss was shouting at the workers because the wording used doesn't accurately reflect the boss's understanding, instead of the workers working on solving the problem?
"OK, we have databases up, load balancers responding, DNS records check out, last change/deployment was at this time, all these services are up, and the latest test suite is running all green, this narrows down the places where a failure might be with some useful differential diagnosis, now we can move attention to.."
"I DON'T CARE THAT YOU THINK THINGS ARE WORKING, IF THE CUSTOMER CANNOT GET TO IT, IT'S DOWN"
"Thanks for that helpful input, let's divert troubleshooting attention from this P1 incident, and have a discussion about what "DOWN" means. You want me to treat the working databases as down because the customer can't get to them? Even though they're working?
It's like the hatred for "works on my machine". "WELL I'M NOT RUNNING ON YOUR MACHINE". No you aren't, but this demonstrates the current build works, the commands you're using are coherent and sensible, excludes many possible causes of failure, and adds useful information to the situation.
For troubleshooting and internal use of course you want to describe the outage in precise terms (while being very sure you are not downplaying the impact).
For talking to customers, a sufficiently slow response is the same as no response, and nothing is more irritating than being told 'it's not really down' when they can't use the service.
In my case, Cloud PR knows me, but I also knowingly risk my job (I clearly believe I have good enough judgment in what I post). If Urs and Ben think I should be fired, I'm okay with that, as it would represent a significant enough difference in opinion, that I wouldn't want to continue working here anyway.
Finally, for what it's worth, I have been reported before for "leaking internal secrets" here on HN! It turned out to be a totally hilarious discussion with the person tasked with questioning me. Still not fired, gotta try harder :).
Whenever I talk about the inner workings of Google I try to reference to external talks, books, or white papers to go along with my comments. Luckily a lot has already been set externally about how Google works.
If you haven't read them, you have to!
I would love to understand the though process of someone going out of their way to remove someone’s livelihood from them because of a comment on HN (when applied in a normal circumstance of adding additional information or correcting a misconception — I’m clearly not saying that bonehead comments shouldn’t have consequences.)
Maybe the person making the report said "Hey, I found some internal details on this external site. I'm not sure if this is allowed. Maybe someone who knows more should take a look at it, here's the link to the page."
Submitting a complaint to an internal review because “you’re not sure it’s allowed” is really petty.
In my opinion, and experience, folks who have good intentions usually pull you to the side to get a feel for a situation before filing a formal complaint.
This is not so difficult though. You just need to adjust your starting point to someone who doesn't like boulos' first. That's not so difficult IMO, it's a large org and boulos' seems to be a fairly prolific commenter here.
He certainly shares stuff I wouldn't be comfortable sharing, but then again he's a lot better connected and in the know than I am.
On the other hand, to anonymously submit a complaint feels, to me, like a personal attack. Someone who simply doesn’t like them in for whatever reason. To me, that action seem petty.
One of the things I really like about working at Google is that they place a lot of trust in the judgement of the individual employees. I generally make it clear when I'm stating my personal opinion versus the "official" (for whatever that means given how informal the project is) one, but I don't have to carefully go through an approved list of talking points, run my HN by the legal department, etc.
Obviously, in certain situations, things get more official and formal. For example, when I went to Google IO to give a talk, we did have some documentation and coaching beforehand about how to handle various questions we might get about non-public stuff, other projects related to ours, etc. We are also expected to run any slides by legal before being publicly shown in a venue with a wide audience like IO. But, even then, the legal folks I've worked with have been a pleasure to talk to.
The company's culture is basically "We hired you because you're smart. We trust you to use your brain." It would be squandering resources to not let their employees use their own intelligence and judgement.
I work at another FANG with a roughly equal engineering community and I don’t see my kind commenting as much at all!
It's probably okay to say that we know the problem and here are the steps we're taking to mitigate it. It would not be okay to say something with large scale stock price implications for Google it another publicly traded corporation. For instance a Google employee shouldn't say something like "faulty solar panels fried Google's 10 largest data centers and twelve others have been lost to rebel drone strikes", even if false, since it could have a drastic impact on the earnings and future value of Google, Google's customers, and Google's competitors.
Even less obvious things like Google's plans for adding privacy features to the Chromium open source project can have a serious impact (see https://www.barrons.com/articles/google-chrome-privacy-quest...).