Hacker News new | past | comments | ask | show | jobs | submit | palcu's comments login

And the queries flow through.


My adjacent teams in London who work in SRE on Google Cloud (GCE) got some well deserved doughnuts today for rolling out the patches on time.


Yeah yeah, they've broken the old TweetDeck. You need to wait for the pop-up to ask you to transition to the new TweetDeck. Or, search on the internet for the Javascript variable you have to change in your console.

The more important question is that they've removed the Activity feed, where you could see likes from other people. Which was like a realtime feed to what your friends were doing on the website. The website is way more boring now.


[disclaimer: SRE @ Google, I was involved with the incident, obvious conflicts of interest]

Hey Dang, thanks for cleaning up the thread. One thing to note is that the title is not correct. The entire region is not currently down, as the regional impact was mitigated as of 06:39 PDT, per the support dashboard (though I think it was earlier). The impact is currently zonal (europe-west9-a), so having zone in the title as opposed to region would reflect reality closer.

Finally, there's lots of good feedback on this thread and on the previous one (https://news.ycombinator.com/item?id=35711349), so we obviously have a lot of lessons to learn.


Would you be able to comment a bit on the emotional (perhaps there’s a better word) aspect of the response?

Was there a lot of anxiety? Panic? Or was it just a “woof that sucks. Time to follow a checklist and then do a bunch of paper work” ?

What I’m curious about is what it feels like on a team at a company like Google when there is a major system failure.


There's not much emotion as the core team working on the huge outages is more like an "SRE for SRE". They are all people who've been with the company for a long time and they've been in the secondary seat for at least one previous big rodeo. Not to mention that we're all running a checklist that has been exercised multiple times and there's always somebody on the call who could help if a step fails.

Personally, I wasn't part this time for the actual mitigation of the overall Paris DC recovery, as I was busy with an unfortunate[0] side effect of the outage. These generate more anxiety, as being woken up at 6am and being told that nobody understands exactly why the system is acting this way is not great. But then again, we're trained for this situation and there are always at least several ways of fixing the issue.

Finally, it's worth repeating that incident management is just a part of the SRE job and after several years I've understood that it is not the most important one. The best SREs I know are not great when it comes to a huge incident. But, they're work has avoided the other 99 outages that could have appeared on the front page of Hacker News.

[0]: https://news.ycombinator.com/item?id=35734224


I appreciate your insight into this. Thanks!


Who trains the trainers?


Life and experience, if you're looking for a short answer. For example, last year we had an outage in London[0] and the folks who worked on it learnt a lot. Now, they applied the learnings in this incident.

[0]: https://news.ycombinator.com/item?id=32161755


I absolutely love the answer to any of the other obvious questions that one would have about the membership.

https://www.bitsaboutmoney.com/memberships


Not ironic, just nominative determinism.


But it's more fun to say irony when the topic is iron



Sure, but ironically sounds a whole lot better than nominative deterministically.


Hijacking the article, but has somebody managed to find a good iPhone/iPad keyboard for coding? I’m still able to do Python with the default keyboard, but I press a lot of times the symbol key.

The Wolfram Alpha custom keyboard on Android is absolutely the best keyboard for coding.


FWIW I travel with a Keychron K3


I tried getting into the Crowdcube investment round, but after they got the allocations, the whole deal fell through and I was refunded.

Needless to say I’m pretty happy they didn’t take my money in the end.


Rest of world has a really good article about people from emerging markets that chose Luna because their home currency is unstable.

https://restofworld.org/2022/argentina-nigeria-terra-crash/


That's just a fancy way of saying "dumb people got scammed"


Took it to the office today and I must say, this is a beautiful piece of engineering. Cheers to all the people that made it possible and here's for hundreds of years of continuous service.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: