- HN errored on all authenticated requests with 502 Bad Gateway. It did still respond to a limited amount of unauthenticated requests with presumably cached pages, which did not get updated. The last post on /newest claimed "0 minutes ago", but was actually much older (1:32:57 PM GMT) and not the newest post.
- This status page actually identified the outage: https://hackernews.onlineornot.com/ - Pages by Hund and Statuspal did not show the outage.
- The last post before the outage was https://news.ycombinator.com/item?id=46301823 (1:39:59 PM GMT). The last comment was https://news.ycombinator.com/item?id=46301848 (1:41:54 PM GMT).
- There was an average of ~4 seconds per comment just prior to the outage. Based on this, HN likely went down at 1:41:58 PM GMT.
(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)
In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
I'll add more as we find out more, but it probably won't be till later this afternoon PST.
reply