Though the article is correct, with everything else that was going on response codes and cache headers were the least of my worries.
I think the best takeaway is that you will go down at some point, so it's best to have a reasoned plan in place for when you do. Handling it in the heat of the moment means you'll miss things.
I didn't post to try and make HN look bad, using the wrong http status code isn't a huge deal or anything. I just wanted to take the opportunity to discuss http response codes, an issue near to my heart. (In my day job at an academic library, the fact that most of the vendors we deal with deliver error pages with 200s does interfere with things we'd like to do better).
Thanks for the reply!
edit: Ack, I just realized that item_id got reset back to 7015126 on the reboot. My data matches HN up to 7015125, and then diverges after that.
After a semi-random sampling, the comments file appears to contain nothing but comments pre-crash.
The submissions however got clobbered a little by the crawler at some point. There are some submissions in there pre-crash and some post-crash; I think everything's OK from 7015172 on, which only leaves 15 possibly damaged rows, and of those, I'd expect most of them didn't have id collisions. Sorting out the old stuff from the new stuff could be manually done.
(Please let me know if there's anything I should be concerned about in those, or if they shouldn't be posted for some reason, or something. I'm recovering from flu and am still not entirely all here.)