As much as it sucks when Github goes down, there's a sense in which I look forward to it happening. It's yet another chance for me to get a peek at their architecture. As a high school student it's pretty disappointing to me how hard it is to find fully laid out architectural plans for large websites. So I appreciate sites like Github and Ravelry being open with that information.
Except this has been going on for months, has happened a number of times, and occurs in violation of standards, like robots.txt, that were created to prevent it from happening. I'd expect it from an individual or startup, but not a large company that's been doing spidering for a long time.
If I were GitHub I think I would have mentioned the cause of outage number 3 after I put the rate limiting fixes in place, which is not the impression I got from the article. It is not hard to write a basic HTTP request spoofer and it sounds like they're still vulnerable.