On the whole, GitHub is doing an awesome job. While this may be embarrassing for the team responsible for the feature, I found the writeup to be honest and thorough. I've made plenty of mistakes in my career, and for me being honest about mistakes goes a lot further than the week or so of downtime.
> We did not sufficiently test the 0.20.2 release of elasticsearch on our infrastructure prior to rolling this upgrade out to our code search cluster, nor had we tested it on any other clusters beforehand.
This doesn't pass the smell test for me. Not that I know tons about ElasticSearch, but couldn't disk space have been consumed by failed replication attempts?
Generally it takes a catastrophic failure under load for you to discover that 'everyone' (everyone else) uses these!