So why spend so much time trying to shift blame to the vendor? They could've just started the article with something like:
> Due to circumstances beyond our control the DC lost all power. We are still working with our vendors to investigate the cause. While such a failure should not have been possible, our systems are supposed to tolerate a complete loss of a DC.
Because a small handful of decisions probably led to the Clickhouse and Kafka services still being non-redundant at the datacenter level, which added up to one mistake. But a small handful of mistakes were made by the vendor. Calling out each one of them was bound to take up more page space.
The ordering that they list the mistakes would be a fair point to make though, in my opinion. They hinted at a mistake they made in their summary, but don't actually tell us point blank what it was until they tell us all the mistakes that their vendor made. I'd argue that was either done to make us feel some empathy for Cloudflare as being victims of the vendor's mistakes, misleading us somewhat. Or it was done that way because it was genuinely embarrassing for the author to write and subconsciously they want us to feel some empathy for them anyway. Or some combination of the two. Either way, I'll grant that I would have preferred to hear what went wrong internally before hearing what went wrong externally.
> Due to circumstances beyond our control the DC lost all power. We are still working with our vendors to investigate the cause. While such a failure should not have been possible, our systems are supposed to tolerate a complete loss of a DC.