As the owner of the linked GitHub repo (also rendered on https://k8s.af --- thanks to Joe Beda), I highly encourage everyone to contribute their failure stories (I'm still looking for the first production service mesh failure story..).
Also be aware of availability bias: Kubernetes enables us to collect failure stories in a (more or less) consistent way, this was previously not easily possible (think about on-premise failures, other fragmented orchestration frameworks, etc) --- I'm pretty sure there are much more failure stories in total about other things (like enterprise software), but we will never hear about them as they are buried inside orgs..
Also be aware of availability bias: Kubernetes enables us to collect failure stories in a (more or less) consistent way, this was previously not easily possible (think about on-premise failures, other fragmented orchestration frameworks, etc) --- I'm pretty sure there are much more failure stories in total about other things (like enterprise software), but we will never hear about them as they are buried inside orgs..
BTW, I also have a small post on why I think Kubernetes is more than just a "complex scheduler": https://srcco.de/posts/why-kubernetes.html