We had ran another large-footprint container management system (not K8s, but also popular), and when its DNS component started to eat all the CPU on all nodes, best I was able to do fast,was just scrapping the whole thing and quickly replacing it with some quick-and-dirty Compose files and manual networking. At least, we were back to normal in an hour or so. Obvious steps (recreating nodes) failed, logs looked perfectly normal, quick strace/ltrace gave no insights, and trying to debug the problem in detail would've taken more time.
But that was only possible because all we ran was small 2.5-node system, not even a proper full HA or anything. And it had resembled Compose close enough.
Since then I'm really wary about using larger black boxes for critical parts. Just Linux kernel and Docker can bring enough headache, and K8s on top of this looks terrifying. Simplicity has value. GitHub can afford to deal with a lot of complexity, but a tiny startup probably can't.
Or am I just unnecessarily scaring myself?
It's a great system, but it's also relatively new, and most issues aren't well documented. You'll spend a lot of time in github issues or asking for help in the (very active, and often very helpful) community.
If you have a valid use case, I wouldn't steer you away from it, but your fears are well founded.