Pods loosing connection to each other is still very very common in azure.
To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.
What’s frustrating is that it happens rarely enough to give the illusion of being stable, making it easy for PM to postpone hardening work, yet often enough to put a noticeable dent in your uptime if you don’t address it. Perfect degree of gaslighting.
> To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.
Keep in mind that in Azure it's a must-have, whereas everywhere else it's either a nice-to-have or a sign your system is broken.
Having worked on the infra side of azure, I'm not surprised. Network is centrally managed and that team was a nightmare to deal with. Their ticket queue was so bad they only worked on sev 1 and the occasional 0. Nothing else got touched without talking to a VP and even then it often didn't change things.
To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.
What’s frustrating is that it happens rarely enough to give the illusion of being stable, making it easy for PM to postpone hardening work, yet often enough to put a noticeable dent in your uptime if you don’t address it. Perfect degree of gaslighting.