Where have you been suffering from this?
I don't want to have to restart the whole thing on each site every time it happens. I'd like a deployment/orchestration system that can work in such scenarios, showing a node as unreachable but then back online when it gets network back.
Isn't that exactly what happens with K8s worker nodes? They will show as "not ready" but will be back once connectivity is restored.
EDIT: Just saw that the intention is to have some nodes in a DC and some nodes in the edge and the intention is to have a single K8s cluster spanning both locations with unreliable network in between. No idea how badly the cluster would react to this.