Hacker News new | past | comments | ask | show | jobs | submit login

What has been changed to improve k8s for applications with poor connectivity between workers on the edge and the control plane in the cloud?



Other than bugs around bad connections causing hangs (the kubelet is less vulnerable to pathological failures in networking causing it to stall), nothing significant.

Kube is designed for nodes to have continuous connectivity to the control plane. If connectivity is disrupted and the machine restarts, none of the workloads will be restarted until connectivity is restored.

I.e. if you can have up to 10m of network disruption then at worst a restart / reboot will take 10m to restore the apps on that node.

Many other components (networking, storage, per node workers) will likely also have issues if they aren’t tested in those scenarios (i’ve seen some networking plugins hang or otherwise fail).

That said, there are lots of people successfully running clusters like this as long as worst case network disruption is bounded, and it’s a solvable problem for many of them.

I doubt we’ll see a significant investment in local resilience in Kubelet from the core project (because it’s a lot of work), but I do think eventually it might get addressed in the community (lots of OpenShift customers have asked for that behavior). The easier way today is run edge single node clusters, but then you have to invent new distribution and rollout models on top (ie gitops / custom controllers) instead of being able to reuse daemonsets.

We are experimenting in various ecosystem projects with patterns would let you map a daemonset on one cluster to smaller / distributed daemonsets on other clusters (which gets you local resilience).


Your reply provides a lot of context. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: