Hacker News new | past | comments | ask | show | jobs | submit login

I thought Kubernetes is not great for environments with poor network connectivity, which is quite common when dealing with Edge and IoT scenarios. Has that changed?



Yes it's changed massively, but it's recent - only since 2019.

I am the creator of the k3sup (k3s installer) tool that was mentioned and have a fair amount of experience with K3s on Raspberry Pi too.

You might also like my video "Exploring K3s with K3sup" - https://www.youtube.com/watch?v=_1kEF-Jd9pw

https://k3sup.dev/


What has been changed to improve k8s for applications with poor connectivity between workers on the edge and the control plane in the cloud?


Other than bugs around bad connections causing hangs (the kubelet is less vulnerable to pathological failures in networking causing it to stall), nothing significant.

Kube is designed for nodes to have continuous connectivity to the control plane. If connectivity is disrupted and the machine restarts, none of the workloads will be restarted until connectivity is restored.

I.e. if you can have up to 10m of network disruption then at worst a restart / reboot will take 10m to restore the apps on that node.

Many other components (networking, storage, per node workers) will likely also have issues if they aren’t tested in those scenarios (i’ve seen some networking plugins hang or otherwise fail).

That said, there are lots of people successfully running clusters like this as long as worst case network disruption is bounded, and it’s a solvable problem for many of them.

I doubt we’ll see a significant investment in local resilience in Kubelet from the core project (because it’s a lot of work), but I do think eventually it might get addressed in the community (lots of OpenShift customers have asked for that behavior). The easier way today is run edge single node clusters, but then you have to invent new distribution and rollout models on top (ie gitops / custom controllers) instead of being able to reuse daemonsets.

We are experimenting in various ecosystem projects with patterns would let you map a daemonset on one cluster to smaller / distributed daemonsets on other clusters (which gets you local resilience).


Your reply provides a lot of context. Thanks!


Do you mean connectivity to the outside, or inside the cluster? The examples of Kubernetes and similar things in such scenarios I've seen usually had stable connectivity between nodes. E.g. an edge scenario would be one tiny well-connected cluster per location, remote-controlled over the (bad) external link through the API.


I meant intra-cluster communication between nodes, when some nodes are on the Edge, some are inside the datacenter. The Edge may have pretty good overall connection to DC, but have to work with intermittent connectivity problems like dropping packets for several minutes, etc., without going crazy.


> I meant intra-cluster communication between nodes, when some nodes are on the Edge, some are inside the datacenter.

Don't do this. Have two K8s clusters. Even if the network were reliable you might still have issues spanning the overlay network geographically.

If you _really_ need to manage them as a unit for whatever reason, federate them(keeping in mind that federation is still not GA). But keep each control plane local.

Then setup the data flows as if K8s wasn't in the picture at all.


Yeah, spanning a cluster from DC to Edge is probably not a good idea, but also generally not what I've seen suggested.


Edge<->DC "dropping packets for several minutes"?

Where have you been suffering from this?


Real case - shops have "terminals" installed on site, various physical locations. Some of them have networking glitches 1-2 times a month on average, usually lasting a couple of minutes.

I don't want to have to restart the whole thing on each site every time it happens. I'd like a deployment/orchestration system that can work in such scenarios, showing a node as unreachable but then back online when it gets network back.


> showing a node as unreachable but then back online when it gets network back.

Isn't that exactly what happens with K8s worker nodes? They will show as "not ready" but will be back once connectivity is restored.

EDIT: Just saw that the intention is to have some nodes in a DC and some nodes in the edge and the intention is to have a single K8s cluster spanning both locations with unreliable network in between. No idea how badly the cluster would react to this.


> I thought Kubernetes is not great for environments with poor network connectivity,

No, it's ok. What you don't want to have is:

* Poor connectivity between K8s masters and ETCD. The backing store needs to be reliable or things don't work right. If it's an IOT scenario, it's possible you won't have multiple k8s master nodes anyway. If you can place etcd and k8s master in the same machine, you are fine.

You need to have a not horrible connection between masters and workers. If connectivity gets disrupted for a long time and nodes start going NotReady then, depending on how your cluster and workloads are configured, K8s may start shuffling things around to work around the (perceived) node failure(which is normally a very good thing). If this happens too often and for too long time it can be disruptive to your workloads. If it's sporadic, it can be a good thing to have K8s route around the failure.

So, if that is your scenario, then you will need to adjust. But keep in mind that no matter what you do, if network is really bad, you would have to mitigate the effects regardless, Kubernetes or not. I can only really see a problem if a) network is terrible and b) your workloads are mostly computing in nature and don't rely on the network (or they communicate in bursts). Otherwise, a network failure means you can't reach your applications anyway...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: