Kubernetes uses etcd's watch to track changes on desired vs actual state of the cluster. It's pretty efficient at watching a large number of entries. I wonder how an out-of-process watcher that queries the database continuously will perform (which I assume is what they're doin), as the cluster size increases.
You can subscribe to Postgres' WAL for tracking changes without having to query them. But from a quick look Kine (the etcd to RDBMS translation tool used in OP's article) doesn't use that.
Interesting write up! Would love to learn more about the ‘why’. The post describes one (that etcd has low usage), but that alone doesn’t seem to be a compelling enough reason to do this.
I don’t know for a fact but I would guess there are multiple reasons and all connected to how complex it is to operate etcd.
1. Most people will need to run 3 nodes of etcd (and can never be an even number), and they have to be set up to replicate data etc
2. IIRC, recovering etcd from a total failure of the cluster is not fun (e.g. if you shut down all nodes). It generally is a manual process that pretty much involves restoring from backup.
3. A lot of people know how to manage Postgres. Much less people know how to manage etcd in production.
4. Tooling for Postgres (to operate it, backup and restore, monitor, etc) is a lot more advanced.
5. Pretty much all cloud providers offer managed Postgres services. I don’t believe any one of them offers managed etcd.
> Pretty much all cloud providers offer managed Postgres services. I don’t believe any one of them offers managed etcd.
I think this is key. PostgreSQL is boring for sure (in a good way), but there is huge operational value in being able to point at a managed endpoint and move on to higher value work.
Distributed, consensus based DBs are incredibly complicated and fail in spectacular ways. It makes sense to me that you would want your infrastructure base layer to be relatively boring technology.
FoundationDB is not easy to set up. And it ships with near nothing out of the box; you have to build layers atop it's foundation to do nearly anything. But the polish is through the roof and it's one of the least likely systems to turn into a live hand gernade on the planet.
Etcd is a quite excellent piece of technology that is actually, relatively speaking, quite easy to set up, is quite polished, has few all in all drawbacks. It's great tech. But software alone isn't going to change the fact that you're operating a distributed system.
The biggest problem, in my view, is that there are so few opportunities to get any real experience with most of these systems. You kind of need to be running thousands of decent sized instances all at once to begin to appreciate the weirder sides of what could happen & what you need to do in response. For most people, many of these distributed db systems operate just fine. Until one day they don't, and then they are totally hosted & either suffering extended outages, rollbacks, or worse. Simple things like node rotations usually go smoothly, but don't generate the same kind of hard-fought experience. Your ask about starting out feels like it's asking for the safest most secure route, but only battle hardened rainy-day ordeals are ever going to actually get you to a place of comfort.
Sometimes you don't need to make your entire infra distributed, just parts of it. I'm comfortable having my cluster's database be something like RDS while I benefit from the distributed nature of the compute I get from k8s.
I've stood up multiple k8s cluster, and so far the most fickle component is ETCD. My last cluster is k3s on Postgres, and has been running in production for a very long time with very little babysitting.
If I may probe, why do you think a distributed db is needed for a control plane store? In many situations, you can tolerate unavailability here and they generally don't reach big data scales.
Kubernetes' meta storage has to be resilient. If a single node can provide it, fair. If it needs replicas and consensus, so be it.
I feel K8S chose etcd because of its clustering support and availability guarantees. And kubernetes should not need a full fledged RDBMS. A light weight KV store like etcd should suffice, without the overhead of ACID.