Migrating ZooKeeper into Kubernetes

zek · on April 9, 2020

I work at HubSpot (on Kafka) and so I was a "user" of this migration because kafka uses Zookeeper for coordination. Its pretty amazing how convenient Kube services made this whole transition and we actually learned a lot from this that we will likely end up applying similar strategies for migrating other services onto Kube. Allowing kube services to point to either external resources or pods/internal ones is a probably the best feature I have found in Kube so far (and there are a lot of great features)

klysm · on April 9, 2020

it’s a little funny to think that if you’re running ZooKeeper in Kubernetes that you’re using etcd to manage the state of the servers of your state management servers

hinkley · on April 11, 2020

Circular dependencies at the bottom of your tech stack are trouble.

I'm having a slow motion argument with a coworker about a piece of code I maintain. It provides bootstrapping data that nearly all of our code uses in some way. Everything from stats and logging up to user-visible functionality.

Every time I run into a hiccup, he's there asking why I'm not using our telemetry or networking code for the internals. It's true that if you are very, very careful, you can manage circular startup dependencies, but anyone sneezes and your app won't start or worse, drops into an infinite loop. Either you build on simpler building blocks with similar functionality, or you find a different way to organize the code.

FBISurveillance · on April 9, 2020

In context of Kafka, hopefully KIP-500 [1] will get implemented sometime soon.

If you're feeling lucky, you can also use zetcd [2] to connect ZK apps to etcd. I've been able to actually run Kafka with it as a toy project a little while ago.

[1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A...

[2] https://github.com/etcd-io/zetcd

takeda · on April 9, 2020

What is the problem with ZK though? The only thing I can think of is that requires JVM, but other than that it is quite solid as far as the locking service is concerned, and since you are using Kafka you already are using JVM.

FridgeSeal · on April 10, 2020

Because if I already have a K8’s cluster, and I want to run a application that requires ZK, I now have to dedicate resources to running a ZK cluster, within my K8’s cluster just to run the one application I cared about.

I also personally find configuring and running Java applications confusing AF. Why are the configs seemingly split into different places and environment variables? Does it get clearer after dealing with Java things for a while?

klysm · on April 9, 2020

Given the close similarity in guarantees, this kind of thing seemed possible but I didn’t know it actually existed! Would it be sketchy to use the same etcd instance as k8s though? It seems desirable to keep that isolated so you don’t fuck up your whole cluster

jpgvm · on April 9, 2020

I would caution against using zetcd specifically with Kafka however as we ran into issues with this in the past.

hinkley · on April 11, 2020

So do none of consul, zookeeper and etcd have a tool for migrating from one of their competitors?

I suppose you end up with Zookeeper running in Kubernetes because the only way to migrate service discovery is to have all machines report to both clusters and then start moving to reading from the new one.

hinkley · on April 11, 2020

Looks like etcd has a module that implements the zookeeper API, but the logistics of moving a bunch of services (without an outage) still seems massive to me. Because old servers still want to discover in the old registry, not the new one.

You can’t just bridge two Raft protocols. If the bridge goes down even once, good luck getting consensus again. And based on the benchmarks I can find, it seems the wire protocol is part of the secret sauce for at least etcd.