
ZooKeeper vs. Doozer vs. Etcd - hunvreus
http://devo.ps/blog/2013/09/11/zookeeper-vs-doozer-vs-etcd.html
======
alecthomas
The story of Doozer is a classic example of how _not_ to steward an open
source project.

It was released by two Heroku engineers who promptly _completely_ abandoned
it. By completely I mean did not respond to any communication whatsoever for a
year or so, despite a very active community that had sprung up around the
project in terms of users and forks. I don't begrudge them their lives (or
whatever drew them away), but there were people/companies willing and able to
take over maintenance, but not even that happened. It probably would have
taken just a few hours to hand over maintainership. Instead, just radio
silence.

Eventually there was some movement and the most active group of fork
maintainers were given commit access, but by that time any enthusiasm over
Doozer was long dead and gone.

~~~
Juha
I agree. It's a pity Doozer got to this point. I was happy to see that still
this years Febrary proactive developers in the google groups
([https://groups.google.com/forum/#!topic/doozer/fVcS0y3KuHQ](https://groups.google.com/forum/#!topic/doozer/fVcS0y3KuHQ))
tried to get the project active and coordinated, but I guess merging different
codebases from different forks was too big of a challenge.

~~~
johnbellone
What it really needs is a company like this to get behind it and stewart it.

------
russell_h
One point the article didn't cover is clients. Making a good Zookeeper client
is hard, for two reasons:

1\. The protocol is difficult to implement. In theory you could just use Jute
to codegen this part, but that assumes Jute supports the language you need.
Doozer improves on this with a simple text-based protocol, and etcd goes a
little further with a HTTP API.

2\. The primitives Zookeeper exposes are very primitive. Implementing higher
level abstractions such as locking or leader election on top of znodes is easy
to get wrong. Just this year Curator has fixed critical bugs in both of those
algorithms:
[https://github.com/Netflix/curator/blob/master/CHANGES.txt](https://github.com/Netflix/curator/blob/master/CHANGES.txt)

My feeling is that etcd and doozer fare a little better on #2 just because
their primitives are slightly easier to understand, but fundamentally the
problem still exists. I'm looking forward to seeing more innovation in this
area.

~~~
tomjohnson3
one other thing to keep in mind is that the underlying algorithms (zab and
raft) provide different guarantees.

for example, zookeeper/zab allows reading directly from followers with a
guarantee to get at least a past value that won't be rolled back. this was one
reason zookeer didn't use paxos:

[https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs...](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos)

in my understanding, raft doesn't allow reading directly from followers,
because the follower logs may get repaired/rolled-back when a new leader is
elected. (though i'm sure an implementation can tweak the protocol to provide
this support.)

that said, raft has a _lot_ of interesting applications, and, in my opinion,
is definitely more understandable than the many versions of paxos.
(implementing zab yourself, at this point, would be a futile exercise.)

i found the videos from the raft user study to be very well done (and easier
to understand than even their paper):

raft:
[http://www.youtube.com/watch?v=JEpsBg0AO6o](http://www.youtube.com/watch?v=JEpsBg0AO6o)
paxos:
[http://www.youtube.com/watch?v=YbZ3zDzDnrw](http://www.youtube.com/watch?v=YbZ3zDzDnrw)

...however, i think they did paxos a disadvantage by not just focusing on
multi-paxos (which is probably the most common implementation). but, it's
certainly fair to say that info about paxos is spread out far and wide...with
perhaps too many knobs to turn and implementation-related details to fill in
yourself.

as a side note: i've just started implementing raft in a set of libraries
(multiple languages) that will be open source - along with other protocols.

~~~
babo
> raft doesn't allow reading directly from followers,

In raft all client connections to followers redirected to use the current
master.

> i think they did paxos a disadvantage by not just focusing on multi-paxos

Raft is equivalent to (multi-)Paxos

~~~
nieksand
Having followers redirect to the leader is how the paper describes the
algorithm.

But I don't think there is anything stopping you from having followers service
committed log entry reads, provided you're willing to live with being out-of-
date.

~~~
tomjohnson3
i don't think this is the case. here is a link to a video by one of the
authors about log repair during leader election:

[http://www.youtube.com/watch?feature=player_detailpage&v=YbZ...](http://www.youtube.com/watch?feature=player_detailpage&v=YbZ3zDzDnrw#t=1481)

log entries on a follower _may_ get rolled back - _and thrown out_ \- since
they were not accepted on a majority of followers.

------
alanctgardner2
They make some cogent points about Zookeeper, but is javaphobia really a valid
concern here? Yes, you have to install a JVM, and Oracle doesn't make that as
friendly as it could be. But in my experience ZK doesn't bring along "a ton of
dependencies". Likewise, I'm skeptical that the performance of Java v. Go in
this case makes a huge difference: you're only spinning the JVM once, at
startup.

Maybe I'm too technically conservative, but it seems like the advantages of
Zookeeper's maturity and support base outweigh the ickiness of Java and the
whizbang factor of Go. Complaining about the politics of the ASF doesn't
really factor in; they're still much more likely to be around in 10 months

~~~
johnbellone
Even though Zookeeper is "mature" it is definitely a beast to fine tune and
wade through. Zookeeper definitely does not bring along a ton of dependencies.
If you're deploying it as documented it should be running on its own dedicated
machines. I don't see why dependencies are a problem there especially with
Chef/Puppet nowadays.

~~~
alanctgardner2
Exactly. A "devops" company shouldn't really be complaining about dependencies
anyways, they're a fact of life. I admit my experience with ZK has been 50%
administering it with Cloudera Manager, which basically abstracts away all the
nastiness. I'm curious about specific issues people have had though; I've used
ZK with Hadoop and Kafka without any major issues.

~~~
sandfox
Why shouldn't a 'devops' company complain about dependency? It's their domain
completely. Treating anything as a "fact of life" is not going to make the
problem go away... And what about anyone out there trying to use them who
isn't a 'devops' thingy. A painful dependency graph is going to be something
that influences their decision on what tool they choose.

~~~
hunvreus
Pretty much what I thought. The leaner the better.

Moreover, in our specific case, we did not want to introduce dependencies on
hosts that are managed by our customers so as to not run into conflicts with
their own stack.

------
peterwwillis
DIY is the pink elephant in the room.

Every company in the world that isn't specifically a software-oriented tech
company uses some form of DIY model. Actually, strike that, even they use a
DIY model. These tools are the proof!

Look at the origins for every modern open source management framework or tool,
and it was just a DIY tool that some startup-turned-huge-company developed out
of their own needs, then cleaned up a lot and released to the world. Your
needs may not match those of the company who developed the tool, so it may not
work for you. But _I guarantee you_ that _no_ tool will work for every
situation.

Pick the tool that best fits your needs and then fork it and maintain it
internally. You'll be doing it anyway. (Unless you don't hire software
developers, in which case you'll want to pay for a real product with a support
contract) Once you've done that, stop writing naval-gazing blog posts about
your infrastructure that won't apply to 99% of us.

~~~
jeremyjh
I think its more like a white elephant. I totally understand the motiviation
to write and blog about these vanity infrastructure software projects but I
agree it is hard to stay enthused about reading about another one in the
language du jour.

------
joevandyk
I use postgresql + listen/notify to share and push configuration out to
applications.

Each application starts a thread that LISTEN's for NOTIFY's from postgresql.

I have a settings table (name text, value text). Configuration data is stored
there.

    
    
        insert into settings (name, value) values
          ('sites.my-site.authnet.api_key', 'asdfasdf');
    

There's a trigger on that table that issues a NOTIFY to all the clients. When
the clients receive the NOTIFY, they query the table and store all of the
settings in memory.

It works great.

No additional moving parts. All my configuration is stored in the database,
they aren't checked into files that need to be protected. No security concerns
about api keys stored in a separate service. My settings are backed up along
with the rest of my data.

~~~
enigmo
If you already have to maintain a database and don't need lease/lock
management or simple failover... broadcasting configuration values is pretty
simple, sure.

------
maplebed
I chose Zookeeper to achieve this same goal a while ago (before I heard of
etcd). I have been pleasantly surprised at how useful having a coordination
service is in my infrastructure in addition to a configuration management
service. Because of this, even though etcd looks like it serves distributed
configuration management better (aka simpler), I'm happy with my choice of
Zookeeper.

Two examples of how a coordination service has been useful:

* cluster wide throttles to help protect overwhelmable backends

* redundancy in maintenance cronjobs that really only want to be run once per cluster per time period

(edited for formatting)

~~~
SEJeff
How do you power cron from ZK? Do you use something like airbnb's chronos[1]?

[1] [http://nerds.airbnb.com/introducing-
chronos](http://nerds.airbnb.com/introducing-chronos)

~~~
maplebed
No, I took a different approach. Any normal cron job that wants to be run only
once (per cluster) grabs a zookeeper lock. Those that fail to acquire the lock
exit and try again during the next interval. This is implemented with a
wrapper around the cron job that takes care of locking:
[https://github.com/ParsePlatform/Ops/blob/master/tools/get_z...](https://github.com/ParsePlatform/Ops/blob/master/tools/get_zk_lock).
More details here: [http://blog.parse.com/2013/03/11/implementing-failover-
for-r...](http://blog.parse.com/2013/03/11/implementing-failover-for-random-
cronjobs-with-zookeeper/)

------
philsnow
You can add YouTube to the list of big players that uses ZooKeeper. We use
zkocc [0] to scale (readonly) clients.

[0]
[http://godoc.org/code.google.com/p/vitess/go/zk/zkocc](http://godoc.org/code.google.com/p/vitess/go/zk/zkocc)

~~~
Juha
Thank's, didn't know that. Yep, Youtube definitely counts as a big player.
Added you to the list.

------
kbd
Sorry if this is a dumb question but I've never understood the purpose of
these configuration stores. If you're not running in the cloud, but have
servers in a datacenter that all mount an NFS share, is there any benefit of
these over simply reading a json/yaml file off of an NFS mount?

~~~
regularfry
Resiliency to your NFS server falling over is one. Sane atomic updates and
locking is another.

~~~
vidarh
HA NFS is not hard. Using DRDB to replicate the underlying block device, or
Gluster to replicate the underlying filesystem, plus IP takeover or e.g.
keeaplived is fairly trivial to set up. Running NFS on a single server is a
bit like running one of these configuration management systems on a single
server - they won't be resilient then either. Using Gluster directly is
another easy way of getting resiliency.

Atomic updates and locking is another matter, but for a lot of setups it's
simply not needed.

~~~
regularfry
Or you can run a single tool that is designed for this one job.

Gluster in particular isn't a panacea for resiliency, you've got to really
know where it departs from POSIX to not create problems for yourself.

~~~
Diederich
Are the GlusterFS people mistaken?

[http://gluster.org/community/documentation/index.php/Gluster...](http://gluster.org/community/documentation/index.php/GlusterFS_General_FAQ#What_file_system_semantics_does_GlusterFS_Support.3B_is_it_fully_POSIX_compliant.3F)

"GlusterFS is fully POSIX compliant."

~~~
regularfry
Ooh, news to me. That certainly wasn't the case 2 years ago.

------
hardwaresofton
Hey I recently developed a service devoted to nothing but storing
configuration files! It seems like I'm kind of in the same space (barely), I'd
appreciate some feedback:

[https://configr.io](https://configr.io)

------
ballard
If infrastructure (eg system) configuration is the goal, there's already a
much more accessible project with backend support (flexibility to hack up
support for ldap, zk, etc):

[https://github.com/puppetlabs/hiera](https://github.com/puppetlabs/hiera)

Intro:
[http://www.devco.net/archives/2011/06/05/hiera_a_pluggable_h...](http://www.devco.net/archives/2011/06/05/hiera_a_pluggable_hierarchical_data_store.php)

Note: Does not depend on puppet, so it'll work with chef. A hiera-databag
adapter would make sense.

------
bradhe
This article is really bad. The arguments against ZK are completely
subjective. The pros for Doozer are "it works" which is hardly an evaluation.
The features in the pros for etcd are all present in ZK.

------
johnbellone
About a year ago I deployed Zookeeper to support the Redis failover gem. All
of the problems were a result of my lack of knowledge on how to deploy and
configure it. My guess is that its mostly used right now in Hadoop/Solr
installations and is configured specifically for those.

I've been using Doozer at home on a few side projects and, mainly for it being
written in Go, have enjoyed using it a lot more. The point on security is spot
on. After reading this post I am definitely going to take a look at etcd.

------
badman_ting
I kind of can't believe they used the Apache Foundation as a negative bullet
point for Zookeeper. Come on, just say you felt like writing your own thing.

~~~
vidarh
They're not the only ones that have gotten a negative impression of the Apache
Foundation. Other than the web server, the Apache Foundation seems to largely
be a burial ground for open sourced corporate Java projects.

It may very well be they don't deserve that, but if so they really need to
improve their pr.

------
SamWhited
`and is pretty fragmented (150 forks…)'; I don't think they understand how
GitHub works... (forks here does not equal `forks' of a project in the
traditional sense)

~~~
mnutt
But in this case I think they're correct. If the project is abandoned and
forks don't ever get merged back in, development is fragmented and you end up
having to pick and choose commits from various forks.

------
zimbatm
Or just push your config on S3 ?

Easy to use and setup, reliable, good doc, supports ACL.

The only downside is that it's not broadcasting changes.

~~~
vidarh
People use services like Doozer, Zookeper and Etcd because of
consistency+availability guarantees. S3 will give you the availability, but
not consistency. If you can sacrifice consistency, there are tons of trivial
ways of doing this (in addition to your suggestion of S3):

* DNS with slaves

* LDAP with slaves

* Rsync of a directory of config files

* Pair of web servers or NFS servers or Samba servers using either rsync or a redundant network filesystem (e.g. GlusterFS) or block device (e.g DRDB) to back it.

Solving _that_ problem is easy. It's once you need/want the consistency
guarantees things get dicy.

------
vbit
There's also [http://arakoon.org/](http://arakoon.org/)

------
jokull
DNS on its own can be enough in some scenarios

~~~
rkoz
Care to explain how DNS alone can do distributed config management?

~~~
alanctgardner2
Oh! Treat records as key-value pairs, and encode the value into the IP. If
your timeout is 10 seconds, add a DNS record timeout.server.yourdomain which
resolves to an IPv6 address with value 10. It gets tougher with ASCII strings,
but you could support multi-record configs as well. Then your application just
uses nslookup to download the config when you reload it.

If someone builds this I will be their best friend

~~~
darkarmani
You can also use DNS like a distributed cache. This is useful when you have
millions of clients because their local DNS server will do caching for you.
You can also use it like a bloom filter where cache hits are true positives
and anything that misses might or might not be a valid key.

~~~
ecopoesis
That sounds like the opposite of a Bloom filter. In a Bloom filter, you can
have false positives but you always get true negatives.

~~~
darkarmani
I meant to say "except" where i went on explaining the opposite kind of
lookups.

