
ZooKeeper: Wait-free coordination for Internet-scale systems - mad44
http://muratbuffalo.blogspot.com/2014/09/paper-summary-zookeeper-wait-free.html
======
TallGuyShort
Obligatory shout-out to Apache Curator:
[http://curator.apache.org/](http://curator.apache.org/)

Curator implements a bunch of algorithms often implemented on top of
ZooKeeper. I like to think of ZooKeeper as the distributed systems equivalent
of peer-reviewed implementations of cryptographic primitives. Curator is a
like a whole cryptographic protocol / cryptosystem. In both cases: don't
implement your own!

~~~
ddispaltro
That is a fantastic analogy (and a great library), in my experience it's works
well but takes effort in ongoing maintenance (log cleaning, etc). Also, it
acts as a canary in the coal mine for other networking problems.

I've also used it in a WAN setup for low throughput, transactional data that I
needed solid exactly-once semantics. Some of the docs related to WAN settings
were non-existent but eventually it worked as intended, thanks to the help of
the community.

------
yourabi
I use ZooKeeper in production for snitch.io.

There are some interesting new alternatives such as etcd / serf/consul - but
at the time ZooKeeper had the best track record (under Jepsen analysis).
Things might have changed since then.

Aphyr has done a bunch of analysis of these systems part of his Jepsen tool:
[http://aphyr.com/tags/jepsen](http://aphyr.com/tags/jepsen) and
[http://aphyr.com/posts/291-call-me-maybe-
zookeepe](http://aphyr.com/posts/291-call-me-maybe-zookeepe)

If you are going to use ZooKeeper I strongly suggest looking at both Apache
Curator and Netflix Exhibitor (they are complimentary).

The examples bundled with ZK don't handle all errors/edge cases...

Curator is a library of common patterns available to use mostly out of the
box.

Exhibitor is a ZooKeeper "aware" supervisor system:
[https://github.com/Netflix/exhibitor](https://github.com/Netflix/exhibitor)

Also always remember your ensemble should have an odd number of nodes (3,5,7)

------
jayunit
If you enjoyed this, I highly recommend Mikito Takada's "Distributed systems
for fun and profit"
[http://book.mixu.net/distsys](http://book.mixu.net/distsys)

The "Partition-tolerant consensus algorithms: Paxos, Raft, ZAB" section is
relevant, along with the "Further Reading" which follows it.

------
directionless
You may also be interested in [http://aphyr.com/posts/291-call-me-maybe-
zookeeper](http://aphyr.com/posts/291-call-me-maybe-zookeeper)

------
tachion
I've used ZooKeeper not only as a service registry but also as a fairly small
message queue - I wanted to be sure that my message will be delivered at least
once, and thanks to Kazoo's (Python ZK library) LockingQueue recipe I was able
to get what I want really easily with all the benefits from ZK's clustering
nature.

------
judk
The article doesn't give any background: ZK is a some of Chubby, Google's
distributed lock manager. Locks are small files.

------
eik3_de
has anyone had real-world experience comparing ZooKeeper with consul? Is
consul considered production-ready yet?

~~~
s0l1dsnak3123
I can't speak for zookeeper, but I've been using consul in production since
1.0. I've never had an issue with it - it works great for me! I've written a
ruby gem to interface with it's API called Diplomat (here:
[https://github.com/WeAreFarmGeek/diplomat](https://github.com/WeAreFarmGeek/diplomat))
and a whole bunch of ansible scripts to setup checks for postgres, nginx, etc.

