Etcd v0.1.0 release

jandy · on Aug 12, 2013

Interesting timing. I was playing with Etcd just this morning. I'm glad there's some more options in this space. I haven't been happy with any setup so far.

Doozer (https://github.com/ha/doozerd) got me excited. It's small, fast, and written in Go. Unfortunately, its development seems quiet and fragmented. Its lack of TTL-style values made it a pain to do a distributed lock service without having a sweeper for cleaning up dead locks.

Zookeeper (http://zookeeper.apache.org) is much more fully featured and mature, but felt way too heavy compared to my nimble Go stack. Installing and maintaining a JVM just for Zookeeper made me uncomfortable.

Etcd is interesting. It has TTLs, it's small and fast, easy to pick up/learn, and it's in active development (and it's tied with CoreOS and Docker, so it's bound to get some reflected love).

philsnow · on Aug 12, 2013

> Zookeeper (http://zookeeper.apache.org) is much more fully featured and mature, but felt way too heavy compared to my nimble Go stack. Installing and maintaining a JVM just for Zookeeper made me uncomfortable.

I've spent almost a year dealing with a large, high-traffic zookeeper installation and I agree. Well, actually etcd seems to have some features which I would love to have in Zookeeper.

I maintain several Zookeeper ensembles which I want to be highly available. Any time I need to swap out a node, increase or decrease the size of the quorum, change a node's port(s), or change a node between voting and observing, I have to draw out a diagram where I keep track of which nodes "know" what at which times. If I skip doing that, I run into situations where fewer than N/2 + 1 nodes agree on the current state of the ensemble, and they fail-stop and don't serve traffic.

Here's a specific example of an issue that seems so blindingly obvious to me, but it clearly wasn't to whoever implemented it: if you want to specify that a zookeeper node is an observer (doesn't participate in leader elections), you have to put that in the config file in two places: once on the line for that node in the section where you tell all the nodes where all the other nodes are (like [0]), and you also have to have a separate line "peerType=observer". This last bit means you can't use the same zoo.cfg file for your observer nodes and your voter nodes, you have to keep two zoo.cfgs, make your init script or whatever use the correct one, and keep the files semantically in sync if you ever have to make further changes. What they should do is have each node look at [0] and say "oh I'm server.4 so I'm supposed to be an observer".

It's piles and piles of little annoyances like that make me dislike Zookeeper. I'll be watching etcd.

[0] zoo.cfg

    [snip]
    server.1=hostname1:port1:port2
    server.2=hostname2:port3:port4
    server.3=hostname3:port5:port6
    server.4=hostname4:port7:port8:observer

justinsb · on Aug 12, 2013

The next version of Zookeeper (3.5) allows for dynamic reconfiguration, so it'll be much easier to reconfigure your cluster online. Hopefully we'll also have self-repairing ZK clusters and you won't have to manage Zookeeper unless things are going seriously wrong (e.g. you only have 2 machines online across your whole cluster).

Here's the bug: https://issues.apache.org/jira/browse/ZOOKEEPER-107

It may have taken 5 years, but this will be fixed.

I love etcd & Zookeeper!

lucian1900 · on Aug 12, 2013

Zookeeper has a feature I particularly like that etcd does not (for now at least): it's possible to write to a node from a client such that the node will disappear if the client disconnects. This plus watching makes for a great liveness check between different machines.

jandy · on Aug 12, 2013

Yes, that's true. It is pretty easy to simulate this with a short TTL and a heartbeat though.

tptacek · on Aug 12, 2013

This looks pretty neat. Is anyone using it? The Golang implementation of Raft these documents link to also looks pretty nice.

polvi · on Aug 12, 2013

The project is pretty young, but we are already seeing a pretty strong community coming together:

http://coreos.com/docs/etcd/#libraries-and-tools

We intend to make it a primitive in CoreOS, so plan on it to just get faster (although we recently hit 20k atomic writes/s, so it is pretty fast) and more reliable.

kbd · on Aug 12, 2013

What are the reasons this isn't just a json-serving front-end to Redis? Is it largely because Redis clustering isn't baked yet?

hendzen · on Aug 12, 2013

Because a configuration service has to maintain consensus [0] in the face of arbitrary network partitions. Unfortunately, Redis is unable to do so in a clustered configuration [1]. Attaining this level of fault tolerance in a distributed environment requires implementing a consensus algorithm, such as Paxos (more realistically, Multipaxos) [2], ZAB [3], or the recently published Raft [4].

As for etcd, it certainly looks promising, but I wouldn't advocate using over ZooKeeper until it has been significantly battle tested in a production environment. Consensus algorithms are notoriously difficult to implement, and when implementations fail, they often do so catastrophically. For example, Google engineers ran into many challenges while implementing Paxos [5] for Chubby [6], their distributed lock service (and the inspiration for ZooKeeper, and subsequently doozerd and etcd). While Raft was intentionally designed to be easier to implement than Paxos, relying on alpha or beta quality software for a configuration and distributed lock manager is probably a path to unpleasantness.

[0] - https://en.wikipedia.org/wiki/Consensus_(computer_science)

[1] - http://aphyr.com/posts/283-call-me-maybe-redis

[2] - http://research.microsoft.com/en-us/um/people/lamport/pubs/l...

[3] - https://www.usenix.org/legacy/event/usenix10/tech/full_paper...

[4] - https://ramcloud.stanford.edu/wiki/download/attachments/1137...

[5] - http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/p...

[6] - http://research.google.com/archive/chubby.html

jokull · on Aug 12, 2013

We reviewed the current options at one point (Zookeper, Doozer and something from Netflix) but ended up just using DNS (Route 53) for our config. This however looks great, and supports nested keys (trees) whereas DNS comes up short in that, and other aspects.

0xbadcafebee · on Aug 12, 2013

How is 1000s of writes a second fast? Especially for a key/value store? Also, why is it returning redundant information?

  $ curl -L http://127.0.0.1:4002/v1/keys/foo
  {"action":"GET","key":"/foo","value":"bar","index":5}

You already know the action and the key; all it needs to return is "bar" and the index, though even the index might be unnecessary. While i'm at it, why is 'curl' returning JSON? I don't know a lot of Unix commands that take JSON input.

While i'm throwing around my valueless opinions, this thing is wholly uncomfortable and over-engineered for its supposed purpose. Further complications from unnecessary requirements like the Raft protocol (what the fuck does autonomous distribution of resource management have to do with sharing configuration?!? s'like building X.509 into Telnet) make this thing's hinges groan from feature bloat.

Yet more blather: Why do you have to configure each host's address and a unique port? Isn't etcd supposed to support automatic service discovery? Zeroconf (among many, many others) has had this working for years and it's not hard to use the existing open-source implementations. And why is HTTPS an advanced use?

ideal0227 · on Aug 12, 2013

If you want to ask thing you do not know very much, you should be more humble. 1.etcd is not a key-value store like redis or memcached, whose aim is to provide high performance. etcd is aim at providing high reliable storage for small amount consistent data across multiple machines.

2.1 We have cmdclt tool, which will not return redundant information. 2.2 I do not think you know exactly what index is.

3.raft is a distributed consensus protocol, which help with fault tolerance and consistency. For sharing configuration using etcd, we can assume the conf server will be robust. The essential goal is to remove spof.

4.It is not. It is hard for our usage. Because it is HTTPS, .

0xbadcafebee · on Aug 12, 2013

Still looks overly complex. As far as I can tell, the true purpose of the thing isn't even put down on paper. "A highly-available key value store for shared configuration and service discovery" Could be mDNS! The reliability can in a way be provided by CurveCP. So again, i'm a little mystified as to why you need to implement Raft or produce this tool to accomplish what's been in use for over 10 years. It does seem more complicated to use, so if that was your goal then kudos :)

ideal0227 · on Aug 12, 2013

People usually do not need to worry about implementation difficulties, unless they want to contribute. If you want to, I am happy to talk with you more about details.

I do not think it is difficult to use, unless people do not want to look at the docs. If you go through the docs, and still cannot figure out how to use I would like to help.

If you do not want to know the context, then raft/zk/etcd will probably remain mystified to you for a long time.

fizx · on Aug 12, 2013

The direct competitor here is apache zookeeper, which this is probably simpler than.

If you care, you should do some research, because there are smart people excited about this. I'd start by looking at which software uses zookeeper and why. You may also find the following wiki articles useful:

http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

http://en.wikipedia.org/wiki/Serializability

0xbadcafebee · on Aug 12, 2013

Zookeeper is a kludge derived from a different kludge because nobody could continue to maintain it. The description and example use of this project is definitely simpler than, but it no way encompasses (or describes) the half-dozen different uses of Zookeeper. Can't decide how to respond to your ironic meta-insult, so here's a bunny with a pancake on it's head: http://4.bp.blogspot.com/_0B0VEm_uRcA/TPLhoaUdLEI/AAAAAAAAAA...

ithkuil · on Aug 12, 2013

you might want to read http://static.googleusercontent.com/external_content/untrust..., it's an useful context

e12e · on Aug 12, 2013

I do think this looks interesting, but the second bullet point makes me cringe (my emphasis):

- Secure: optional SSL client cert authentication

Reading the section on SSL in the guide makes me feel slightly better, although I'm worried that it doesn't mention anything about revoking certificates and/or online status checking.

Perhaps we need a (better?) library for TLS-PSK and/or TLS+kerberos for these types of uses of HTTPS? That or a compact stand-alone CA that simplifies certificate management and enrolment to the point where it is both usable, deployable and reasonably secure.

I'm guessing a compact "master CA"-service that only deals with maintaining (optionally) off-line root certs, that only certify intermediary on-line CA(s), that deal with enrolment and revocation of service and client (ie: "principal") certs.

Of course, at that point, you've pretty much created a kerberos work-a-like on top of TLS (for some extra spiffy-ness, set the intermediary CAs to issue certs with 10minute life-times...) -- and I'm not sure if such a system would really be better than just using kerberos in the first place...

[edit: formatting]

Maybe the ease of interop with other rest/http-based services and clients would be worth it -- maybe not.

philips · on Aug 12, 2013

We want to make the SSL setup even easier. The big hurdle is the current lack of easy to setup CA management tools.

Any suggestions on tools? openssl's CA tools work but are pretty difficult to setup and use. It would be great if someone built a CA tools on top of the native Go x509 API.

I would love to have a discussion (or patches!) around revocation and status checking in the next version. The project is young but we want to get security right from the beginning.

e12e · on Aug 12, 2013

The last time I checked, I couldn't find any projects that looked very appealing. There's (now defunct) python CA-project: http://www.pyca.de/

Then there's http://www.openca.org/ (which I've not go any experience with), and of course plain openssl -- which I've always felt is more of a proof of concept than anything really useful.

We need a simple, opinionated project, something along the lines of salt/NaCl -- and I suspect go would be a great implementation platform.

The only sane option I know of, would be Microsoft AD w/CA setup -- but that's not Free, not open, and not really a good fit for this purpose either [edit: to clarify - it's much too big, and does much more than needed. I'm also not sure how well it works in a setup where you want to keep the root keys offline as much as possible].

SEJeff · on Aug 12, 2013

https://fedorahosted.org/certmaster It is python, but this is actually really good stuff.

philips · on Aug 12, 2013

That looks interesting. I will check it out. I will collect other ideas over here: https://github.com/coreos/etcd/issues/99

jared314 · on Aug 12, 2013

I am still a little confused. Is CoreOS designed to be used inside a container, or as a host OS to other containers?

polvi · on Aug 12, 2013

CoreOS is like dom0 and your containers are domUs. Shared kernel, but LXC for isolation.

We need a better way to explain this...

jared314 · on Aug 12, 2013

Reusing the VM terminology "host" and "guest" would have made it clear for me.

Does it include a zero-copy network driver?

polvi · on Aug 12, 2013

Optionally, by giving the container direct access to the nic. Docker will do more iptables/veth-pair magic, but if you need direct access you can use other methods. We like systemd-nspawn, since it integrates nicely with systemd.

gizzlon · on Aug 12, 2013

Off topic, but what are the pros of going this route instead of bare metal or virtual machines?

Don't get me wrong, it's cool to see some innovation in this space..

These made me a little bot wiser :) http://en.wikipedia.org/wiki/LXC http://en.wikipedia.org/wiki/Docker_%28Linux_container_engin...

ideal0227 · on Aug 12, 2013

A host OS.