

Etcd 0.3.0 – Improved Cluster Discovery, API Enhancements and Windows Support - polvi
https://coreos.com/blog/etcd-0.3.0-released/

======
vidarh
The discovery mechanism looks clumsy to me.

There's no way we'd rely on a public discovery service, for example. If we're
going to hardcode configuration information - such as the URL of a discovery
service that may or may not be up or reachable, we might as well hardcode the
addresses of a few peers.

And running a second etcd cluster to bring up the main one seems pointless.
Either it's turtles all the way down, or you then need to hardcode config
information for the second cluster, in which case it serves little purpose.

I'd rather have a mechanism where each peer takes a list of possible peers and
tries to connect, with a method for deciding when there is quorum to elect an
initial leader and start allowing writes (that's easy enough by introducing a
config option to decide if a peer is "blessed" to be part of the initial
leadership election, and how many blessed peers must be connected to have
quorum - just needs to be enough to form a majority of blessed peers to
prevent more than one subset from electing a leader before they manage to
connect)

Am I missing something?

~~~
ballard
This is why top-down, single-source of truth is mostly the best way for core
infrastructure. No pun intended. It's important for some parts to not be
infinitely reconfigurable or dynamic because it would create chaos and service
dependency deadlock/DoS. Most people not running bare metal don't know the
lessons of how and why things underneath are they way they are and the
sensible limits what's possible.

It's also more secure because if it's possible to dictate all of the service
details, it's much easier to run a lean and locked-down infrastructure.

CoreOS by itself looks great. More JEOS distros need to happen. Trying to
bundle more unproven things just looks marketing led the product.

~~~
nl
_This is why top-down, single-source of truth is mostly the best way for core
infrastructure._

What happens when your single source of truth goes down?

etcd attempts to provide a source of truth by using the Raft[1] consensus
algorithm to determine that "truth" in an environment when relying on merely
one place is too risky.

Th Raft and Paxos[2] (as used in Zookeeper & Google Chubby) algorithms give
_guarantees_ for consensus in an environment with unreliable hardware. That is
better than a _single_ source of truth where you get nothing when that source
disappears.

[1] [http://raftconsensus.github.io/](http://raftconsensus.github.io/)

[2]
[http://en.wikipedia.org/wiki/Paxos_(computer_science)](http://en.wikipedia.org/wiki/Paxos_\(computer_science\))

------
nl
Discovery looks interesting. Can it be used for a client to discover a
cluster?

I've been digging into Docker link containers a bit.

I'm not entirely comfortable with how they work,and I'm not really sure why.
The only thing I can put my finger on is that I feel like discovery is a
separate concern to deployment. But at the same time they are so closely
linked o can understand why Docker needs to tackle it.

Is there a way etcd can work better with Docker links? Maybe it could
automatically read/write Docker published ENV variables or something? Though I
don't think that will quite work across physical machines without some
additional work.

~~~
philips
> Discovery looks interesting. Can it be used for a client to discover a
> cluster?

It certainly could be extended for that. We haven't done anything with
discovery to help with that process yet though.

> Is there a way etcd can work better with Docker links?

There is a lot of room to explore here. Including:

\- Automatically registering exposed ports from running containers into etcd
for service discovery.

\- Using information in etcd to manage a smart proxy that would proxy
requests. This would be the "Link via an Ambassador" but etcd would help
coordinate.

Both of these would be good things to build and see how they work and feel in
practice.

There are also some interesting alternatives to doing the links thing such as
using some network namespace tricks to avoid passing metadata around via
environment variables and simply expose services directly into a container:
[https://coreos.com/blog/Jumpers-and-the-software-defined-
loc...](https://coreos.com/blog/Jumpers-and-the-software-defined-localhost/)

~~~
nl
> Discovery looks interesting. Can it be used for a client to discover a
> cluster? >> It certainly could be extended for that. We haven't done
> anything with discovery to help with that process yet though.

So what is the current best practice for etcd discovery? Environment variable?
mDNS might be nice, too (Though I guess mDNS is kinda, almost an alternative
for a subset of what etcd does).
[http://blogs.gnome.org/danni/2014/02/02/a-libnss-plugin-
for-...](http://blogs.gnome.org/danni/2014/02/02/a-libnss-plugin-for-docker/)
looks interesting too.

I read the Jumper post thing when it was HN. I need to think about that some
more.

------
bkirwi
I'm curious to know more about the garbage collection of stale peers.

\- AFAIK, etcd is built on RAFT, which relies on a 'joint majority' method of
transitioning cluster membership. Are there any issues forming agreement on
what the new membership should be when it's unclear what nodes are still
supposed to be part of the cluster? \- In the land of ZooKeeper, cluster
configurations are typically very very stable, so tracking membership takes
very little information. Is etcd targeted to more dynamic environments where
the garbage generated by entering and leaving nodes is significant?

~~~
philips
etcd isn't designed for significantly dynamic environments but we certainly
want to make it easy for people to add and remove machines at runtime. We have
basic support for runtime cluster configuration but are building out a more
complete API. The current proposal that is being built can be found over here:
[http://thread.gmane.org/gmane.comp.distributed.etcd/168](http://thread.gmane.org/gmane.comp.distributed.etcd/168)

As you point out there is no magic garbage collection of stale peers. You have
to explicitly delete nodes that you don't expect to be participating any
longer.

------
baghali
Has anyone checked out serf [http://www.serfdom.io/](http://www.serfdom.io/)
if yes what are the pros and cons of etcd and serf?

------
aabalkan
Especially Windows support is cool news. I am following etcd project, it's
serious and can change server configuration world. Thanks Alex!

------
cenkalti
Congratz guys, this is a big improvement from 0.2.0. Keep up good work!

------
ballard
This is a pure marketing story pushing a bad solution.

Hiera, a simple hierarchal property distribution system using a backend of
Zookeeper plus puppet or chef is far superior. Etcd is the PHP of
configuration management.

~~~
danieldk
Wow. I know neither etcd or Hiera. But please provide some arguments against
etcd if you think it is bad, rather than just bashing it.

It would be interesting to know the cons and pros of both solutions.

~~~
ballard
I'm bashing it because it oversimplifies the problem and claims to be a
solution that doesn't address the irreducibility of the problem.

It's great for a host file but scp can do that.

It doesn't coordinate the transition of services to other states. Puppet does
an incredible amount of work (dependency satisfaction) to get things where you
want them to be. Configuration management is as much management order of
operations as well as what files should contain what. There's not much
separating the two because you have to usually change them between states.
Chef does this to, but in a simpler way.

Cfengine was a past attempt, but is really complicated double work of what
runs on a server and what runs on a managed node.

~~~
nl
I'm not sure what problem you think etcd is attempting to solve, but if your
solution involvs Puppet or Chef then it isn't surprising you find etcd isn't
sufficient for your purposes.

etcd is a highly available configuration store. That's all. It's designed for
applications to use to store configuration, so they can be started & stopped
without relying on file based storage. Generally speaking these applications
have to be coded to use etcd (which is completely different to Chef/Puppet,
which are Ops tools). Sure, you could do something to write etcd values to a
file, and then have them used by any program, but that misses the point.

Your criticisms are so completely irrelevant they aren't even wrong.

~~~
uniclaude
No bashing against etcd from me, but can you point me to a reason why a
regular database wouldn't work for this type of thing?

High availability, throughput and data storage seem to fit into what a
database is made for.

~~~
vidarh
You'd think so, but most RDBMS's are a lot of work to set up in a way that
provides the kind of consistency guarantees and availability that a
configuration store like this does.

E.g. take Postgres. You can't connect to the slaves and write, so your client
will need to know how to identify which server is currently the master. And if
the master fails, you need to implement a leadership election method, promote
the chosen slave, and resync your other slaves from the new master.

Before you know it you've reimplemented the configuration store with a
different data storage mechanism. There's very little intersection between the
typical RDBMS and a configuration store like this.

~~~
uniclaude
Well, even with your point there are a couple reasons why I could imagine
databases making sense here.

One is that you might already have people used to deal with them in a highly
available manner in your team. HA with Postgres is not exactly an area of
research anymore. Most people already using Postgres in their stack know about
tools like pgbouncer.

Another one is that there are other databases, like Riak, that can talk http
and have those problems solved.

I am here more trying to understand the features that would make one pick etcd
over an higly available database. Thanks for taking the time to explain.

~~~
nl
They solve different problems.

In CAP theorem[1] terms, etcd provides Consistency and Partition-Tolerance,
while databases generally provide Consistency and Availability.

That's important, because it shows the kind of problem each is trying to
solve.

[1]
[http://en.wikipedia.org/wiki/CAP_theorem](http://en.wikipedia.org/wiki/CAP_theorem)

~~~
uniclaude
Great, that's exactly the answer I was looking for. Now everything makes
sense.

------
Touche
I can't figure out what etcd is. Any help?

~~~
jarito
It is a distributed configuration management clustering product like
zookeeper. It simplifies the creation and sharing of mostly configuration data
in clustered or distributed systems.

~~~
Touche
I don't know what that means or what zookeeper is... Can you explain the
problem it solves?

~~~
ballard
Zookeeper is a very specialized distributed filesystem for saving and
delivering important details for very large systems. Getting concurrent
distributed systems like this right is the equivalent of summited Mount
Everest starting by walking from Paris. It was mostly developed at Yahoo and
spun off as an Apache project under Hadoop and commercialized by Cloudera.
Netflix uses ZK for example.

~~~
ithkuil
so you are basically saying that you don't trust the people who designed etcd.

