Hacker News new | past | comments | ask | show | jobs | submit login
Red Hat contributes etcd to the Cloud Native Computing Foundation (redhat.com)
256 points by pplonski86 on Dec 14, 2018 | hide | past | favorite | 35 comments

> it demonstrated its quality early on

I find that slightly revisionist though I suppose it depends on your definition of "early on".

To my recollection etcd had a very rough patch early on until they overhauled their raft sub-system. Hashicorp caught some flak at the time for giving the raft implementation a hard pass and writing their own. Go ecosystem fragmentation was particularly bad and a very hot topic at the time. I believe that etcd's new found stability and subsequent track record after the new implementation vindicated Hashicorp somewhat.

Let me share a little bit more story here.

The original go-raft is one of the first raft implementations. At that time, the raft paper was not even officially published. Many other attempts around that time were not very successful, including go-raft.

Making a production ready consensus algorithm is not easy: https://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/.... Things like pipelining, batching, flow control, asynchronous snapshot were not extensively explored in the context of raft. And not much effort has been put into testing due to the immaturity of the applications of raft at the time.

We realized the problem a few months after etcd alpha was initially released and became popular. However, I went back to CMU to continue my master degree for 1yr, which slowed down the progress.

After I came back from school, together with Blake, Yicheng from CoreOS and later on Ben from Cockroach Labs, we built a solid raft impl as our first priority. Once we put etcd/raft inside etcd2, the stability of etcd greatly improved. That is about 1.5 yr after the initial release. Now etcd/raft powers many production level distributed systems: tikv, cockroachdb, dgraph and many others.

Over the last couples of years, the focus of etcd/raft is always stability and nothing else (although people are blaming us for usability :P).

etcd has succeeded as a piece of distributed systems infrastructure beyond our wildest expectation. When Alex Polvi, Xiang Li, and I started the project as a README in the summer of 2013 we identified that their still was no consensus database that was developer friendly, easily secured, production ready, and based on a well understood consensus algorithm. And largely we got lucky with good market timing, the invention of the Raft algorithm, and the explosion of good tooling around the Go language. This lead to early success as etcd was used in locksmith, skydns, and vulcan load balancer.

As the years went on we got lucky again that the Kubernetes project chose etcd as its primary key-value database. This helped to establish the project as a must use piece of infrastructure software. Which went on to influence the technology selection of storage, database, networking, and many other projects. Just check out all of the stickers of projects relying on etcd that I could find at KubeCon here in Seattle. https://twitter.com/BrandonPhilips/status/107370136987218739...

For a sense of all of the projects that use checkout this list we maintain in the project: https://github.com/etcd-io/etcd/blob/master/Documentation/in...

Some notable projects include: Kubernetes, Rook, CoreDNS, Uber M3, Trillian, Vitess, TiDB, and many many others.

Moving into the CNCF will help to bring a few things to the project:

- Funding and resources to complete regular third-party security audits and correctness audits

- On-call rotation and team for the discovery.etcd.io system

- Assistance in maintaining a documentation website

- Resources to fund face to face meetup groups and maintainer meetings

As a closing remark I want to thank the over 450 contributors and the entire maintainer team for bringing the project to this point. We are solving an important distributed systems problem with a focused piece of technology.

In fact, in Seattle this week we all got together as a maintainer team for the first time ever: https://twitter.com/sp_zala/status/1073239003330015233

If you want to learn more about the history of the project checkout this other post blog post: https://coreos.com/blog/history-etcd

It was a lot of fun watching the project and community evolve. I think you and the team did an excellent job. I remember a huge spike in users around when discovery.etcd.io launched. It was really a game changer for us building large-scale multi-data center telecom systems. I still remember bootstrapping the first cluster in a 24 data center test and having things blow up, particularly in higher-latency environments (cross-DC)

Fast-forward 4 months, the project had grown and scaled to support the influx of new curious devs and use-cases that stretched the bounds of what was possible at the time. At the end of the 4-months, we had a 128 node cluster that stayed up for years and still powers all of the emergency notifications in a few states in the US!

Woah! I would love to get this testimonial in our production users doc!


Docker Swarm Mode also embeds etcd.

(The embedding mechanism is copy-paste, which I find both ingenious and a bit distasteful. Maybe I’m just sore I didn’t think of it first)

I wrote the initial implementation of the raft subsystem and it was definitely not a copy/paste. We started from scratch (using etcd's core raft) with the transport layer being grpc. My initial experiment could be found in this repository [1]. I then took the code from my initial experiment and included this into Swarmkit [2]. From there we went through many iterations on the initial code base and improved the UI with Docker swarm `init`/`join`/`leave` to make the experience of managing the cluster "friendly".

We spent quite some time evaluating different raft and paxos implementations (mostly Consul and etcd raft libraries), and found out etcd to be the most stable and flexible for our use case. It was very easy for example to swap the transport layer to use grpc. The fact that etcd implementation is represented as a simple state machine makes it also much easier to reason about under complex scenarios for debugging purposes, instead of digging into multiple layers of abstractions.

In retrospect, this came with quite a learning curve. We've had to deal with issues caused by our own misunderstandings on how to use the library properly. At the same time the fact that the developers favored stability as opposed to user friendliness was exactly what we found attractive using etcd's raft. Additionally, CoreOS developers were super friendly and helpful to help us fix these issues. We've reported and fixed some bugs as well. Kudos to them for all the help they provided at the time.

[1] https://github.com/abronan/proton [2] https://github.com/docker/swarmkit/commit/89de50f2092dfd2170...

I apologise for my misunderstanding.

What I remember is, during DockerCon in June 2016, I went into the code to see how it worked, and I found a top-level file setting up data structures and handlers that seemed to be 90% the same as the equivalent file in etcd. And the underlying implementation is reused via vendoring.

Maybe this rings a bell with you and you can tell me what I saw, because I can't find it now.

Maybe I dreamed the whole thing.

I did, and still do, think integrating etcd into Swarm Mode was a masterstroke; we had spent the previous two years working to avoid "first you must install etcd" in a different way that nobody got. Afterwards we created kubeadm to ape the 'init' and 'join' functionality.

Are you sure? I’ve spent quite some time playing with the internals of Docker Swarm / swarmkit last year and I’m quite confident it wasn’t true then. As far as I know they call go-raft directly because they only need a fraction of the features offered by etcd.

It uses etcd/raft from beginning.

It is indeed work that you and your team should be proud of.

Any thoughts on rkt?

rkt was needed to push a number of ideas forward in the ecosystem at the time (4 years ago, 2014) and part of its legacy is the creation of technologies that provided plugin interfaces for the container ecosystem.

The Container Networking Interface was directly created by the work in rkt and continues on today inside of Kubernetes and the CNCF. This work made it possible for an ecosystem of networking solutions to exist that could take advantage of everything Linux has to offer.

The creation of the Kubernetes Container Runtime Interface (CRI) was also spawned, in part, by the existence of rkt and the need to consider container runtimes for use with Kubernetes. It was a long hard engineering effort but I think the separation that CRI forced the kubelet to go through and the competition of various runtimes is good for the ecosystem and the resilience of the Kubernetes project.

It is very unlikely that rkt will be part of the Kubernetes ecosystem at this point with the existence of containerd, and CRI-O as Kube CRI solutions on Linux. And there were missed opportunities on a variety of fronts along the way. But, rkt continues to be used by many organizations for other niche use cases of containers. And the shifts that rkt caused above were positive improvements for the Kubernetes ecosystem.

Thanks for the thoughtful reply.

How does etcd compare to zookeeper? What made etcd the choice for kubernetes and cncf?

A few reasons we didn't use ZK at the time (some of these are out of date). Zookeeper had:

- No TLS security story

- An abandoned RPC/serialization system that was hard to use in other languages

- A consensus algorithm that differed from systems described in literature

- A large RAM footprint

Awhile ago some etcd engineers made an experiment in fact to try and run ZK client protocol on etcd with a proxy:

https://github.com/etcd-io/zetcd https://coreos.com/blog/introducing-zetcd

Today, etcd performs much better than ZK and I believe it is much more widely deployed with a wider set of engaged users.


Pre 3.5.0 zookeeper reconfiguration of a running cluster was also much harder - that was a significant discussion point on Kubernetes when we had the etcd vs (anything) discussions early after open sourcing.

I still think etcd total ordering over history also made reasoning about changes in the system while we were writing the first versions of the controllers and caches and list-watch loops. ZK had partial order, and I was leery of that at the time.

zookeeper uses a coordination kernel, while etcd uses a replicated state machine.

What are the pros and cons? Should etcd be the default whenever one is thinking about using something like that?

It really depends on your use case but one of the main "pros" of etcd is the narrow latency band when writing.

This article is likely biased to the good parts of etcd as it's written by coreOS but you can see how the latency of writes in etcd is very consistent compared to the wide range of latencies experienced writing to ZooKeeper or Consul:


There are other "pros" related to the fact that it's been designed for "cloud native" architectures like kubernetes. For example, FoundationDB can perform on average at sub-milisecond latency for writes (https://apple.github.io/foundationdb/benchmarking.html) versus 1.6ms on etcd however configuring FoudationDB to run programmatically is challenging as it was designed in an environment where ops people rack physical servers.

All key/value stores have good points and bad points but that's in relation to your use case. If write or read throughput isn't the most important, say it's consistency or availability, you may make a different choice about what are "pros" and what are "cons".

Another "pro" or "con" may be the language its written in or how it runs or deploys. If you run a Java shop and have tons of experience writing and deploying Java code, it may be in your best interest to be able to have more control by using a project written in Java. conversely, if you have all go engineers, you may want a project written in go. If you only have junior engineers, you may want whatever is easiest to operate and deploy.

I was wondering when this was going to happen. It's such a critical piece of Kubernetes.

This is exactly the kind of thing that should be a layer on top of FoundationDB. Too many state stores all doing largely the same thing.

FoundationDB has only been open source for 8 months?? etcd is like 5 years old.

FoundationDB started in 2009 and released their 1.0 in 2013. Source: https://en.wikipedia.org/wiki/FoundationDB

From the same source:

> On April 19, 2018, Apple open sourced the software, releasing it under the Apache 2.0 license.

I meant going forward, not saying that etcd should have originally been built on top of it.

I wonder if this the core of Redhat knowing that the clock is ticking, and make sure that critical software that they worked on is available in an open fashion. One only has to look at Sun and MySQL to see what can happen to a once vibrant open source offering after acquisition.

I don't understand this line of thinking. IBM has plenty of people contributing to open-source projects. I wouldn't be surprised if they contributed to etcd even before the acquisition. When it comes to their open-source track record, IBM and Oracle are nothing alike.

Red Hat does not own anything that's valuable aside from their developers, who chose to work at Red Hat due to their pro-FOSS positioning. If IBM chose to start shutting projects like Fedora down or move in the direction of closed-source, these developers would have no desire to remain, and would leave, making that $40B acquisition worthless.

Red Hat has extensive customer relationships, customer databases, contracts for future revenue, partnerships, operational processes, . . . product IP is just a slice of the pie.

It was proposed in July (https://github.com/cncf/toc/issues/136) and voted on in September (https://github.com/cncf/toc/pull/143), all before the IBM/Red Hat deal was announced or known to the staff.

The announcement was held for publicity at KubeCon.

Glad to see that Red Hat is still committed to the Open Source movement after so many naysayers predicted that all contributions going forward would be stymied due to the announcement of the acquisition.

Not saying it will happen (just that it usually does), but I almost never see acquired companies immediately turn into the parent. Usually the acquired maintains its course until attrition and cross-pollination replace its original culture with that of the parent. The acquired company eventually exists only as a collection of intellectual property and history. Could take many years depending on how tightly IBM squeezes. That and the acquisition isn't even finalized yet.

I wonder if "We should get this taken care of before IBM takes over" was part of the thought process here.

The acquisition hasn’t even happened yet, so there’s no influence to see just yet. In fact, it’s illegal for RH to change for IBM prior to the acquisition.

How is etcd security these days? Authentication still off by default?

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact