
The etcd operator: Simplify etcd cluster configuration and management - polvi
https://coreos.com/blog/introducing-the-etcd-operator.html
======
darren0
I would love to understand the design rational on why a custom controller is
needed to run etcd as opposed to leveraging the existing k8s constructs such
as replicasets or petsets. While this is a very useful piece of technology it
gives me the wrong impression that if you want to run a persistent or "more
complicated" work load then you must develop a significant amount of code for
that to work on k8s. Which I don't believe is the case, which I why I'm asking
why this route was chosen.

~~~
theptip
The FAQ at the end of the OP addresses this:

"Q: How is this different than StatefulSets (previously PetSets)?

A: StatefulSets are designed to enable support in Kubernetes for applications
that require the cluster to give them "stateful resources" like static IPs and
storage. Applications that need this more stateful deployment model still need
Operator automation to alert and act on failure, backup, or reconfigure. So,
an Operator for applications needing these deployment properties and could use
StatefulSets instead of leveraging ReplicaSets or Deployments."

There is inevitably some app-specific logic required to modify a complex
stateful deployment; the Operator encapsulates this logic so that the external
interface is a simple config file.

~~~
philips
Darren, the FAQ is at the overview post here:
[https://coreos.com/blog/introducing-
operators.html](https://coreos.com/blog/introducing-operators.html)

~~~
darren0
Thanks, but this alludes to more operators coming. Why is such a thing really
needed. The proliferation of this approach seems like a potential downfall of
k8s. Similar to Mesos where a framework is quite powerful, the cost of
developing one is too high. This blog post basically implies that k8s base
constructs can't run postgres, redis, prometheus, etcd, cassandra, etc. But
why? Are we saying that stateful services fundamentally require one off domain
specific logic to run in k8s?

~~~
philips
I don't think stateful applications require the use of something like an
Operator. It really just comes down to where the state lives. For example if
you are wanting to run your database on top of an EBS, SAN, or something like
that it is no problems to just throw up a StatefulSet and go for it.

However, if you start to think about things like orchestrating scaling of
databases that have a administrative tool like Cassandra, Vitess, RethinkDB,
read-replicated Postgres, etc you need to have some glue/gel that integrates
those admin tools with the Kubernetes APIs. And that is what a minimal
Operator should do.

The other thing Operators can do is glue existing software to the Kubernetes
APIs. Which is what the Prometheus Operator does. Prometheus has its own
configuration system for finding monitoring targets; instead of forcing the
user to drop down into that different format the Operator adapts Kubernetes
concepts like label queries and generates the equivalent Prometheus config.

Overall I don't think this is required for every application. Static databases
persisting to shared storage, stateless applications, or cluster wide daemons
all fit nicely in Kubernetes abstractions before Operators. But, there is a
class of clustered applications that are served well by this pattern.

~~~
tristanz
I think philips is exactly right. The whole design of Kubernetes is geared
toward allowing users write their own controllers for advanced use cases. You
could view something like Jenkins or Vitess as controllers because they spawn
Kubernetes pods on demand. The beauty is Kubernetes gives you great
primitives, so you often will be controlling these objects, not the underlying
pods. Of course many simple applications don't need a controller, although I
suspect more and more simple use cases will be managed by an external
controller like Helm which orchestrates the lifecycle of applications.

The questions is: Why isn't this just called a controller? What's this new
term Operator?

~~~
philips
Controller wasn't quite the term to capture the combination of an application
specific controller and third party resource to manage a collection of user
created application instances.

So, we arrived at Operator. It felt like a good term that we could put after X
that encapsulates the intent of the pattern. It helps you operate instances of
an application.

------
hatred
The concept of custom controllers looks similar to what schedulers are in
Mesos. It's nice to see the two communities taking a leaf out of each other's
books e.g., Mesos would introduce experimental support for task groups (aka
Pods) in 1.1.

Disclaimer: I work at Mesosphere on Mesos.

~~~
ideal0227
Yea. They are similar in functionality.

But they work differently. The operator does not really “schedule” containers.
It finishes the controlling logic by using Kubernetes APIs. For example, it
uses native Kubernetes health checking, service discovery, deployment. It
works completely on top of Kubernetes API, so no specialized scheduler,
executor or proxy are needed comparing to [https://github.com/mesosphere/etcd-
mesos/blob/master/docs/ar...](https://github.com/mesosphere/etcd-
mesos/blob/master/docs/architecture.md).

The advantages of Mesos is exposing lower level APIs and resources to allow
more control. The etcd operator we built does not really need that. Building
this kind of application operator may be simpler on k8s than on native Mesos.

Disclaimer: I work at CoreOS on Kubernetes and etcd.

------
ex3ndr
Can someone clarify some points?

* Isn't etcd2 is required to start kubernetes? I found that if etcd2 is not helaty or connection is just temporary lost then k8s just freezes it's scheduling and API. So what if Operator and etcd2 is working on one node and it is down? Also i found that etcd2 also freezes event when one node is down. Isn't it unrecoverable situation?

* k8s/coreos manual recommends to have etcd2 servers not that far from each other mostly because it have very strict requirements about networks (ping 5ms or so) that for some pairs of servers couldn't work well.

* What if we will lost ALL nodes and it will create almost new cluster from backups, but what if we will need to restore latest version (not 30 mins ago)?

~~~
philips
1) Yes, Kubernetes relies on etcd as its primary database. Right now the etcd
Operator does not tackle trying to manage the etcd that Kubernetes relies on.
But! We are working on that as part of our self-hosted work
[https://coreos.com/blog/self-hosted-
kubernetes.html](https://coreos.com/blog/self-hosted-kubernetes.html). Stay
tuned.

2) etcd can deal with any latency up to seconds long for say a globally
replicated etcd. But! You need to tune etcd to expect that latency so it
doesn't trigger a leader election. See the tuning guide:
[https://coreos.com/etcd/docs/latest/tuning.html](https://coreos.com/etcd/docs/latest/tuning.html)

3) The backups are something that we are just getting to with the etcd
Operator. Our intention is to help you create backups and create new clusters
from arbitrarily old backups, but that work hasn't started yet.

------
jbpetersen
Being someone who's been getting more familiar lately with backend engineering
and has been trying to make sense of various options, I've got a strong enough
impression of CoreOS that I'm betting my time it'll be dominating the next few
years.

I also can't wait to see an open version of AWS Lambda / Google Functions
appear.

~~~
duaneb
There are already lambda implementations available; I can't speak to google
functions.

~~~
jbpetersen
Is there a significant difference?

------
russell_h
I've been thinking about implementing a custom controller that would use Third
Party Resources as a way to install and manage an application on top of
Kubernetes. The way that Kuberetes controllers work (watching a declarative
configuration, and "making it so") seems like a great fit for the problem.

Its exciting to see CoreOS working in the same direction - this looks much
more elegant than what I would have hacked up.

~~~
theptip
I've been thinking the same way; the k8s Third Party Resource API really
enables some clever solutions.

While most k8s users are (from what I can tell) currently writing YAML config
files and loading them by hand (encouraged by tools like Helm and Spread), I
think that the k8s apps of the future will be more like the Operator;

1) The 'deploy scripts' are controllers that run in your k8s cluster and
dynamically ensure the rest of your code is running, and the primitives that
you operate on will be your custom ThirdPartyResources.

2) All of the config for your app is wrapped in a domain-specific k8s object
spec; instead of writing a YAML file and uploading it as a raw Deployment, you
would create a FooService API object with just the parameters that you
actually care about for configuring your service.

Right now it's a pain and a lot of code (>10kloc of Go for the etcd-operator!)
but I'm sure that a bunch of that could be abstracted out into a framework
that makes it easy to generate/build operators for a variety of application
use-cases.

Currently the solutions that build and deploy your code for you in k8s seem to
be PaaS replacements (Deis, Openshift), which take a very generic approach to
bundling your code. That's probably going to work for common use-cases, but I
suspect the more bespoke deployments will need something more like the
Operator approach, and I'm looking forward to seeing what tooling evolves in
this area.

------
dantiberian
This sounds a lot like Joyent's Autopilot Pattern
([http://autopilotpattern.io](http://autopilotpattern.io)), but will be more
integrated with Kubernetes, rather than being agnostic.

~~~
doublerebel
Thanks, I remember seeing the autopilot pattern mentioned on Joyent's blog,
but haven't seen that website. The lifecycle [0] looks remarkably similar to
the build and deployment steps outlined in Distelli's manifest [1]. I use
Distelli+Consul on Joyent so I suppose I've been doing the autopilot pattern
without realizing it!

I know that much of Distelli's workflow comes from the founders' experience at
AWS, so I wonder where the root of this pattern lies. Perhaps that would help
unify these similar methods.

[0]: [http://autopilotpattern.io/#how-do-we-do-
it](http://autopilotpattern.io/#how-do-we-do-it)

[1]: [https://www.distelli.com/docs/manifest/deployment-
types](https://www.distelli.com/docs/manifest/deployment-types)

~~~
0x74696d
I'm the lead developer for Joyent of ContainerPilot, which is the tool at the
core of our Autopilot Pattern implementation examples. The lifecycle events
you recognize in Distelli are definitely similar. And Chef's new tool Habitat
has a supervisor that was independently developed but ended up having
interesting parallels with ContainerPilot. So there's a universal idea lurking
under there, which is why we called Autopilot a "Pattern" rather than a tool
in itself.

But it's not clear to me from a casual glance at the the docs whether Distelli
lives inside the container during those hooks? That's part of the distinction
of the Autopilot Pattern is making the higher-level orchestration layer as
thin as possible.

(As far as the root, some of it is derived from my experiences as a perhaps-
foolishly-early adopter of Docker in prod at my previous gig at a streaming
media startup. The rest is derived from both principals with which Joyent's
own Triton infra is built and our experiences speaking with enterprise devs
and ops teams.)

~~~
kt9
I'm the founder at Distelli and I just want to clarify that the Distelli agent
doesn't typically live inside the container though it can. Its used to
orchestrate the container lifecycle on the VM itself.

However if you're building Docker containers and deploying them we recommend
using Kubernetes which is something that Distelli supports out of the box now
- [https://www.distelli.com](https://www.distelli.com)

------
adieu
This is great news. We developed an internal controller managing etcd cluster
used by kubernetes apiserver using third party resource too. The control loop
design pattern works really well.

------
why-el
Somewhat unrelated, but I am just curious. For those who use etcd (and this is
coming from a place of ignorance), does the key layout (which the keys are
currently stored, how they are structured) get out of hand? Meaning, does it
get to a place where a dev working with etcd might not have an idea about what
in etcd at any given time? Or do teams force some kind of policy (in
documentation or code) that everyone must respect?

I am asking because I was in situation where I was introduced to other key-
value stores, and because the team working with them is big and no process was
followed to group all keys in one place, it was hard to know "what is in the
store" at any moment, short of exhausting all the entry points in the code.

------
NegatioN
I see it mentioned in the article that they have created a tool similar to
Chaos Monkey for k8s, but I don't see any resources linking to it.

Will this at some point be available publically? Although k8s ensures pods are
rescheduled, many applications do not handle it well, so I think a lot of
teams can benefit from having something like that.

~~~
ideal0227
The "Chaos Monkey" lives inside the project as a sub-pkg right now:
[https://github.com/coreos/etcd-
operator/tree/master/pkg/chao...](https://github.com/coreos/etcd-
operator/tree/master/pkg/chaos).

We plan to make it a separate project once we feel good about its
functionality and reliability.

If you have any potential use case, requirement in mind, please tell us. :)

------
hosh
This is brilliant. It's like the promise-theory-based convergence tools
(CFEngine, Puppet, Chef) on top of K8S primitives. Better yet, the extension
works like other K8S addons -- you start it up by scheduling the controller
pod. That means potentially, I could use it in say, GKE, which I might not
have direct control over the kube-master.

I wonder if it is leverging PetSets. I also wonder how this overlaps or plays
with Deis's Helm project.

I'm looking forward to some things implemented like this: Kafka/Zookeeper,
PostgreSQL, Mongodb, Vault, to name a few.

I also wonder it means something like Chef could be retooled as a K8S
controller.

~~~
philips
All of your questions are answered in the FAQ section of the overview post:
[https://coreos.com/blog/introducing-
operators.html](https://coreos.com/blog/introducing-operators.html)

~~~
hosh
I don't think my specific questions are answered by FAQ on that page.

The only answer I found that addresses one part of what I'm wondering about is
"How is this different from configuration management like Puppet or Chef?"
However, I did not ask that question.

If you read some of the Mike Burgess's "Promise Theory: Principles and
Applications", you'll realize that Operators (and Kubernetes controllers for
that matter) are applications and implementations of specific parts of the
Promise Theory. This idea that Puppet or Chef is "configuration management" is
a story sold to non-technical people. I would argue that Operators may be a
_better_ application of Promise Theory than previous-generation tools.

Puppet or Chef running as a Kubernetes controller might be able to twiddle
things. It's not exactly a great fit because both would be calling each
respective servers rather than using Third Party Resources on the kube master
(and such, unwieldy) The DSL in each would have to be extended for things
useful for controlling a Kubernetes cluster, but once in place, it can do
exactly what that etcd Operator does: converge on the desired state by
managing memberships and doing cleanups.

Don't get me wrong: I like the CoreOS technology as well as Kubernetes. I've
deployed on CoreOS and Kubernetes before. I get that companies have a
responsibility to control the story and the messaging ... but what I am asking
are questions that are bigger than any single technology or company, and I
like to make up my own mind about things.

------
otterley
Where is the functional specification for an Operator? It sounds like a K8S
primitive; is that in fact true? If not, why does this post make it sound like
one?

