
The Basics: Kubernetes, Mesosphere, and Docker Swarm - CrankyBear
https://insights.hpe.com/content/hpe-nxt/en/articles/2017/02/the-basics-explaining-kubernetes-mesosphere-and-docker-swarm.html
======
stuntkite
I've since moved on to Kubernetes and SaltStack, but I got into production
Docker containers with Deis[1] a year ago. With full Heroku buildpack support,
Kubernetes backed, and great container tools. I feel like it's sad that this
project doesn't get more love. It's a 20 minute deploy to GKE or AWS (I
recommend GKE).

If you run OSX, TheNewNormal's solo and cluster Kube setup is a breeze and has
the Deis Workflow as a one command install, then you have full local parity
with deployment.

When I started with straight Docker figuring out the transition to the real
world was be really daunting. Getting into a real production environment that
I could compost quickly as I learned learn the paradigms was invaluable.
Articles like this seem to buzzword their way into making the path muddier
than it actually is.

Another big help for me was any time I needed to set something up, first
thing, I'd look at how the bleeding edge was being done with Alpine.
Practically you might not always run that light, but that community has all
the answers.

[1][https://deis.com/](https://deis.com/)
[2][https://github.com/TheNewNormal/kube-solo-
osx](https://github.com/TheNewNormal/kube-solo-osx)
[3][https://github.com/TheNewNormal/kube-cluster-
osx](https://github.com/TheNewNormal/kube-cluster-osx)

~~~
tedmiston
When I first heard about Deis it was heavily compared to Flynn [1]. It's a
little hard to tell from the outside if the two are still competitors or if
Deis is focusing on a lower level of abstraction.

[https://flynn.io/](https://flynn.io/)

~~~
hackerboos
They both support Heroku-like buildpacks. Major difference IMO is that Flynn
supports databases and other stateful services.

I evaluated Flynn, Deis and Dokku and wasn't entirely satisfied with any of
them opting for Heroku instead.

I've promised people a blogpost on this and I will do it once I get time to
evaluate RancherOS too.

~~~
baronseng
Can you evaluate Kontena as well while you are at it?

------
jaaron
Title really shouldn't include "Mesosphere." It should be "Marathon" or
"DC/OS". Or, if you keep Mesosphere, it should be "Google, Mesosphere, and
Docker."

Aside from that, it's not a bad overview. I particularly like this quote:

“Mesos and Kubernetes are largely aimed at solving similar problems of running
clustered applications; they have different histories and different approaches
to solving the problem. Mesos focuses its energy on very generic scheduling
and plugging in multiple different schedulers.” In contrast, says Burns,
“Kubernetes was designed from the ground up to be an environment for building
distributed applications from containers. It includes primitives for
replication and service discovery as core primitives, whereas such things are
added via frameworks in Mesos. … Swarm is an effort by Docker to extend the
existing Docker API to make a cluster of machines look like a single Docker
API.”

Mesos is useful when you have requirements that don't fit in k8s' rather
opinionated world view. If you can use k8s, it's great, but it's also changing
rapidly, so be aware you're on the bleeding edge. DockerSwarm is Docker
finding a business model. (I'm biased here)

~~~
lowbloodsugar
Better bleeding edge than end-of-life. We're using Marathon, and suddenly the
entire UI is end-of-life, and your options are "use Marathon back-end only
without a UI" or "migrate to DC/OS" which is even more opinionated than K8s. I
think the real tell is that Mesosphere don't have the cash/people to maintain
a non-DC/OS UI for marathon, and there isn't a community able/willing to do it
for them.

~~~
XorNot
About a year ago there was a decision made in our project to "just use
Marathon" (our technical lead has a bit of a tendency to declare things as
simple, without thinking too hard about operational problems) - a year later
(...don't ask...) and I very much have the feeling we've backed the wrong
horse.

Conversely...I'm deeply suspicious of excited discussions of distributed
systems these days because it doesn't seem like a lot of people are truly
running them "at scale" \- there's not a lot of write-ups about how people
handle 300+ node clusters, and probably more importantly _how_ they handle
them (how do their internal users use them).

------
webo
We've been running Kubernetes in production for over a year. It has a steep
learning curve and it takes a lot of time to fine-tune the underlying
infrastructure (we use CoreOs) to make it production-ready. There still seems
to be a lot of shortcomings of k8s that seem like should have been addressed
by now:

\- it is impossible to trigger a rescheduling / rebalancing. When a new node
comes in (via AutoScaling policy or whatever), kubernetes doesn't do anything.
Thus, the new nodes can be sitting there doing nothing.

\- once a pod has been scheduled onto a node, it never reschedules it anywhere
else. The node may be experiencing problems, thus the pod is affected. k8s
doesn't do anything to heal that -- it could simply delete the pod so it is
rescheduled somewhere else.

\- docker itself constantly ships with a lot of bugs. To this day (we are on
docker 1.12.6), we constantly have problems with the docker daemon hanging or
becoming unresponsive. I'm not sure if k8s can do much about this, but I feel
like it should since we don't directly control docker.

\- doesn't integrate more tightly with the OS / cloud provider. For example,
it could perform health checks and decide if the node should be restarted,
terminated, or idle.

All of our services are stateless, so it would be nice to have the option for
all the above, especially k8s started as being the solution for stateless
apps.

~~~
cddotdotslash
We've been using Docker as well and I can attest to the shear number of bugs
and constant regressions. I don't think a release has gone by without a bug
that prevents us from updating. From slow pulls, to slow extractions, to
issues with SELinux, to namespace incompatibilities, to the daemon just
hanging repeatedly. It's frustrating because the technologies around Docker
(Mesos, Kubernetes, Marathon, etc) seem to be getting better and more stable
while Docker just continues to have issues.

~~~
webo
Also, in our case it doesn't help when CoreOs auto-upgrades docker versions :(

I'm really looking forward to rkt so that we can finally have a solid
alternative to docker.

------
tom_pulo
With all of this talk about docker, Kubernetes and the rest I feel peer
pressured into ditching my monolithic heroku rails app and switching to the
distributed services heaven that docker seems too advertise.

Can anybody that has made the switch give me a convincing argument about why I
should switch to (or not)? My feeling that docker is great if you are VP of
Engineering at Netflix, but is probably not the best thing if you are starting
a startup and just need to get things done.

Disclaimer: I'm not religious about this and I'm totally open to being
convinced that I'm wrong.

~~~
cookiecaper
The ecosystem around containerization is still emerging and is in a rapid
state of flux. I really wouldn't recommend getting production anywhere near it
for the next 2 years minimum, and realistically, you probably want to wait
more like 5.

Both Docker and k8s change quickly and both lack functionality that most
people would consider pretty basic. Google may have transcended into a plane
where persistent storage and individually-addressable servers are a thing of
the past, but the rest of us haven't. _Many_ things that an admin takes for
granted on a normal setup are missing, difficult, convoluted, or _flat out
impossible_ on k8s/Docker.

We're converting our 100+ "traditional" cloud servers into a Docker/k8s
cluster now, and it's a nightmare. There's really no reason for it. The
biggest benefit is a consistent image, but you can get that with much less
ridiculous tooling, like Ansible.

My opinion on the long-term: containers will have a permanent role, but I
don't think it will be nearly as big as many think. Kubernetes will be refined
and become the de-facto "cluster definition language" for deployments and will
take a much larger role than containers. It will learn to address all types of
networked units (already underway) and cloud interfaces/APIs will likely just
be layers on top of it.

The hugely embarrassing bugs and missing features in both k8s and Docker will
get fleshed out and fixed over the next 2-3 years, and in 5 years, just as the
sheen wears off of this containerland architecture and people start looking
for the next premature fad to waste millions of dollars blindly pursuing, it
will probably start to be reasonable to run _some_ stable, production-level
services in Docker/k8s. ;) It will never be appropriate for everything,
despite protests to the contrary.

I think the long-term future for k8s is much brighter than the future for
Docker. If Docker can survive under the weight of the VC investments they've
taken, they'll probably become a repository management company (and a possible
acquisition target for a megacorp that wants to control that) and the docker
engine will fall out of use, mostly replaced by a combination of container
runtimes: rkt, lxc, probably a forthcoming containerization implementation
from Microsoft, and a smattering of smaller ones.

The important thing to remember about Docker and containers is that they're
not really new. Containers used to be called jails, zones, etc. They didn't
revolutionize infrastructure then and I don't think they will now. The hype is
mostly because Docker has hundreds of millions of VC money to burn on looking
cool.

If Docker has a killer feature, it's the image registry that makes it easy to
"docker pull upstream/image", but the Dockerfile spec itself is too sloppy to
really provide the simplicity that people think they're getting, the security
practices are _abysmal_ and there will be large-scale pwnage due to it
sometime in the not-too-distant future, and the engine's many quirks, bugs,
and stupid behaviors do no favors to either the runtime or the company.

If Docker can nurse the momentum from the registry, they _may_ have a future,
but the user base for Docker is pretty hard to lock in, so I dunno.

tl;dr Don't use either. Learn k8s slowly over the next 2 years as they work
out the kinks, since it will play a larger role in the future. In 5 years, you
may want to use some of this on an important project, but right now, it's all
a joke, and, in general, the companies that are switching now are making a
very bad decision.

~~~
XorNot
I'm actually swinging hard against containers at the moment.

I've been playing around the runv project in docker (definitely not production
ready) and running containers in optimized virtual-machines...and it just
seems like the better model? Which, following the logic through, really means
I just want to make my VMs fast with a nice interface for users - and I can,
with runv I can spin up KVM fast enough to not notice it.

Basically...I'd really rather just have VMs, and pour effort into optimizing
hypervisors (and there's been some good effort along these lines - the DAX
patches and memory dedupe with the linux kernel).

------
bogomipz
>" I’ve spoken with several DevOps who argue that compared with running a
Kubernetes or Mesos cluster, Docker Swarm is a snap."

I'm always leary of statements like that. My experience with cluster
managers(Yarn, Mesos) and distributed systems in general is that they are
almost never "a snap" to run once you move past trivial workloads and
requirements.

~~~
amouat
The author is (mis)quoting me there. I meant the initial process of installing
and getting Swarm running is much easier than k8s or Mesos, not that on-going
maintenance was easier.

Also, since I wrote that, k8s has done a lot of work to simplify installation
with kubeadm and other improvements.

~~~
bogomipz
Ah OK. Yeah doing a greenfield deployment is one thing, "operationalizing" it
is often more involved.

For Mesos anyone if is interested and you are Ansivle shop you can look at
Ansible Shipyard to simplify an install.

[https://github.com/AnsibleShipyard](https://github.com/AnsibleShipyard)

Kubeadm looks interesting. Thanks.

------
sitepodmatt
Quoting from the disadvantages section: "Second, Kubernetes excels at
automatically fixing problems. But it’s so good at it that containers can
crash and be restarted so fast you don’t notice your containers are crashing.
"

A key part of Kubernetes is to the bring the state of the system to that of
the spec, through a series of watches - and in some places polling (config
maps watching in kubectl as of 1.4.x) - this may look like magic, but it's not
fixing problems per se its that your desired state of the system (the spec) is
different to what is observed (the status). This is not a disadvantage.
Kubernetes if not for someone sitting at the terminal expecting a bell or stop
the world when problems happens, although I guess you could configure it that
way and fight against the system.

------
technologyvault
Great article! I just made a point to get caught up Docker and related
elements of the containerization world.

Mesosphere looks like it works similar to Nanobox
([http://nanobox.io](http://nanobox.io))

------
lowbloodsugar
Unfortunately, Mesosphere is no longer focusing on Marathon as an Apache Mesos
framework and is instead integrating it into DC/OS, their commercial offering.
Open-source Mesos+Marathon no longer appears to be an option.

~~~
nemothekid
Isn't DC/OS now opensource?
([https://github.com/dcos/dcos](https://github.com/dcos/dcos))

~~~
lowbloodsugar
So DC/OS is a "one solution for all the things" offering, like K8s. It's
"magical" and "monolithic": if it happens to do what you want, out of the box,
then you're in luck. If it doesn't, or more likely, if you can't figure out
how to make it do what it says it does, then there is a small community that
can help you. And that's if you use the "magical" installer. I wouldn't even
consider building the i-dont-know-how-many-but-more-than-ten (its 30)
different "open-source" packages that make up DC/OS myself, let alone deploy
them.

If you are searching for a product in this space, then K8s is the way to go.
It has a huge community.

If, on the other hand, you are looking for a small, non-monolithic, resource
scheduler + some basic frameworks (i.e. Apache Mesos + Marathon) then DC/OS is
_not that_.

But since Mesosphere are no longer supporting working on Marathon outside of
DC/OS, or indeed any framework outside of DC/OS, the Mesos/Marathon is
effectively dead, since, while Mesos is open source, the only major supporter
of frameworks for it is Mesosphere, and you wont get them unless you use
DC/OS.

~~~
jaaron
I think it's a matter of scope.

For some relatively small projects, k8s is sufficient. But pretty quickly you
end up needing more and more functionality and you end up in a larger stack
like OpenShift.

That's the place that Mesos and DC/OS (IMHO) shine: when you're working with
large clusters (1000+) running very different workloads. Because in that
scenario, k8s is still pretty immature and you're going to inevitably need to
solve all the other problems that OpenShift, DC/OS, Rancher or other stacks
solve. Just like Linux distros, in theory, anyone can do it. In practice, it's
a pain and you want to pick up a standard distro. That's what these stacks
provide: a pre-configured suite of open source tools that get you way more
than k8s or Mesos or Marathon on their own provide.

~~~
lowbloodsugar
Sure, but DC/OS has yet to demonstrate that it can do any of that. _Mesos_ has
demonstrated that it can handle clusters at scale, but I don't know of anyone
running the full smorgasbord of all 30 services on a cluster that big, and its
the other 29 things that's going to fail at scale.

And again, if you are going to be running DC/OS at that scale, then you are
going to be running the enterprise version, because you aren't going to be
running that much magical shit on million dollar hardware without someone
being paid to troubleshoot.

That might be me in a few years, except that Mesosphere seems to be attempting
to kill me in the short term by killing non-DC/OS mesos/marathon while
crippling non-enterprise DC/OS. So looks like we'll be migrating to K8s and
hoping that in a few years K8s scales. The "K8s doesn't scale" argument is
losing ground with every new release btw.

~~~
jadbox
All of Siri and Azure runs on top of DC/OS, it's certainly demonstrated it can
be used for massive data centers

~~~
ninkendo
Siri doesn't run on DC/OS. They use mesos (ie. the open source apache
foundation project that marathon and later DC/OS are built on top of), but
write their own framework(s) that run on top of it.

Source: I worked on the team, although we had talked about this at mesoscon in
the past.

------
ovidiup
I built Jollyturns ([https://jollyturns.com](https://jollyturns.com)) which
has a fairly large server-side component. Before starting the work on
Jollyturns I worked at Google (I left in 2010) on a variety of infrastructure
projects, where I got to use first hand Borg and all the other services
running on it.

When I started more than 5 years ago few of the options mentioned in the
article were available, so I just used Xen running on bare metal. I just
wanted to get things done as opposed to forever experimenting with
infrastructure. Not to mention the load was inexistent, so everything was easy
to manage.

After I launched the mobile app, I decided to spend some time on the
infrastructure. 2 years ago I experimented with Marathon, which was originally
developed at Twitter, probably by a bunch of former Google employees. The
reason is Marathon felt very much like Borg: you could see the jobs you
launched in a pretty nice Web interface, including their log files, resource
utilization and so on. Deploying it on bare metal machines however was an
exercise in frustration since Marathon relied heavily on Apache Mesos. For a
former Googler, Mesos had some weird terminology and was truly difficult to
understand what the heck was going on. Running Marathon on top of it had major
challenges: when long-running services failed you could not figure out why
things were not working. So after about 2 weeks I gave up on it, and went back
to manually managed Xen VMs.

Around the same time I spent few days with Docker Swarm. If you're familiar
with Borg/Kubernetes, Swarm is very different from it since - at least when it
started, you had to allocate services to machines by hand. I wrote it off
quickly since allocating dockerized services on physical machines was not my
idea of cluster management.

Last year I switched to Kubernetes (version 1.2) since it's the closest to
what I expect from a cluster management system. The version I've been using in
production has a lot of issues: high availability (HA) for its components is
almost non-existent. I had to setup Kube in such a way that its components
have some resemblance of HA. In the default configuration, Kubernetes installs
its control components on a single machine. If that machine fails or is
rebooted your entire cluster disappears.

Even with all these issues, Kubernetes solves a lot of problems for you. The
flannel networking infrastructure greatly simplifies the deployment of docker
containers, since you don't need to worry about routing traffic between your
containers.

Even now Kubernetes doesn't do HA:

[https://github.com/kubernetes/kubernetes/issues/26852](https://github.com/kubernetes/kubernetes/issues/26852)

[https://github.com/kubernetes/kubernetes/issues/18174](https://github.com/kubernetes/kubernetes/issues/18174)

Don't be fooled by the title of the bug report, the same component used in
kubectl to implement HA could be used inside Kube' servers for the same
reason. I guess these days the Google engineers working on Kubernetes have no
real experience deploying large services on Borg inside Google. Such a pity!

~~~
justinsb
FYI, Kubernetes can do HA: the issues you pointed to are both about client
connectivity to multiple API servers, which is typically solved today either
using a load-balancer, or by using DNS with multiple A records (which actually
works surprisingly well with go clients). The issues you pointed to are about
a potential third way, where you would configure a client with multiple server
names/addresses, and the client would failover between them.

~~~
ovidiup
As I said, it's not only kubectl that has the problem. None of the services
implemented by Kubernetes are HA: kubelet, proxy, and the scheduler. For a
robust deployment you need these replicated.

Using DNS might work for the K8s services, but at least in version 1.2, SkyDNS
was an add-on to Kubernetes. This should really be part of the deployed K8s
services. Hopefully newer versions fixed that, I didn't check.

Preferably, the base K8s services implement HA natively. Deploying a separate
load balancer is just a workaround around the problem.

FYI Google's Borg internal services implement HA natively. Seems to me the
Kubernetes team just wanted to build something quick, and never got around to
doing the right thing. But I think it's about time they do it.

~~~
justinsb
I think this was true around kubernetes 1.2, but is no longer the case. etcd
is natively HA. kube-apiserver is effectively stateless by virtue of storing
state in etcd, so you can run multiple copies for HA. kube-scheduler & kube-
controller-manager have control loops that assume they are the sole
controller, so they use leader-election backed by etcd: for HA you run
multiple copies and they fail-over automatically. kubelet & kube-proxy run
per-node so the required HA behaviour is simply that they connect to a
different apiserver in the event of failure (via load-balancer or DNS, as you
prefer).

kube-dns is an application on k8s, so it uses scale-out and k8s services for
HA, like applications do. And I agree that it is important, I don't know of
any installations that don't include it.

I think the right things have been built. We do need to do a better job
documenting this though!

~~~
snambi
etcd itself cannot be horizontally scaled because of the architecture. etcd's
leader model cannot allow you to go beyond a certain number of nodes in
cluster. The leader would be overloaded.

~~~
untoreh
I think federation allows to scale horizontally above the limitation of a
single etcd cluster. OTOH The fact that zk/etcd/consul are all leader-based is
probably the reason flynn "simply" uses postgres

