
Lessons learned from using Docker Swarm mode in production - gkze
http://blog.bugsnag.com/container-orchestration-with-docker-swarm-mode
======
the_common_man
I am kind of surprised with negativity about Docker and Swarm. Both of them
but UX first and technology second. This is the correct approach to get
adoption (and also the reason why they are popular). Getting started with
Docker and Swarm is really really simple and it's very hard to dislike simple
things. Compared to that mesos and openstack and other tech stacks like that
are garaguntan and imo have very bad UX.

~~~
derefr
Consider that they have different target audiences.

Docker itself is mostly targeted as a tool for _developers_ : you, the
developer, _dockerize_ your application, resulting in a container-image. Sure,
that container-image then has to get _deployed_ by someone (who isn't
necessarily you), but the reason it's getting deployed at all is that a
developer, at some point, made a decision to use Docker as part of the
development process. Everyone else has to just deal with that.

Docker Swarm, meanwhile, is _infrastructure_ , pure and simple. Developers
don't touch it; ops people do. And ops people have very different opinions on
what makes for a good piece of software than developers do. "Good UX" comes
second to things like "stable" and "low overhead" and "predictable failure
modes" and "configurable from a central source of truth."

~~~
pmontra
In big companies ops trumps devs, and it's correct because they develop for
6-12 months and then operate that sw for 6 or 12 years. Been there, saw that
(a mobile phone operator). So if ops says that docker is no-no for deployment,
the dev has to work with another technology or convince them that everybody is
going to benefit from it.

Startups begin as small companies and small companies have a single team that
decide how to develop and deploy. Usually developers deploy and take care of
production too. What's convenient for development often trumps what's
convenient for production, at least for the first months or years.

~~~
derefr
You're presuming that the same company develops and deploys the software. My
company runs many third-party Docker container-images in production—precisely
because a Docker container-image is the only format that software comes in.

~~~
pmontra
I've seen ops accepting to run a couple of services on Windows and Linux at a
time they were all HP-UX and Solaris. There were no good alternatives for
those services so ops were not happy but had to learn how to operate those
servers. Can I suppose you went through the same?

------
leetrout
I am a broken record lately on here... Nomad is really awesome for avoiding
configuration hell and having to manage multiple services for container
orchestration. It's a single binary and very very easy to setup and run. I
actually prefer it to Swarm but YMMV.

It uses Consul under the hood and has so far been bulletproof. (They all have
their drawbacks / idiosyncrasies).

[https://www.nomadproject.io/](https://www.nomadproject.io/)

~~~
andrenth
The one feature that to me seems to be essential but appears to be missing
from all these container orchestrators is the ability to tie a remote volume
(Ceph/Gluster/Lustre/etc) to a container so that if a container is scheduled
to run on a certain node, the volume will automatically be mounted on the same
node.

It seems from the mailing list that at least Nomad will have that at some
point, but I have not seen much talk about it from Kubernetes or Docker Swarm.

~~~
gkze
we tried using EFS for shared storage but quickly depleted our I/O bursting
credits and our throughput dropped to a grinding halt, because our app's
worklaod is both disk write and read intensive. No solutions yet

~~~
objectivefs
If you need more performance than EFS for your shared filesystem storage, you
could give our ObjectiveFS
([https://objectivefs.com](https://objectivefs.com)) a try. We see
significantly higher read/write performance, especially for smaller files.

------
nzoschke
This is a great write up, thanks for sharing the lessons learned.

I wonder if the open questions about instance management are solved by the
"Docker for AWS" beta.

We are entering the commodity phase for orchestration software.

Blogs and HN comments are full of success stories on Swarm, Kubernetes, Mesos,
Nomad and ECS.

There are also a few warnings, like the routing issue in this review, but it's
simply a matter of time before those get sorted.

What's really going on here is that we are all learning how to handle
complexities of distributed systems in the cloud. These new foundations means
we can run more sophisticated apps easier and more reliably.

------
bandrami
Nothing whose version ends in "-rc4" is used in "production". You're using it
in a very hot beta test.

~~~
gkze
We ran RC4 in production and we're running 1.12.1GA in production right now as
well. We have been making money while running this and serving live customer
traffic so we consider it production :)

~~~
toomuchtodo
I hope you have a plan for your paying users when it breaks in production.

------
jahewson
Honestly, you should try kubernetes. The experience is pretty much the same
and the feature set is much more mature.

~~~
madmax96
My experience with Kubernetes is limited, so take all this with a grain of
salt. I've been using the integrated swarm since beta.

The new integrated swarm is a real game-changer in that it is much simpler to
use compared to other solutions. With swarm, it's simply:

    
    
        docker swarm init
        docker swarm join --token <blah> <blah:2377>  
    

That being said, I found that Kubernetes offers more granularity in the level
of control over the cluster. That's not something that __I__ need necessarily,
though obviously YMMV.

~~~
ovi256
I've had the same great, simple experience with Flynn. The terminal commands
you give are barely different from Flynn's. And I like Flynn's capabilities
better, at least for now. They've already figured out the routing fabric
(unlike Docker, per the article) and they have a great redundant DB
capability, sorely missing from other PaaSes, even k8s.

I have no doubt that docker will _eventually_ catch up though.

~~~
madmax96
That's cool. I've never looked at Flynn, but you've motivated me to give it a
look!

------
SloopJon
The big takeaway for me is that first impressions matter. Although the bulk of
the post is about hard-earned knowledge and workarounds for completely
unreliable features, getting a proof of concept in twenty minutes sealed the
deal.

------
rb808
While we're on the topic can anyone recommend a system for rolling out (Java)
applications across server farms that doesn't use containers? We have a bunch
of shell scripts that are pretty horrible.

We could containerize, but we dont need that right now.

~~~
tdurden
Ansible. It isn't perfect, but is far better than shell scripts for
application deployment.

~~~
kkirsche
This is what I use. Other players include Puppet, Salt, and Chef

------
jacques_chester
> _This might be a wishlist item (since we don’t find ourselves doing it
> frequently enough to merit an automated solution), but it would be very nice
> to be able to simply bake a new AMI, the completion of which would trigger a
> job that could swap out instances one or several at a time, such that we
> would be able to perform zero-downtime upgrades automatically. This can
> still be done, but right now it’s by hand._

BOSH[0] does rolling deploys, with canaries, out of the box.

At Pivotal we completely upgrade Pivotal Web Services to the latest Cloud
Foundry within about a day of it releasing. PWS is our dogfooding the hard
way: with a flagship platform that some of our customers have sue-you-to-dust-
if-it-fails support contracts for.

Thousands of apps, tens of thousands of containers, thousands of VMS.

None of whom know that we restarted the entire infrastructure beneath them.

Disclosure: I guess that to the degree that Docker Inc realises that platforms
are where the money is, my employers at Pivotal are competitors. But BOSH is
still a fit for what you want.

[0] [http://bosh.io/](http://bosh.io/)

~~~
nzoschke
I respectfully disagree.

When running on AWS you want CloudFormation controlling an AutoScaling Group
for this.

~~~
jacques_chester
OK, for disposable units, it will make sense. I'm less comfortable with
entrusting highly stateful services to AWS alone.

As a footnote, BOSH works on AWS, OpenStack, vSphere, Azure, GCP and there are
experimental CPIs for RackHD and Photon.

------
colemickens
As long as your willing to re-architect your app so that workers pull web
requests from a queue rather than expecting to just... serve traffic normally.

Or did I misunderstand that section?

~~~
gkze
Incorrect - that's what we're running currently but as soon as the routing
mesh issues are resolved you can start running apps that listen on ports too

~~~
abrookewood
Docker have been copping some flack recently for releasing things that seem
rushed and incomplete (the routing mesh being a good example) - doesn't it
worry you that you're using a Release Candidate in Production? It seems risky
considering their official releases are still pretty buggy.

