

Docker Containers at Scale: Our Take on Docker Swarm - presspot
http://mesosphere.com/2015/02/26/deploying-with-docker-swarm/

======
phildougherty
If you've spent any significant amount of time with Mesos and Marathon, you've
probably found that it's buggy (Marathon at least), complicated, and hard to
work with. Zookeeper is just one part of the complicated web of infrastructure
you need to stand up. To top it off you end up needing to run something like
Netflix' Exhibitor to have any shred of confidence that if ZK goes away the
state of your cluster will be recoverable. It definitely "works", but
certainly leaves something to be desired in the stability and ease of use
department.

~~~
23david
Yep. In my experience, I've seen that the issue isn't with Mesos the Apache
project, but rather with Marathon and Chronos, which are the applications
built by Mesosphere.

There's a big naming and marketing confusion at the moment with the open-
source project and the VC-funded startup having such a similar name. It's
unfortunately giving a bad reputation to the Mesos project.

Curious why there actually isn't a trademark issue here...?

~~~
phildougherty
Because Mesosphere (the VC backed startup) hired the creator of Mesos (the
Apache project), and is now trying to capitalize on the open source project as
one of their "products".

~~~
presspot
Mesosphere is arguably the largest contributor to Mesos, certainly on par with
Twitter, especially when you consider all the surrounding ecosystem. The
company also secured permission from the Apache foundation with the trademark
when the company was founded. It's good for the open source ecosystem to have
companies productize and support projects, particularly when they are plowing
millions of dollars back into the open source.

------
rcarmo
I'm a bit concerned about the network layer (or, rather, lack thereof). I've
looked at one or two of the "overlay network" approaches (that essentially
have a container act as a router/tunneller between hosts), and wish there was
something in Swarm that let me do basic port-mapping/load balancing across
containers on multiple hosts (preferably with auto-scaling) without funkiness
and undue overhead.

------
carlivar
If only Mesos would drop the Zookeeper requirement. We've found Zookeeper to
be overly complicated and difficult to troubleshoot.

~~~
lclarkmichalek
Really? I've found it a lot easier to manage than etcd. The fact that you
manually specify all of your nodes in the config file removes a whole host of
errors you can create in etcd with its fancy discovery stuff.

What's your set up? How many zookeeper nodes do you run? What problems have
you run into?

~~~
carlivar
We would get increased latency interacting with Zookeeper over time until
eventually it would completely fail. The log messages when this would happen
(either latency or failure) were extremely unhelpful. The logging server-side
for Zookeeper in fact I found downright terrible.

We wound up proactively restarting the ZK cluster regularly, which improved
stability.

Granted, it was our own software written to use it, and we suspect there were
problems with the way it was written. It was easier for us to just rip it out
than debug, however. I find it overly complicated to write against given the
need for thick clients.

Consul phrases it well
([https://consul.io/intro/vs/zookeeper.html](https://consul.io/intro/vs/zookeeper.html)):

"ZooKeeper provides ephemeral nodes which are K/V entries that are removed
when a client disconnects. These are more sophisticated than a heartbeat
system, but also have inherent scalability issues and add client side
complexity. All clients must maintain active connections to the ZooKeeper
servers, and perform keep-alives. Additionally, this requires "thick clients",
which are difficult to write and often result in difficult to debug issues."

Edit: in retrospect our problems might have been solved by turning syncing
off, as described here:

[http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/zo...](http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/zookeeper_psuedo_scalability_and_absolute)

But you can figure out from the above how scalable Zookeeper is... not very.
We run in physical datacenters and I certainly wouldn't be thrilled about
building out snowflake RAID systems just for our ZK clusters (we generally try
to use whitebox commodity hardware and we want individual nodes to be as
disposable as possible).

It's ironic that ZK requires such consistency when the goals of Mesos are
exactly the opposite.

~~~
mdellabitta
> It's ironic that ZK requires such consistency when the goals of Mesos are
> exactly the opposite.

Is it ironic? Seems to me that if you want to have a distributed cluster that
can deal with worker failure and still be useful, you need to rely on
something to durably maintain your state at a lower level.

~~~
carlivar
I agree and I want that thing not to be ZK. Consul or etcd would be fine.

~~~
mdellabitta
Well, you've described how zk doesn't meet the needs of _your_ software as
written, but I'm not sure you've established why Mesos would be better off
without it...

~~~
carlivar
Yes, I have. Zookeeper's logging (and thus troubleshooting ability) is
atrocious.

~~~
mdellabitta
And yet somehow they seem to have gotten it to work. I run a few Zookeeper-
based systems where ZK just seems to work and I don't have to look at logging
statements, either. I've never dealt with Zookeeper downtime that wasn't
Amazon's fault.

~~~
carlivar
So you are having ZK downtime? Such a system shouldn't ever really go down,
even with Amazon problems. We strive for software in which individual nodes
can be flakey or down without impacting the uptime of the whole cluster. ZK
doesn't fit this criteria, as you have just stated. I prefer gossip-based Raft
systems, for example (which is why I like Consul so much. We're also big Riak
users).

~~~
mdellabitta
> So you are having ZK downtime?

No, individual nodes have gone down based on hardware problems. The system
stays up. Jespen has given Zookeeper probably the most ringing endorsement of
anything it's tested, so I don't know what you're on about.

------
afarrell
Why is it called Docker Swarm rather than Docker Fleet or something similarly
logistics-related?

~~~
rubiquity
CoreOS already has a project named Fleet[0].

[0] - [https://github.com/coreos/fleet](https://github.com/coreos/fleet)

------
polynomial
If you're in NYC, there will be a talk tonight on Building & Deploying
Applications to Apache Mesos at the Digital Ocean meetup:
[http://eventhunt.io/node/5984](http://eventhunt.io/node/5984).

