
Docker, Mesos, Marathon, and the End of Pets - ddispaltro
http://blog.factual.com/docker-mesos-marathon-and-the-end-of-pets
======
KaiserPro
I think the main problem with all of these systems is that are just so damn
complex.

Docker, its great if you have no state. But then if you have no state shit is
easy. Mapping images to highspeed storage securely and reliably is genuinely
hard. (unless you use NFSv4 and kerberos)

Mesos is just over kill for everything. How many people actually need shared
image programs bigger than a 64 core machine with 512gigs of ram? (and now
good are you at juggling NUMA or NUMA like interfaces)

I can't help thinking that what people would really like is just a nice easy
to use distributed CPU scheduler. Fleet basically, just without the theology
that comes with it.

Seriously, mainframes look super sexy right now. Easy resource management,
highly scalable real UNIX. (no need to spin up/down, just spawn a new process)

~~~
pibefision
Docker is great also to hide complexity during implementation. Discourse.org
is doing a great work "enveloping" their complex rails app in containers to
easy the install process. And is not stateless.

~~~
sytse
By shipping only in a Docker container you will limit the audience for your
app. That is the reason we ship GitLab as Omnibus packages (deb/rpm).

~~~
stephenr
While deb/rpm packages are better than a monolithic container, having used one
of your rpms, I'd say they're only just barely better.

~~~
sytse
What didn't you like about the Omnibus rpms?

~~~
edgan
Omnibus packages are a horrible idea. They are security issues waiting to
happen.

A great example is the Chef server rpm. It is a 500mb mini distribution in one
package. It has copies of perl, python, Ruby, and Erlang in it. If any of
these has a security vulnerability, I have to wait on the maintainer to
release a new version, and hope it included the security fixes.

They also tend to include things like python header files for no reason. You
wouldn't compile against an Omnibus package, but they are there anyway.
Examples of this are Sumologic's and Datadog's agents.

~~~
sytse
We are aware that we'll have to patch any security issues and have done so
reliably. I agree it is not ideal and we'll always be slower than the
distribution packages. On the other hand the installation is much faster to
perform (2 minutes instead of 10 pages of copy pasting) and we're able to ship
with very secure settings for the integration points (sockets, etc.). But we
recognize that some people will prefer native packages and are sponsoring work
to make native Debian packages.

------
steveb
The real problem is going from tutorial to something you would use in
production. Throw in logging, security and service discovery and you can have
a few engineers hacking away for months.

So I want to plug a project I've been contributing to:
[https://github.com/CiscoCloud/microservices-
infrastructure](https://github.com/CiscoCloud/microservices-infrastructure)

We're trying to make it super easy to deploy these tools. For example every
time you launch a docker container, it will register with consul and be added
to haproxy. The nice thing about using Mesos is we can support data like
workloads Cassandra, HDFS, and Kafka on the same cluster your run Docker
images on.

We use terraform to deploy to multiple clouds so you don't get locked in to
something like cloudformation.

~~~
bkeroack
This is basically why Kubernetes exists: for all the plumbing, discovery, etc
required on top of bare containers.

It still requires work to go from zero to production-quality stack, of course.

~~~
steveb
We like Kubernetes (and are looking to add it to our project), but our goal is
to integrate building blocks that allows us to run many different types of
workloads. Think of our project more like an Ubuntu for distributed systems.

Kubernetes may eventually spread out beyond Docker, but for today we need to
support things like Kafka and Spark.

As others have noted, we've had things like CloudFoundry, OpenShift and
Heroku, and these all-in-one frameworks tend not to extend outside their
original domain.

~~~
jacques_chester
You should look at Cloud Foundry again, particularly with the introduction of
Lattice. It used to be tied to apps, now it basically thinks about tasks and
processes in a completely generic way.

------
sytse
The Circle CI post [http://blog.circleci.com/its-the-
future/](http://blog.circleci.com/its-the-future/) reads as a parody on this
one.

~~~
phildougherty
Wow. That was hilarious! So much truth in that.

------
Wilya
Is anyone running Marathon in production? _Real_ production. The kind where
any downtime means lost money.

I see a lot of intro-level tutorials, but almost nothing on the more advanced
side.

My (completely casual) experience with Marathon is pretty bad, with the main
process crashing quite regularly even under no load, so I'm wondering if
people who write about these systems have actually used them for non-trivial
tasks. And for something as critical as Marathon, which is supposed to
handle... well... all my services, I'd rather be sure that the system is rock
solid.

(This is specifically about Marathon. Mesos itself has proven more reliable)

~~~
steve0ps
I've been running Marathon in production (real production) to power more than
100 applications for the past six months. I chose it because it seemed like
the most stable thing at the time; however, quickly found it was not
production ready. While many of the original issues I encountered in 0.7.x
were resolved with the 0.8.x release, 0.8.x brought new issues such as stuck
deployments, etc. Additionally, I have found the upgrade path to be obtrusive
and frankly scary. I am actively moving away from Marathon because of these
issues.

Marathon does not make using Docker or building microservices simple. There
are many important pieces that Marathon does not provide. Sure your operations
team can tie in Mesos-DNS / Bamboo / Consul / whatever else, but it's going to
take time, requires a specialized team, and leaves you feeling nervous about
what happens if everything crashes in the middle of the night. Even when tying
in these third party tools, it is likely you will have to make significant
code updates to utilize features such as service-discovery / SRV records. You
will inevitably end up with a hobbled-together system that needs serious
support from your operations team.

I am fairly frustrated as a whole with Mesosphere, and expected more from a
company who raised so much capital.

~~~
nemothekid
I wouldn't expect Marathon to do service discovery for you, as I believe that
is better left to something like Mesos-dns/Consul which marathon can supervise
for you, and docker integration has been fairly simple.

In any case I found my marathon was not without issues, like failover causing
every application to restart (I think this was fixed in 0.8.2), or the fact
that marathon tends to use 2x as much RAM as Zookeeper or Mesos-Master (I run
the 3 on the same node).

Have you seen the aurora apache project? It solves the same problems as
marathon, and its creators claim it was built to handle stability. I
originally chose marathon as JSON configuration over REST was easier to wrap
my head around, but was this something you tried and how did it work for you?

------
jacques_chester
Factual have done what lots of people do, which is invent the first 20% of a
PaaS.

PaaSes are awesome. They also, once you go past the basics, require enormous
engineering effort. And that's the problem: engineering effort spent on
curating your own homegrown PaaS is engineering effort not available for
creating user value.

5 years ago rolling your own was a source of competitive advantage. Today you
can get an installable PaaS (Cloud Foundry or OpenShift) off the shelf and run
it. In 2 years Docker, Mesos and CoreOS will probably all have PaaSes of their
own.

Interesting times.

~~~
samkone
I know two companies running running openshift on Mesos. The reason modern
Paas aren't enough, is the factat some scale you're not just running web
services. But also rather complex data pipelines involving distributed data
systems like kafka, spark, cassandra, etc .. and a simple paas has issues
handling those workloads. That's where Mesos shines. As for the user value, I
consider having an uptime system, providing reliable and intelligent service
based on data processing, by efficiently using your resources is has an
important business value.

~~~
jacques_chester
I find Mesos the most interesting because it's a kind of mirror image of
Lattice[1], particularly Diego[2][2][3]. Both of them push intelligence out to
the edges, but in different ways.

Diego pushes the scheduling problem out to the executors themselves through an
auction mechanism. Mesos delegates it back to the requestors, as I understand
it.

The way it was described to me by Onsi Fakhouri is that Diego is "demand-
driven", and Mesos is "supply-driven". Diego grew from the lessons learnt on
Cloud Foundry v2, so it favours fast placement of requests over perfect fit.
But in the fact of the network fallacies, it probably turns out to be a good
approach anyhow.

By way of warning, it has been a few months since I read the relevant papers
and my memory is fuzzy.

Edit: and it looks as though the auction mechanism was moved away from. Hm.

[1] [http://lattice.cf/](http://lattice.cf/)

[2] [https://github.com/cloudfoundry-incubator/diego-design-
notes](https://github.com/cloudfoundry-incubator/diego-design-notes)

[2]
[https://www.youtube.com/watch?v=1OkmVTFhfLY](https://www.youtube.com/watch?v=1OkmVTFhfLY)
(an excellent overview)

[3]
[https://www.youtube.com/watch?v=SSxI9eonBVs](https://www.youtube.com/watch?v=SSxI9eonBVs)
(an excellent update)

------
copsarebastards
To be honest, I still have yet to see one of these systems that beats simply
using Bash. They're all trying to make a scripting problem into a
configuration problem. That's sometimes a reasonable idea, when what you're
doing is common enough that only a few things here and there need to diverge
from the defaults. But _every_ image I've ever had to create contained far
more edge cases than default cases, and half of those edge cases are things
nobody thought of and therefore their system doesn't handle it. Rather than
trying to fight with one of these systems to get them to do something I could
easily do with a few Bash commands, I find it easier to just script the setup
in Bash to begin with.

Obviously that doesn't work for Windows systems.

~~~
tomjen3
In that case you might want to look at ansible: it does exactly what such a
system should (logs in with SSH and runs some scripts) but does in a smart way
(eg. you can configure certain boxes to be web servers and have it run a
script on all web server boxes). It does have some weird config format but it
does also allow you to run scripts.

~~~
vezzy-fnord
Quite curious about if whether we had more robust shell languages, like
Inferno's es, which uses functional expressions, though it's some 20+ years
old by now, if kept alive by some enthusiasts... would much of the rationale
behind the more basic configuration management and provisioning go away?

The chief rationale for embarking on the CM route is idempotence. Though it
seems to me having a basic, lexer-only language simply for chain loading
commands to exec() without much of the state and environment baggage behind
full command shells could work as an alternative. execline works like this:
[http://skarnet.org/software/execline/](http://skarnet.org/software/execline/)

------
justizin
This is poetic:

    
    
      "Kubernetes has a Clintonesque inevitability to it"
    

:-P

