Docker, its great if you have no state. But then if you have no state shit is easy. Mapping images to highspeed storage securely and reliably is genuinely hard. (unless you use NFSv4 and kerberos)
Mesos is just over kill for everything. How many people actually need shared image programs bigger than a 64 core machine with 512gigs of ram? (and now good are you at juggling NUMA or NUMA like interfaces)
I can't help thinking that what people would really like is just a nice easy to use distributed CPU scheduler. Fleet basically, just without the theology that comes with it.
Seriously, mainframes look super sexy right now. Easy resource management, highly scalable real UNIX. (no need to spin up/down, just spawn a new process)
That's definitely an opinion. :)
I have seen Mesosphere deployed with great success.
Insofar as state, this is one reason I'm not crazy about CoreOS - I feel more comfortable containerizing the application tier than the data tier, though both are certainly possible.
I'm really not eager to replace a highly tuned MySQL or Postgres machine with a container environment experiencing several levels of abstraction and redirection. I get frustrated enough trying to align partitions with block boundaries through RAID controllers.
But if you have 20 front-end app servers and 5 machines that run cron jobs, container services can help you to utilize your capacity much better. I can't say how many times I've worked somewhere that we desperately needed capacity, but didn't have the budget to expand until we cleaned up a bunch of machines that were vastly underutilized.
Anyway, Mesosphere isn't perfect, I have only even used it moderately, but there's a lot of tooling out there which we can use.
Def agree on the wierd theology of fleet, but also generally that it just doesn't do enough for me. It's way too much fucking trouble to say, "Run an http proxy on each physical machine".
Basically, everyone is racing back to PaaSes. Heroku pioneered it and are still out there. Red Hat have OpenShift and are making noises about turning it into a Docker+Kubernetes thing in version 3. Cloud Foundry has been around for a few years now. There are other also-rans.
The thing is that apart from Heroku, you've not heard of installable PaaSes because they're being pitched to the Fortune 500s.
I've worked on Cloud Foundry and I work for the company which donates the most effort to the Foundation. It's been surreal to watch other people introduce pieces of a PaaS and see the excitement about the pieces. Meanwhile, we literally have an entire turnkey system already. If you need a full PaaS -- push your app or service and have it running in seconds, with health management, centralised logging, auto-placement, service injection, the works -- we built it already. Free and opensource, owned by an independent Cloud Foundry Foundation.
Anyhow, I'm obviously biased, YMMV etc etc. But I'd play with Lattice, to get the hang of things.
They've already decided that, and have even reached code-freeze. Their conference is later this month, so that's when it's going to all be announced and rollout plans detailed.
That's why I built ShutIt, which we've used to encapsulate complex legacy environments to produce stateless builds:
For example, teams can have a development environment (with _everything_ in it) rebuilt daily. As everyone uses it, everyone curates it, and they're all talking about the same thing - one pet if you like, rather than n, where n is the number of developer/development envs.
You mean hide ugly sprawling uninstallable messes of rube goldberg code? :)
I'm referring to things that require PHP+MySQL, Node, Redis, and an old JVM process running Struts/Spring all managed by nginx except for that one situation where Apache2 .htaccess semantics are required for rewrite rules in which case it runs Apache proxied by nginx.
A great example is the Chef server rpm. It is a 500mb mini distribution in one package. It has copies of perl, python, Ruby, and Erlang in it. If any of these has a security vulnerability, I have to wait on the maintainer to release a new version, and hope it included the security fixes.
They also tend to include things like python header files for no reason. You wouldn't compile against an Omnibus package, but they are there anyway. Examples of this are Sumologic's and Datadog's agents.
The result is that their forum software requires another operating system to run. Had they been more disciplined in their development approach, Docker would have been merely a convenient way to test Discourse, and not the only supported option.
So I want to plug a project I've been contributing to: https://github.com/CiscoCloud/microservices-infrastructure
We're trying to make it super easy to deploy these tools. For example every time you launch a docker container, it will register with consul and be added to haproxy. The nice thing about using Mesos is we can support data like workloads Cassandra, HDFS, and Kafka on the same cluster your run Docker images on.
We use terraform to deploy to multiple clouds so you don't get locked in to something like cloudformation.
It still requires work to go from zero to production-quality stack, of course.
Kubernetes may eventually spread out beyond Docker, but for today we need to support things like Kafka and Spark.
As others have noted, we've had things like CloudFoundry, OpenShift and Heroku, and these all-in-one frameworks tend not to extend outside their original domain.
I see a lot of intro-level tutorials, but almost nothing on the more advanced side.
My (completely casual) experience with Marathon is pretty bad, with the main process crashing quite regularly even under no load, so I'm wondering if people who write about these systems have actually used them for non-trivial tasks. And for something as critical as Marathon, which is supposed to handle... well... all my services, I'd rather be sure that the system is rock solid.
(This is specifically about Marathon. Mesos itself has proven more reliable)
Marathon does not make using Docker or building microservices simple. There are many important pieces that Marathon does not provide. Sure your operations team can tie in Mesos-DNS / Bamboo / Consul / whatever else, but it's going to take time, requires a specialized team, and leaves you feeling nervous about what happens if everything crashes in the middle of the night. Even when tying in these third party tools, it is likely you will have to make significant code updates to utilize features such as service-discovery / SRV records. You will inevitably end up with a hobbled-together system that needs serious support from your operations team.
I am fairly frustrated as a whole with Mesosphere, and expected more from a company who raised so much capital.
In any case I found my marathon was not without issues, like failover causing every application to restart (I think this was fixed in 0.8.2), or the fact that marathon tends to use 2x as much RAM as Zookeeper or Mesos-Master (I run the 3 on the same node).
Have you seen the aurora apache project? It solves the same problems as marathon, and its creators claim it was built to handle stability. I originally chose marathon as JSON configuration over REST was easier to wrap my head around, but was this something you tried and how did it work for you?
It has worked remarkably well and allowed us to scale up/down during peak hours or unexpected high traffic peaks.
There were indeed issues with the 0.7.x series of Marathon, but we've made a big effort to focus on stability and performance in 0.8.x, and onward. As with any new software project, there tend to be issues in early releases.
PaaSes are awesome. They also, once you go past the basics, require enormous engineering effort. And that's the problem: engineering effort spent on curating your own homegrown PaaS is engineering effort not available for creating user value.
5 years ago rolling your own was a source of competitive advantage. Today you can get an installable PaaS (Cloud Foundry or OpenShift) off the shelf and run it. In 2 years Docker, Mesos and CoreOS will probably all have PaaSes of their own.
Diego pushes the scheduling problem out to the executors themselves through an auction mechanism. Mesos delegates it back to the requestors, as I understand it.
The way it was described to me by Onsi Fakhouri is that Diego is "demand-driven", and Mesos is "supply-driven". Diego grew from the lessons learnt on Cloud Foundry v2, so it favours fast placement of requests over perfect fit. But in the fact of the network fallacies, it probably turns out to be a good approach anyhow.
By way of warning, it has been a few months since I read the relevant papers and my memory is fuzzy.
Edit: and it looks as though the auction mechanism was moved away from. Hm.
 https://www.youtube.com/watch?v=1OkmVTFhfLY (an excellent overview)
 https://www.youtube.com/watch?v=SSxI9eonBVs (an excellent update)
Obviously that doesn't work for Windows systems.
The chief rationale for embarking on the CM route is idempotence. Though it seems to me having a basic, lexer-only language simply for chain loading commands to exec() without much of the state and environment baggage behind full command shells could work as an alternative. execline works like this: http://skarnet.org/software/execline/
"Kubernetes has a Clintonesque inevitability to it"