
Introduction to Apache Mesos - aant
http://www.antonlindstrom.com/2015/03/29/introduction-to-apache-mesos.html
======
CoffeeDregs
As noted in another comment, we, a growing starting with 5 people around
software out of 15 people total, run a Mesos cluster with a couple of hundred
machines. By FAR our largest challenge has been to adapt our thinking to break
the "this machine is 'production', that machine is 'staging'" mindset. We have
a production compute infrastructure and you're welcome to launch 'production',
'staging' or whatever jobs into it. The friction around "running Mesos" has
mostly been the friction of the air from exhalatoins of joy buffeting our
esophagi...

We have a separate, much smaller cluster to test new Mesos (and Chronos and
Marathon) versions. But the distinctions "production", "staging" and "dev"
have become much more nuanced, so we've settled on discussing the
"application" environment versus the "infrastructure" environment. Much as, as
a startup on AWS, you wouldn't distinguish the RDS instance of your database
(e.g. 9.3.1), you would distinguish the version of your database on an RDS
instance, we distinguish the versions of our apps on the production cluster
and not the version of the production cluster. One of the team was an ex-
Googler and he said that Google did much the same.

The one thing Marathon and Chronos currently lack is a prioritization
mechanism so we're building that as a Chronos task that monitors and scales
down/up Marathon tasks by their priority (as represented in their id or tag).

~~~
KMag
> The friction around "running Mesos" has mostly been the friction of the air
> from exhalatoins of joy buffeting our esophagi...

Is there some common phrase relating belching and being happy that I haven't
heard?

~~~
CoffeeDregs
Not that I know of, but it'd be awesome to have a word for joyous belching...

------
rckrd
An interesting talk about Docker and Mesos by a former coworker, he also
contributed to 0.22:

[https://www.youtube.com/watch?v=ZZXtXLvTXAE](https://www.youtube.com/watch?v=ZZXtXLvTXAE)

------
wmf
Maybe this is a good place to ask a question that I've been pondering for a
while: Why is Mesos based on the concept of resource offers? AFAIK this is
backwards compared to other schedulers where you ask for resources and the
schedule gives them to you (or not).

~~~
sabraham
To load the framework (application) with the logic with the concern of
deciding what resources it wants. The scheduler shouldn't care about what you
get, just fairness. Two-tiered scheduling achieves this:

 _Mesos decides how many resources to offer each framework, while frameworks
decide which resources to accept and which computations to run on them._

[http://mesos.berkeley.edu/mesos_tech_report.pdf](http://mesos.berkeley.edu/mesos_tech_report.pdf)

~~~
wmf
I guess that makes sense if frameworks care about placement (which I generally
don't) because otherwise they'd have to express placement constraints to
Mesos.

I still can't find documentation about what fairness policies are actually
implemented, though.

~~~
sabraham
Mesos uses dominant resource fairness:

[https://www.cs.berkeley.edu/~alig/papers/drf.pdf](https://www.cs.berkeley.edu/~alig/papers/drf.pdf)

------
domenp
Allow me take the opportunity and ask if anybody here is running Elasticsearch
with Mesos (and Marathon) using Docker container?

I'm running Elasticsearch nodes on a dedicated Mesos slaves and I’m still not
sure how much memory should I allocate to Elasticsearch task e.g. all
available memory as reported by Mesos or some smaller amount to leave
something for the system? Please note that I'm asking about memory allocated
to the Marathon task not about JVM heap size.

~~~
nemothekid
I've taken the view of treating our database slaves as special and allocating
all the memory on the slave to ES. That way ES (or any other db) always runs
on the right slave, and no other potentially long running/resource hungry
process runs alongside it (but just enough that's jenkins tasks and chronos
tasks can still rune if need be).

AKAIK, there isn't a right way to run databases because the persistent storage
layer into mesos is still being baked.

~~~
dingdingdang
> the persistent storage layer into mesos is still being baked

That's pretty important point, would have thought persistent storage would
have been first point to get working for a project like Mesos. Also, their
homepage ( [http://mesos.apache.org/](http://mesos.apache.org/) ) outright
states: "Apache Mesos abstracts CPU, memory, storage, and other compute
resources away from machines [...]". What storage are they talking about if
not persistent storage?

~~~
nemothekid
This mesos 0.22 talk goes into a bit more what they are imagining -
[http://mesosphere.com/2015/03/27/mesos-0-22-0-released/](http://mesosphere.com/2015/03/27/mesos-0-22-0-released/)

Essentially the current mesos disk quotas are just so that tasks that run, run
with a minimum amount of disk space. I think what they are trying to
accomplish in future releases is some way you could build an EBS Style disk
space management on top of mesos.

~~~
SEJeff
Yes, they are calling it dynamic reservations:

[https://issues.apache.org/jira/browse/MESOS-2018](https://issues.apache.org/jira/browse/MESOS-2018)

With a first draft of the user documentation at:

[https://gist.github.com/mpark/e8ee4eb9671bdb252c4f](https://gist.github.com/mpark/e8ee4eb9671bdb252c4f)

It will be really slick once this makes it all into Mesos 0.23

------
andyidsinga
question / thought : seems like mesos could evolve to be an alternative to
openstack ...especially if a tenant layer is developed ..yes/no ???

~~~
wmf
Probably. And OpenShift 3.0 (also on the front page today) is a PaaS (Cloud
Foundry competitor) built on Mesos.

~~~
josephjacks
OpenShift is most definitely not built on Mesos. RedHat's plans over time are
to integrate with Mesos at a lower level for fine grained resource allocation,
but for now Kubernetes is used as the cluster scheduling and orchestrations
runtime.

------
StavrosK
Okay, maybe I'm getting old, but I can't really see the point of Mesos. It
feels like it's adding too much magic to the mix, and I can't see myself using
it to deploy web app servers/databases. Is there a different intended purpose
that I'm missing? I suspect it's great for deploying task workers, for
example.

~~~
justrudd
You can use it for web apps/services ([https://mesosphere.com/docs/getting-
started/service-discover...](https://mesosphere.com/docs/getting-
started/service-discovery/)). But I'm with you on databases. I guess I just
like knowing that those hosts are being managed and cared for (pets vs.
cattle).

With that said, I often wonder how many people are using Mesos/Marathon before
they have any need for it? Using it on for 4 hosts vs. 40 or 400?

~~~
tekacs
There's nothing wrong with Mesos on 4 hosts, though...

Anything more than a single host requires additional co-ordination - picking
Mesos is mostly no different than rolling alternate methods for doing so.

It's not choice with the least overhead, but it's a possibility, for those for
whom it has the right conveniences...

------
lobster_johnson
I've looked at Mesos briefly, but it seems completely JVM-based, and that
you're not prepared to build your whole stack around the current Java stack
(Hadoop, Spark, Storm, Kafka, Zookeeper etc.), Mesos has zero utility. Is this
accurate?

As someone working on scaling microservices, I keep being disappointed by
potentially useful services that turn out to require a JVM-based language such
as Java or Scala. For example, Kafka looks very decent, but the high-level
client is written in Java; if you're not on the JVM, you're stuck implementing
a lot of the client yourself. As far as I know (from the last time I looked at
this stuff), the Zookeeper client is similar, whereas Spark and Storm both
require that you write processing code on the JVM, and libhdfs is apparently
still JNI-based, not native.

For someone using Docker, is there anything competing with Mesos that _isn 't_
wedded to Javaland?

~~~
CoffeeDregs
> [Not using Java -> ] Mesos has zero utility. Is this accurate?

Not at all. My company has a cluster with a few hundred Slaves and we're
mostly a Python shop, with some C++ for machine vision.

Mesos certainly has its issues (as does everything), but its awfully nice for
micro-services: if you can package your service into a Docker container, then
you can launch it into the cluster and Chronos/Marathon/Mesos will take care
of making sure that it's run/running.

I missed the deadline for the Mesos conference (I was unaware of the
deadline), but I'm trying to squeak in a talk about "Using Mesos at [small]
Scale" because we're a small company and Mesos has allowed us to do a bunch of
big company stuff.

>is there anything competing with Mesos that isn't wedded to Javaland?

Yes, I can look at the source code and see that it uses te JVM (Chronos uses
Scala), but, AFAICT, Mesos isn't "wedded" to anything. All of the components
are API-driven. I apt-get install it, I run it, I send jobs to it, it works
and it behaves well. Better I can poke at the APIs of any of the services to
find out what is happening. So we use Marathon for service-discovery and run
Chronos, a framework, under Marathon. Makes finding Chronos, which could be on
one of 200 machines, quick and easy.

~~~
lobster_johnson
Thanks for the detailed reply!

I have to say I'm wary of big frameworks like this that insert themselves as a
kind of monolithic control structure for everything.

My ideal setup is always one where I pick and mix the best modules for the
job, and where I can write some interface glue to let my apps slot into the
system, as opposed to writing my apps for a specific API (as tends to be the
case with, say, Hadoop, and which of course would tie the whole platform to
that API, making it hard to migrate to something else).

Sounds like Mesos is pretty modular and open in that respect?

~~~
CoffeeDregs
Once you get to know it, Mesos is really quite simple: slaves emit events
about their capabilities and running processes; masters collect and distribute
an inventory of slaves and their processes and of events that occur on slaves.
The frameworks (ie. Marathon for long running jobs; Chronos for scheduled
jobs) then listen to those inventories/events and ask the Mesos master to add
and delete processes from slaves. So Mesos is quite modular.

------
twoy
If you don't want to manage ZooKeeper manually to bootstrap Mesos cluster, you
need another Mesos cluster haha.

~~~
eropple
Or just use Exhibitor. I don't remember the last time I manually anything'd
with Zookeeper.

Between Exhibitor and Curator, I honestly find Zookeeper so straightforward
and easy to work with that I don't quite understand the popularity of etcd.

~~~
toomuchtodo
I'm running a small Exhibitor/Zookeeper cluster dockerized (5 nodes, several
hundred clients), and its extremely straightforward. Etcd isn't what we'd
consider production-grade yet.

~~~
saryant
Having worked with etcd in production for the last few months, I have to
agree. The CoreOS stack needs some more time to marinate.

~~~
toomuchtodo
Thanks for this comment. Glad to know I made the right choice.

I'm not saying etcd won't ever overshadow Zookeeper, it probably will with the
momentum behind it, but as an ops guys, I wasn't willing to bet production
application service discovery on it.

~~~
eropple
My distaste for the Go community is pretty well-established in these parts; I
think worse-is-better is screwing us all, and etcd seems to me to be the
worse-is-better Zookeeper. And for things that don't _matter_ , sure, worse-
is-better your life away; a Rails app can be whatever you want, but the
infrastructure _I_ manage had better be bulletproof. I won't say etcd will
never be _competitive_ , but without some significant changes, I don't see it
getting my vote--and those changes are largely around the parts of the feature
set that etcd doesn't support, at which point...why use it, anyway?

------
preillyme
I'm curious which of the Marathon bugs caused your downtime?

~~~
aant
I'm afraid I don't remember exactly which one it was but after an upgrade of
the framework it worked perfectly again.

~~~
florianleibert
The 0.7X releases contained a number of bugs. 0.8X has been significantly
better!

