
Mesosphere Announces First Data Center OS and $36M in Funding - preillyme
https://gigaom.com/2014/12/07/mesospheres-new-data-center-mother-brain-will-blow-your-mind/
======
brandonb
Congrats to the team! Ben was one of the most brilliant students in my
undergrad class and it's awesome to see him employ his talents in such a big
way.

I worked on healthcare.gov last year and it's hard to overstate the potential
impact of a tool like DCOS. At one point, we had 2000+ VMs, most manually
configured, with no monitoring, and completely different configs between dev,
test, and prod (not intentionally). Straightforward operations like migrating
half of the database servers from one VLAN to the other took months, small
mistakes like changing a database password could result in hours-long outages,
and simply getting the data that Mesosphere displays automatically would
sometimes take weeks and other times simply be impossible.

Of course, clean devops hygiene would have eliminated much of the pain in the
first place, but not every organization has the expertise to do things right.
In fact, most don't, and the solution for most organizations is good tooling
that automates as much of the system as possible and provides good development
discipline for the rest.

~~~
23david
Having 2000+ VMs without automation or monitoring in place is pretty unusual.
Especially for a site like healthcare.gov which must have had all sorts of
HIPAA and data security requirements.

But it's honestly not hard to hook all those machines up to a system like
Chef/Puppet/Saltstack/Ansible and start automating common tasks within a few
days.

Migrating databases between networks and rotating passwords would generally be
outside of the scope of a tool like Mesosphere. Again, this is something that
can be easily handled by existing automation tools. With most databases
though, there's nothing straightforward about migrating data to nodes on
different vlans or password rotations. If it's a one-time task, I recommend
hiring a database consulting firm to do the migration or rotation.

I think that have good defaults and enforcing best-practices is a good idea.
But I think that a lot of this can be achieved with existing tools. IMO, it
makes more sense for organizations to automate and orchestrate Mesos
deployments via existing/mature DevOps tools like
Chef/Puppet/Ansible/Saltstack. Would also be exciting to see deployments
working via NixOps/NixOS.

------
KaiserPro
Older person here.

Whats nice is that people are thinking about supercomputers again, even if
they insist on calling them "the cloud".

First things first, look up beowulf
([http://en.wikipedia.org/wiki/Beowulf_cluster](http://en.wikipedia.org/wiki/Beowulf_cluster))
which is a suite of tools that implents a multi machine scheduler, and message
passing interface. Whats nice is that if one host is overloaded, it can
migrate the process to another. (however I'm not sure what performance is like
nowadays)

In the world of VFX for movies, we've been dealing with schedulers for years.
Programmes like alfred, tractor, qube and deadline can dispatch tasks, and
deal with dependencies at massive scale.

The first thing to note is that "DCOS" is really a discrete set of parts; a
scheduler, machine state enforcement, network config, Storage, and the
underlying OS.

With careful planning, the state enforcement tool (puppet and the like) can
take care of all of these tasks except a global scheduler.

The beuty of the VFX scheduler is that they understand dependencies really
well. (I need x to complete before I run y, I need feature z to run) A lot of
newer schedulers really don't understand this concept well.

~~~
KaiserPro
Its really important to understand that puppet and the like cannot (without
heavy engineering) act as a task dispatcher. The big feature of "DCOS" is task
distribution.

In linux terms its like comparing the CPU scheduler to chmod. Yes you can make
a program run by chmoding a file to +x, but its the scheduler that is
responsible for making sure the programme has CPU time.

~~~
23david
Task distribution primitives are handled by Mesos and implemented in the
various apps running on Mesos (Marathon, Chronos, etc.)

It seems inaccurate to call this a feature of DCOS.

Puppet and the like can easily configure and manage Mesos and the various apps
running on top, giving all the task dispatch functionality goodness.

DCOS will hopefully be a great integrated Mesos distribution. Seems to me that
by supporting machine provisioning and having a per-node licensing model, it's
being positioning as a direct competitor to Openstack and VMWare.

------
kelseyhightower
Disclosure: CoreOS employee; Kubernetes contributor.

After reading the title I was a bit tempted to call BS since I think
Kubernetes should have the rights to "The first Datacenter OS" tagline[1].
However, after reading up on the project details at
[https://mesosphere.com/learn](https://mesosphere.com/learn) I can see how
Mesosphere came to the conclusion of being a DCOS, if not necessarily the
first. Mesosphere goes a bit further than Kubernetes and offers a solution to
the storage problem and attempts to address other "userland" concerns by
shipping Apache Spark, Cassandra, Kafka, and Hadoop. So maybe it would be more
accurate to call this a datacenter distro on top of the Kubernetes kernel?

Regardless, I think the concept of a datacenter OS will be the key to
commoditizing IaaS providers and leveling the playing field in terms of
features and usability for those who have not given up on the dream of running
a "private" cloud.

Why will the DCOS work where others have failed?

Current solutions aimed at taming the datacenter operate at the machine/VM
level, which exposes the OS for each machine, and completely punts on the
application. Guess who gets to stitch it all back together? A DCOS is designed
to manage applications directly, commonly via application containers, which
means we can treat the OS running on the underlying machines like firmware and
limit our interactions to basic updates and minimal configuration -- think
CoreOS.

What about PaaS?

That's a topic worthy of a lengthy discussion, but I think it boils down to
the lack of control found in most PaaS platforms[2]. In order for a PaaS
offering to be successful it must make opinionated decisions about how to
deploy and run applications; a bit too inflexible for most people. On the
other hand, a DCOS seems to hit the sweet spot between IaaS and PaaS.

[1] I'm sure you can make an argument for Joyent's SmartDataCenter
([https://www.joyent.com/private-cloud](https://www.joyent.com/private-cloud))
as well. [2] Deis ([http://deis.io/overview](http://deis.io/overview))
attempts to address this issue.

~~~
Zariel
Isn't it running ontop of the Mesos kernel, which has been around for longer
than Kubernetes?

~~~
presspot
A bit of the history:

The Mesosphere DCOS is built around the Apache Mesos kernel

The Mesos kernel was developed at UC Berkeley in 2009 [1].

Spark was written as a sample app on top of it [2].

Ben Hindman and his colleagues at the UC Berkeley AmpLab had always envisioned
Mesos as a kernel inside of a full-blown operating system [3]. They finally
brought it to market.

[1]
[https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/...](https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Hindman.pdf)

[2] "We have implemented Mesos in 10,000 lines of C++. The system scales to
50,000 (emulated) nodes and uses ZooKeeper for fault tolerance. To evaluate
Mesos, we have ported three cluster computing systems to run over it: Hadoop,
MPI, and the Torque batch scheduler. To validate our hypothesis that
specialized frameworks provide value over general ones, we have also built a
new framework on top of Mesos called Spark, optimized for iterative jobs where
a dataset is reused in many parallel operations, and shown that Spark can
outperform Hadoop by 10x in iterative machine learning workloads." ibid.

[3]
[http://people.csail.mit.edu/matei/papers/2011/hotcloud_datac...](http://people.csail.mit.edu/matei/papers/2011/hotcloud_datacenter_os.pdf)

~~~
Zariel
Yep, also had input from Google around the time there were deploying (or
building) their new scheduler to replace Borg, Omega.

------
michaelsbradley
So, the commoditization of on-demand and highly scalable virtual computing
infrastructures, together with the rising popularity of "containerization" for
app and service composition, seems to be creating an "orchestration crisis",
or an "orchestration business opportunity," depending on your vantage point.

Are we about to see the emergence of what might be termed "new wave mainframe"
computing?

~~~
randomsearch
> be creating an "orchestration crisis"

Spot on, and very well put.

The problem with existing orchestration tools, and tools like chef, puppet,
etc. is that they're all a bit piecemeal and complicated. What we need is a
step up the abstraction hierarchy, and some standardisation.

We'll probably know when we've got there if companies no longer have any real
idea of how many servers or VMs from different providers that they're
utilising. They'll just know which applications they're running and how much
it's costing them.

------
larryweya
I'm a recent mesos convert but I think "first" is a tad inaccurate if you
consider Joyent's SmartOS and it's recently open sourced Smart Data Center.

~~~
presspot
There are a lot of components to an operating system. It's not just the
technology components, it's the product components and the business
components. E.g., Does it have an API? Does it have an SDK? Does it have a
user interface? Does it have an init system, a chron, a storage system,
service discovery? Does it have an ecosystem of third party developers? I
posit that the OS Checklist is fairly long and that no of the other systems
you mention have the complete OS package.

~~~
dmpk2k
_I posit that the OS Checklist is fairly long and that no of the other systems
you mention have the complete OS package._

They do.

E.g. SmartOS is a UNIX. It'd be pretty odd if it didn't have an _init system_.
Or storage. Or cron. Or POSIX. Or whatever else you'd like to add to the list
that UNIX systems usually have...

~~~
justincormack
I assumed the OP meant a distributed init system, distributed storage, etc.

------
hendzen
Mesosphere should really evangelize libprocess [0] more. Probably one of the
cooler C++ libraries out there.

[0] -
[https://github.com/apache/mesos/tree/master/3rdparty/libproc...](https://github.com/apache/mesos/tree/master/3rdparty/libprocess)

~~~
corysama
OK. But, what is it? I tried reading the documentation, but all it said was
"readme: this is the readme for libprocess."

~~~
adamnemecek
"Libprocess is a library written in C/C++ that provides an actor style
message-passing programming model that leverages efficient operating system
event mechanisms. Libprocess is very similar to Erlang's process model,
including basic constructs for sending and receiving messages. I'm excited
about giving people an opportunity to use this software, so look for lots more
details to be added here shortly!"

[http://www.eecs.berkeley.edu/~benh/libprocess/](http://www.eecs.berkeley.edu/~benh/libprocess/)

------
cookrn
A well discussed and related story from a few days ago:
[https://news.ycombinator.com/item?id=8694940](https://news.ycombinator.com/item?id=8694940)

~~~
preillyme
But this is our official announcement of the DCOS project. The other post was
about Ben's ideas that helped drive the creation.

~~~
throwaway892348
I don't think there was any confusion but thanks for making sure it stays that
way.

------
bc1323
The DCOS project looks amazing. The command line interface looks like a heroku
toolbelt for your very own servers. Cool server usage visualizations too.

------
tomcart
DCOS is an interesting description, as the idea of a data centre (to my tiny
mind at least) is made more fuzzy by concepts like AWS AZs.

Do people expect that the 'DC' will span AZs, regions even? Or is the
separation of these things valuable in some way?

How about the idea of dev vs prod environments? Will the isolation provided be
strong enough that we'll happily drop everything onto a single cluster of
machines?

------
dang
We changed the url from [http://techcrunch.com/2014/12/07/mesosphere-releases-
first-d...](http://techcrunch.com/2014/12/07/mesosphere-releases-first-data-
center-os-and-announces-36m-in-funding) because this one is a somewhat more
substantive article (though not the title). Via
[https://news.ycombinator.com/item?id=8715055](https://news.ycombinator.com/item?id=8715055).

~~~
23david
lol. mind blown.

"Mesosphere Announces First Data Center OS And $36M In Funding" \- Techcrunch

"Mesosphere’s new data center mother brain will blow your mind" \- GigaOM

