
Nomad, a cluster manager and scheduler - craneca0
https://www.hashicorp.com/blog/nomad.html
======
SEJeff
I'm genuinely not sure why they are trying to compete with the likes of Mesos
or Kubernetes, or what they are really trying to achieve here. There is simply
no way they'll build a community around nomad 1/2 as big as either of the
aforementioned even if the software is really good.

~~~
fapjacks
I'm sure to a fan that "really freaking loves Mesos" you can't imagine other
software competing with Mesos, but Hashicorp writes excellent software
generally. Serf and Consul have completely sold me on Nomad, and I haven't
even used it yet. I think there will be similar draw for many people that have
used Hashicorp's software before.

~~~
SEJeff
No it isn't about koolaid, it is about technology. Consul, Vagrant, and Vault
are all _excellent_ Hashicorp technologies I've used. They are great, and some
of the best in the industry for their problem spaces.

You've got Mesos scaling to 10,000+ physical node clusters today in production
at the likes of companies like Apple and Twitter. You've got Kubernetes being
adopted and developed by pretty much all of the open source heavyweights and
it came from the experienced Google developed building... Google. Kubernetes
ontop of Mesos is basically the holy grail in my personal opinion where you
get the best Ops (mesos) story mixed with the best Dev (k8s) story. I guess
we'll see how much Nomad takes off :)

Hashicorp isn't a huge company, it seems like to me their best bet is keeping
the focus relatively small so they can be the best at what they do. Even if
Nomad is a huge hit and is amazing, it still seems kind of sad that they
couldn't simply double down and help out with kubernetes. I do find it ironic
that they talk about how nomad is for microservices and then make a dig at the
several microservices that k8s is made up of.

~~~
Rapzid
I'm sort of surprised to hear this view of the situation which is about the
complete opposite of what I've had for the past few years. I have felt it's a
shame that hashicorp have created consul, vault, a new raft implementation
(which they got tons of flack for then etcd made THEIR OWN re-write which
fixed a tons of issues this year..), serf, etc., and nobody has adopted or
built on them. K8's current secrets solution is a bit underwhelming TBH, as an
example.

Kubernetes has support for 250 node clusters in their list of blockers:
[https://github.com/kubernetes/kubernetes/blob/master/docs/ro...](https://github.com/kubernetes/kubernetes/blob/master/docs/roadmap.md#blocking-
features)

------
cpitman
This one is interesting to me. I've been a big proponent of Hashicorp's other
tooling, but this seems like an area that other projects are already
addressing (and doing well in). Choice is great, but I think I would have
preferred if they joined up with Kubernetes/Mesos/etc.

Also, their messaging seems a little ingenious. Otto talks about how important
it is to support microservice development and deployment, but Nomad lists as a
con that Kubernetes has too many separately deployed and composed services.

PS, I do work for Red Hat, so maybe I'm a little biased.

~~~
rgarcia
_Also, their messaging seems a little disingenuous. Otto talks about how
important it is to support microservice development and deployment, but Nomad
lists as a con that Kubernetes has too many separately deployed and composed
services._

This is consistent with a (reasonable) belief that microservice architecture
is an important design pattern to support, but may not be the best approach
for all problems. From reading the docs, my sense is that Nomad takes the
position that for a cluster scheduler, fewer moving parts leads to lower
operational overhead, which outweighs any benefit that microservices may
bring. E.g., it's more difficult to deploy a microservice platform like Nomad
if the platform itself is deployed as a set of microservices.

~~~
illamint
I think there's definitely a bootstrapping problem here: microservices are
great if you have something like Kubernetes, Nomad, Mesos etc. on which to run
and deploy them, but you have to run your platform on something and be able to
bring it back up if it goes down and that's where I think Nomad might have the
edge.

~~~
smarterclayton
Agree (Kubernetes and OpenShift dev here) - OpenShift is actually bundled as a
monolithic Go binary that contains the full Kubernetes stack and client, the
Openshift admin client, user client, and js web console for exactly that
reason (even though it is all technically micro services on the server side).
The single binary comes with downsides (binary is 95M) but it makes the "try
it out" flow much, much, easier to see it all working. But the converse is
true - you have to be able to decouple those bits at scale, and you eventually
will want to start leveraging the platform to run itself.

------
fidget
No solution for persistent/statefull applications, which is a real
disapointment, seeing as this is where I see orchestration systems currently
breaking new ground and coming up with interesting solutions. The contraint
systems also doesn't look too impressive; can I do the equiq of Marathon's
GROUP_BY contraint (i.e. AZ GROUP_BY 2 -> ensure I have instances running on
>=2 machines w/ different AZ values)?

Also no maintenance primitives, but that's just me being in love with Mesos.

~~~
maximegarcia
That's always my first question. Let's have a pgsql "job", or a set of mongodb
instance in cluster, backed by Docker. How do we orchestrate the data side ?
Do we need to have special nodes where the data live in and set constraints
for that ?

~~~
filearts
When the Hashicorp folk have a few minutes to breathe at their conf, I really
hope that they could address the question of stateful apps.

I imagine that users of Nomad also have persistent state to deal with and
there must be a pattern that has emerged to solve this already?

------
larryweya
I've been using Mesos with Aurora in production for almost a year and while
its very stable (zero downtime so far with loss of hosts a couple of times)
the setup process was quite tedious and not something I'm looking forward to
doing again. It also has a lot of moving parts - understand and setup
zookeeper to get mesos up, understand and setup Aurora, use something for
service discovery (I use AirBnB's Synapse for this). Plus I prefer to use
tools I can choose to look under the hood of and perhaps make some
contributions, which is a bit intimidating with both Mesos and Aurora (C/C++
and Scala). Because of this, I'm keen to try out Nomad mostly because of the
promise of no-other-dependency/single binary plus the use of a single language
across the stack - Golang

------
lobster_johnson
Having looked at both Mesos (with its various frameworks) and Kubernetes, this
immediately looks more attractive to me.

No external dependencies (which arguably simplifies ops), competetive feature
set, a nice job language, the fact that Docker is optional, polished
documentation, etc. Having used some of Hashicorp's other products, I've grown
to expect a level of quality and pragmatism that seems to present here, too.
Not being JVM-based is a big plus in my book, too.

How much production use has Nomad seen, I wonder?

------
pm90
> _Nomad is designed to be a global state, optimistically concurrent
> scheduler. Global state means schedulers get access to the entire state of
> the cluster when making decisions enabling richer constraints, job
> priorities, resource preemption, and faster placements._

Can anyone shed light on how that's possible? I was under the impression that
global state in distributed systems was not possible?

~~~
josh2600
[http://research.google.com/pubs/pub41684.html](http://research.google.com/pubs/pub41684.html)

Basically, there's a server that holds state for the cluster. When a scheduler
attempts to load a job into the cluster, it grabs state from the
aforementioned server, performs its job placement calculations, and then tries
to submit its answer to the master state. Since there are many such
schedulers, and the amount of time it takes to place a job is non-trivial, its
possible that during the processing time to calculate placement, another
scheduler might've consumed the requested resources. In this case, the first
scheduler will provision whichever services are not in conflict for placement
location, and perform a new calculation with the new state to place the
remaining conflicted servers.

The general idea is that most services don't have such a high affinity that
the components need to be started all at the same time (a MapReduce job on
10,000 nodes can still run with 5,000 nodes while the second 5k are
provisioning). The tradeoff here is that you have to manage conflicts, but the
hope is that there are few enough that your optimism is rewarded.

Does that make sense?

~~~
derefr
Sounds more like "greedy" than "optimistic" scheduling: the early jobs get the
worms.

In most scheduling systems, two 10,000-node jobs that each would saturate a
cluster on their own will time-share if submitted together, with each job
acting in practice more like 10,000 single-node jobs. The result is usually
each job getting a probabilistic 50% share of the cluster while they're both
running, and then whichever one runs longer saturating the cluster once the
other ends.

This scheduling system, meanwhile, would seem to just hand the 10,000 nodes
over to job A, and then sleep job B until job A is done.

Admittedly, in the case where the jobs _aren 't_ submitted at the same time,
and job A has already grabbed and saturated the cluster, the two cases
collapse together: job B must wait (unless you want to schedule processes
rather than containers; then you just degrade the cluster's performance.) But
for batch-processing applications, you'd usually schedule everything to start
at once, precisely so that the scheduler could interleave the jobs.

~~~
josh2600
This job swapping problem is a different sort of problem. In your example, I
have cluster saturation as a concern. Google would love to saturate their
cluster, but I would wager they usually are nowhere near saturation
(scheduling a job on Borg takes something like 90 seconds at minimum; there's
a lot of ground to cover). The innovation in optimistic schedulers is that you
can have multiple schedulers running against a master state. This is in
contrast to a single scheduler, single state system like Mesos (forgive me if
I've misspoken about Mesos, but this is how I remember it).

~~~
krenoten
Mesos is a two-level scheduler, where the elected master is more of a broker
for offers from agents to external schedulers that decide to act on the offer
or not. By default it's pessimistic, but this is pluggable and there's no
reason you can't write an optimistic allocator module that will hand out
offers to multiple schedulers.

~~~
josh2600
Part of the beauty of Omega is that there aren't just multiple schedulers, but
schedulers with different profiles (this solves the head of line problem for
long-running jobs while accelerating small jobs). Mesos, from the Borg
paper[0], seems to have issues with head of line problems (including going so
far as to silently fail jobs without queueing them).

FWIW, Omega seems like the answer if the overhead for resolving
inconsistencies becomes low enough such that the benefits of optimistic
scheduling outweigh the costs.

[0][http://www.slideshare.net/sameertiwari33/scheduling-on-
large...](http://www.slideshare.net/sameertiwari33/scheduling-on-large-
clusters) See slide 6.

~~~
krenoten
That link isn't working for me, unfortunately. I'm assuming that the HOL
you're referring to stems from the assumption of the default DRF allocator of
fast decisions, which can cause issues for mixed workloads if there's a lot of
hoarding happening. This is something that could be addressed with an
allocator module, but nobody has yet to my knowledge, leading me to think it's
not a massive problem for many people. I think the people who care about this
use case end up running multiple mesos clusters.

One note on priorities a la Borg (what I think people are actually using at
Google today, but I don't know to what extent Omega was merged with it after
the Omega paper or if Borg currently runs as an Omega scheduler or what), is
that it sometimes causes annoyance for people at Google, as the optimal
priority is not always assigned to certain workloads, leading to starvation or
overpreemption of things that were not given a healthy number. I believe that
Mesos' work on oversubscription took concerns around this issue into
consideration.

It's really fascinating to compare the architecture of Nomad vs K8s, because
both claim heavy descendence from Omega + Borg. If I squint, it seems like
Nomad is more of a direct implementation based on those papers. K8s more
explicitly modularizes more components, which I think is why a lot of
companies are jumping behind it, as they can mold it in the ways they want.

------
bluecmd
That's... no small release. If it indeed has everything they claim that's
extraordinary. What's the catch? Why haven't they made more noise about this?

~~~
jedberg
I think this is their noise. :)

Their big conference is today, so I expect a few more announcements.

------
mapunk
Side note to Hashicorp devs: The Products section of your homepage is
virtually unreadable on Windows/Chrome:
[https://i.imgur.com/8st8HQk.png](https://i.imgur.com/8st8HQk.png)

~~~
nailer
Hashi folk: this actually hit my own site a few weeks ago, learning from the
experience: OS X renders fonts better, even on the same non-retina display,
than Windows does. If you have something font-weight: 200 or less, it's fine
on OS X, but it's completely unusable on Windows.

~~~
mapunk
400 weight made it look a bit more clear for me.

------
a-priori
Anyone able to give a compare/contrast here with other cluster management
systems... Apache Mesos or Kubernetes, for example?

~~~
nathankleyn
Hashicorp themselves have published comparisons to Kubernetes, Mesos, et al on
the Nomad site[1]. They look well written and generally not too biased.

[1]:
[https://www.nomadproject.io/intro/vs/](https://www.nomadproject.io/intro/vs/)

~~~
covi
[https://www.nomadproject.io/intro/vs/mesos.html](https://www.nomadproject.io/intro/vs/mesos.html)

I'm not sure I understand what is being said in this page.

Is the Nomad scheduler centralized? If so, it has been demonstrated that
distributed scheduling (e.g. Mesos) leads to better throughput and
availability, while achieving a placement close to a centralized approach.

------
cheeseprocedure
Nomad appears to share some of Consul's internals, but it does not seem
possible to use an existing Consul cluster as backing KV store/lock
provider/etc. I'd like to understand why.

------
artursapek
Hashicorp is all in on Golang.

~~~
AYBABTME
Go is a pretty good choice of language to build stuff like that. It's easy to
distribute, it's easy to write tools for, it's well known by a bunch of people
in the domain.

------
porker
Congrats to Hashicorp, it's always exciting to see another release from you.
Now I just have to find an excuse to use them... ;-)

------
sybhn
They're building a nice little ecosystem...

------
EFruit
This might stray a bit from the core topic, but does anyone have resources
(preferably free) about the theory behind what Nomad covers? Batch/job/task
Scheduling, etc.

------
dedene
Does anyone know how Nomad plays together with CoreOS?

