You've got Mesos scaling to 10,000+ physical node clusters today in production at the likes of companies like Apple and Twitter. You've got Kubernetes being adopted and developed by pretty much all of the open source heavyweights and it came from the experienced Google developed building... Google. Kubernetes ontop of Mesos is basically the holy grail in my personal opinion where you get the best Ops (mesos) story mixed with the best Dev (k8s) story. I guess we'll see how much Nomad takes off :)
Hashicorp isn't a huge company, it seems like to me their best bet is keeping the focus relatively small so they can be the best at what they do. Even if Nomad is a huge hit and is amazing, it still seems kind of sad that they couldn't simply double down and help out with kubernetes. I do find it ironic that they talk about how nomad is for microservices and then make a dig at the several microservices that k8s is made up of.
Kubernetes has support for 250 node clusters in their list of blockers: https://github.com/kubernetes/kubernetes/blob/master/docs/ro...
Mesos is the only player right now that's trying to do multitenancy; separating the concerns of resource management from scheduler algorithms, UI's, etc. You put mesos on your hardware as your datacenter kernel, then you play around with various schedulers to support different workloads.
Done properly, I'd much rather see Nomad exist as a Mesos framework, so I could colocate it along side chronos/spark/marathon and see how it fits in. Instead if I wanted to run this I have to partition up my hardware and dedicate some to it, which is the whole thing that cluster managers are supposed to save you from having to do (by running multiple workloads on the same hardware.)
I get that this is operationally simpler to set up and install, but with that simplicity comes a huge amount of lock-in, because you can't do anything else with your hardware other than run Nomad on it.
From your other comment:
> I do find it ironic that they talk about how nomad is for microservices and then make a dig at the several microservices that k8s is made up of.
Being for microservices doesn't mean you should be a microservice. Kubernetes your control plane involves 5 services (Kubelet, proxy, Docker, replication controller, etcd) that need to be up and running before you even have started an app. Then the question is what happens to your system if one or two of those go down, or if they become bottlenecks, or need to be upgraded, and so on.
Having evaluated both Mesos and Kubernetes, Nomad is a lot more attractive to me due to its simplicity and "turnkey" approach.
It does win on setup for sure. This isn't an easy problem to solve however.
Moving parts are moving parts no matter how well they are designed. More of them always add complexity, by definition; they never reduce it.
Also, their messaging seems a little ingenious. Otto talks about how important it is to support microservice development and deployment, but Nomad lists as a con that Kubernetes has too many separately deployed and composed services.
PS, I do work for Red Hat, so maybe I'm a little biased.
This is consistent with a (reasonable) belief that microservice architecture is an important design pattern to support, but may not be the best approach for all problems. From reading the docs, my sense is that Nomad takes the position that for a cluster scheduler, fewer moving parts leads to lower operational overhead, which outweighs any benefit that microservices may bring. E.g., it's more difficult to deploy a microservice platform like Nomad if the platform itself is deployed as a set of microservices.
Also no maintenance primitives, but that's just me being in love with Mesos.
I imagine that users of Nomad also have persistent state to deal with and there must be a pattern that has emerged to solve this already?
No external dependencies (which arguably simplifies ops), competetive feature set, a nice job language, the fact that Docker is optional, polished documentation, etc. Having used some of Hashicorp's other products, I've grown to expect a level of quality and pragmatism that seems to present here, too. Not being JVM-based is a big plus in my book, too.
How much production use has Nomad seen, I wonder?
Can anyone shed light on how that's possible? I was under the impression that global state in distributed systems was not possible?
Basically, there's a server that holds state for the cluster. When a scheduler attempts to load a job into the cluster, it grabs state from the aforementioned server, performs its job placement calculations, and then tries to submit its answer to the master state. Since there are many such schedulers, and the amount of time it takes to place a job is non-trivial, its possible that during the processing time to calculate placement, another scheduler might've consumed the requested resources. In this case, the first scheduler will provision whichever services are not in conflict for placement location, and perform a new calculation with the new state to place the remaining conflicted servers.
The general idea is that most services don't have such a high affinity that the components need to be started all at the same time (a MapReduce job on 10,000 nodes can still run with 5,000 nodes while the second 5k are provisioning). The tradeoff here is that you have to manage conflicts, but the hope is that there are few enough that your optimism is rewarded.
Does that make sense?
In most scheduling systems, two 10,000-node jobs that each would saturate a cluster on their own will time-share if submitted together, with each job acting in practice more like 10,000 single-node jobs. The result is usually each job getting a probabilistic 50% share of the cluster while they're both running, and then whichever one runs longer saturating the cluster once the other ends.
This scheduling system, meanwhile, would seem to just hand the 10,000 nodes over to job A, and then sleep job B until job A is done.
Admittedly, in the case where the jobs aren't submitted at the same time, and job A has already grabbed and saturated the cluster, the two cases collapse together: job B must wait (unless you want to schedule processes rather than containers; then you just degrade the cluster's performance.) But for batch-processing applications, you'd usually schedule everything to start at once, precisely so that the scheduler could interleave the jobs.
FWIW, Omega seems like the answer if the overhead for resolving inconsistencies becomes low enough such that the benefits of optimistic scheduling outweigh the costs.
http://www.slideshare.net/sameertiwari33/scheduling-on-large... See slide 6.
One note on priorities a la Borg (what I think people are actually using at Google today, but I don't know to what extent Omega was merged with it after the Omega paper or if Borg currently runs as an Omega scheduler or what), is that it sometimes causes annoyance for people at Google, as the optimal priority is not always assigned to certain workloads, leading to starvation or overpreemption of things that were not given a healthy number. I believe that Mesos' work on oversubscription took concerns around this issue into consideration.
It's really fascinating to compare the architecture of Nomad vs K8s, because both claim heavy descendence from Omega + Borg. If I squint, it seems like Nomad is more of a direct implementation based on those papers. K8s more explicitly modularizes more components, which I think is why a lot of companies are jumping behind it, as they can mold it in the ways they want.
Their big conference is today, so I expect a few more announcements.
I'm not sure I understand what is being said in this page.
Is the Nomad scheduler centralized? If so, it has been demonstrated that distributed scheduling (e.g. Mesos) leads to better throughput and availability, while achieving a placement close to a centralized approach.