Hacker News new | comments | show | ask | jobs | submit login
Nomad, a cluster manager and scheduler (hashicorp.com)
263 points by craneca0 on Sept 28, 2015 | hide | past | web | favorite | 46 comments

I'm genuinely not sure why they are trying to compete with the likes of Mesos or Kubernetes, or what they are really trying to achieve here. There is simply no way they'll build a community around nomad 1/2 as big as either of the aforementioned even if the software is really good.

I'm sure to a fan that "really freaking loves Mesos" you can't imagine other software competing with Mesos, but Hashicorp writes excellent software generally. Serf and Consul have completely sold me on Nomad, and I haven't even used it yet. I think there will be similar draw for many people that have used Hashicorp's software before.

No it isn't about koolaid, it is about technology. Consul, Vagrant, and Vault are all excellent Hashicorp technologies I've used. They are great, and some of the best in the industry for their problem spaces.

You've got Mesos scaling to 10,000+ physical node clusters today in production at the likes of companies like Apple and Twitter. You've got Kubernetes being adopted and developed by pretty much all of the open source heavyweights and it came from the experienced Google developed building... Google. Kubernetes ontop of Mesos is basically the holy grail in my personal opinion where you get the best Ops (mesos) story mixed with the best Dev (k8s) story. I guess we'll see how much Nomad takes off :)

Hashicorp isn't a huge company, it seems like to me their best bet is keeping the focus relatively small so they can be the best at what they do. Even if Nomad is a huge hit and is amazing, it still seems kind of sad that they couldn't simply double down and help out with kubernetes. I do find it ironic that they talk about how nomad is for microservices and then make a dig at the several microservices that k8s is made up of.

I'm sort of surprised to hear this view of the situation which is about the complete opposite of what I've had for the past few years. I have felt it's a shame that hashicorp have created consul, vault, a new raft implementation (which they got tons of flack for then etcd made THEIR OWN re-write which fixed a tons of issues this year..), serf, etc., and nobody has adopted or built on them. K8's current secrets solution is a bit underwhelming TBH, as an example.

Kubernetes has support for 250 node clusters in their list of blockers: https://github.com/kubernetes/kubernetes/blob/master/docs/ro...

It's just sad that there's yet another player that wants to be at the very bottom of the clustering stack. For physical resource management (memory and CPUs, etc), by definition you have to pick one solution and stick with it (because you don't want multiple cluster managers thinking they exclusively own the same hardware.)

Mesos is the only player right now that's trying to do multitenancy; separating the concerns of resource management from scheduler algorithms, UI's, etc. You put mesos on your hardware as your datacenter kernel, then you play around with various schedulers to support different workloads.

Done properly, I'd much rather see Nomad exist as a Mesos framework, so I could colocate it along side chronos/spark/marathon and see how it fits in. Instead if I wanted to run this I have to partition up my hardware and dedicate some to it, which is the whole thing that cluster managers are supposed to save you from having to do (by running multiple workloads on the same hardware.)

I get that this is operationally simpler to set up and install, but with that simplicity comes a huge amount of lock-in, because you can't do anything else with your hardware other than run Nomad on it.

Yeah, it feels almost like the mess of JavaScript libraries of the last five years: Everybody is dropping a hook into the water with some newfangled containerization technology, hoping to make it big.

This is a space where you can compete on operational complexity. Mesos and Kubernetes are fairly complicated to set up and run, with dependencies and intricacies that Nomad just doesn't have.

From your other comment:

> I do find it ironic that they talk about how nomad is for microservices and then make a dig at the several microservices that k8s is made up of.

Being for microservices doesn't mean you should be a microservice. Kubernetes your control plane involves 5 services (Kubelet, proxy, Docker, replication controller, etcd) that need to be up and running before you even have started an app. Then the question is what happens to your system if one or two of those go down, or if they become bottlenecks, or need to be upgraded, and so on.

Having evaluated both Mesos and Kubernetes, Nomad is a lot more attractive to me due to its simplicity and "turnkey" approach.

When they become bottlenecks, you scale them out. That is the beauty of the design, vs a monolith like nomad :)

It does win on setup for sure. This isn't an easy problem to solve however.

Not all bottlenecks scale linearly (and "bottleneck" was just one of several words I used). I'm sure Kubernetes is well-designed and battle-tested, but to most developers these are black boxes that need to be studied and learned, each with its own set of complexities and workarounds and warts. And so on.

Moving parts are moving parts no matter how well they are designed. More of them always add complexity, by definition; they never reduce it.

This one is interesting to me. I've been a big proponent of Hashicorp's other tooling, but this seems like an area that other projects are already addressing (and doing well in). Choice is great, but I think I would have preferred if they joined up with Kubernetes/Mesos/etc.

Also, their messaging seems a little ingenious. Otto talks about how important it is to support microservice development and deployment, but Nomad lists as a con that Kubernetes has too many separately deployed and composed services.

PS, I do work for Red Hat, so maybe I'm a little biased.

Also, their messaging seems a little disingenuous. Otto talks about how important it is to support microservice development and deployment, but Nomad lists as a con that Kubernetes has too many separately deployed and composed services.

This is consistent with a (reasonable) belief that microservice architecture is an important design pattern to support, but may not be the best approach for all problems. From reading the docs, my sense is that Nomad takes the position that for a cluster scheduler, fewer moving parts leads to lower operational overhead, which outweighs any benefit that microservices may bring. E.g., it's more difficult to deploy a microservice platform like Nomad if the platform itself is deployed as a set of microservices.

I think there's definitely a bootstrapping problem here: microservices are great if you have something like Kubernetes, Nomad, Mesos etc. on which to run and deploy them, but you have to run your platform on something and be able to bring it back up if it goes down and that's where I think Nomad might have the edge.

Agree (Kubernetes and OpenShift dev here) - OpenShift is actually bundled as a monolithic Go binary that contains the full Kubernetes stack and client, the Openshift admin client, user client, and js web console for exactly that reason (even though it is all technically micro services on the server side). The single binary comes with downsides (binary is 95M) but it makes the "try it out" flow much, much, easier to see it all working. But the converse is true - you have to be able to decouple those bits at scale, and you eventually will want to start leveraging the platform to run itself.

Did you mean ingenious? Or disingenuous?

Ha, you are correct, I meant disingenuous.

No solution for persistent/statefull applications, which is a real disapointment, seeing as this is where I see orchestration systems currently breaking new ground and coming up with interesting solutions. The contraint systems also doesn't look too impressive; can I do the equiq of Marathon's GROUP_BY contraint (i.e. AZ GROUP_BY 2 -> ensure I have instances running on >=2 machines w/ different AZ values)?

Also no maintenance primitives, but that's just me being in love with Mesos.

That's always my first question. Let's have a pgsql "job", or a set of mongodb instance in cluster, backed by Docker. How do we orchestrate the data side ? Do we need to have special nodes where the data live in and set constraints for that ?

When the Hashicorp folk have a few minutes to breathe at their conf, I really hope that they could address the question of stateful apps.

I imagine that users of Nomad also have persistent state to deal with and there must be a pattern that has emerged to solve this already?

I work for ClusterHQ and we make a tool called Flocker which aims to solve this problem.

I've been using Mesos with Aurora in production for almost a year and while its very stable (zero downtime so far with loss of hosts a couple of times) the setup process was quite tedious and not something I'm looking forward to doing again. It also has a lot of moving parts - understand and setup zookeeper to get mesos up, understand and setup Aurora, use something for service discovery (I use AirBnB's Synapse for this). Plus I prefer to use tools I can choose to look under the hood of and perhaps make some contributions, which is a bit intimidating with both Mesos and Aurora (C/C++ and Scala). Because of this, I'm keen to try out Nomad mostly because of the promise of no-other-dependency/single binary plus the use of a single language across the stack - Golang

Having looked at both Mesos (with its various frameworks) and Kubernetes, this immediately looks more attractive to me.

No external dependencies (which arguably simplifies ops), competetive feature set, a nice job language, the fact that Docker is optional, polished documentation, etc. Having used some of Hashicorp's other products, I've grown to expect a level of quality and pragmatism that seems to present here, too. Not being JVM-based is a big plus in my book, too.

How much production use has Nomad seen, I wonder?

>Nomad is designed to be a global state, optimistically concurrent scheduler. Global state means schedulers get access to the entire state of the cluster when making decisions enabling richer constraints, job priorities, resource preemption, and faster placements.

Can anyone shed light on how that's possible? I was under the impression that global state in distributed systems was not possible?


Basically, there's a server that holds state for the cluster. When a scheduler attempts to load a job into the cluster, it grabs state from the aforementioned server, performs its job placement calculations, and then tries to submit its answer to the master state. Since there are many such schedulers, and the amount of time it takes to place a job is non-trivial, its possible that during the processing time to calculate placement, another scheduler might've consumed the requested resources. In this case, the first scheduler will provision whichever services are not in conflict for placement location, and perform a new calculation with the new state to place the remaining conflicted servers.

The general idea is that most services don't have such a high affinity that the components need to be started all at the same time (a MapReduce job on 10,000 nodes can still run with 5,000 nodes while the second 5k are provisioning). The tradeoff here is that you have to manage conflicts, but the hope is that there are few enough that your optimism is rewarded.

Does that make sense?

Sounds more like "greedy" than "optimistic" scheduling: the early jobs get the worms.

In most scheduling systems, two 10,000-node jobs that each would saturate a cluster on their own will time-share if submitted together, with each job acting in practice more like 10,000 single-node jobs. The result is usually each job getting a probabilistic 50% share of the cluster while they're both running, and then whichever one runs longer saturating the cluster once the other ends.

This scheduling system, meanwhile, would seem to just hand the 10,000 nodes over to job A, and then sleep job B until job A is done.

Admittedly, in the case where the jobs aren't submitted at the same time, and job A has already grabbed and saturated the cluster, the two cases collapse together: job B must wait (unless you want to schedule processes rather than containers; then you just degrade the cluster's performance.) But for batch-processing applications, you'd usually schedule everything to start at once, precisely so that the scheduler could interleave the jobs.

This job swapping problem is a different sort of problem. In your example, I have cluster saturation as a concern. Google would love to saturate their cluster, but I would wager they usually are nowhere near saturation (scheduling a job on Borg takes something like 90 seconds at minimum; there's a lot of ground to cover). The innovation in optimistic schedulers is that you can have multiple schedulers running against a master state. This is in contrast to a single scheduler, single state system like Mesos (forgive me if I've misspoken about Mesos, but this is how I remember it).

Mesos is a two-level scheduler, where the elected master is more of a broker for offers from agents to external schedulers that decide to act on the offer or not. By default it's pessimistic, but this is pluggable and there's no reason you can't write an optimistic allocator module that will hand out offers to multiple schedulers.

Part of the beauty of Omega is that there aren't just multiple schedulers, but schedulers with different profiles (this solves the head of line problem for long-running jobs while accelerating small jobs). Mesos, from the Borg paper[0], seems to have issues with head of line problems (including going so far as to silently fail jobs without queueing them).

FWIW, Omega seems like the answer if the overhead for resolving inconsistencies becomes low enough such that the benefits of optimistic scheduling outweigh the costs.

[0]http://www.slideshare.net/sameertiwari33/scheduling-on-large... See slide 6.

That link isn't working for me, unfortunately. I'm assuming that the HOL you're referring to stems from the assumption of the default DRF allocator of fast decisions, which can cause issues for mixed workloads if there's a lot of hoarding happening. This is something that could be addressed with an allocator module, but nobody has yet to my knowledge, leading me to think it's not a massive problem for many people. I think the people who care about this use case end up running multiple mesos clusters.

One note on priorities a la Borg (what I think people are actually using at Google today, but I don't know to what extent Omega was merged with it after the Omega paper or if Borg currently runs as an Omega scheduler or what), is that it sometimes causes annoyance for people at Google, as the optimal priority is not always assigned to certain workloads, leading to starvation or overpreemption of things that were not given a healthy number. I believe that Mesos' work on oversubscription took concerns around this issue into consideration.

It's really fascinating to compare the architecture of Nomad vs K8s, because both claim heavy descendence from Omega + Borg. If I squint, it seems like Nomad is more of a direct implementation based on those papers. K8s more explicitly modularizes more components, which I think is why a lot of companies are jumping behind it, as they can mold it in the ways they want.

It's possible but there are tradeoffs depending on what you mean by "global" and "state". If the state is strongly consistent (CP) then during a network partition the minority side of the partition will basically shut down and stop working. If the state is eventually consistent (AP) then it will become inconsistent during a network partition and have to be reconciled after the partition heals.

Why would you say that global state in a distributed system is not possible? The entire internet is a distributed system with global state. The state needs to be aware of the context and access, that's all.

That's... no small release. If it indeed has everything they claim that's extraordinary. What's the catch? Why haven't they made more noise about this?

I think this is their noise. :)

Their big conference is today, so I expect a few more announcements.

Side note to Hashicorp devs: The Products section of your homepage is virtually unreadable on Windows/Chrome: https://i.imgur.com/8st8HQk.png

Hashi folk: this actually hit my own site a few weeks ago, learning from the experience: OS X renders fonts better, even on the same non-retina display, than Windows does. If you have something font-weight: 200 or less, it's fine on OS X, but it's completely unusable on Windows.

400 weight made it look a bit more clear for me.

Anyone able to give a compare/contrast here with other cluster management systems... Apache Mesos or Kubernetes, for example?

Hashicorp themselves have published comparisons to Kubernetes, Mesos, et al on the Nomad site[1]. They look well written and generally not too biased.

[1]: https://www.nomadproject.io/intro/vs/


I'm not sure I understand what is being said in this page.

Is the Nomad scheduler centralized? If so, it has been demonstrated that distributed scheduling (e.g. Mesos) leads to better throughput and availability, while achieving a placement close to a centralized approach.

Kelsey Hightower plans on giving a talk on this tomorrow at HashiConf.

Nomad appears to share some of Consul's internals, but it does not seem possible to use an existing Consul cluster as backing KV store/lock provider/etc. I'd like to understand why.

Hashicorp is all in on Golang.

Go is a pretty good choice of language to build stuff like that. It's easy to distribute, it's easy to write tools for, it's well known by a bunch of people in the domain.

Congrats to Hashicorp, it's always exciting to see another release from you. Now I just have to find an excuse to use them... ;-)

They're building a nice little ecosystem...

This might stray a bit from the core topic, but does anyone have resources (preferably free) about the theory behind what Nomad covers? Batch/job/task Scheduling, etc.

Does anyone know how Nomad plays together with CoreOS?

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact