For some reason Nomad seems to get noticeably less publicity than some of the other Hashicorp offerings like Consul, Vault, and Terraform. In my opinion Nomad is right up there with them. The documentation is excellent. I haven’t had to fix any upstream issues in about a year of development on two separate Nomad clusters. Upgrading versions live is straightforward, and I rarely find myself in a situation where I can’t accomplish something I envisioned because Nomad is missing a feature. It schedules batch jobs, cron jobs, long running services, and system services that run on every node. It has a variety of job drivers outside of Docker.
Nomad, Consul, Vault, and the Consul-aware Fabio load balancer run together to form most of what one might need for a cluster scheduler based deployment, somewhat reminiscent of the “do one thing well” Unix philosophy of composability.
Certainly it isn’t perfect, but I’d recommend it to anyone who is considering using a cluster scheduler but is apprehensive about the operational complexity of the more widely discussed options such as Kubernetes.
With the velocity of k8s it's hard to imagine how Nomad could catch/keep up. K8s has operators, Helm, etc. That just means you can add battle-tested components off the shelve with a single command. So, less wheel-inventing and boilerplate writing to do for us.
With the backing of so much larger community/entities it also feels like I’m less likely to be the first one to discover a new bug. RedHat or Google or one of their customers will have hit and fixed it already, and my production platform keeps humming along nicely. K8s has just had more flytime and exposure to crazy environments and workloads, so more kinks are going to be ironed out.
I always did like the “do one thing right” unixy approach of Hashicorp’s toolset, and that you can pick the pieces you like. But (sadly for them) that means I can now pick Vault or Consul and run it on top of Kubernetes (re-using k8s' internal etcd is not recommended) if I wanted. I'm actually not overly sorry for them, seeing as how they're locking up more & more features behind enterprise products. I haven't checked in a while but wouldn't be surprised if they also had a Nomad Enterprise already. Nothing wrong with HashiCorp wanting to make money, but if there also is k8s without those restrictions..
Kubernetes seems to be a lot of magic and NIH and tries to do everything itself, whereas Mesos and Nomad are nicely composable and easy to reason about.
Nomad's biggest benefit for me is a very nice integration with Vault (and Consul), I can have Nomad ask for a container instance specific secret which Vault then goes and generates and later immediately revokes once that container dies. Maybe this is possible with Kubernetes but I have not seen anything that tight yet.
IAM instance profiles are nice but they are instance wide, but having each container a unique, short lived and properly scoped set of secrets injected at the last possible time and immediately revoked afterwards makes me feel all warm and fuzzy inside.
Not heard that criticism before, what are you referring to in particular? The NIH part seems incongruous to me, since Google were a major contributor in inventing warehouse scale computing and cluster schedulers (c.f. the Borg and Omega papers, etc.).
I would have to put so much effort in convincing customers and management to not go the (now almost default?) Kubernetes-route, that it's risky trying something else. A small hiccup in Nomad, would be enough for the pitchforks to come out.
The biggest benefits seem to be
(1) simplicity, but GCE and minikube are easy enough to learn in a day and
(2) ability to run non-containers, but docker containers are generic - they can run java apps just fine.
Nomad is operationally simple, you can run it out of your normal devops roles, you don't need dedicated staff. Mostly because you can pretty easily wrap your head around what it does and how it works.
This saves you bundles of cash and time.
I hope whatever you are running under k8s isn't crucial or important, and I really hope I'm not a customer of whatever you "operate".
Maintenance is real, that applies to everything if you want it to work reliably for any length of time. There are various ways to handle maintenance, do a little consistently and constantly (what most of us professionals do) or do large bulk-replacements every X time (like when stuff crashes and burns - and nobody can remember how to fix it, so they just replace it with whatever is new and shiny).
Upgrading it is hard. Especially with stuff like Kafka/ZooKeeper run on a K8S cluster.
I am a 1 man shop. I manage my cluster in ~10 minutes per month.
AWS is the new one, just started a few weeks ago.
Additionally, in the early days there were some tools missing (like online modifying the raft peer members) that are all there now.
Running in production and very happy with it!