Hacker News new | past | comments | ask | show | jobs | submit login
Nomad v1.0 release – workload orchestration (github.com/hashicorp)
71 points by jrnkntl on Dec 8, 2020 | hide | past | favorite | 26 comments



There's a blog post as well. I'm unsure what makes for the most relevant link: https://www.hashicorp.com/blog/announcing-general-availabili...

I'm the Nomad Team Engineering Lead and would be happy to answer any questions people might have.


Ah, I've submitted this as soon as I saw the notification of the new release on Github, maybe the blog post was indeed more relevant.

Congratulations on this milestone release, we're using Nomad since March this year on a single 'bare-metal' server and it serves our needs perfectly. We set it up with a simple gorilla/mux API in front and use the Nomad API to submit jobs from all our other applications and it works flawlessly.

With regards to 1.0's features:

HCL2 is a welcome addition for us since we had a lot of repetition in our job files using artifacts for tasks.

Also the addition of the PostStop lifecycle couldn't come at a better time, we were discussing workarounds for this recently.

One area of potential improvement would be the behaviour of file/directory permissions through different task drivers, I know this is totally dependent of the different drivers we can use with Nomad but more than once we bumped into this while setting up our jobs (and others too [1][2])

[1] https://github.com/hashicorp/nomad/issues/2625 [2] https://github.com/hashicorp/nomad/issues/8892

Thanks for all the work your team did. I am a big fan of the Hashicorp ecosystem.


> One area of potential improvement would be the behaviour of file/directory permissions through different task drivers, I know this is totally dependent of the different drivers we can use with Nomad but more than once we bumped into this while setting up our jobs (and others too [1][2])

Thanks for mentioning these. Everyone interested should definitely +1 them as we do use reaction emoji during prioritization.

The task driver dependent nature does make this tricky, but since the Nomad agent is usually run as root and controls the allocation/task directory tree we should have some options here.


I'm a big fan conceptually of Nomad, and I love the streams that the Nomad team does to talk about development and answer questions.

My only comments would be:

I wish there was more content available (maybe on HashiCorp Learn Nomad) for working with single-instance Nomad clusters.

And a demo of realworld dev-to-prod workflow. Something like "Okay here's a local Docker Compose setup with Postgres, a backend API, and a frontend web app, and here's the workflow for getting it into production with Nomad."


Thanks that's great feedback and you're not alone in that desire.

We do "small" users a pretty big disservice by effectively dismissing single-server deployments and jumping straight to "real" production deployments (3+ servers distinct from the nodes that run workloads).

We have people who use Nomad locally as a systemd alternative. We have people who use single-scheduler-multiple-nodes clusters at home or at work, and it works quite well! If the scheduler crashes, work continues to run uninterrupted, so there's little reason not to start small and scale up.

The problem is largely that it's tricky to separate out and accurately address these various personas. People looking for a systemd alternative are obviously highly technical and will likely figure everything out through experimentation. However, "small" cluster users need to be carefully educated on the differences between an HA cluster (3+ schedulers) and a single scheduler cluster.

Not only that, but we would need to automate testing for each of these recommendations to ensure changes don't break an official recommendation.


For single node or small cluster deployments, one issue with k8s is the significant CPU and RAM usage even on a fully idle system with no payload. How is Nomad in that regard? Are the controllers also implemented as a control loop that is constantly polling and diffing reported and desired state?


Granted this is anecdotal, but I found Nomad to have much more reasonable resource usage compared to K3s on my pi cluster


Thanks. Seems like it’s worth looking into it then. Still curious about the reasons why it’s more efficient.


Nomad was designed for efficiency (linear scalability) and performance from the very beginning. One of major aspects of this is that it is monolithic. The single nomad binary contains:

- The CLI for interacting with Nomad clusters via their HTTP API

- The HTTP API

- The "server" agent which contains the scheduler, state store, and Raft implementation

- The "client" agent which runs workloads and has builtin task drivers for docker, plain processes, and more.

Not only is there no dependency on an external state store like etcd, but there's no pluggable abstraction for state storage. The tradeoff is that all cluster state must fit in memory. Pi clusters should use far less than 100mb of memory while something like C2M (6,100 nodes, 2 million containers) used just under 100gb of memory. Memory use scales linearly with cluster size (nodes+containers).

Since Raft and the scheduler are colocated within a single server agent process, many operations can be performed in the replicated FSM as a simple function call instead of if Raft was an external process requiring networking overhead to interact with.

I'm not very familiar with k8s internals, so I'm afraid I can't offer a very detailed direct comparison.


Thanks for that detailed explanation. The fact that all main functions live in a single process probably helps a lot with CPU and RAM usage, compared to k8s where etcd, API server, scheduler and controller manager live in independent processes, which means serialization/deserialization and less optimization opportunities.


Yeah, I mean also from a financial/business standpoint I totally get it.

The incentive is to target content efforts towards enterprise and large-scale orgs. Because who cares if the indie dev or small startup is running your stack, let's be real.


I’d be happy to know more about the single node use case. I considered Nomad in the past for this, but it seemed optimized for a cluster, and I ended up using Docker Swarm instead.


Same for me, i did stick to docker-compose. A production-grade single server deployment would be a real interesting case.


The big problem for me with Docker Compose is the lack of zero downtime / seamless deployments. First it stops the containers running the old version, then it starts new containers running new versions, instead of doing the reverse.


First off, very cool. I'm a big Hashicorp fan, and have been following Nomad for a while, so it's exciting you hit 1.0!

It seems to me one of the advantages K8s has is that there are multiple "Kubernetes as a Service" services out there, such as EKS on AWS, Google Cloud's offering, even Digital Ocean.

Are there any plans to make Nomad Enterprise more accessible? It seems a managed Nomad hits the sweet spot of simple scheduling, which removes a lot of the K8s bloat for many people.


Thanks! I think you really hit the nail on the head. In a twitter thread I talk about how Nomad 1.0 represents us implementing the "table stakes", the base foundation, for an orchestrator. However, I think what you're noticing is how the table stakes have changed from 5 years ago: orchestrators need hosted options. k8s has push-button support from every major cloud. We can't just ignore that.

And we aren't ignoring that. I have no idea what I can say publicly, so I'll just link to what our cofounder/CTO has already said:

> And hosted Nomad clusters are on the way

https://twitter.com/mitchellh/status/1334198278225682434


Thanks! I wish Hashicorp the best of luck getting hosted clusters out the door, totally agree with you about table stakes.


I really like the idea of Nomad as a simpler alternative to Kubernetes. The documentation is great and provides a lot of examples, but I wish I could find a starter guide on how to deploy a complete application with it and Consul. Something like, here’s how to deploy a Rails app with Sidekiq workers and database migrations.

This along with Waypoint seems like a great solution for smaller side projects.


I like it, too, but with the integration of CNI it's no longer so much simpler than Kubernetes. Where Kubernetes has etcd Nomad requires Consul which arguably might be simpler.

To compete with Kubernetes Nomad will most likely acquire more features (e.g. storage) until it's as complex as Kubernetes because it's just what some businesses seem to want.

On the other hand Kubernetes gets simpler with projects like k3s or k0s.


Nomad has supported CSI for storage since 0.11. It is definitely a challenge since despite being an orchestrator agnostic standard, storage vendors often assume k8s and only support it.

That being said Nomad's CSI support doesn't impact clusters that don't optin to use it. Jobs that use host volumes or ephemeral volumes still work. Only using Nomad for stateless workloads still works. We try very hard to introduce new features in a way that only impacts people who use them.

While the principle is the same for CNI, our migration to group networks ("groups" in Nomad are like "pods" in k8s) and away from task networks has been more painful than we had hoped. Existing jobs should still work with task networks and we're rapidly trying to fix differences in the two approaches.

Nomad's Consul dependency does introduce complexity. The migration to group networks actually included a change that made service addressing available to servers in such a way that Nomad could offer native service discovery. It's still being discussed whether we want to pursue that since offering multiple solutions has obvious downsides as well.


Thanks for your comprehensive answer. As I said I like Nomad and in fact I'd probably prefer to use Nomad if Kubernetes hadn't become kind of an industry standard. When people are free to choose they should by all means use Nomad :)

That said Kubernetes is also simple with stateless workloads or host local storage. It becomes complex when you are using some kind of cluster managed storage and I guess there will be demand for it on Nomad (when you are lucky) and there will be no way around that complexity.

Eventually there will also be things like Nomad Operators and such to handle the increasing complexity :)

Anyway keep up the good work!


Podman can also deploy from Kubernetes config https://github.com/containers/podman/blob/master/docs/source...


There's no requirement on Consul for deploying Nomad, and storage support is in Beta right now.


It is an opportunity for someone/something to take docker swarm place since complexity of Kubernetes might scare lots of people


Moving namespaces from enterprise to open source is a big step forward, kudos! In the past many discussions ended with just choosing kubernetes because isolating containers was not possible within nomad.


You do not need Nomad too!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: