Hacker News new | past | comments | ask | show | jobs | submit login
Cloudflare Uses HashiCorp Nomad (2020) (cloudflare.com)
150 points by saranshk 46 days ago | hide | past | favorite | 29 comments

2020 indicator would be great.

I recently looked closer into my billing data for GKE (Google managed Kubernetes) and was astounded by the overhead taken up just by the Kubernetes internals. Something like 10-20%. It might be better if using bigger node types.

how does Nomad compares on this front?

Nomad is very lightweight. It also does nothing near as much as what Kubernetes does. But if all you need is to schedule containers (or other binaries) it does so quite nicely.

You generally want to pair it with Consul (as Cloudflare has done) to get service discovery. Consul is also quite lightweight.

Having run Kubernetes (GKE) in production and AKS/EKS in trials, that overhead is a fixed marginal cost, not one that grows with node size. When running a small cluster with 1-3 nodes, there are reasons to scale out first (reliability), but margin will improve more quickly by scaling up nodes.

GCP makes that conveniently pretty easy with their pricing model for CPU and memory, so it is possible to do this incrementally.

For capacity planning you have to look at worst case behaviors, and we have a lot of Pollyannas running around crowing about best and average case behaviors.

A consensus protocol has O(logn) behavior on any network that displays any of the 8 fallacies of distributed computing. But the larger the cluster the more fallacies you're likely to have to deal with in a given day, and O(logn) is too optimistic. If the cost is ever 'marginal', it's in little islands of stability that won't last long.

What I see over and over again is people expending huge opportunity costs trying to keep their brittle system in one of these local maxima as long as they can. I think because they fear that once they slip out of that comfy spot people will see they aren't some miracle worker, they're just slightly above average and really good at story telling.

Hey, I think there may have been a misunderstanding. Definitely agree with you on the complexity of weighing O notation costs for distributed systems.

What I'm talking about, and I think the OP I replied to is referring to, is the monetary and compute cost in CPU and memory overhead of the kubelet and it's associated processes on real-world deployments. There are plenty of other costs associated with Kubernetes, of course.

GKE charges a fixed cost to operate the consensus protocol (etcd) and control plane (kube-apiserver) on their own systems.

On the nodes the user operates, the costs then are relatively fixed, that is, there is some amount of CPU time and memory spent per node on logging, metrics, and so on, but as the node gets larger, that quantity does not increase. (It's definitely sublinear.)

Or in other words: you can more greatly increase the capacity of a GKE cluster by doubling the size (CPU/memory) than doubling the number of nodes. Scaling up will increase gross capacity by 100% while increasing overhead by a few percent at most, scaling out will have the same increase in gross capacity, but also increase the amount spent on node overhead by 100%.

I'm honestly not sure what your point is. The cloud provider literally charges a fixed cost irrespective of the number of nodes for managed kubernetes. Sure they may need to deal with scaling issues and costs but that's not the concern of a user who pays a flat fee.

If the hosting dollar figure for K8s is the only cost you can think of, you're down at the bottom of a well of magic thinking that I'm not qualified to hoist you out of.

Huh? This is a thread discussing the hosting costs of kubernetes. I get it, you've got an axe to grind and will apparently do so at any opportunity but I don't see how it's relevant to the discussion.

The only axe I have in this story is people who downplay the costs of solutions as part of a discussion of pros and cons.

We regularly call out pharmaceutical and petrochemical companies here for doing that. I don’t know why you would expect tech to get a free pass.

The most important thing is that you don’t fool yourself, and you are the easiest person to fool.

To be fair, average case is useful. It's the reason probabalistic algorithms like quicksort (O(N^2)) and hash tables (O(N)) can work at all in practice. The catch is you have to know what distribution your average is being taken over (Hash tables can be assumed to be a uniform distribution if your hash function is good enough; quicksort can be a uniform distribution, but noone writes it that way until after it blows up in their face, if then.)

You do have to know about worst case behaviors.

There are no circumstances where best case behavior is worth even calculating for anything other than curiousity, unless you're trying to deny said behavior to a cryptographic adversary (so it's actually your worst case).

I spun up a 3-node k8s cluster a year back, and was shocked by how much CPU it used when doing nothing. It just felt so utterly wasteful when compared to the likes of Docker Swarm.

I totally understand there are use cases for which k8s is a great fit because it has so many capabilities, but I often see folk on HN advocating for using it everywhere, because it's "so simple" once you understand it. I just don't get it.

For work I can totally understand this trade-off but this fact makes k8s pretty much a non-starter on side projects or personal sites because running a small cluster on small nodes leaves you so little headroom for your code that it's just not worth it. Swarm/Nomad/pacemaker all sip resources by comparison.

The history of that is actually a bit funny.

You see, there used to be no such "fixed overhead" in early versions (including post 1.0) of k8s.

This turned out to be a worse idea for less involved or experienced operators than fixed overhead, because it turned out that people would run k8s nodes way underpowered, load them with a ton of workload, then have lots of outages as they starved critical system components of resources.

Because of that a (settable) "overhead minimum" was added to calculation done about available resources, iirc originally going for 0.9 cpu core and I don't recall how much memory. This allowed to still run a 2 core node (though it's really not recommended) for experimentation, while greatly lowering chances of obscure for newbies issues. It didn't prevent them (PSA: If another team provisions your cluster, check what resources they provisioned...) but it makes it much harder to fail.

I got all set up with a Pi-clone cluster and then K3s decided that an extra 100MB of memory savings was not worth maintaining a separate implementation of part of the services.

On a 16GB machine that's not a big deal. On a 2GB machine that's a significant fraction of your disk cache, for a very slow disk. On a 1GB machine that's just make or break.

ETA: And you may think that these toys don't matter, but people have to learn and buy-in somewhere, and the fact of the matter is that I have production services right now that mostly need bandwidth, not CPU or memory, and once I tune them to the most appropriate AWS instance for that workload, the Kubernetes overhead would become the plurality of resource usage on these boxes (if I were using k8s, which we are not at present). K8s only scales by getting into bin packing, where this latency-sensitive service runs with substantially more potentially noisy neighbors.

I mean for GKE, there is a free tier that covers the cost of a autopilot cluster[0], which means basically no overhead cost of running on GKE instead of a Compute Engine node.

[0]: https://cloud.google.com/kubernetes-engine/pricing#cluster_m...

actually that is not really true - i strongly urge you to try out http://k3s.io/ or https://k0sproject.io/

these are full-fledged, certified k8s distributions that run on raspberry pi as well as all the way in production.



While Cloudflare does run Nomad, a lot of services run on Kubernetes as well[1]. Different tools for different use cases.

(Disclaimer: engineer at Cloudflare, and author of blog post linked below.)

[1]: https://blog.cloudflare.com/high-availability-load-balancers...

The developers at riot games are also always swooning over Nomad and have published quite a couple of talks on YouTube as well as articles on their dev blog.

I’ve looked into kubernetes but after I got it all going I decided it’s not worth the complexity. Same experience as installing gentoo :-)

I decided not to learn k8s because I really think something should be coming soon to replace it (or hide it). Kind of like how DistBelief became Tensorflow, but now we have keras and other friendly NN libraries on top.

I think Borg was DistBelief, Tensorflow is Kubertenes and I’m waiting for Keras. Helm isn’t there yet IMHO so I’m still waiting.

Do you have any links? I had a look at their dev blog, but everything I read was about an in-house orchestration system that they wrote - I couldn't find anything about them using Nomad.

Sorry for the late reply.

Yes, they had their own system before, but they replaced it with somethign else. However, it turns that I read this too long ago and mistook DC/OS for Nomad. My apologies!

Well written article and great reminder that Kubernetes is not the answer to everything.

Out of curiosity, has anyone here used Nomad for a home server? Right now my home "server" is six NVidia Jetsons running Docker Swarm, which works but I feel like Swarm isn't getting the same priority from Docker as some of the other orchestration things out there.

I tried using Kubernetes, but that was a lot of work for a home server, and so I went back to Swarm. Is Nomad appreciably easier to set up?

Yes, you can be up and running with a multi-machine lab install in a couple hours, even if you're rolling the component installations and configuration yourself. In K8s production cloud workloads, this is usually offloaded to GKE/EKS/Kops, but the equivalent Kubernetes manual installation is appreciably more heavy-handed.

I attempted to use Nomad for a couple months before going back to Kubernetes using k3s. The main issue I ran into was that there is no out of the box solution for ingress and it ended up being quite difficult to get something working reliably. With k8s, I was able to hook it up to cloudflare tunnel using ingress-nginx very easily and it was all integrated into the standard way of doing things.

This is exactly the reason she I'm having a tedious time for needing to learn k8s for my work :(

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact