Hacker News new | past | comments | ask | show | jobs | submit login
Google admits Kubernetes container tech is too complex (theregister.com)
573 points by pjmlp 9 months ago | hide | past | favorite | 432 comments

I work on a 3-person DevOps team that just finished migrating ~20 services from GCE vms running docker-compose to GKE.

It's taken us a little over a year. Partly because K8s has a steep learning curve, but also because safely transitioning services without disrupting product teams adds a lot of overhead.

The investment is already yielding great returns. Developers are happy. Actual quote: "Kubernetes is the biggest quality-of-life improvement I've experienced in my career."

The same is true for me as a DevOps engineer. I never have to write a script to orchestrate a rolling deployment ever again. I'll never have to touch Puppet or wrangle OS upgrades. I no longer need to use Terraform to scale out a service.

Now that we've completed this transition, one of my goals for the next year is to get the team to a place where we spend 75% of our time on higher-level projects -- working on developer tools and Kubernetes extensions and setting site reliability standards, instead of the firefighting and operational work that has traditionally taken up so much of our time. Couldn't be happier about this change.

... That is until management decides we're overhead and fires us to free up budget for feature developers. ;)

... Until they hire you back because entropy exists :P

I'm on the DevOps side as well going through the same transition, k8s also allows insane customization, and I have some colleagues that are delaying our rollout unintentionally so they can play around with developing more tooling for deployments which is really frustrating. The k8s scene seems to be filled with constant scope creep and refactoring to get it just perfect before use. Either way, I agree the benefits far outweigh this annoyance that I've experienced. I'm so excited to work on developing tooling instead with my time.

However, I don't think we're free entirely from managing servers the old way with Chef / Puppet / Ansible, unless you're purely hosted there's still the rule of thumb you shouldn't run services that hold state in k8s. But with persistent vol's I do see that changing, though I'm not sure if everyone agree's that's a good idea.

"I have some colleagues that are delaying our rollout unintentionally so they can play around with developing more tooling for deployments which is really frustrating."

My impression is that the primary purpose of Kubernetes is to give SRE teams political air cover to rewrite a lot of their existing processes. Whether Kubernetes is actually required for that, or even net superior seems questionable. This unsexy work becomes justifiable because it's coupled to a mainstream accepted tech modernization.

You see this same phenomenon with database migrations. Where what the team really needed to do is just rewrite an app to use the existing database properly. But no one is going to approve that work. So what happens is people convince themselves that the existing tech sucks and use that to rationalize doing the rewrite. The result ends up not always being net superior, because sure you did the rewrite but you are also eating the operational cost of integrating a new technology into the org.

Yes, the number of proposed db switched I’ve seen is remarkably high. I once interviewed for a role as a database developer and was confused to find out they didn’t have the database that the role pertained to. One of the early questions in the interview was how quickly I could migrate production from ms sql to pg. Needless to say that was a gigantic red flag and I hope they found the right person for that job.

I’ve also seen a switch from rdbms to Hadoop because a company had “millions” of rows. Luckily on this one I only had to rewrite a handful of queries.

I've got relatively modestly-specced SQL Servers handling tables with hundreds of millions and even billions of rows without breaking a sweat. Somebody either just really wanted a new toy to play with, or has no idea what indexes are.

Exactly, I’ve seen Sql servers handle billions of rows with 2 thousand columns. I think also people that work too long at one company don’t realize how problems were solved elsewhere.

> I’ve also seen a switch from rdbms to Hadoop because a company had “millions” of rows. Luckily on this one I only had to rewrite a handful of queries.

Wat. That's gross. It probably costs them more per query now than the rdbms did.

Didn’t even think about that part because I don’t know too much about Hadoop other than it seemed impractical.

I guess millions sounded like a lot to a decision maker :)

> So what happens is people convince themselves that the existing tech sucks and use that to rationalize doing the rewrite.

That certainly is a thing that happens, but you could use that to dismiss any technology at all. In the case of Kubernetes, it makes operations a lot easier to the (important) effect that the development teams can do a lot of their own operations work. This is important since they're the ones who are empowered to solve operations problems and it also eliminates the blame game between ops and dev. Further, it eliminates a lot of coordination with a separate ops team--the dev teams aren't competing to get time from an ops team; they can solve their own problems, especially the most common ones. This also has the nice property of freeing the SREs to work on high-level automation, including integrating tools from the ecosystem (e.g., cert-manager, external-dns, etc).

Kubernetes certainly isn't the final stage in the evolution, but it's a welcome improvement.

> but you could use that to dismiss any technology at all

No, you can't; you need three (-ish) factors:

1. The technology is sufficiently incompatible with what you're currently using that you need a rewrite to use it (eg, this generally doesn't happen with gcc -> llvm, for example).

2. The technology is sufficiently (faux-)popular that it's possible to convince a pointy-haired boss that you need to switch to it (eg, this won't work with COBOL anymore, though unfortunately it successor Java is still going).

3. The technology sucks.

And really, if you want to dismiss a technology, point 3 ought to be enough all on its own (particularly since that's presumably the reason you want to dismiss that technology).

I think in your eagerness to 'gotcha' me, you missed my point. :)

Anyway, we're trying to assess Kubernetes' value proposition (i.e., to answer "does it suck?"). If your system for answering that question depends on already knowing the answer, it's not a very useful system.

> we're trying to assess Kubernetes' value proposition (i.e., to answer "does it suck?").

Well, I'm not, since I already know that, but if you don't know that yet, then your position makes more sense. (That is, using "dismiss" in the sense of finding out that it sucks, rather than (as I read it) in the sense of justifying a refusal to use technology that you already know sucks.)

Unfortunately, due to market-for-lemons dynamics, it's usually not possible to convey knowledge that a particular technology sucks until things have already gone horribly wrong. See eg COBOL or (the Java-style corruption of) Object Oriented Programming.

> you are also eating the operational cost of integrating a new technology into the org.

and in short order you will reap the savings of being able to hire people who already know your devops/infra tech stack, and can hit the ground running. not to mention being able to benefit from the constant improvements that come from outside your org.

Maybe. I don't buy that just because people are on Kubernetes they won't still kludge it up with custom in-house scripts or "extensions". Give it time.

Ha, are you me? I really pushed for us to follow the "change-as-little-as-possible and ship to prod quickly" route. Prod is where things get hard, and it's better to find out what's hard sooner rather than later.

We are running a handful of stateful services in K8s (things like MongoDB for which GCP doesn't have a compelling and affordable managed offering). It's definitely more complex than transitioning a stateless service, but so far our experiences with StatefulSets and PersistentVolumes have been good. And this allows us to sunset Puppet/OS management completely. I should note that we _are_ being extremely careful about backups. We also run each stateful service in a dedicated node pool for isolation. Who knows, maybe a year from now we'll be shaking our heads and saying "that was a TERRIBLE idea" but for now, so far so good.

We're running on GKE, so lots of things that would be hard in on-prem environments (ingress, networking, storage) are easy.

> We're running on GKE, so lots of things that would be hard in on-prem environments (ingress, networking, storage) are easy.

Agreed. The on-prem story is still really messy, but I think there's a lot of third-party work to build on-prem distributions that are cut and dry. Unfortunately, there are lots of them right now and it's not clear what the advantages and pitfalls are of each. Things will settle and this problem will be solved with time, but for now it's quite a pain point.

Management never hires people back, that's admitting failure.

They hire other people with the supposedly same skillstack and then have them rebuild it from scratch.

"Kubernetes is the biggest quality-of-life improvement I've experienced in my career."

Can you elaborate? What exactly about Kubernetes improved the developers life?

1. Reliable rolling deployments.

We didn't have these before. Yes, you can implement them without K8s (I have, at other companies) but to get the full set of features K8s provides, such as deploying N services in parallel, taking no more than X% of your capacity offline at a time, short-circuiting in the event the app is dead on arrival, connection draining with timeouts, you end up with a VERY complicated multi-threaded codebase.

2. Seamless horizontal scale-out.

Want to scale up your app in a test environment from 3 replicas to 6 to do some performance testing? This used to be a DevOps ticket that would take a few days -- DevOps engineer tries running Terraform, but oh no, a CentOS package update seems to have broken our Puppet manifests and we have to fix that. Now the developer makes a PR to the GitOps repo where they adjust a single YAML setting.

3. GitOps/ArgoCD.

ArgoCD is possibly my favorite piece of software of all time. It provides incredible visualizations for what's happening with your Kubernetes infrastructure. It really increases your confidence and trust in the system to be able watch a deployment rollout or scaling operation happen in real time. ArgoCD makes it spectacularly obvious when something has gone wrong -- you still sometimes need to go spelunking through the GCP console or use kubectl to inspect resources, but to a much lesser degree. I cannot emphasize enough how magical it is.

These are all things that our initial implementation has delivered, we are also planning to try to leverage K8s for things like on-demand pull request preview environments, hosted developer environments, and canary deploys that are MUCH harder to implement in a world without K8s (trust me, I've done it).

Yeh this list is not surprising and none of these things you mentioned exclusively require Kubernetes or running containers in Prod. Like I mentioned in another comment, it feels like the purpose of Kubernetes is that it's "brand" provides political cover to introduce these practices to the engineering org.

I don't know what to tell you except that with years of experience implementing this infrastructure outside Kubernetes, letting Kubernetes handle it is cheaper. That doesn't mean Kubernetes is a good fit for every organization or workload.

You can roll your own anything with enough time and manpower. Whether it makes sense to do so depends on your circumstances.

Well for example, I'm a lead devops on a team of 2-3. We have 6 core services.

For the last several years, due to static load we've been fine with 2 instances per service. Now that we want autoscale (even though all the real load is really just the DB), it seems we could get ourselves onto AWS autoscale in ~1 month, though it would require some coding.

Spending 3 devops on 1 year doing something is a red-flag to me. If your changes save devs 1 hour per deploy, it'll pay off 6,000 deploys from now.

It's not the only thing we've been doing, just the most important.

I think it's great you guys have 2.5 dedicated people operating 6 services across 12 prod instances, with that kind of ratio our team would be 15 people!

I agree with your point overall, I just want to add that some DevOps efforts aren't to save DevOps time, but to prevent errors.

When you make a change manually, there's a chance that you forget something, have a typo, etc. Those problems disappear when those same things are automated.

None of it requires kubernetes, kubernetes just does it out of the box better than any hand-rolled custom stack that tried to do the same. Its a true joy to work with compared to the cobbled together stuff I've built and/or used in the past.

I agree that ArgoCD is amazing. What's so beautiful is how well it integrates into K8s. Every element of ArgoCD is a CRD (custom resource definition). It's less of a piece of software (other than the interface) and more of an extension to K8s to provide Continuous Delivery. Saying ArgoCD does not require K8s is like saying you don't need Photoshop to run a Photoshop plugin.

I've used Puppet, Chef, and Ansible in production over the last decade. For me, Ansible replaced Puppet and Chef five or six years ago. For the last two years the only thing I have used Ansible for is to manage desktops and Android boxes AT HOME. Configuration Management systems like Ansible are fun and cool, but have been mostly obsoleted by the combo of Terraform and K8s. If you're on AWS, Kops can replace a lot of what Terraform does.

Kubernetes makes running services easier and more reliable. I've spent more time learning Istio that K8s itself. The learning curve of K8s is minimal compared to Drupal.

It is not a brand, it's when all the capabilities come together in a single package with one general way to do things that you get the benefits. You could do all this stuff with IaaS but there were too many ways and not enough lines in the sand between dev and ops in the IaaS world. IaaS should be considered a legacy approach.

You’re using hosted k8s! That explains everything! I was so confused how a devops person wasn’t cursing the name. Implementing and managing k8s is where all the complexity and headache lives.

Updated original comment to clarify :)

Although, to be fair, the article is about GKE.

Have you seen kops? rancher? loft? I don't think its that terrible to manage k8s at all using the tools available to you. k8s definitely used to be be difficult to manage but that isn't the case anymore.

But literally every cloud provider out there has a managed solution so at this point you really only need to do it for DC work or if you like to do it.

Amazon - EKS Google - GKE Azure - AKS

anything beyond those 3 is a rounding error but...

Linode - LKE DigitalOcean - DigitalOcean Kubernetes IBM Cloud Kubernetes Service Rackspace - KAAS

I've used Rancher and while it is very nice it still punts a lot of the hard bits like interacting with the outside network, vm creation, and storage (especially shared persistent storage) to the ops team to figure out.

Creating a big-ole cluster of app servers was never really the hard ops problem which is what k8s does really well.

Creating the cluster of servers isn't the value of kubernetes. We all had clusters of servers well before k8s, openstack, and things like AWS.

The benefit of k8s is the orchestration of those clusters. Spinning up 6 new http servers and getting them added to the load balancer automatically. Generating 4 new memcached nodes and getting them registered in DNS so clients pick them up and add them to the hashring.

The benefit of k8s is the scaling and elastic capabilities. It can trigger vertical scaling by spinning up larger pods or horizontal scaling by adding/removing pods.

Anyone thinking that people are using k8s because it can create app servers doesn't understand why anyone is using k8s. If all we needed to do is create a cluster of app servers, we wouldn't be using k8s.

That being said, a cluster of app servers still needs orchestration and config management and we had a ton of crazy solutions for that prior to k8s.

I previously managed my companies Kubernetes clusters using Kops on AWS. My company switched to Google, so I had to move everything to GKE. I miss Kops, it gave me more control and made life easier. I don't want to use the old crappy deprecated DNS server, but with GKE that's what you get. I don't get to control what goes on the masters on my clusters because they're not mine anymore. Having GKE do the masters means more headaches and complexity because I no longer have control. I love K8s, but GKE does a pretty poor job implementing it. It boggles my mind that the main company behind K8s is so incompetent in implementing it.

Its a bit of a pain, but you can definitely switch to CoreDNS on GKE if you need to. Also as someone who has used GKE, EKS, MicroK8s, and Minikube, I would say that the easiest to use implementation of Kubenetes as a Service is GKE. Not saying its easy or the worlds most thought out product, but I gotta give google just a tiny bit of credit for at least being easier to use then the competition.

If your company is implementing and managing their own k8s clusters they are doing it wrong. Use a cloud hosted solution.

Can only speak for myself personally, but having gone through a similar moment, it was 50% docker QoL improvements and 50% k8s QoL improvements -- most significantly around prod/dev parity. Every engineer in my org really did gain a significant amount of confidence around being able to spin up and deploy new services (when necessary) without being nervous about something going wrong in prod that wasn't configured properly in a lower environment.

I'm really happy with the local development story too. We use skaffold to pick and choose which resources we run locally, and then we use the same definition to generate our artifacts which keeps things in line.

It reminds me of docker-compose file, except I can publish it too.

> "Kubernetes is the biggest quality-of-life improvement I've experienced in my career."

Just wait until the developers discover Google App Engine, Heroku, or DigitalOcean App Platform.

Yeah, I think thats the point though. Kubernetes enables the orchestration and observability of a PaaS in a much more flexible way so that you can get all of that while still matching the requirements of your business.

I think Heroku and DigitalOcean App Platform are still going to be popular for small setups (as will things like Amplify) but when you outgrow those (or realize you are paying too much for them) then Kubernetes is a reasonable option.

+1. One benefit I see of Kubernetes is that it can handle pretty much whatever you throw at it, so you can run everything in the one service.

Need to run a bespoke database, yeah it can do that. Need to migrate an old service running in a VM that needs a disk, it can do that too.

Oh man, I can't believe we didn't think of migrating our mature 20-service stack with complex hardware, security, and compliance requirements to App Engine. Dang it. I should've known to consult HN first!

How much of this is specific to Kubernetes and how much is more accurately because you moved from managing your own VM infrastructure to using managed services? Cloud providers have (often long had) ways of saying "run this code" without managing a VM and without involving kubernetes, including with docker support. And they are often very easy to use though not without their faults. An example being Azure app service.

I do agree that kubernetes is a pleasant experience from an application developer perspective with an existing cluster, but in my experience it was not without excessive pain and long hours by those administering the cluster. A year doesn't surprise me in your case, which brings to mind this question.

> I no longer need to use Terraform to scale out a service.

Are you still using Terraform to specify your deployments? From experience, you need something there to manage all your yamls: deployments, configmaps, secrets. Especially if you have multiple environments.

As someone with limited experience with containers, how does K8s allow you to move away from things like Puppet for configuration management? Does it offer some substitute that alleviates the need for something like Puppet or Ansible?

Yes because whatever you used for app specific configuration like libraries and packages is now done in the Dockerfile and containerized. So the same thing run locally is run in the cloud. Then as far as the infrastructure for running code such as load balancers, service discovery, docker.. That is all given to you just by running K8s. So you are more concerned with shipping immutable containers to k8s than provisioning "machines".

Then you can focus on containers which can be run, tested and built wherever without the fear of broken updates or one thing stepping on another. We found back in the days of ansible and chef that we had very low confidence in upgrading hosts live. So we would then do immutable hosts and blue green deploy them to production. But why think in the scope of hosts and VMs when really you have some application that needs to run somewhere.

K8s IMO isn't the end all, I think eventually we will get to something that doesn't need containers at all and you run just processes. But it is a good step for now. Also once you have your stuff containerized it makes other non k8s stuff easy like AWS Lambda

Edit: Also yes you can use those to set up generic k8s nodes but when we ran bare metal we used kubeadm to make coreOS immutable nodes. I don't think that is used anymore haven't checked but really the best way to set up k8s is to deploy really thin hosts that have nothing but Docker and k8s. VMware and others have solutions for this too where you don't have to mess with building hosts.

Thanks for the detailed response. It looks like I've still got a lot to learn - I've just lately been playing with LXC to get more familiarised with containers. I've previously looked at Helm apps and they seemed to be very similar to Puppet manifests. From what you said it seems like the approach is to have immutable containers for each application, set up via Dockerfiles, which somehow also simplifies the upgrade process? Does that mean you just deploy a new version/container of an application linked to the same underlying database (for example) when you need to run an upgrade?

So if you had a fleet of ten containers running the same application in a load balanced config, I'm guessing you'd need to upgrade all of them at once (with downtime) rather than upgrading them one by one (because then the database would be inconsistent)? I'm assuming that since the containers are immutable the data is stored elsewhere.

Helm is so bad it raises my blood pressure just hearing the name. Helm tries to apply the old way of doing things (like as you said Puppet) and makes it worse than ever. K8s yaml config is simple and elegant, don't try hiding it under templates. Kustomize is the proper way of working with K8s yaml. Helm fights against it.

Helm isn't great (god, Go templates, shudder), but I love being able to bundle all the manifests for an application together with well-documented parameters/values... If you want do something like conditionally switch from a LoadBalancer service to Ingress in a test environment... I have no idea how you'd handle that in Kustomize, but it's straightforward in Helm. Ultimately it seems like you end up with the same complexity, just expressed via 100 layers of Kustomize transforms instead of with a conditional in your template.

There are two different types of patches in Kustomize. It does take a bit to get used to it as it's different. jid (Interactive jq) makes the process a lot easier. It's actually pretty easy to make changes like this.

> So if you had a fleet of ten containers running the same application in a load balanced config, I'm guessing you'd need to upgrade all of them at once (with downtime) rather than upgrading them one by one (because then the database would be inconsistent)?

That depends entirely on your application and the upgrade itself. Assuming we are discussing 10 different containers (e.g. 10 micro-services), k8s will normally update them in parallel but not atomically, it would be up to apication or deployment time logic to ensure they are updated as a single 'transaction'. If they are 10 copies of the same container, then k8s itself has tools for rolling upgrades where you can control the rollout.

Also, depending on application logic, the upgrade could be done in such a way that there is no need to synchronize the services, they could work with the DB as is.

In Kubernetes, deploying 10 containers takes a minute or two. I haven't worked with incredibly large deployments, but really deploying any amount of containers could easily take a minute or two if you have enough nodes. There is no downtime. Database inconsistency can cause problems but also any problems like that can be mitigated by doing a two-phase change to the database, and such changes are pretty rare and also devs instinctively avoid making those kinds of schema changes.

You get the load balancing for free in K8s and rolling deploys. What you do is upgrade the deployment with a new docker image, and yes immutably it is replaced. In a case of an HTTP service, k8s will wait until a pod (container) responds healthy until it is put in the loop. Then it steps down old pods according to your rolling deploy metrics and replaces them. You can define what that is like having a max number of pods with a minimum number of pods up.

> So the same thing run locally is run in the cloud

Who is preparing Dockerfiles? Developers and system administrators / security people do not generally prioritize same things. We do not use k8s for now (therefore I know very little about it), so this might not be relevant but how do you prevent shipping insecure containers?

Generally developers. When running in a container most of the attack surface is the app itself, and if it is compromised the damage is supposed to be limited to the container. There have been container escape exploits in the past though. But with a container you treat the container as the thing that you run and give resources to and don't trust it just like if you were running an application. All of the principles of giving an application resources such as least privilege apply to containers too.

But since you are not running multiple things or users in one space in a container, something such as an out of date vulnerable library can't be leveraged to gain root access to an entire host running other sensitive things too.

In Kubernetes and docker in general one container should not be able to compromise another, or k8s. But there are other issues if an attacker can access a running container such as now having network access to other services like databases. But again these are all things that can be locked down and should be even if provisioning hosts running things.

You still need something to provision the base OS and all the stuff under K8s (docker daemon, ntp, storage, networking, etc.) that it relies on, unless you go with a fully hosted solution.

Ansible or Puppet still excel at that kind of work.

And it looks like the parent went with a hosted solution which explains everything. Having to manage all the underlying services that k8s glues together is a huge PITA.

It enables 12 factor apps, and if you are using a cloud provider there is no infra setup, so you do not need puppet/ansible/etc. It's a better way to deploy apps hands down.

One great boon is to use Kotlin code to factor deployment descriptors into reusable parts that are somewhat statically type checked. I wasn't responsible for the migration from legacy system to Kubernetes, but that seemed to me a big win. Aside from being able to update the whole, and only parts of the system so trivially.

For me the scariest part would be that it's not tech I would use (i.e. update the descriptors) regularly, so what if something goes wrong, how quick would I be able to identify the problem. I have no answer to that because I'm off the project.

> ... That is until management decides we're overhead and fires us to free up budget for feature developers. ;)

It's a real risk, at least perceptually. Mitigate it by documenting the number of developers' hours you save with automation and better infrastructure. It's pretty hard to argue against a trend of increased developer productivity. Include your own hours; you're targeting saving 30 hours a week of your own time and that frees you up to improve other things even faster.

You spent more than 3 person-years migrating twenty services from one container orchestration system to another and you think it was worth it? Not sure that seems worth it to me...

Nope! We went from not having a container orchestration system to having one.

If you're determined to be skeptical of the value-add from Kubernetes, my random post on HN ain't gonna convince you :) The developer experience improvement is obvious to people actually interacting with K8s.

I do want to clarify that this isn't the only thing we accomplished in the past year. Just the thing we shipped that was our biggest priority and had (by far) the biggest impact.

If you have about 20 developers spending 10% of their time fighting with your current system it adds up fast. I mean 10% may be high but 20 may be low. It depends a lot on the exact situation if it was worth the time.

That was my experience with Kubernetes when I set it up for a QA cluster a few years back. That learning curve is more like a sheer cliff face but once you get to the top the view is really nice. But getting there is a hell of a hike. It's easier now since every major cloud has a managed offering but my journey was pre-EKS so it was doubly as intense for me.

> That is until management decides we're overhead and fires us to free up budget for feature developers.

Snake eating it's own tail.

But not only do I not understand K8s, I'm also an idiot.

I’m tired of the complainers of the complainers. First a technology seems cool, then after engineers become experienced with the tool the warts rear itself and you get a vocal group of complainers.

It doesn’t end there... after complaining for a really long time two things happen. First off so much time has passed that you get these domain experts (Devops people) whose entire job is to mess with kubernetes. Second the complainers have been complaining so long that people get tired of it.

You now get people who are so tired of listening to people complain whose entire job entire job revolves around kubernetes that these people now start complaining about the complainers.

It happened with JavaScript. Javascript was around for so long people started complaining about the complainers. In fact it’s been around so long that a whole generation of people who’ve never used any other language was literally born. These people started off as the complainers against the complainers but now they outnumber the complainers so you rarely see people talk shit about JavaScript anymore.

Actually JavaScript has been around so long that the entire language has changed and part of the terribleness was fixed by making another language (typescript) compile into JavaScript.

Which brings me back to kubernetes. Kubernetes is a bad tool with no alternative precisely because it requires a 3 man dedicated team a year to get things up and running.

A good tool would be something like allows me to to get it up and running in a week just by reading some docs. Even better an hour. Could such a tool exist and replace Kubernetes? Yes. Does such a tool exist? No.

I am a complainer and you are a complainer of complainers. What will likely happen some time from now is two possible things. Kube will be so integrated into the infrastructure ecosystem that wrappers will be written on top of kube just like how react and typescript have replaced JavaScript. If that doesn’t happen then a whole new tool will replace it.

I’m sorry to say but the ideal we are shooting for here is a tool that will ultimately make Devops a general thing that all developers can deal with rather then an entire specialist team. Again no such tool exists yet but it certainly can exist, especially when the inventor of the tool has become a complainer.

If the inventor of the tool becomes a complainer, that validates the complainers. And now the complainers of the complainers have nothing left to say.

There already are a number of platforms that make deploying and operating apps very simple! Heroku, AppEngine, Lambda, so many others.

Your 5-person startup does not need a DevOps engineer. Your 50-person org should start thinking about it. Your 500-person company probably needs multiple DevOps engineers -- having every dev team independently figure out how to handle things like deployments, reliability, and security is chaotic and wasteful. There are a lot of details that only start to matter at scale, and both Kubernetes and dedicated infrastructure teams are for this use case.

Yeah my point is, if you’re migrating to kube without using some service that handles all the details, even the 5 person start up needs a dedicated devops team.

Nice! Thanks to you we just reached level 3: The complainers of complainers of complainers! Level 4 is coming!

> A good tool would be something like allows me to to get it up and running in a week just by reading some docs. Even better an hour. Could such a tool exist and replace Kubernetes? Yes. Does such a tool exist? No.

Instead of complaining, why don't you build this tool? That's the problem I have with complainers.

You're complaining about me complaining.

Instead of complaining why don't you build me the tool to stop me from complaining? It's the same reason why I'm not building the tool.

That's the problem I have with complainers complaining about other complainers. Why don't you guys do something about my complaining rather then complain about it?

Why complain about the complainers? Because some complainers will never be happy no matter what the state of the world is. Ironically your solution (build more things) is the exact thing the complainers complain about with regard to the frontend world. They complain too many things have been built. You can’t ever satisfy everyone.

I understand their rationale. We manage thousand Kubernetes clusters and end-users can find lots and lots of creative way to shoot themselves in the foot:

- I can store anything in a secret? Let's have thousands of cat images. Etcd then stops working because we have over 2GB of funny cats in the key store.

- I can run a root Pod? Lets mount the docker socket and start building images with it. Oh and by the way, I never clean those up and my Node simply fills up. Also I add some additional docker networks that break Pod to Pod networks.

- Istio is nice - why we don't add automatic injection for Pods in all namespaces? Including kube-system? And then they brick kube-proxy and the cluster stops working.

- I can use validating webhooks for better security? Lets watch on all resources. To keep it more secure lets set the failure policy of the webhook to Fail, so we never admit any modification without the apiserver to make a call to out webhook. Whats that? My single replica webhook has was evicted from the Pod (we didn't add any resource requests and limits) and now it cannot even be created or scheduled because kube-controller-manager and kube-scheduler cannot update their lease and they lost leadership and now are idling, effectively bricking the entire cluster.

Google would reduce the pain points with this change, however they would still face countless other issues with Kubernetes.

I fully agree with your points and would sum them up as "Kubernetes has a steep learning curve, a (quite) large interface and ample opportunities to shoot yourself into the foot with it" (plus, they're very funny).

However playing the devil's advocate here: If you actually took the steps of learning the basic abstractions, then for me it's really hard to see what you could still get rid of.

If you actually go all-in and fit your application to the principles of Kubernetes-native applications (instead of the other way around), then it works nothing short to amazing.

We're running 120 microservices in GKE and the difference to our custom-built setup before is night and day. I let my Infra team go surfing together for two weeks because without changes it flies mostly on autopilot.

Let's not kid ourselves, distributed computing is _hard_ and Kubernetes is a testament to that. I'm not saying it can't be made more accessible by further standardization, but there are fundamental limits to how easy it can be made.

Which by the way is leading to my only pet peeve with it: I feel most of the complexity of K8S comes from the fact that it got hyped as an enterprise product and then lots of features were built that support shoving your non cloud-native workload into Kubernetes even if it was never designed for it.

If you don't do or need all of that, the amount of interface, complexity and footguns shrinks significantly. Maybe it's time to better pull them apart in the documentation.

> If you actually took the steps of learning the basic abstractions, then for me it's really hard to see what you could still get rid of.

This argument basically sums up to "Developers just need discipline, and stop blaming the tools". While this is a sound argument on paper, the intrinsic complexity of software systems make it hard to pin the blame on developers. BTW This is the same argument Uncle Bob makes which is not so popular with many mainstream developers.

You're right about feature creep in k8s though.

I get what you're saying, but my point is a bit more nuanced:

If your goal is to build highly reliable and available services to end users that are secure and scalable with a team of more than 10 engineers, eventually you will run into more than 50% of the concepts in Kubernetes anyway and end up re-inventing them.

Scaling up and down, node draining, finding out whether services are healthy, RBAC, resource distribution, secrets management, service hardening, introspection capabilities, explicit declaration of dependencies and endpoints and many, many more.

My point is: Sure, if your goal isn't that, it doesn't make sense to start out using Kubernetes.

But if at least eventually that's what you need, imho it's way preferable to just learn and apply well proven abstractions instead of reinventing the wheel along the way and end up with a less maintainable, capable and standardized solution you won't find anyone for maintaining.

If I hear about some of the comments here suggesting to "just spinning up docker-compose with Traefik in front" (disclaimer: I really like Traefik), then that reminds me of how some of the ops mess started that I historically had to care for.

Agreed, the truth usually lies somewhere in between and my point was we can't absolve the tools/ecosystems and put it on squarely on the devs. That definitely doesn't absolve teams and they need to do their homework before jumping on the bandwagon. K8s is great if you know what you're signing up for.

We are too proud to sometimes admit we simply „are not good enough at something”.

It’s always easier to shift the blame somewhere else.

Thats why some radical but correct concepts are so hard to push.

IMO fixing BEAM VM to easily work on cluster could be better for distributed systems than k8s.

That would do nothing to help the 99% of users who don't use BEAM VM languages

They'd have a reason to start using BEAM VM languages, though.

To a person who isn't using Erlang or ASP.net, suggesting that we should use either of those language packages and it will solve any of our problems without creating a thousand new ones sounds equally non-starterish to me.

To add a counter-example, I have lots of Ruby experience and I've just joined a Go team. I won't tell them to use Ruby, I will just do it where it makes sense and saves us time. (And then we'll have two problems... enter "limiting blast radius")

Point of my counter-example is, I'm extremely skeptical that all the world's problems can be solved by adopting a new mono-culture, whatever it is. There are 100% always gonna be some problems that are better solved in a different language. PHP is the best way to run Wordpress, for example (ok, so it's the only way to run Wordpress, but you get the idea... "Wordpress is the best way to..."), but I've been in high-functioning IT organizations that won't touch that with a ten foot pole, because "it's another language to support, and PHP is icky."

We also got rid of a perfectly fine Wiki in favor of centralized Knowledge-base software for similar reason. "Better to just have one KB. We don't need to be hosting another thing." So the chances of moving everything over to BEAM VM are next to nil, unless you are a product-focused company with just one product, or happen to have an absolute champion leading the effort to migrate all the things. For all the other things, you need to have a consistent answer too.

No tool is one-size-fits-all. Where Kubernetes shines most is under any environment that isn't running a single monolith or building a software monoculture and/or can't manage that for whatever reason (because those are all basic use cases that are frankly easy enough to manage without adding on top the additional complexity of Kubernetes; don't need it, don't use it!) IMHO, diversity in infrastructure is a plus though, and Kubernetes is a technology that it turns out enables this.

The BEAM VM is fantastic but the scope of Kubernetes + docker is completely different.

For example you still need a way to get BEAM onto hosts, still need to manage the OS on the host, still need to setup networking, RBAC etc.

Ok, but we have a similar setup that runs on GCE instances. Deploying involves building an image and pushing a button. We don't really have the need for an Infrastructure team.

I once misconfigured iptables and locked myself out of our buildserver. Had to call lab support in a different country. Is Linux too complicated? Joyent famously took down their whole region by rebooting wrong nodes. It’s almost like running distributed networks of supercomputers at scale is hard or something...

> Joyent famously took down their whole region by rebooting wrong nodes.

In case anyone's interested, here's a pretty funny and educational talk by Bryan Cantrill about that particular incident:

GOTO 2017 • Debugging Under Fire: Keep your Head when Systems have Lost their Mind


For two systems of equal functionality, the one that allows or encourages fewer footbullets is the better design. Not all complexity is essential.

What kubernetes complexity is non-essential and can be replicated by simpler solution?

I once did an `apt autoremove` on a custom install of CentOS handed to my team. Apt uninstalled python (and a lot more), and apt depends on python to run, so that was a bummer. The easiest way out was to reinstall the OS.

These are some delightfully specific hypotheticals.

There is a benefit : fixing each of these issues fixes them for everyone using K8S.

One of the goal of K8S is normalization/standardization of a complex topic to better share knowledge

> I can store anything in a secret? Let's have thousands of cat images.

Why would someone want to store non-secret information as a secret?

"A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools"

— Douglas Adams

Because mutable persistence in Kubernetes can be super annoying to manage and people might grasp for whatever lifeline they can find.

If you have a managed object store or even relational database outside of k8s, the thought of storing arbitrary data in secrets probably doesn't come to mind. But if your enterprise spools up a cluster and tells you to use nfs PVCs with no other storage solution, suddenly you might start getting creative.

I think it's just a cute way of saying "data". Like saying you're seeding Linux ISOs when you're actually seeding pirated movies on BitTorrent.

What makes you think Kubernetes "secrets" are appropriate for storing secret information? They're not secure (not without adding a bunch of other nonsense on top of them).

> Why would someone want to store non-secret information as a secret?

Top reason given to me by developers: "I don't want to spend time thinking about the distinction."

> - I can run a root Pod? Lets mount the docker socket and start building images with it.

Just in case you haven't figured out the proper way to do this, you should use docker:dind.

kaniko is an even better way

If it didn't have wierd issues when you start stacking in docker file.

like what? I'd like to hear your experiences with it

Oh yes secret management with kubectl is needlessly complicated

Sure just put your secret data on a file then we'll use your file name as the key of the secret.

Cronjobs sometimes have weird bugs as well.

A lot of its complexity is due to the fact it's an evolving system, that's fine. But I see that some things end up way more complex or unreliable than it needs due to overengineering or use cases no one needs

I remember trying out docker sometime back in late 2013. Something about it never fully stuck with me. I always felt like the final boundary for a piece software should be the process, not the computer. Plopping an entire VM into a zipfile and saying "here's your software" felt like lazy engineering to many of us at the time (and still today).

For our current stack, the answer has been to make the entire business application run as a single process. We also use a single (mono) repository because it is a natural fit with the grain of the software.

As far as I am aware, there is no reason a single process cannot exploit the full resources of any computer. Modern x86 servers are ridiculously fast, as long as you can get at them directly. AspNetCore + SQLite (properly tuned) running on a 64 core Epyc serving web clients using Kestrel will probably be sufficient for 99.9% of business applications today. You can handle millions of simultaneous clients without blinking. Who even has that many total customers right now?

Horizonal scalability is simply a band-aid for poor engineering in most (not all) applications. The poor engineering, in my experience, is typically caused by underestimating how fast a single x86 thread is and exploring the concurrent & distributed computing rabbit hole from there. It is a rabbit hole that should go unexplored, if ever possible.

Here's a quick trick if none of the above sticks: If one of your consultants or developers tells you they can make your application faster by adding a bunch of additional computers, you are almost certainly getting taken for a ride.

I agree with you. However, after spending years and years trying to compile software that came with cryptic install instructions. Or have the author insist that since it works on their machine I 'm just doing something stupid. Docker was largely able to fix that.

It's a somewhat odd solution for a too common problem, but any solution is still better than dealing with such an annoying problem. (source: made docker the de facto cross-teams communication standard in my company. "I 'll just give you a docker container, no need to fight trying to get the correct version of nvidia-smi to work on your machine" type of thing)

It probably depends on the space and types of software you 're working on. If it's frontend applications for example then its overkill. But if somebody wants you to let's say install multiple elasticsearch versions + some global binaries for some reason + a bunch of different gpu drivers on your machine (you get the idea), then docker is a big net positive. Both for getting something to compile without drama and for not polluting your host OS (or VM) with conflicting software packages.

Completely agree. The whole compile chain for most software and reliance on linked libraries, implicit dependencies like locale settings changing behavior, basically decades of weird accidents and hacks to get around memory and disk size limits, can be a nightmare to deal with. If using slow dynamic languages, or modern frontend bunglers, all the implicit c extension compilations and dependencies can still be a pain.

The list goes on and on, it’s bizarre to me to think of this as the true, good way of doing software and think of docker as lazy. Docker certainly has its own problems too, but does a decent job at encapsulating the decades of craziness we’ve come to heavily rely on. And it lets you test these things alongside your own software when updating versions and be sure you run the same thing in production.

If docker isn’t your preferred solution to these problems that’s fine, but I don’t get why it’s so popular on HN to pretend that docker is literally useless and nobody in their right mind would ever use it except to pad their resume with buzzwords.

I don’t know. I cant take anyone seriously who says it’s hard to type



At a prompt, then install missing libs. Unless you have to maintain updates regularly, “It’s just so hard” seems like a damn meme.

When library versions start to cause issues: like V3.5 having a bug, so you need to roll back to V3.4... that's when ./configure && make starts to have issues.

Yeah, it happens with .so files, .dlls ("dll hell"), package managers and more. But that's where things like containers come in to help: "I tested Library Foo version V3.4 and that's what you get in the docker". No issues with Foo V3.5 or V3.6 causing issues... just get exactly what the developer tested on their box.

Be it a .dll, a .so, a #include library, some version of Python (2.7 with import six), some crazy version of a Ruby Gem that just won't work on Debian for some reason (but works on Red Hat)... etc. etc.

There are basically two options; maintain up-to-date dependencies carefully (engineer around dll-hell with lots of automated testing and be well-versed in the changelogs of dependencies) or compile a bunch of CVEs into production software.

There really isn't any middle ground (except to not use third-party libraries at all).

That assumes you are using free software without a support contract where the vendor has no incentive to maintain long term support for libs by only applying security patches but not adding any features to old versions. I understand this goes against the culture of using only the latest or "fixing" vulns by upgrading to a more recent version (which may have a different API or untested changes to existing APIs).

That makes sense for a hobbyist community but not so much for production.

In a former job we needed to fork and maintain patches ourselves, keeping an eye on the CVE databases and mailinglists and applying only security patches as needed rather than upgrading versions. We managed to be proactive and avoid 90% of the patches by turning stuff off or ripping it out of the build entirely. For example with openSSH we ripped out PAM, built it without LDAP support, no kerberos support etc. And kept patching it when vulns came out. You'd be amazed at how many vulns don't affect you if you turn off 90% of the functionality and only use what you need.

We needed to do this as we were selling embedded software that had stability requirements and was supported (by us).

It drove people nuts as they would run a Nessus scan and do a version check, then look in a database and conclude our software was vulnerable. To shut up the scanners we changed the banners but still people would do fingerprinting, at which point we started putting messages like X-custom-build into our banners and explained to pentesters that they need to actually pentest to verify vulns rather than fingerprinting and doing vuln db lookups.

Point being, at some point you need to maintain stuff and have stable APIs if you want long lasting code that runs well and addresses known vulns. You don't do that by constantly changing your dependencies, you do it by removing complexity, assigning long terms owners, and spending money to maintain your dependencies.

So either you pay the library vendor to make LTS versions, or you pay in house staff to do that, or you push the risk onto the customer.

> then install missing libs. Unless you have to maintain updates regularly

You lost me there already. Why should there be missing libs, and why would you not have to maintain updates regularly in production environment?

So let me see if I got this right, it's basically:

1. ./configure 2. make 3. ??? 4. profit!

Doesn't sound like a perfectly good solution to me.

Aren't we conflating compile complexities with runtime complexities here? There are plenty of open-source applications that offer pre-compiled binaries.

That difference isn't as black and white as you're making it out to be, sometimes it's just a design decision whether certain work is done at compile time or runtime. And both kinds of issues, runtime or compile-time, can be caused by the kinds of problems I'm talking about like unspecified dependencies.

This is why I wish github actually allowed automated compilation. That way we could all see exactly how binaries are compiled and don't need to setup a build environment for each open source project we want to build ourselves.

I am totally on board with the idea of improving productivity. The issue I see is that this is avoiding a deeper problem - namely that the software stack requires a max-level wizard to set up from scratch each time.

Refactoring your application so that it can be cloned and built and ran within 2-3 keypresses is something that should be strongly considered. For us, these are the steps required to stand up an entirely new stack from source:

0. Create new Windows Server VM, and install git + .NET Core SDK.

1. Clone our repository's main branch.

2. Run dotnet build to produce a Self-Contained Deployment

3. Run the application with --console argument or install as a service.

This is literally all that is required. The application will create & migrate its internal SQLite databases automatically. There is no other software or 3rd party services which must be set up as a prerequisite. Development experience is the same, you just attach debugger via VS rather than start console or service.

We also role play putting certain types of operational intelligence into our software. We ask questions like "Can our application understand its environment regarding XYZ and respond automatically?"

The issue Docker solves for me is not the complexity or number of steps but the compatibility.

I built a service that is installed in 10 lines that could be ran through a makefile, but I assume specific versions of each library of the system and don’t intend to test against the hundreds of possible system dependencies combinations or assume it will surely be compatible anyway.

The dev running the container won’t building their own debian installs with the specific version required in my doc just to run the install script from there, they just instanciate the container and run with it.

> Plopping an entire VM into a zipfile and saying "here's your software" felt like lazy engineering to many of us at the time (and still today).

At the risk of nitpicking, docker images aren't the equivalent of VM images, as they don't include a kernel.

This isn't nitpicking at all, it's an important distinction!

Docker is not virtualization, it's just an abstraction that makes some Linux process isolation features easier to manage.

It also allows you to bundle whatever dependencies you have in the same bundle, but that is not the same as having a VM.

Linux containers and equivalent technologies are virtualisation (specifically OS virtualisation[1]), just not a VM. Hardware virtualisation (VMs) isn't the only kind of virtualisation that exists.

[1]: https://en.wikipedia.org/wiki/Operating_system-level_virtual...

By that logic, processes are arguably virtualisation too. They do after all use virtual memory.

Threads, processes and containers exist on a continuum.

The key is that "containers" don't actually exist -- they're just processes running under a variety of different namespaces.

It's true that Docker isn't a first-class abstraction at the level of the Linux kernel, but BSD has jails, and Solaris has Zones. This is important in some respects, but I don't see that it informs things here. Containers are still 'a thing' regardless of how they're implemented.

Curious to learn more about how jails + zones are implemented. In Linux land, I find the notion that containers are a coherent abstraction really hinders developers from understanding how their application is deployed.

Indeed they are! The notion that each process has its separate address space is called virtual memory for that reason.

See also cgroups: while this feature is used by the container run times, it predates Docker, and can be used standalone with normal processes.

Indeed. I made a website for testing npm packages inside a cgroup/unshared "container" - about 6 months before docker came out.

If only I had realised that could have been useful for more than testing npm packages...

OCI “docker” containers are at this point are a description of a process. How it’s realized is up to the implementor. runc realizes the container with kernel namespaceing and runv realizes the container with hardware virtualization.

Both if implemented to spec will be logically equivalent and drop-in replacements for one another.

I don't know if that is the most important reason it is not equivalent. It also doesn't have any system processes; there is no systemd, sshd, no crond, etc. It doesn't need its own firewall rules configured or its own security managed. I could go on but I think you already get the point.

Indeed - essentially, a container is a glorified chroot with a few (important) bells and whistles attached to it.

I agree with the other commentor, this is an important distinction. It has no hypervisor and is really just a normal process using standard kernel features: cgroups (resource limiting) and namespaces (resource isolation). It's really not so different to chroot.

Containers solve the dependency problem by simply pretending it doesn't exist.

I used to do UNIX integration work in the late 1990's early 2000's and containers weren't really a thing. So you had to make sure libs from one program didn't crap on another program. And developers had to be conscious of what dependencies they included in their code. Nowadays they don't have to care as much because of containers. Every program can have its own dependencies. Thereby solving the integration problem.

A better solution would be to actually integrate programs and their dependencies into working systems, but no one has time for that. Software bloat is fine. Computers are cheap and fast. And actually understanding what we're doing would be too expensive. So just wrap all your have finished crapware up in a giant black box and dump it on a server.

I'm very interested in understanding what I'm doing and what I'm bundling into my software.

What I'm not interested in, is this kind of walking uphill in the snow both ways:

> So you had to make sure libs from one program didn't crap on another program. And developers had to be conscious of what dependencies they included in their code.

ie. having to understand what everybody else is doing in order for my software to run properly. No thanks. That's not why I'm here.

I'll put the exact dependencies I want, in the versions which work best for my software, into a Docker image or whatever tool offers a similar level of isolation, and I'll be working on my code while everybody else spends their time fighting over the ABI compatibility of C system libraries.

Why is this a better solution?

Separating the dependencies between programs allows you to test and release independently to allow incremental upgrades. IMO that is better.

> Horizonal scalability is simply a band-aid for poor engineering

And don't even get me started on having instances labeled "large" that have less memory and CPU capacity than my personal backup laptop (currently on loan to my 8yo for COVID reasons)...

But that doesn’t make any sense. We’re not talking about physical hardware we’re talking about tiny tiny slices of it. When VMs are the logical isolation boundary in your infra they get really small — 512 MB is a lot of memory for a single purpose server.

> 512 MB is a lot of memory for a single purpose server.

Maybe. But when the time comes 512 MB doesn't seem like much anymore, what do you do? Do you pick the next larger instance or do you split the load across more 512 MB slices of a computer?

Don't forget the ridiculously low and restricted IOs unless you pay a lot.

But you can use other cloud providers with better value.

wrt horizontal scaling. Do recall that the motivation for this strategy by Google was cheap-as-possible servers that failed constantly. Back when they built racks using legos the problem was hardware reliability. They had to 'scale' horizontally for reliability (given cost constraints) as much as load.

People have since bought into the marketing reasons for 'being in the cloud' and having 'infinite scalability' but that largely misses the point (and the pain) that caused many of these technologies and patterns to be developed in the first place.

The best example of how to scale without buying into this pattern I know of is Stack Overflow. At least circa 2016 - https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

> Horizonal scalability is simply a band-aid for poor engineering in most (not all) applications.

Maybe if your app handles < 10k concurrent connections. Otherwise it is the most cost efficient solution and exists because it solves the scaling problem in the best way as of today.

When you approach the limits of what your kernel can handle, then it may be time to split your workload across boxes or to carve smaller boxes out of your metal (and probably directly attaching NICs to the VMs to the host OS doesn't have to deal with them). Making your workload horizontally scalable is always a sound engineering choice.


Splitting a horizontally scalable workload across a dozen virtual servers that are barely larger than the smallest laptop you can get from Best Buy, you are just creating self-inflicted pain. Chances are the smallest box you can get from Dell can comfortably host your whole application.

The fact remains the odds of you needing to support more than 10K simultaneous connections are vanishingly small.

> When you approach the limits of what your kernel can handle

Even before. If you want low latency. And banks handle more than 10K concurrent every day.

Cost example: https://pt.slideshare.net/markmyers106/vertical-vs-horizonta...

Yes some banks need it As do giants like Google Facebook etc

Chances are very high that the problem domain you are working in does not.

Like the author said “1%” I think maybe 5% to 7%

The point being that masses of software is developer everyday on a cargo cult adoption of solution they do not require.

>The point being that masses of software is developer everyday on a cargo cult adoption of solution they do not require.

This is certainly true, but there is a possible benefit: standardization. Having a standard skillset allows employees greater flexibility since they can jump employers and still expect to be rapidly useful. Similarly, if your company uses a standard toolkit, there's going to be less training overhead for new hires. Now, the devil is in the details, and I'm inclined to agree that you'd be better off hiring someone that can think outside the box and keep the tooling simpler. But using the standard toolkit will work reasonably well across several orders of magnitude in scale.

10k? It's not 1999 anymore. Look at Netflix to see the state of the art in saturating NICs with commodity hardware and off the shelf NGINX plus FreeBSD.

Does anyone know of any raw stats at what one beefy server can handle with a typical HTTP CRUD app? <10k doesn't seem right?

10k concurrent idle connections is no problem, but 10k/rps is a decent amount of traffic. What is a typical CURD app? It really depends on what software you're using too. You can do >10k/rps with DB read/write on every request with PHP on a lowend server, but if you throw a heavy framework into the mix then that would not be possible.

agree with you, when you say cost efficient it means - that we can scale out our poorly written slow software to more servers to handle increased traffic, instead of hiring more expensive engineers and rewriting software properly than could handle increased load from a single instance

VMs also autoscale.

I get where you're coming from, but it turns out that plopping the entire VM into a zipfile ends up being a good way to have the kind of reproducibility that makes your operations sane. Do you pin specific versions of dependencies for your install scripts? Might as well pin the whole image. It's like 88% of the benefit of reproducible builds, at 3% of the cost, and that's not nothing.

Mind you, that's still Docker, though, not Kubernetes.

> If one of your consultants or developers tells you they can make your application faster by adding a bunch of additional computers, you are almost certainly getting taken for a ride.

Eh. There's a redundancy play in there somewhere too, if you know how to pull it off. (Big if.)

Horizontal scaling brings down total compute use if you have many distinct uses.

Imagine of your only unit of compute is a single bulky machine. You don't fully saturate it, but you need a second machine to avoid downtime anyway. Now you spin up a second or third service and suddenly you need 5 or ten machines and your compute utilization is 20%. You can pack things in tighter. But then you have a knapsack problem, and that's easier to solve efficiently with many small blocks even if it costs you a 1% overhead or whatever.

> Plopping an entire VM into a zipfile and saying "here's your software" felt like lazy engineering to many of us at the time (and still today).

I've seen this (a long time ago) in the education world market. Very small school with a STEM program. They had specific scientific software they wanted undergrads to use (and some of it was pretty proprietary and used to interface with lab equipment) + a pre-configured IDE.

Instead of going through the compatibility matrix of OS and their versions they just gave all students a VM image that would "just work". Everyone could bring in their own devices and as long as you could run an hypervisor everything would "just work".

Personally I think there is beauty to one text file that defines a reproducible build environment. Carefully controlled dependencies are quite nice.

That said, I haven't really used docker as a runtime container.

Horizontal scaling is not a _performance_ tactic, but rather about availability and cost. Having higher availability means trading off consistency, or in other words, using distributed systems. Also, you can not elastically scale vertically, without also scaling horizontally. In other words, horizontal systems are cheaper if you have fluctuations in traffic.

Performance is not the only reason to scale. Redundancy / failure tolerance.

Cassandra is a big PITA, but it hasn't gone down (knock on wood) in the 6 years I've been using it. PARTS of it have...

How many distinct services do you have in your monorepo? A monorepo is fine...

until it isn't.

> Something about it never fully stuck with me. ... Plopping an entire VM...

There's your first miss.

After that single node crashes, the app you're running on that one server will run a little more slowly..

How big should the one server that serves the whole of netflix be..?

Stackoverflow has always run on a couple of IIS instances. If you’re not bigger than them don’t worry.

You can pretend to be Netflix or Google, and build your tech-stack like they do. Or you can stop wasting your resources setting up a tech stack that you’re never going to get a return of investment on.

Why a few though? Couldn't they do with just 1..? that would make people on HN more happier it seems.

Stack Overflow is not a unit of measurement that anyone would be able to take seriously or find useful? How many stack overflows is one asana? Or how many stack overflows is one trello?

Horizontal scaling, docker K8s have their own benefits that are many and obviously to the industry. you don't need to be google to deploy and use them. If you deploy one server for each app and each team vs deploying a common K8s cluster where is the higher investment? You claim more ROI with more physical hardware and more servers?

> Why a few though?

Because it’s less dangerous and cheaper?

> Horizontal scaling, docker K8s have their own benefits that are many and obviously to the industry.

Which is why SO runs on more than one IIS...

You don’t need a tech-stack, that is apparently even too complex for google considering the article, to scale horizontally.

> If you deploy one server for each app and each team vs deploying a common K8s cluster where is the higher investment?

The investment comes from the complexity. We’ve seen numerous proofs of concepts in my country, and in my sector of work, where different IT departments spent one or two 2-5 full years worth of man hours trying to adopt a perfect devops tech-stach.

Maybe that’s because they were incompetent, you’re free and possible right to claim so, but that’s still professional teams expending real world resources and failing.

From a management perspective, and this is where I’m coming from much more than a technical perspective mind you, the most expensive resource you have is your employees. If software is so complex that I need one or two full time operators to run it, well, let’s just say I could run more than a million azure web apps, and have our regular Microsoft certified operators handle it.

> You claim more ROI with more physical hardware and more servers?

I haven’t owned my own iron since 2010. All our on-prem servers, and we still do have those, are virtual and running on rented iron.

I think we may be speaking past each other though. My point is financial and yours appear to be mostly technical. If you can set up and run your K8s without expending resources, then good for you, a lot of companies and organisations have proven to be unable to do that though, and in those cases, I think they would’ve been better off not doing it, until they needed to.

Kubernetes is not too complex. There are things to learn no doubt, but it's easy to reason about once you cross the initial learning curve.

Ofcourse transitions can fail. People can think yea let's do this small thing and end up chewing off a much bigger problem than they thought they were getting into. But that problem is in the whole of tech. "Let's just use our present people and switch from all proprietary to all open source in 3 months.." yea, best of luck with that... You need a solid team and going all in on K8s is hard, you need technical talent and leadership to drive this.

Agreed, maybe it may not be for everyone. Benefits are both technical and financial, less compute resources used, more reliable deploys, more resilient services. The problems being solved by this are not trivial. There are tangible benefits. Is it a risk? Ofcourse it is. The risk is not in the technology, the risk is in the competence of the team deploying it. If it can't change and adapt, maybe a lot more fundamental things need to change in that organization than just deploying a new orchestration layer.

My only point is, this shift from dedicated servers to VMs and now to containers is a fundamental shift in how things are done. People can hate on it all they like, but it's a better way of doing things and everyone will catch-up eventually.

> Or how many stack overflows is one trello?

I shall not express any opinion on this topic other than to say that Trello is not a good example to bring up. The entire customer base of Trello is not using a single shared board and thus they could scale in any direction they wanted to maximize ROI.

"You can handle millions of simultaneous clients without blinking. Who even has that many total customers right now?"

Apparently netflix has that many customers. Then again, if you split Netflix into regions and separate all the account logic from the streaming, the recommendations engine and the movie-content, you could perhaps run the account logic for one region in one server.

Besides just giving you a warm fuzzy feeling of running only one server.

What is the point of running one server?

Do you also object to them running in the cloud in VMs and not on physical hardware that they own? Sounds like an old man's "kids these days" rant..

Ofcourse, everything runs on servers. Question is, who owns and maintains the hardware.

The link you shared just says they manage their OS layer, ofcourse they do. Everyone running on AWS VMs is responsible for their own OS layer. Wether they want precise control over their OS doesn't change their preference for who owns and manages the hardware..

Well, in case of Netflix - Netflix run the servers - from hardware to the custom FreeBSD build as the explained in multiple talks.

Even if you run one container, containers let you use off the shelf apps instead of only off the shelf libs.

I'm not interested in being "not lazy." I only care about user value and ability to provide user value (tech debt/cost/velocity).

For the price of an M5A.16xlarge to get those cores, I can get like 27 m4.larges and have way more fault tolerance.

I think your disdain is misplaced.

Processes were the original containers.

I suppose your apps ran on windows. It isn't a problem (or at least it's a smaller problem) with windows ecosystem, especially enterprise one, since usually you're using same version and installation steps handled by infra with AD. Even without AD usually the installer and windows version is same, and Microsoft usually great at backwards compatibility.

But it's not the case in linux, or at least non-enterprise / ldap linux. Installing mysql/redis/elastic/dotnet core/etc on each machine may be different, and has different installer available.

With docker I just need to instruct them to install docker, setup docker compose and everything is handled via containerization.

Back when I was doing big boys UNIX, the respective package managers took care of everything.

Which we later replicated in Red-Hat with RPM.

Bare bones OS install + bunch of OS packages => done.

And in what concerns containers I was working with HP-UX Vault in 1999.

Agreed to an extent. Even outside RHEL, you have yum for RPM-based distros, Aptitude for deb-based distros, pacman for Arch, etc.

You need to start with the same base operating system, and you need to make sure that you pull in the same versions of packages in case there are backwards-incompatible bugs, or version bumps such that the dynamically-loaded library is no longer detected (hence the common albeit dangerous workaround of "add a symlink").

(If you're using rpm directly, then you need to bundle the actual packages that you're installing, or point to specific packages that you're confident won't change. And at that point, what's the difference between your approach and Docker?)

The challenge that I believe Docker solves (or, at least, attempts to) is environment reproducibility: without it, you have dependency hell.

Nixos fixes this issue (and then some). I wish it had won instead of docker. Maybe it took too long to become stable.

Nix and Docker are complementary, not enemies.

We use both together, since Nix is the only sane way to package Docker images, in my opinion.

Forgive my ignorance, but can't that be solved by statically linking things?

Don't you just end up with a less hackjob version of a container when you do that?

Putting on my Asbestos Longjohns: The more I look into k8s ecosystem, the more I'm convinced that it's one of those things that suits FAANG etc, but the regular Joe developer has caught on the fad and wants to add it to his repertoire, even though it's an overkill. After all no one got fired for buying IBM and recommending Kubernetes. Most teams need a simpler deployment strategies that other's have succinctly mentioned elsewhere in this page.

Kubernetes solves a very real and significant problem. But before you start using it, make sure you have the problem it solves.

If you're looking to have your small app eventually grow into a large one, read up on K8s and just make sure you're not blocking future-you from making your app work on it. E.g., work well in a container (which is useful for automated testing, deps management, etc), have a simple 'ping' endpoint to make sure the app is up, have a better config story than "recompile to change these variables", use a logging library, and tolerate any other services you're using to sometimes be down.

All useful things for a grown-up app to do anyways, all a bit of a PITA, and all better than trying to operate an app that doesn't do them.

Exactly this. Kubernetes is a service orchestrator, not a hosting platform. Those getting caught up in it just want a hosting platform, but those getting value out of it want a service orchestrator.

If you have one monolithic backend service (and most web applications really should start out this way), Kubernetes offers almost no benefits over alternatives.

Putting on my tinfoil hat: it suits FAANG to have potential competitors burn their runways on baroque tech fads like Kubernetes or <insert-react-state-management-architecture-of-the-month-here>. Extra credit if they end up hosting their overly complex solution on your platform.

I once have been told that development teams smaller than 20 developers have no business in using k8s, due to the complexity to brings. If something as essential as the infra so complex it is not readily understood by everyone on the team, a few (more than one) team members need to become the experts on the matter. For small teams this is simply not worth it.

As part of a small team currently using Kubernetes, I suspect it’s more how you use it - the tools and ecosystem have matured immensely in the last couple of years since I’ve first started using it.

I don’t think it suits all teams and use cases, but for us it’s absolutely fantastic and without going down the rabbit-hole of cloud-provider specific tools and recreating half the issues it solves, I’m not super sure what we’d use.

Agreed. I'm a solo technical founder and have been using k8s for all my hosting for 3+ years. It's so easy (for me) that I'm fine paying a premium for the managed service (GCP) since it saves me lots of time, my most valuable resource.

I've already climbed most of the learning curve so YMMV, but as a team of one and dozens of WordPress, MySQL, and bespoke app servers, kuberenetes makes ops manageable so I can spend time on things that really matter.

Deploying new web apps is trivial, declarative manifests are easy to reason about, TLS certs are issued and renewed automatically (cert-manager), backups are cheap and reliable (daily GCP snapshots), making changes to the cluster via declarative terraform is a breeze, etc etc. No way I could manage all the ops without leaning so heavily on the core foundation provided by k8s.

Curious: do you have a single workload (like a WP site) that requires more than one physical computer in resources?

I think that's the first thing with k8s: it all starts with an app that requires several physical nodes.

As in - a single workload that can't fit on a single physical machine? No I don't, although I certainly could if I needed to. Most of my workloads are either low-traffic WP sites or bespoke web-based business tools for clients with very bursty traffic.

Most of the value I get from k8s is the hands-off nature of it - I get slack notifications (prometheus+alertmanager) if anything is happening I need to address (e.g. workload down, node down, API not responding, etc). Otherwise I can safely ignore my cluster and know everything's good. Spinning up a new WP site takes 10m with backups, TLS, monitoring, etc built in.

I’ve run a production kubernetes cluster that was hosting a DGraph cluster of 3 machines on its own, some ML workloads, and 4-5 products (each consisting of multiple services) and that was more than a single machine would have been able to handle.

Well _technically_, sure, we could have run a bunch of those products on a single machine, but there goes your durability and the memory overhead on some of them was quite Hugh, and properly fitting them onto a single machine would have required more optimisation and technical skills than the devs I was working with had or were inclined to do.

Definitely; if you as a company want to do Kubernetes or even cloud services (beyond the easy managed service like Beanstalk or GCE), you need to have a dedicated expert on it. Or more abstractly, one full time unit (can be distributed). If it's some guy's part time hobby it will not work.

I think this is what went wrong with k8s. I saw lots of interests from hobbyist, and people proposing k8s in small teams. It become often hear that "you do containers in production? use k8s!" That's just a big disappointment waiting to happen.

If something as essential as the infrastructure is so complex that you need a dedicated expert on it, it's bad infrastructure. To take a offhand analogy, you don't need a dedicated highway maintainance engineer in order to drive your car.

I think Kubernetes in principle gets a lot of things very right - but it has over time grown into this huge amorphous blob of complexity that makes it very easy to shoot yourself in the foot with, as many people said :)

That issue is not endemic to Kubernetes, but rather to any larger system past a certain age, you learn stuff as you go along and would do stuff differently if you did it again today - but you can't easily, because you cannot break compatibility for everybody using your stuff.

As a concrete example from the Kubernetes world, there is a talk by Tim Hockin [1] about how today, they would fundamentally design the api-server differently and base pretty much everything on CRDs.

[1] https://www.youtube.com/watch?v=ji0FWzFwNhA

The industry and the k8s project are still figuring out the right way to do things that don't require the organization, size, and technical choices Google made.

A friend of mine is a contributor to k8s itself, and of course, this all comes incredibly easy to them. Following their recommendation, I gave it a shot for my single-person, single-node (!) homelab, all without using MicroK8s, k3s or similar.

After a week of almost full-time work, I threw in the towel. Admittedly, I also had to learn concepts like reverse proxies alongside, too, so I was by no means well-equipped to begin with.

Yet, tossing together some docker-compose.yml files and "managing" them with a Python script has worked very well. Kubernetes really scarred me in that sense, and I am healed! Also, Caddy has helped me in actually enjoying configuring the webserver.

Wait so running a kubeadm init and then removing a master taint took you whole week to figure out? How long ago was that?

No, setting up ingress did. Couldn't get reverse-proxying to work.

Ah ok. For a single node homelab setups I just throw everything on hostNetwork, second choice is NodePort (if there are port conflicts). In general k8s ingress on baremetal requires deeper understanding of its network design

I would (probably) spin up an ingress-controller on ports 80 and 443, using hostNetwork, then use Ingresses from then on (and as it's a single-node cluster, just create a wildcard DNS A record, and possibly an anchor for other CNAMEs to point at (depending on DNS server) pointing at the IP said ingress-controller is running on).

Does mean that anything that upsets the ingress controller is an outage, but for experimentation, that's probably OK.

Yeah, of course running random docker compose files and containers from the internet and blissfully exposing your mongodb or whatnot service unsecured to the whole world seems like an easy, non-complicated alternative. Kubernetes has a few shitty defaults, like exposing a service account for all pods by default or allowing to mutate pod image tags, but most of the functionality it provides is a must have when you actually care about your SLA. Rolling updates with health check and configured back-off time? Separate ingress for OAM and live traffic with automatic HTTPS, etc? I could go on.

Are you having a bad day?

I am talking about a homelab, a single server at home, for home use. It's much safer now, with Docker compose, because I understand it and I wrote the core exposed part's configuration, the Caddyfile, myself, manually. I know exactly what's exposed, and it's exactly right the way it is!

The remaining risk comes from the services themselves having security holes, but k8s has that very same risk.

From my experience it's actually sold as a simpler alternative to other infra provisioning. So you end up with situations where a team deploys whatever with a helm chart, and it sets up the stuff like magic and they build on it. Then when something goes wrong they literally have no idea how to fix anything and it becomes a waking nightmare.

As a longtime and frequent user of k8s, I stay away from helm charts. I tried them out when they first got popular but I found they introduced more friction than they solved on the whole.

Not every addon/tool for the k8s ecosystem is worth it. I also don't bother with the ever-growing list of service meshes... not enough value to me for the overhead.

K8s is definitely the simpler alternative for me but there is still a lot of essential complexity in k8s due to the nature of the problems it's trying to solve. Mostly I like building on top of a solid foundation of standardized k8s API objects (pods, services, volumes, etc).

Tldr; Bring in only the add-ons and tools you really need so you don't add more complexity than necessary. Don't get swept up in the hype and marketing from other devs and cloud vendors.

This is such a great point and so frequently skimmed over in k8s discussions. We as tech folks tend to focus on the front page blog posts about 1000s of nodes and all of the orchestration that goes into complicated top 1% high-traffic/high-complexity use-case setups. In reality there are a lot of profitable businesses out there happily running a simple cluster set up with a single-digit number of deployments chugging along on it with zero down-time.

Really when looking at tools in the k8s ecosystem, it's better to approach it as you would importing a new library into your application. Most decent devs wouldn't blindly import a new lib so that they can copy/paste a single line of code they found online for a business critical function, and k8s tools should be no different. We must think about what value does a given tool bring, and is it worth the cost of learning/maintenance? Sometimes the answer is a resounding "yes", but too often the question isn't even asked.

I like Kubernetes, I don't overly like Helm charts because yes, they work, but you can install one without having to think about what's it putting in your cluster.

Also, I don't much like Go's templating syntax.

> The more I look into k8s ecosystem, the more I'm convinced that it's one of those things that suits FAANG etc, but the regular Joe developer has caught on the fad and wants to add it to his repertoire, even though it's an overkill.

You are absolutely spot on because this is how not to pass the behavioral interview for Engineering Manager.

but the regular Joe developer has caught on the fad and wants to add it to his repertoire, even though it's an overkill

There are very strong financial incentives for every individual developer and sysadmin to adopt Kubernetes, regardless of the impact it has on the organisation as a whole. In a sense this is engineering reaching the level of corporate maturity of the sales department who will optimise everything for their commission regardless of the organisations ability to deliver it at a profit, or even at all.

I'm sure there's a name to this phenomenon. Companies want stable software, and Regular Joe want better pay, but companies won't pay unless Joe starts doing crazy complex stuff that complicates things further.

Regular Joe learns complex stuff at your expense. He then leaves for greener pastures and higher pay thanks to the boost to his resume. You are then left with complex stuff you need to maintain and so you have to hire another Regular Joe for a higher salary than your first Regular Joe.

The name is "poor management of resources.". Regular Joe should move on before trying this masochism.

> There are very strong financial incentives for every individual developer and sysadmin to adopt Kubernetes, regardless of the impact it has on the organisation as a whole.

Then that organization is doing a terrible job of aligning incentives. I'm guessing their pay structure isn't terribly merit-based nor high enough that people aren't constantly thinking about other jobs.

If this is about FAANG (your comment wasn't, but others were), perhaps part of this is exposing larger problems in many smaller orgs. (note: I'm ex-FAANG and happily so)

Sorry to have to be the one to tell you: sometimes architectural decisions are driven by factors other than YAGNI. Right now you have throngs of young developers paying $50k+ a year for the privilege to learn how to use Docker and Kubernetes while in college, and when 90% of them inevitably get rejected from FAANG after graduation, you'll be able to hire them on the cheap and entice them with development stacks they're comfortable with.

In my opinion, k8s starts to shine when you have to manage hundreds of containers. When you have just dozens of them it's an overkill, but there's no way to smoothly slot in another solution between "docker-compose up -d" and spinning up a k8s cluster: you will (or think you will) hit a maintainability ceiling again and have to migrate to k8s.

There actually is, Hashicorp Nomad fits solidly in between those two options.

Nomad is way simpler to get a cluster up and running, has a great configuration syntax (I'll take HCL over YAML anyday) and had first class Terraform/Consul/Vault integrations.

Onboarding devs is fairly straightforward, if they can write a docker-compose.yml, it's an easy transition to a nomad job specification.

It took me by myself ~4 months to get our current hashistack(Vault/Consul/Nomad) stood up using Terraform+ansible. Two members of my team have been working to replace the hashistack with a self hosted K8's deployment and they just went over the 1 year mark and we still do not have something capable of hosting the workload currently running on the Hashistack.

This got a little long winded but I feel like this "it's docker compose or K8's, take your pick" mentality had led to a bunch of needless time being spent by smaller teams/companies on solutions that just aren't right for them.

What I think K8S (EKS, GKE, DO hosted environments at least) provide is a nice way to integrate things like Gitlab. This gives you a really easy to use CI/CD pipeline for very little work and configuration. This allows you to deploy production from your main branch and spin up feature branches that can be tested by the people that requested the feature very easily. This does not require an additional effort once the system is setup.

Also you can get red/green deployments and rolling deployments with little to no effort, which can be very nice, nice to have.

I think that an important distinction is between deploying on k8s and operating it. For a small team (not measured in the dozens), the latter is unaffordable but the working style of the former is still powerful.

This feature helps a lot with that problem by bringing GCP closer to where AWS has been with Fargate. k8s will still be more work than using AWS ECS but it might also be preferable if you dislike using the provider’s components and want the control of, for example, doing your own load balancing and storage management.

k8s and its ecosystem represent a data center in software. Data centers are fairly complex constructs. It is then to be expected that this complexity will shine through in k8s' API, UI, UX. k8s' main mission seems to be to provide a complete digital data center, not an easy to use one, and I would argue that that is exactly the right choice. Over time, as the core of the beast is figured out, there will be (as there have been) more and more opportunities taken to actually help users navigate that complexity and/or resolve it into more natural and less error-prone interfaces. But in the meantime it seems like it's mostly on the community to provide (usually temporary) solutions for the most pressing usability concerns.

That reminds me of Basecamps article about "The majestic monolith" https://m.signalvnoise.com/the-majestic-monolith/

Kubernetes has to be most complex software I've ever tried to learn. I eventually gave up and decided to stick with simple single machine docker-compose deployments. I figure by the time any of my personal projects actually need to scale beyond 1 machine, I'd probably have enough revenue that I can afford to hire someone else to worry about it.

It's the right attitude. The management fees alone for this autopilot thing are 0.10$ per hour. Or about 70$/month. It's a bargain considering all the hidden costs kubernetes imposes in terms of requiring people that know how to tame the complexity associated with it (i.e. very expensive devops people costing magnitudes more than that). Automating those people away is worth money.

I like Cloud Run for the same reason because I can use it without needing a lot of devops skills in my team or without sacrificing my own time (because I have those skills but have more valuable things to do). It allows me to focus on keeping my CI/CD pipeline (cloud run sets that up with a button click) busy with new functionality. And our hosting cost are close to 0$ because we stay below the freemium layer until we actually need to scale.

Edit. corrected the typo 700->70

Hey, it's William from Google here. You're right about the costs, I just wanted to point out that Autopilot does include one cluster in GKE's free tier. So you'll only pay the ~$73/month if you have more than 1 cluster.

There's (almost) no limit to what you can run in one cluster too, and Kubernetes namespaces can help to separate different environments to allow for sharing.

Cloud Run sounds like the perfect solution for your workloads though!

yep, cloud run is pretty good. unfortunately, it doesnt cover all cases. (i.e. stateful stuff like: websockets and chunking, and recurring jobs)

for these case, i still have gke cluster around.

Nitpick FYI: WebSocket is coming. https://cloud.google.com/blog/products/serverless/cloud-run-...

(Still agree that Cloud Run isn't for everyone.)

I think you mean 70$, not 700$

Eh yeah, thanks for correcting me.

Long time ago I worked on HA setups for telecom (Wimax/LTE) equipment. Kubernetes is complicated but has nothing on those systems. Just to give you some idea - https://www.metaswitch.com/hs-fs/hubfs/Blogs/3gpp-ts-23-228-... (doesn’t even cover everything)

The very term "HA" still gives me nightmares. It can be very hard to get HA to work correctly. Many years ago, I worked in a startup and one of our main offerings was an HA network device. It was unbelievably finicky to get it to work in the first place and even harder to update the software on an HA cluster.

It's HA by intimidation. The cluster is complex enough that nobody even wants to touch it, and since human errors are the most common type of error out there, it breaks much less often.

Yes I believe this is why you see things like k3s in some iot/edge deployment scenarios. Because other alternatives for HA like OpenSAF have been severely lacking for years

I’d recommend giving Docker Swarm + Traefik a shot. It’s dead simple to set up manually and has very little “magic” in how things work under the hood. Plus much of your existing Docker Compose config will work out of the box. It vastly simplified the deployment process too.

I previously avoided Docker Swarm for ages since I assumed it involved the same level of complexity as k8s. I also initially figured that managed k8s would be a safer bet than managing my own Swarm cluster, but if you’ve used anybody’s managed k8s (or read https://k8s.af), you’ll realize that every cloud provider has their own closed source fork of k8s with plenty of nasty bugs that you can’t do anything about.

Docker Swarm is pretty much abandonware/on life support at best, so one should avoid using it for new stuff.

Hashicorp's Nomad is the best choice on the complexity for features scale IMHO, and that's why i'm writing an article how great it is, how easier some things are and what's missing compared to Kubernetes.

And yet, Docker Compose is pretty popular for local development, so much so, that it's not uncommon to find a docker-compose.yml in the repositories for many open source projects. And Docker Swarm builds on that, by bridging the gap between Docker Compose and multi-server deployments, with tools like Swarmpit and Podman for easier management of it as well, much like Rancher does for Kubernetes. I agree that Docker Swarm isn't developed as actively as it should be, but disagree that it should be avoided and disagree that it should be allowed to die. In my opinion, it's a more minimalistic and more sane approach to container orchestration with minimal up front investment (just install Docker and edit your Compose files a bit with deploy constraints, you're ready to go).

Hashicorp's Nomad is good if you have a strong engineering department or need to run mixed workloads (e.g. both containers and native processes) because its abstractions are well suited for this, but HCL, their DSL for describing deployments, doesn't map nicely to neither Docker, nor Docker Compose files, knowledgebases or tutorials. Nomad's integration with Consul is a major boon, but the need to run your own CA for safe communication between nodes, Nomad's read-only Web UI, and the oddity of HCL at times also makes it a non starter for me and some other people.

At the end of the day, these are just two data points, sadly the job market for Kubernetes also dwarfs everything else and sadly many companies will be burned by this and will learn nothing at the end of the day. Ideally, i think that the best route would be evaluating the orchestrators and other technologies that you want to use by doing pilot projects and such, and looking at them in real world circumstances, to determine their fit for your goals and needs (Web UIs will matter for some, but not for others, for example; as will onboarding and the need for long term investment vs plug and play).

Edit: as for Kubernetes, personally i find the K3s distribution to be an almost reasonable alternative to Swarm/Nomad, if the situation calls for it: https://k3s.io/

Edit #2: it would actually be pretty awesome to read more about your experience in this article that you're writing!

Absolutely. We use Swarm in production at ecoeats and it's a dream for simple clustering with multiple services. Using Hetzner clouds volume plugin gives EBS-like functionality too.

> Nomad's read-only Web UI,

Nomad's UI has a good number of functionalities that can be controlled through it. Sure, there are some more lower level operations that are CLI only(though this seems to be something they are actively working to improve on) but most of that probably won't be needed but someone just trying to run a couple containers on a single node.


> Swarmpit and Podman

I actually meant Swarmpit ( https://swarmpit.io/ ) and Portainer ( https://www.portainer.io/ ).

Podman is another container runtime that acts as an alternative to Docker (even if it is not feature complete), so i misspoke.

We use swarm for a small cluster in production a well. Extremely easy, zero downtime deployments are fantastic. I can explain it to someone else and quickly get them up to speed. Having said that, the fact that it seems to be on life support has made me look at other alternatives, even a simple docker-compose per node.

I quite like swarm for its compatibility with docker-compose, in particular to have the option of zero downtime deploys. If I wanted to manage a real cluster though I'd probably use nomad or GKE to avoid getting burned when the system is under load.

I use Swarm at home (only on a single node, because it turns out 3 are overkill for my needs) and it's been running great for 6 months so far. Before that I tried various incarnations of k8s and eventually they'd just destroy themselves up and require a rebuild (the main issue was persistent storage).

My only complaint with Swarm is there isn't an easy way to expose containers directly on the network (like host networking). I have a few containers (wireguard and minidlna) which need this, so those are running through docker-compose. I've tried macvlan but wasn't able to get that working in Swarm mode.

Did you try this?

    - host

Maybe it has changed since I built it, but I wasn't able to get this working with Swarm services. I had to convert them to docker-compose to make it work. The docs suggest it should work with Swarm mode though, so maybe I need to try again.

Ideally I'd like to give each service its own IP on the network, which was possible with how I had k8s setup.

Mind sharing a bit more on your home setup?

Asking because I never saw the point in multiple container replicas for simple self-hosted stuff. One container each has served me well so far [0] (Nextcloud, Bitwarden, GitLab), and if they crash, they just get restarted. Multiple containers increase throughput, supporting more users, is that it? It just sounds nightmarish in regards to storage and parallel, conflicting writes.

[0]: One container per component (web, db, cache, ...).

I do have just one container of each thing, but I was originally planning to have multiple nodes. I have 3x ThinkCenter Tiny M73 (with the Pentium CPU) and thought one would be a bit underpowered for everything I wanted to run (it was for k8s), so was planning to distribute services automatically across the swarm. One node is more than enough though, so I'd actually be fine with just docker-compose, but splitting everything up into separate 'services' is nice.

This is a honest question, but may I ask what you found to be hard about it?

For long I was scared of it because so many people say it's crazy complex. But actually it took no more than a dozen of hours for me to learn it and get a working setup on aws.

Maybe being full stack and having a strong knowledge of Linux and Docker helped.

Now I'm not pretending to be an expert with it, and there are certainly traps and mistakes that I didn't experience yet. But I don't understand what people find to be hard about it.

Maybe your starting instances are workhorses, but if you go from something like a t3.medium to two it’s a ~$60 / mth increase... Not something I’d personally optimize for.

Also why Docker in the first place? I’m genuinely wondering - in the stacks I run (Express / Python) it doesn’t seem necessary at low scale. Elastic Beanstalk, Heroku, Digital Ocean etc all offer facilities for single-command deploys that work out of the box.

I like Docker for the 'keeping it clean' aspect. Install php, composer and stuff juat because one of the projects you host requires it? Nope. Have to make excessive configuration on a system component just to run another application? Nope. Forgot how you set it up and now you are struggeling at the new machine? Use the Dockerfile.

vagrant and ansible can help.

I use Docker without the network virtualization as a package manager.

Docker make it easy to run the same version of code in different places and let’s things run next to each other without version conflicts.

Also, I think you’re in a very small minority not to care about $720/yr increases in your hobbies.

GP was talking about projects that had revenue, and about "hiring someone" past a single instance.

I replied that beyond a single instance, you can probably get away with not hiring a K8s devops person and just spinning another instance. I'm not sure you've read this whole thing right.

And yes, I certainly wouldn't mind paying an additional $720 / yr for a project that had revenue; I almost certainly wouldn't want to spend money hiring a specialist, or spend time hyperoptimizing that myself - I make that in about a dozen hours of work, so counting how far one can go down the rabbit hole of optimizing server costs, and the associated cost of opportunity, the economics are crystal clear.

I don't have any successful personal projects but I have significant experience working with clients, and they are sold on the reasoning pretty much every single time ("I can charge you $3,000 for developing this feature, or we can use a paid service for $720 a year").

I also don't see how Docker is going to save you that much money; if you need a certain amount of compute, you need a certain amount of compute. AWS ElasticBeanstalk for instance charges nothing for spinning up an additional instance compared to EC2; there is no overhead for the PaaS aspect of it, like there would be in Heroku. Digital Ocean app platform is the same as EB, AFAIK.

GP talked about using Docker to do complex single host deployments until they needed horizontal scaling, which given max VM power, is after they can afford someone to manage it for them.

That makes me think they have multiple services of variable workload packed onto a single host, eg web server, async, and DB all on a single host via Docker.

That’s the antithesis of EB, which can only do horizontal scaling. Docker provides a way to replicate those multiple services in a deployment configuration when you want to set up a host image.

Not using Docker as an easy way to pack it all onto a single box as long as they can is just wasted expense.

I’m leaning towards using Ubuntu/systemd to start all my services, and using a single container per project with SQLite.

This way:

- I can easily move from dev to prod by using the private container registry

- I have apt-get on the server and just use the default Ubuntu

- No distributed/network state

I used to use Core OS, but all these container OSes are here today and gone tomorrow, and normally have their own config standard. At least with Ubuntu they have LTS and a bunch of Google pages for fixing stuff.

systemd-nspawn is the most underrated piece of software on any modern link system. It just works and does exactly what it needs to do. Not to mention using a directory structure as a container image?! What is this sorcery?!

Yes, k8s is not built for solo outfits and small projects.

In places with 10+ developers and 10+ services on a single cluster it works surprisingly well from the user (developer) side.

Hmm, I guess I should shut down my business then since I'm a solo founder using k8s exclusively for my hosting for 3+ years...

Definitely a learning curve but honestly not bad if you like ops. Absolutely possible and beneficial for any size team. As always, it depends on what your goals are and how you want to use k8s.

In my opinion (and I'm biased as I work on GKE Autopilot), GKE is viable for a 1 developer project, especially with Autopilot mode (I have my own hobby projects deployed on GKE).

If you were self-hosting k8s, then I'd agree with you.

I think it is more like: 10+ servers OR 10+ developers. you can have 10+ services as a single developer ;)

docker swarm is a good alternative for you then, comes out of the box with docker and you only need to add a few lines to your docker compose files to make them swarm compatible

> simple ... docker-compose deployments

So, not simple.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact