> I still don't totally get why the shift happened when it did. Five years ago all three camps were doing fine. Now the VM+systemd crowd has basically disappeared from job postings, serverless stayed niche, and K8s just won.
>
> My best guesses: managed K8s (EKS, GKE, AKS) got mature and the talent pool flipped: enough people learned it that hiring for anything else became the harder choice. And Helm made "just use someone else's chart" a real option. But I'm not certain. If you were there for the shift and have a better theory, I'd genuinely like to know.
Pretty much, almost. Have spent a bunch of time in my career working on the "VM + systemd" setups, stuff running on a rack, or in an ec2 on cloud - managed kubernetes is a lot better for me than those cobbled together messes. There's "easier" setups but usually end up costing me a lot more in time and $.
To answer simply, it became good + convenient. I could complain about plenty, and people here like to, but honestly you couldn't pay me to go back to the old way. The one legitimate gripe is the upgrade schedule is exhausting, on AWS it's about every 6 months before you go into extended support. I also hate being at the mercy of arbitrary decisions like "ok we know a huge chunk of the web going back a decade has architected off our Ingress API, but recently we decided we dont really like that way anymore and we want you to use Gateway API instead, so, um, like ya we know it just killed off one of the most used open source ingress configs (ingress-nginx) but yea trust us bro this is going to be so much better" kind of thing.
The upgrade cycle is a feature, not a bug. If (when) you need to do a big lift and shift, or there's some 0 day CVE, push buttan, get security update. You CAN drift behind but there's a real $$$ cost to that now. Every three months I toss opus at my k8s stack and verify it's compliant with k8s v1.xx.y and then push the upgrade button on my staging cluster, and then a week later I push the upgrade button on my prod cluster. What used to be two days of maintenance every quarter is now more like 2-5 minutes spread across the two upgrades.
I'll admit I'm dreading switching over to the gateway api, but by the time I get forced off ingresses it should be a stable/mature ecosystem. That's still a ways out though.
I don't know anyone still dealing with VMs anymore, except our IT guy who manages a couple of pet servers for random executives from the before times. In the last year k8s has started absorbing executive pet processes and the number of VMs our IT guy manages has dropped by about half.
While I'm here spouting stuff, yeah hiring for k8s is real easy, if our SRE gets hit by a bus, he can be replaced in a week, and we can probably struggle through using opus until that happens. K8s being he lingua franca of git ops IaC makes it real easy for the new guy to parachute in and start working. Every VM thing is going to be totally bespoke and have the personality of the guy who designed it, which is rarely a good thing.
The gateway api people have clearly won and I can’t truly complain because I’m not a maintainer, but I have contributed in the past to a nontrivial part of the tooling built off this ecosystem. The issues with snippets/annotations are a core deficiency with k8s design and eliminating this api creates more problems than it supposedly solves. I have been working on solutions of my own preparing for this inevitability, but it’s rough. ingress annotations like it or not run the modern infra tech stack. if they are persona non grata at any point in the future, a lot of people are going to have a lot of urgent consulting problems in the near to mid distant future.
I to this date have not seen a viable drop in replacement to how I’ve seen big orgs use the ingress controller stack with the gateway api and what i understand currently is ingate is basically DOA.
I somewhat agree with you... but it's not like you don't need some actual experts who know what they're doing, especially when stuff goes bonkers and it will go bonkers.
Even on AWS EKS, you will run into bullshit with their network overlay. Egress policies are a mess (at least half a year ago, you were not able to say something like "allow pod A to egress traffic to service (!) B" despite a service resolving down to an IP address in the end.
And that's before going into the unholy mess that is getting connectivity to and from the external world to your cluster. Cloudfront, ACM certificates, ALB, ALB-EKS integration, Route53, Route53-EKS integration, EFS, EFS-EKS integration, EBS, EBS-EKS integration, RDS, RDS-EKS integration, IAM-EKS integration, SSM, SSM-EKS integration, autoscaling... and if you want more pain and don't already wince, try setting that up across regions or, as I had to do once, across account boundaries.
Kubernetes is powerful. But do not make the mistake of assuming it's easy to get started with, at least on the admin side. Even if you got prior AWS experience, getting it all integrated into EKS so you don't have to deal with Terraform and helm/k8s for a full deployment of a piece of software will take you an awful lot of time.
For users though? It's a breeze, I will admit as much. Everything down to the firewall rules can be encoded in k8s spec files.
If you struggle with any of that (a lot of what you listed is not strictly necessary to running managed kubernetes, specifically EKS) you are also going to struggle with a lot of other things on AWS, or wrangling a VM setup at any kind of scale.
> a lot of what you listed is not strictly necessary to running managed kubernetes, specifically EKS
Oh it's not necessary per se but if you want to host a web service with any sort of state and not having to do stuff in parallel either by hand or by terraform, I'd consider the integrations pretty vital.
It's easy enough (well, it's still addons whose versions you have to keep updated each on their own) once it is set up, but getting to the point where you have something reproducibly running for the first time is annoying as hell.
I think the best supported and most mature pattern on most big cloud providers is precisely
> do stuff in parallel either by hand or by terraform
…specifically by terraform. Making k8s own the provisioning and management of external infrastructure on principle (as opposed to when that makes sense, e.g. load balancers/gateway/CSI providers) is not a good approach. Sure, it feels unified, but the cost of unification is incredibly not worth it.
> Sure, it feels unified, but the cost of unification is incredibly not worth it.
That's the cost I was talking about. It is indeed annoying and time-consuming to get it set-up once, but once it works... it is amazing for developers to have the ability to spin up a completely identical to prod environment for a hotfix branch to test stuff out, with no involvement from ops or anyone else.
And also, it's much easier IMHO to get a mental image of how a system is constructed when it's one architecture - no matter if it's k8s/helm or Terraform. But as soon as you have both in the mix, you get friction issues, you have to pass stuff from Terraform to Helm or vice versa... and may God have mercy upon you if you also have Ansible in the mess, I had to do that once for a piece of proprietary dependency that would not have been supported by the vendor in any place other than a SLES bare metal server.
Yea, I used to believe this too, and still sort of agree - I got so tired of the argument in maintaining k8s infra in terraform I gave up and wrote what is essentially a terraform wrapper module around helm. The charts break terraform quite a bit sometimes, so you have to keep it simple, and god help you if you want to use CRD's, hashicorp providers have the notion no one actually needs those.
I had dismal hopes of it working for very long but it's remained mostly untouched going on 3 years now which really surprised me, and it's been easy to work with. I think if you run EKS resources like node groups, autoscalers, LB type of resources in the same state file as helm deployments you're going to have a very bad time though.
What I've seen more than anything else is that Kubernetes built an ecosystem (of contributors and users, but also of companies invested in its success) that none of its competitors could or would. There was apparently a faction within Google that believed open-sourcing Kubernetes was a mistake because Google would have made more money keeping it in-house, but in terms of the success of the project I think it was entirely the right call, as was creating a foundation to maintain and promote it. Look at the history of its competition:
* DC/OS was always its own thing and as time went on, eventually Mesosphere was basically the sole maintainer of the underlying Mesos. Very little external contribution.
* OpenShift was different from Mesos and basically maintained only by Red Hat from the Makara acquisition (sometime in 2010 I think) to about mid-2015 (i.e. the point where they ripped out most of the OpenShift-native process isolation and orchestration and replaced it with Docker and Kubernetes). Pre-Kubernetes OpenShift frankly struggled to catch on and again, basically everybody who cared about developing it worked for one company.
* CoreOS was developing fleet in the open but dropped it outright when Kubernetes was released. The phrase I heard there was "We started to say something and Google finished our sentence." They pivoted to Kubernetes for orchestration so hard it was kind of awkward talking to customers who used fleet after that. In theory somebody could have picked it up like Kinvolk picked up rkt for awhile (and later CoreOS Linux as Flatcar), but as far as I know nobody ever made a serious effort to do so.
* Docker released Docker Swarm shortly after Kubernetes was released -- yet another one-company product. (I still don't really understand why they released Swarm -- for simple workloads, Docker Engine and Docker Compose were enough, and for more complex ones Docker Engine was, at that time, still the sole underlying runtime in Kubernetes. There were already two distinct orchestrators on the market, one from a much larger company with a lot more operational experience running containerized workloads than Docker had. What was their thought process?)
* HashiCorp released Nomad well after Kubernetes but not only was it another sole-corporate-maintainer orchestrator, it deliberately omitted a lot of the basics Kubernetes included like service discovery in an effort to stay simple -- so in very few cases was Nomad alone actually enough to orchestrate workloads (nor was it intended to be, as the Nomad engineers in the ~1.0 days would have been first to tell you). Past a point this made Nomad more work to get running and keep running than Kubernetes was.
The flip side is, I don't think a purely community-developed orchestrator would have won, even with a foundation backing it. It's not the corporate backing that's the issue, it's the lack of diversity in that corporate backing.
Pretty much, almost. Have spent a bunch of time in my career working on the "VM + systemd" setups, stuff running on a rack, or in an ec2 on cloud - managed kubernetes is a lot better for me than those cobbled together messes. There's "easier" setups but usually end up costing me a lot more in time and $.
To answer simply, it became good + convenient. I could complain about plenty, and people here like to, but honestly you couldn't pay me to go back to the old way. The one legitimate gripe is the upgrade schedule is exhausting, on AWS it's about every 6 months before you go into extended support. I also hate being at the mercy of arbitrary decisions like "ok we know a huge chunk of the web going back a decade has architected off our Ingress API, but recently we decided we dont really like that way anymore and we want you to use Gateway API instead, so, um, like ya we know it just killed off one of the most used open source ingress configs (ingress-nginx) but yea trust us bro this is going to be so much better" kind of thing.