It's taken us a little over a year. Partly because K8s has a steep learning curve, but also because safely transitioning services without disrupting product teams adds a lot of overhead.
The investment is already yielding great returns. Developers are happy. Actual quote: "Kubernetes is the biggest quality-of-life improvement I've experienced in my career."
The same is true for me as a DevOps engineer. I never have to write a script to orchestrate a rolling deployment ever again. I'll never have to touch Puppet or wrangle OS upgrades. I no longer need to use Terraform to scale out a service.
Now that we've completed this transition, one of my goals for the next year is to get the team to a place where we spend 75% of our time on higher-level projects -- working on developer tools and Kubernetes extensions and setting site reliability standards, instead of the firefighting and operational work that has traditionally taken up so much of our time. Couldn't be happier about this change.
... That is until management decides we're overhead and fires us to free up budget for feature developers. ;)
I'm on the DevOps side as well going through the same transition, k8s also allows insane customization, and I have some colleagues that are delaying our rollout unintentionally so they can play around with developing more tooling for deployments which is really frustrating. The k8s scene seems to be filled with constant scope creep and refactoring to get it just perfect before use. Either way, I agree the benefits far outweigh this annoyance that I've experienced. I'm so excited to work on developing tooling instead with my time.
However, I don't think we're free entirely from managing servers the old way with Chef / Puppet / Ansible, unless you're purely hosted there's still the rule of thumb you shouldn't run services that hold state in k8s. But with persistent vol's I do see that changing, though I'm not sure if everyone agree's that's a good idea.
My impression is that the primary purpose of Kubernetes is to give SRE teams political air cover to rewrite a lot of their existing processes. Whether Kubernetes is actually required for that, or even net superior seems questionable. This unsexy work becomes justifiable because it's coupled to a mainstream accepted tech modernization.
You see this same phenomenon with database migrations. Where what the team really needed to do is just rewrite an app to use the existing database properly. But no one is going to approve that work. So what happens is people convince themselves that the existing tech sucks and use that to rationalize doing the rewrite. The result ends up not always being net superior, because sure you did the rewrite but you are also eating the operational cost of integrating a new technology into the org.
I’ve also seen a switch from rdbms to Hadoop because a company had “millions” of rows. Luckily on this one I only had to rewrite a handful of queries.
Wat. That's gross. It probably costs them more per query now than the rdbms did.
That certainly is a thing that happens, but you could use that to dismiss any technology at all. In the case of Kubernetes, it makes operations a lot easier to the (important) effect that the development teams can do a lot of their own operations work. This is important since they're the ones who are empowered to solve operations problems and it also eliminates the blame game between ops and dev. Further, it eliminates a lot of coordination with a separate ops team--the dev teams aren't competing to get time from an ops team; they can solve their own problems, especially the most common ones. This also has the nice property of freeing the SREs to work on high-level automation, including integrating tools from the ecosystem (e.g., cert-manager, external-dns, etc).
Kubernetes certainly isn't the final stage in the evolution, but it's a welcome improvement.
No, you can't; you need three (-ish) factors:
1. The technology is sufficiently incompatible with what you're currently using that you need a rewrite to use it (eg, this generally doesn't happen with gcc -> llvm, for example).
2. The technology is sufficiently (faux-)popular that it's possible to convince a pointy-haired boss that you need to switch to it (eg, this won't work with COBOL anymore, though unfortunately it successor Java is still going).
3. The technology sucks.
And really, if you want to dismiss a technology, point 3 ought to be enough all on its own (particularly since that's presumably the reason you want to dismiss that technology).
Anyway, we're trying to assess Kubernetes' value proposition (i.e., to answer "does it suck?"). If your system for answering that question depends on already knowing the answer, it's not a very useful system.
Well, I'm not, since I already know that, but if you don't know that yet, then your position makes more sense. (That is, using "dismiss" in the sense of finding out that it sucks, rather than (as I read it) in the sense of justifying a refusal to use technology that you already know sucks.)
Unfortunately, due to market-for-lemons dynamics, it's usually not possible to convey knowledge that a particular technology sucks until things have already gone horribly wrong. See eg COBOL or (the Java-style corruption of) Object Oriented Programming.
and in short order you will reap the savings of being able to hire people who already know your devops/infra tech stack, and can hit the ground running. not to mention being able to benefit from the constant improvements that come from outside your org.
We are running a handful of stateful services in K8s (things like MongoDB for which GCP doesn't have a compelling and affordable managed offering). It's definitely more complex than transitioning a stateless service, but so far our experiences with StatefulSets and PersistentVolumes have been good. And this allows us to sunset Puppet/OS management completely. I should note that we _are_ being extremely careful about backups. We also run each stateful service in a dedicated node pool for isolation. Who knows, maybe a year from now we'll be shaking our heads and saying "that was a TERRIBLE idea" but for now, so far so good.
We're running on GKE, so lots of things that would be hard in on-prem environments (ingress, networking, storage) are easy.
Agreed. The on-prem story is still really messy, but I think there's a lot of third-party work to build on-prem distributions that are cut and dry. Unfortunately, there are lots of them right now and it's not clear what the advantages and pitfalls are of each. Things will settle and this problem will be solved with time, but for now it's quite a pain point.
They hire other people with the supposedly same skillstack and then have them rebuild it from scratch.
Can you elaborate? What exactly about Kubernetes improved the developers life?
We didn't have these before. Yes, you can implement them without K8s (I have, at other companies) but to get the full set of features K8s provides, such as deploying N services in parallel, taking no more than X% of your capacity offline at a time, short-circuiting in the event the app is dead on arrival, connection draining with timeouts, you end up with a VERY complicated multi-threaded codebase.
2. Seamless horizontal scale-out.
Want to scale up your app in a test environment from 3 replicas to 6 to do some performance testing? This used to be a DevOps ticket that would take a few days -- DevOps engineer tries running Terraform, but oh no, a CentOS package update seems to have broken our Puppet manifests and we have to fix that. Now the developer makes a PR to the GitOps repo where they adjust a single YAML setting.
ArgoCD is possibly my favorite piece of software of all time. It provides incredible visualizations for what's happening with your Kubernetes infrastructure. It really increases your confidence and trust in the system to be able watch a deployment rollout or scaling operation happen in real time. ArgoCD makes it spectacularly obvious when something has gone wrong -- you still sometimes need to go spelunking through the GCP console or use kubectl to inspect resources, but to a much lesser degree. I cannot emphasize enough how magical it is.
These are all things that our initial implementation has delivered, we are also planning to try to leverage K8s for things like on-demand pull request preview environments, hosted developer environments, and canary deploys that are MUCH harder to implement in a world without K8s (trust me, I've done it).
You can roll your own anything with enough time and manpower. Whether it makes sense to do so depends on your circumstances.
For the last several years, due to static load we've been fine with 2 instances per service. Now that we want autoscale (even though all the real load is really just the DB), it seems we could get ourselves onto AWS autoscale in ~1 month, though it would require some coding.
Spending 3 devops on 1 year doing something is a red-flag to me. If your changes save devs 1 hour per deploy, it'll pay off 6,000 deploys from now.
I think it's great you guys have 2.5 dedicated people operating 6 services across 12 prod instances, with that kind of ratio our team would be 15 people!
When you make a change manually, there's a chance that you forget something, have a typo, etc. Those problems disappear when those same things are automated.
I've used Puppet, Chef, and Ansible in production over the last decade. For me, Ansible replaced Puppet and Chef five or six years ago. For the last two years the only thing I have used Ansible for is to manage desktops and Android boxes AT HOME. Configuration Management systems like Ansible are fun and cool, but have been mostly obsoleted by the combo of Terraform and K8s. If you're on AWS, Kops can replace a lot of what Terraform does.
Kubernetes makes running services easier and more reliable. I've spent more time learning Istio that K8s itself. The learning curve of K8s is minimal compared to Drupal.
Although, to be fair, the article is about GKE.
But literally every cloud provider out there has a managed solution so at this point you really only need to do it for DC work or if you like to do it.
Amazon - EKS
Google - GKE
Azure - AKS
anything beyond those 3 is a rounding error but...
Linode - LKE
DigitalOcean - DigitalOcean Kubernetes
IBM Cloud Kubernetes Service
Rackspace - KAAS
Creating a big-ole cluster of app servers was never really the hard ops problem which is what k8s does really well.
The benefit of k8s is the orchestration of those clusters. Spinning up 6 new http servers and getting them added to the load balancer automatically. Generating 4 new memcached nodes and getting them registered in DNS so clients pick them up and add them to the hashring.
The benefit of k8s is the scaling and elastic capabilities. It can trigger vertical scaling by spinning up larger pods or horizontal scaling by adding/removing pods.
Anyone thinking that people are using k8s because it can create app servers doesn't understand why anyone is using k8s. If all we needed to do is create a cluster of app servers, we wouldn't be using k8s.
That being said, a cluster of app servers still needs orchestration and config management and we had a ton of crazy solutions for that prior to k8s.
It reminds me of docker-compose file, except I can publish it too.
Just wait until the developers discover Google App Engine, Heroku, or DigitalOcean App Platform.
I think Heroku and DigitalOcean App Platform are still going to be popular for small setups (as will things like Amplify) but when you outgrow those (or realize you are paying too much for them) then Kubernetes is a reasonable option.
Need to run a bespoke database, yeah it can do that. Need to migrate an old service running in a VM that needs a disk, it can do that too.
I do agree that kubernetes is a pleasant experience from an application developer perspective with an existing cluster, but in my experience it was not without excessive pain and long hours by those administering the cluster. A year doesn't surprise me in your case, which brings to mind this question.
Are you still using Terraform to specify your deployments? From experience, you need something there to manage all your yamls: deployments, configmaps, secrets. Especially if you have multiple environments.
Then you can focus on containers which can be run, tested and built wherever without the fear of broken updates or one thing stepping on another. We found back in the days of ansible and chef that we had very low confidence in upgrading hosts live. So we would then do immutable hosts and blue green deploy them to production. But why think in the scope of hosts and VMs when really you have some application that needs to run somewhere.
K8s IMO isn't the end all, I think eventually we will get to something that doesn't need containers at all and you run just processes. But it is a good step for now. Also once you have your stuff containerized it makes other non k8s stuff easy like AWS Lambda
Edit: Also yes you can use those to set up generic k8s nodes but when we ran bare metal we used kubeadm to make coreOS immutable nodes. I don't think that is used anymore haven't checked but really the best way to set up k8s is to deploy really thin hosts that have nothing but Docker and k8s. VMware and others have solutions for this too where you don't have to mess with building hosts.
So if you had a fleet of ten containers running the same application in a load balanced config, I'm guessing you'd need to upgrade all of them at once (with downtime) rather than upgrading them one by one (because then the database would be inconsistent)? I'm assuming that since the containers are immutable the data is stored elsewhere.
That depends entirely on your application and the upgrade itself. Assuming we are discussing 10 different containers (e.g. 10 micro-services), k8s will normally update them in parallel but not atomically, it would be up to apication or deployment time logic to ensure they are updated as a single 'transaction'. If they are 10 copies of the same container, then k8s itself has tools for rolling upgrades where you can control the rollout.
Also, depending on application logic, the upgrade could be done in such a way that there is no need to synchronize the services, they could work with the DB as is.
Who is preparing Dockerfiles? Developers and system administrators / security people do not generally prioritize same things. We do not use k8s for now (therefore I know very little about it), so this might not be relevant but how do you prevent shipping insecure containers?
But since you are not running multiple things or users in one space in a container, something such as an out of date vulnerable library can't be leveraged to gain root access to an entire host running other sensitive things too.
In Kubernetes and docker in general one container should not be able to compromise another, or k8s. But there are other issues if an attacker can access a running container such as now having network access to other services like databases. But again these are all things that can be locked down and should be even if provisioning hosts running things.
Ansible or Puppet still excel at that kind of work.
For me the scariest part would be that it's not tech I would use (i.e. update the descriptors) regularly, so what if something goes wrong, how quick would I be able to identify the problem. I have no answer to that because I'm off the project.
It's a real risk, at least perceptually. Mitigate it by documenting the number of developers' hours you save with automation and better infrastructure. It's pretty hard to argue against a trend of increased developer productivity. Include your own hours; you're targeting saving 30 hours a week of your own time and that frees you up to improve other things even faster.
If you're determined to be skeptical of the value-add from Kubernetes, my random post on HN ain't gonna convince you :) The developer experience improvement is obvious to people actually interacting with K8s.
I do want to clarify that this isn't the only thing we accomplished in the past year. Just the thing we shipped that was our biggest priority and had (by far) the biggest impact.
Snake eating it's own tail.
But not only do I not understand K8s, I'm also an idiot.
It doesn’t end there... after complaining for a really long time two things happen. First off so much time has passed that you get these domain experts (Devops people) whose entire job is to mess with kubernetes. Second the complainers have been complaining so long that people get tired of it.
You now get people who are so tired of listening to people complain whose entire job entire job revolves around kubernetes that these people now start complaining about the complainers.
Which brings me back to kubernetes. Kubernetes is a bad tool with no alternative precisely because it requires a 3 man dedicated team a year to get things up and running.
A good tool would be something like allows me to to get it up and running in a week just by reading some docs. Even better an hour. Could such a tool exist and replace Kubernetes? Yes. Does such a tool exist? No.
I’m sorry to say but the ideal we are shooting for here is a tool that will ultimately make Devops a general thing that all developers can deal with rather then an entire specialist team. Again no such tool exists yet but it certainly can exist, especially when the inventor of the tool has become a complainer.
If the inventor of the tool becomes a complainer, that validates the complainers. And now the complainers of the complainers have nothing left to say.
Your 5-person startup does not need a DevOps engineer. Your 50-person org should start thinking about it. Your 500-person company probably needs multiple DevOps engineers -- having every dev team independently figure out how to handle things like deployments, reliability, and security is chaotic and wasteful. There are a lot of details that only start to matter at scale, and both Kubernetes and dedicated infrastructure teams are for this use case.
Instead of complaining, why don't you build this tool? That's the problem I have with complainers.
Instead of complaining why don't you build me the tool to stop me from complaining? It's the same reason why I'm not building the tool.
That's the problem I have with complainers complaining about other complainers. Why don't you guys do something about my complaining rather then complain about it?
- I can store anything in a secret? Let's have thousands of cat images. Etcd then stops working because we have over 2GB of funny cats in the key store.
- I can run a root Pod? Lets mount the docker socket and start building images with it. Oh and by the way, I never clean those up and my Node simply fills up. Also I add some additional docker networks that break Pod to Pod networks.
- Istio is nice - why we don't add automatic injection for Pods in all namespaces? Including kube-system? And then they brick kube-proxy and the cluster stops working.
- I can use validating webhooks for better security? Lets watch on all resources. To keep it more secure lets set the failure policy of the webhook to Fail, so we never admit any modification without the apiserver to make a call to out webhook. Whats that? My single replica webhook has was evicted from the Pod (we didn't add any resource requests and limits) and now it cannot even be created or scheduled because kube-controller-manager and kube-scheduler cannot update their lease and they lost leadership and now are idling, effectively bricking the entire cluster.
Google would reduce the pain points with this change, however they would still face countless other issues with Kubernetes.
However playing the devil's advocate here: If you actually took the steps of learning the basic abstractions, then for me it's really hard to see what you could still get rid of.
If you actually go all-in and fit your application to the principles of Kubernetes-native applications (instead of the other way around), then it works nothing short to amazing.
We're running 120 microservices in GKE and the difference to our custom-built setup before is night and day. I let my Infra team go surfing together for two weeks because without changes it flies mostly on autopilot.
Let's not kid ourselves, distributed computing is _hard_ and Kubernetes is a testament to that. I'm not saying it can't be made more accessible by further standardization, but there are fundamental limits to how easy it can be made.
Which by the way is leading to my only pet peeve with it: I feel most of the complexity of K8S comes from the fact that it got hyped as an enterprise product and then lots of features were built that support shoving your non cloud-native workload into Kubernetes even if it was never designed for it.
If you don't do or need all of that, the amount of interface, complexity and footguns shrinks significantly. Maybe it's time to better pull them apart in the documentation.
This argument basically sums up to "Developers just need discipline, and stop blaming the tools". While this is a sound argument on paper, the intrinsic complexity of software systems make it hard to pin the blame on developers. BTW This is the same argument Uncle Bob makes which is not so popular with many mainstream developers.
You're right about feature creep in k8s though.
If your goal is to build highly reliable and available services to end users that are secure and scalable with a team of more than 10 engineers, eventually you will run into more than 50% of the concepts in Kubernetes anyway and end up re-inventing them.
Scaling up and down, node draining, finding out whether services are healthy, RBAC, resource distribution, secrets management, service hardening, introspection capabilities, explicit declaration of dependencies and endpoints and many, many more.
My point is: Sure, if your goal isn't that, it doesn't make sense to start out using Kubernetes.
But if at least eventually that's what you need, imho it's way preferable to just learn and apply well proven abstractions instead of reinventing the wheel along the way and end up with a less maintainable, capable and standardized solution you won't find anyone for maintaining.
If I hear about some of the comments here suggesting to "just spinning up docker-compose with Traefik in front" (disclaimer: I really like Traefik), then that reminds me of how some of the ops mess started that I historically had to care for.
It’s always easier to shift the blame somewhere else.
Thats why some radical but correct concepts are so hard to push.
To add a counter-example, I have lots of Ruby experience and I've just joined a Go team. I won't tell them to use Ruby, I will just do it where it makes sense and saves us time. (And then we'll have two problems... enter "limiting blast radius")
Point of my counter-example is, I'm extremely skeptical that all the world's problems can be solved by adopting a new mono-culture, whatever it is. There are 100% always gonna be some problems that are better solved in a different language. PHP is the best way to run Wordpress, for example (ok, so it's the only way to run Wordpress, but you get the idea... "Wordpress is the best way to..."), but I've been in high-functioning IT organizations that won't touch that with a ten foot pole, because "it's another language to support, and PHP is icky."
We also got rid of a perfectly fine Wiki in favor of centralized Knowledge-base software for similar reason. "Better to just have one KB. We don't need to be hosting another thing." So the chances of moving everything over to BEAM VM are next to nil, unless you are a product-focused company with just one product, or happen to have an absolute champion leading the effort to migrate all the things. For all the other things, you need to have a consistent answer too.
No tool is one-size-fits-all. Where Kubernetes shines most is under any environment that isn't running a single monolith or building a software monoculture and/or can't manage that for whatever reason (because those are all basic use cases that are frankly easy enough to manage without adding on top the additional complexity of Kubernetes; don't need it, don't use it!) IMHO, diversity in infrastructure is a plus though, and Kubernetes is a technology that it turns out enables this.
For example you still need a way to get BEAM onto hosts, still need to manage the OS on the host, still need to setup networking, RBAC etc.
In case anyone's interested, here's a pretty funny and educational talk by Bryan Cantrill about that particular incident:
GOTO 2017 • Debugging Under Fire: Keep your Head when Systems have Lost their Mind
One of the goal of K8S is normalization/standardization of a complex topic to better share knowledge
Why would someone want to store non-secret information as a secret?
— Douglas Adams
If you have a managed object store or even relational database outside of k8s, the thought of storing arbitrary data in secrets probably doesn't come to mind. But if your enterprise spools up a cluster and tells you to use nfs PVCs with no other storage solution, suddenly you might start getting creative.
Top reason given to me by developers: "I don't want to spend time thinking about the distinction."
Just in case you haven't figured out the proper way to do this, you should use docker:dind.
Sure just put your secret data on a file then we'll use your file name as the key of the secret.
Cronjobs sometimes have weird bugs as well.
A lot of its complexity is due to the fact it's an evolving system, that's fine. But I see that some things end up way more complex or unreliable than it needs due to overengineering or use cases no one needs
For our current stack, the answer has been to make the entire business application run as a single process. We also use a single (mono) repository because it is a natural fit with the grain of the software.
As far as I am aware, there is no reason a single process cannot exploit the full resources of any computer. Modern x86 servers are ridiculously fast, as long as you can get at them directly. AspNetCore + SQLite (properly tuned) running on a 64 core Epyc serving web clients using Kestrel will probably be sufficient for 99.9% of business applications today. You can handle millions of simultaneous clients without blinking. Who even has that many total customers right now?
Horizonal scalability is simply a band-aid for poor engineering in most (not all) applications. The poor engineering, in my experience, is typically caused by underestimating how fast a single x86 thread is and exploring the concurrent & distributed computing rabbit hole from there. It is a rabbit hole that should go unexplored, if ever possible.
Here's a quick trick if none of the above sticks: If one of your consultants or developers tells you they can make your application faster by adding a bunch of additional computers, you are almost certainly getting taken for a ride.
It's a somewhat odd solution for a too common problem, but any solution is still better than dealing with such an annoying problem. (source: made docker the de facto cross-teams communication standard in my company. "I 'll just give you a docker container, no need to fight trying to get the correct version of nvidia-smi to work on your machine" type of thing)
It probably depends on the space and types of software you 're working on. If it's frontend applications for example then its overkill. But if somebody wants you to let's say install multiple elasticsearch versions + some global binaries for some reason + a bunch of different gpu drivers on your machine (you get the idea), then docker is a big net positive. Both for getting something to compile without drama and for not polluting your host OS (or VM) with conflicting software packages.
The list goes on and on, it’s bizarre to me to think of this as the true, good way of doing software and think of docker as lazy. Docker certainly has its own problems too, but does a decent job at encapsulating the decades of craziness we’ve come to heavily rely on. And it lets you test these things alongside your own software when updating versions and be sure you run the same thing in production.
If docker isn’t your preferred solution to these problems that’s fine, but I don’t get why it’s so popular on HN to pretend that docker is literally useless and nobody in their right mind would ever use it except to pad their resume with buzzwords.
At a prompt, then install missing libs. Unless you have to maintain updates regularly, “It’s just so hard” seems like a damn meme.
Yeah, it happens with .so files, .dlls ("dll hell"), package managers and more. But that's where things like containers come in to help: "I tested Library Foo version V3.4 and that's what you get in the docker". No issues with Foo V3.5 or V3.6 causing issues... just get exactly what the developer tested on their box.
Be it a .dll, a .so, a #include library, some version of Python (2.7 with import six), some crazy version of a Ruby Gem that just won't work on Debian for some reason (but works on Red Hat)... etc. etc.
There really isn't any middle ground (except to not use third-party libraries at all).
That makes sense for a hobbyist community but not so much for production.
In a former job we needed to fork and maintain patches ourselves, keeping an eye on the CVE databases and mailinglists and applying only security patches as needed rather than upgrading versions. We managed to be proactive and avoid 90% of the patches by turning stuff off or ripping it out of the build entirely. For example with openSSH we ripped out PAM, built it without LDAP support, no kerberos support etc. And kept patching it when vulns came out. You'd be amazed at how many vulns don't affect you if you turn off 90% of the functionality and only use what you need.
We needed to do this as we were selling embedded software that had stability requirements and was supported (by us).
It drove people nuts as they would run a Nessus scan and do a version check, then look in a database and conclude our software was vulnerable. To shut up the scanners we changed the banners but still people would do fingerprinting, at which point we started putting messages like X-custom-build into our banners and explained to pentesters that they need to actually pentest to verify vulns rather than fingerprinting and doing vuln db lookups.
Point being, at some point you need to maintain stuff and have stable APIs if you want long lasting code that runs well and addresses known vulns. You don't do that by constantly changing your dependencies, you do it by removing complexity, assigning long terms owners, and spending money to maintain your dependencies.
So either you pay the library vendor to make LTS versions, or you pay in house staff to do that, or you push the risk onto the customer.
You lost me there already. Why should there be missing libs, and why would you not have to maintain updates regularly in production environment?
So let me see if I got this right, it's basically:
Doesn't sound like a perfectly good solution to me.
Refactoring your application so that it can be cloned and built and ran within 2-3 keypresses is something that should be strongly considered. For us, these are the steps required to stand up an entirely new stack from source:
0. Create new Windows Server VM, and install git + .NET Core SDK.
1. Clone our repository's main branch.
2. Run dotnet build to produce a Self-Contained Deployment
3. Run the application with --console argument or install as a service.
This is literally all that is required. The application will create & migrate its internal SQLite databases automatically. There is no other software or 3rd party services which must be set up as a prerequisite. Development experience is the same, you just attach debugger via VS rather than start console or service.
We also role play putting certain types of operational intelligence into our software. We ask questions like "Can our application understand its environment regarding XYZ and respond automatically?"
I built a service that is installed in 10 lines that could be ran through a makefile, but I assume specific versions of each library of the system and don’t intend to test against the hundreds of possible system dependencies combinations or assume it will surely be compatible anyway.
The dev running the container won’t building their own debian installs with the specific version required in my doc just to run the install script from there, they just instanciate the container and run with it.
At the risk of nitpicking, docker images aren't the equivalent of VM images, as they don't include a kernel.
Docker is not virtualization, it's just an abstraction that makes some Linux process isolation features easier to manage.
It also allows you to bundle whatever dependencies you have in the same bundle, but that is not the same as having a VM.
Threads, processes and containers exist on a continuum.
See also cgroups: while this feature is used by the container run times, it predates Docker, and can be used standalone with normal processes.
If only I had realised that could have been useful for more than testing npm packages...
Both if implemented to spec will be logically equivalent and drop-in replacements for one another.
I used to do UNIX integration work in the late 1990's early 2000's and containers weren't really a thing. So you had to make sure libs from one program didn't crap on another program. And developers had to be conscious of what dependencies they included in their code. Nowadays they don't have to care as much because of containers. Every program can have its own dependencies. Thereby solving the integration problem.
A better solution would be to actually integrate programs and their dependencies into working systems, but no one has time for that. Software bloat is fine. Computers are cheap and fast. And actually understanding what we're doing would be too expensive. So just wrap all your have finished crapware up in a giant black box and dump it on a server.
What I'm not interested in, is this kind of walking uphill in the snow both ways:
> So you had to make sure libs from one program didn't crap on another program. And developers had to be conscious of what dependencies they included in their code.
ie. having to understand what everybody else is doing in order for my software to run properly. No thanks. That's not why I'm here.
I'll put the exact dependencies I want, in the versions which work best for my software, into a Docker image or whatever tool offers a similar level of isolation, and I'll be working on my code while everybody else spends their time fighting over the ABI compatibility of C system libraries.
Separating the dependencies between programs allows you to test and release independently to allow incremental upgrades. IMO that is better.
And don't even get me started on having instances labeled "large" that have less memory and CPU capacity than my personal backup laptop (currently on loan to my 8yo for COVID reasons)...
Maybe. But when the time comes 512 MB doesn't seem like much anymore, what do you do? Do you pick the next larger instance or do you split the load across more 512 MB slices of a computer?
But you can use other cloud providers with better value.
People have since bought into the marketing reasons for 'being in the cloud' and having 'infinite scalability' but that largely misses the point (and the pain) that caused many of these technologies and patterns to be developed in the first place.
The best example of how to scale without buying into this pattern I know of is Stack Overflow. At least circa 2016 - https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...
Maybe if your app handles < 10k concurrent connections. Otherwise it is the most cost efficient solution and exists because it solves the scaling problem in the best way as of today.
Splitting a horizontally scalable workload across a dozen virtual servers that are barely larger than the smallest laptop you can get from Best Buy, you are just creating self-inflicted pain. Chances are the smallest box you can get from Dell can comfortably host your whole application.
The fact remains the odds of you needing to support more than 10K simultaneous connections are vanishingly small.
Even before. If you want low latency. And banks handle more than 10K concurrent every day.
Chances are very high that the problem domain you are working in does not.
Like the author said “1%” I think maybe 5% to 7%
The point being that masses of software is developer everyday on a cargo cult adoption of solution they do not require.
This is certainly true, but there is a possible benefit: standardization. Having a standard skillset allows employees greater flexibility since they can jump employers and still expect to be rapidly useful. Similarly, if your company uses a standard toolkit, there's going to be less training overhead for new hires. Now, the devil is in the details, and I'm inclined to agree that you'd be better off hiring someone that can think outside the box and keep the tooling simpler. But using the standard toolkit will work reasonably well across several orders of magnitude in scale.
Mind you, that's still Docker, though, not Kubernetes.
> If one of your consultants or developers tells you they can make your application faster by adding a bunch of additional computers, you are almost certainly getting taken for a ride.
Eh. There's a redundancy play in there somewhere too, if you know how to pull it off. (Big if.)
Imagine of your only unit of compute is a single bulky machine. You don't fully saturate it, but you need a second machine to avoid downtime anyway. Now you spin up a second or third service and suddenly you need 5 or ten machines and your compute utilization is 20%. You can pack things in tighter. But then you have a knapsack problem, and that's easier to solve efficiently with many small blocks even if it costs you a 1% overhead or whatever.
I've seen this (a long time ago) in the education world market. Very small school with a STEM program. They had specific scientific software they wanted undergrads to use (and some of it was pretty proprietary and used to interface with lab equipment) + a pre-configured IDE.
Instead of going through the compatibility matrix of OS and their versions they just gave all students a VM image that would "just work". Everyone could bring in their own devices and as long as you could run an hypervisor everything would "just work".
That said, I haven't really used docker as a runtime container.
Cassandra is a big PITA, but it hasn't gone down (knock on wood) in the 6 years I've been using it. PARTS of it have...
How many distinct services do you have in your monorepo? A monorepo is fine...
until it isn't.
There's your first miss.
How big should the one server that serves the whole of netflix be..?
You can pretend to be Netflix or Google, and build your tech-stack like they do. Or you can stop wasting your resources setting up a tech stack that you’re never going to get a return of investment on.
Stack Overflow is not a unit of measurement that anyone would be able to take seriously or find useful? How many stack overflows is one asana? Or how many stack overflows is one trello?
Horizontal scaling, docker K8s have their own benefits that are many and obviously to the industry. you don't need to be google to deploy and use them. If you deploy one server for each app and each team vs deploying a common K8s cluster where is the higher investment? You claim more ROI with more physical hardware and more servers?
Because it’s less dangerous and cheaper?
> Horizontal scaling, docker K8s have their own benefits that are many and obviously to the industry.
Which is why SO runs on more than one IIS...
You don’t need a tech-stack, that is apparently even too complex for google considering the article, to scale horizontally.
> If you deploy one server for each app and each team vs deploying a common K8s cluster where is the higher investment?
The investment comes from the complexity. We’ve seen numerous proofs of concepts in my country, and in my sector of work, where different IT departments spent one or two 2-5 full years worth of man hours trying to adopt a perfect devops tech-stach.
Maybe that’s because they were incompetent, you’re free and possible right to claim so, but that’s still professional teams expending real world resources and failing.
From a management perspective, and this is where I’m coming from much more than a technical perspective mind you, the most expensive resource you have is your employees. If software is so complex that I need one or two full time operators to run it, well, let’s just say I could run more than a million azure web apps, and have our regular Microsoft certified operators handle it.
> You claim more ROI with more physical hardware and more servers?
I haven’t owned my own iron since 2010. All our on-prem servers, and we still do have those, are virtual and running on rented iron.
I think we may be speaking past each other though. My point is financial and yours appear to be mostly technical. If you can set up and run your K8s without expending resources, then good for you, a lot of companies and organisations have proven to be unable to do that though, and in those cases, I think they would’ve been better off not doing it, until they needed to.
Ofcourse transitions can fail. People can think yea let's do this small thing and end up chewing off a much bigger problem than they thought they were getting into. But that problem is in the whole of tech. "Let's just use our present people and switch from all proprietary to all open source in 3 months.." yea, best of luck with that... You need a solid team and going all in on K8s is hard, you need technical talent and leadership to drive this.
Agreed, maybe it may not be for everyone. Benefits are both technical and financial, less compute resources used, more reliable deploys, more resilient services. The problems being solved by this are not trivial. There are tangible benefits. Is it a risk? Ofcourse it is. The risk is not in the technology, the risk is in the competence of the team deploying it. If it can't change and adapt, maybe a lot more fundamental things need to change in that organization than just deploying a new orchestration layer.
My only point is, this shift from dedicated servers to VMs and now to containers is a fundamental shift in how things are done. People can hate on it all they like, but it's a better way of doing things and everyone will catch-up eventually.
I shall not express any opinion on this topic other than to say that Trello is not a good example to bring up. The entire customer base of Trello is not using a single shared board and thus they could scale in any direction they wanted to maximize ROI.
Apparently netflix has that many customers. Then again, if you split Netflix into regions and separate all the account logic from the streaming, the recommendations engine and the movie-content, you could perhaps run the account logic for one region in one server.
What is the point of running one server?
Do you also object to them running in the cloud in VMs and not on physical hardware that they own? Sounds like an old man's "kids these days" rant..
The link you shared just says they manage their OS layer, ofcourse they do. Everyone running on AWS VMs is responsible for their own OS layer. Wether they want precise control over their OS doesn't change their preference for who owns and manages the hardware..
I'm not interested in being "not lazy." I only care about user value and ability to provide user value (tech debt/cost/velocity).
For the price of an M5A.16xlarge to get those cores, I can get like 27 m4.larges and have way more fault tolerance.
I think your disdain is misplaced.
But it's not the case in linux, or at least non-enterprise / ldap linux. Installing mysql/redis/elastic/dotnet core/etc on each machine may be different, and has different installer available.
With docker I just need to instruct them to install docker, setup docker compose and everything is handled via containerization.
Which we later replicated in Red-Hat with RPM.
Bare bones OS install + bunch of OS packages => done.
And in what concerns containers I was working with HP-UX Vault in 1999.
You need to start with the same base operating system, and you need to make sure that you pull in the same versions of packages in case there are backwards-incompatible bugs, or version bumps such that the dynamically-loaded library is no longer detected (hence the common albeit dangerous workaround of "add a symlink").
(If you're using rpm directly, then you need to bundle the actual packages that you're installing, or point to specific packages that you're confident won't change. And at that point, what's the difference between your approach and Docker?)
The challenge that I believe Docker solves (or, at least, attempts to) is environment reproducibility: without it, you have dependency hell.
We use both together, since Nix is the only sane way to package Docker images, in my opinion.
Don't you just end up with a less hackjob version of a container when you do that?
If you're looking to have your small app eventually grow into a large one, read up on K8s and just make sure you're not blocking future-you from making your app work on it. E.g., work well in a container (which is useful for automated testing, deps management, etc), have a simple 'ping' endpoint to make sure the app is up, have a better config story than "recompile to change these variables", use a logging library, and tolerate any other services you're using to sometimes be down.
All useful things for a grown-up app to do anyways, all a bit of a PITA, and all better than trying to operate an app that doesn't do them.
If you have one monolithic backend service (and most web applications really should start out this way), Kubernetes offers almost no benefits over alternatives.
I don’t think it suits all teams and use cases, but for us it’s absolutely fantastic and without going down the rabbit-hole of cloud-provider specific tools and recreating half the issues it solves, I’m not super sure what we’d use.
I've already climbed most of the learning curve so YMMV, but as a team of one and dozens of WordPress, MySQL, and bespoke app servers, kuberenetes makes ops manageable so I can spend time on things that really matter.
Deploying new web apps is trivial, declarative manifests are easy to reason about, TLS certs are issued and renewed automatically (cert-manager), backups are cheap and reliable (daily GCP snapshots), making changes to the cluster via declarative terraform is a breeze, etc etc. No way I could manage all the ops without leaning so heavily on the core foundation provided by k8s.
I think that's the first thing with k8s: it all starts with an app that requires several physical nodes.
Most of the value I get from k8s is the hands-off nature of it - I get slack notifications (prometheus+alertmanager) if anything is happening I need to address (e.g. workload down, node down, API not responding, etc). Otherwise I can safely ignore my cluster and know everything's good. Spinning up a new WP site takes 10m with backups, TLS, monitoring, etc built in.
Well _technically_, sure, we could have run a bunch of those products on a single machine, but there goes your durability and the memory overhead on some of them was quite Hugh, and properly fitting them onto a single machine would have required more optimisation and technical skills than the devs I was working with had or were inclined to do.
That issue is not endemic to Kubernetes, but rather to any larger system past a certain age, you learn stuff as you go along and would do stuff differently if you did it again today - but you can't easily, because you cannot break compatibility for everybody using your stuff.
As a concrete example from the Kubernetes world, there is a talk by Tim Hockin  about how today, they would fundamentally design the api-server differently and base pretty much everything on CRDs.
After a week of almost full-time work, I threw in the towel. Admittedly, I also had to learn concepts like reverse proxies alongside, too, so I was by no means well-equipped to begin with.
Yet, tossing together some docker-compose.yml files and "managing" them with a Python script has worked very well. Kubernetes really scarred me in that sense, and I am healed! Also, Caddy has helped me in actually enjoying configuring the webserver.
Does mean that anything that upsets the ingress controller is an outage, but for experimentation, that's probably OK.
I am talking about a homelab, a single server at home, for home use. It's much safer now, with Docker compose, because I understand it and I wrote the core exposed part's configuration, the Caddyfile, myself, manually. I know exactly what's exposed, and it's exactly right the way it is!
The remaining risk comes from the services themselves having security holes, but k8s has that very same risk.
Not every addon/tool for the k8s ecosystem is worth it. I also don't bother with the ever-growing list of service meshes... not enough value to me for the overhead.
K8s is definitely the simpler alternative for me but there is still a lot of essential complexity in k8s due to the nature of the problems it's trying to solve. Mostly I like building on top of a solid foundation of standardized k8s API objects (pods, services, volumes, etc).
Tldr; Bring in only the add-ons and tools you really need so you don't add more complexity than necessary. Don't get swept up in the hype and marketing from other devs and cloud vendors.
Really when looking at tools in the k8s ecosystem, it's better to approach it as you would importing a new library into your application. Most decent devs wouldn't blindly import a new lib so that they can copy/paste a single line of code they found online for a business critical function, and k8s tools should be no different. We must think about what value does a given tool bring, and is it worth the cost of learning/maintenance? Sometimes the answer is a resounding "yes", but too often the question isn't even asked.
Also, I don't much like Go's templating syntax.
You are absolutely spot on because this is how not to pass the behavioral interview for Engineering Manager.
There are very strong financial incentives for every individual developer and sysadmin to adopt Kubernetes, regardless of the impact it has on the organisation as a whole. In a sense this is engineering reaching the level of corporate maturity of the sales department who will optimise everything for their commission regardless of the organisations ability to deliver it at a profit, or even at all.
Then that organization is doing a terrible job of aligning incentives. I'm guessing their pay structure isn't terribly merit-based nor high enough that people aren't constantly thinking about other jobs.
If this is about FAANG (your comment wasn't, but others were), perhaps part of this is exposing larger problems in many smaller orgs. (note: I'm ex-FAANG and happily so)
Nomad is way simpler to get a cluster up and running, has a great configuration syntax (I'll take HCL over YAML anyday) and had first class Terraform/Consul/Vault integrations.
Onboarding devs is fairly straightforward, if they can write a docker-compose.yml, it's an easy transition to a nomad job specification.
It took me by myself ~4 months to get our current hashistack(Vault/Consul/Nomad) stood up using Terraform+ansible. Two members of my team have been working to replace the hashistack with a self hosted K8's deployment and they just went over the 1 year mark and we still do not have something capable of hosting the workload currently running on the Hashistack.
This got a little long winded but I feel like this "it's docker compose or K8's, take your pick" mentality had led to a bunch of needless time being spent by smaller teams/companies on solutions that just aren't right for them.
Also you can get red/green deployments and rolling deployments with little to no effort, which can be very nice, nice to have.
This feature helps a lot with that problem by bringing GCP closer to where AWS has been with Fargate. k8s will still be more work than using AWS ECS but it might also be preferable if you dislike using the provider’s components and want the control of, for example, doing your own load balancing and storage management.
I like Cloud Run for the same reason because I can use it without needing a lot of devops skills in my team or without sacrificing my own time (because I have those skills but have more valuable things to do). It allows me to focus on keeping my CI/CD pipeline (cloud run sets that up with a button click) busy with new functionality. And our hosting cost are close to 0$ because we stay below the freemium layer until we actually need to scale.
Edit. corrected the typo 700->70
There's (almost) no limit to what you can run in one cluster too, and Kubernetes namespaces can help to separate different environments to allow for sharing.
Cloud Run sounds like the perfect solution for your workloads though!
for these case, i still have gke cluster around.
(Still agree that Cloud Run isn't for everyone.)
I previously avoided Docker Swarm for ages since I assumed it involved the same level of complexity as k8s. I also initially figured that managed k8s would be a safer bet than managing my own Swarm cluster, but if you’ve used anybody’s managed k8s (or read https://k8s.af), you’ll realize that every cloud provider has their own closed source fork of k8s with plenty of nasty bugs that you can’t do anything about.
Hashicorp's Nomad is the best choice on the complexity for features scale IMHO, and that's why i'm writing an article how great it is, how easier some things are and what's missing compared to Kubernetes.
Hashicorp's Nomad is good if you have a strong engineering department or need to run mixed workloads (e.g. both containers and native processes) because its abstractions are well suited for this, but HCL, their DSL for describing deployments, doesn't map nicely to neither Docker, nor Docker Compose files, knowledgebases or tutorials. Nomad's integration with Consul is a major boon, but the need to run your own CA for safe communication between nodes, Nomad's read-only Web UI, and the oddity of HCL at times also makes it a non starter for me and some other people.
At the end of the day, these are just two data points, sadly the job market for Kubernetes also dwarfs everything else and sadly many companies will be burned by this and will learn nothing at the end of the day. Ideally, i think that the best route would be evaluating the orchestrators and other technologies that you want to use by doing pilot projects and such, and looking at them in real world circumstances, to determine their fit for your goals and needs (Web UIs will matter for some, but not for others, for example; as will onboarding and the need for long term investment vs plug and play).
Edit: as for Kubernetes, personally i find the K3s distribution to be an almost reasonable alternative to Swarm/Nomad, if the situation calls for it: https://k3s.io/
Edit #2: it would actually be pretty awesome to read more about your experience in this article that you're writing!
Nomad's UI has a good number of functionalities that can be controlled through it. Sure, there are some more lower level operations that are CLI only(though this seems to be something they are actively working to improve on) but most of that probably won't be needed but someone just trying to run a couple containers on a single node.
> Swarmpit and Podman
I actually meant Swarmpit ( https://swarmpit.io/ ) and Portainer ( https://www.portainer.io/ ).
Podman is another container runtime that acts as an alternative to Docker (even if it is not feature complete), so i misspoke.
My only complaint with Swarm is there isn't an easy way to expose containers directly on the network (like host networking). I have a few containers (wireguard and minidlna) which need this, so those are running through docker-compose. I've tried macvlan but wasn't able to get that working in Swarm mode.
Ideally I'd like to give each service its own IP on the network, which was possible with how I had k8s setup.
Asking because I never saw the point in multiple container replicas for simple self-hosted stuff. One container each has served me well so far  (Nextcloud, Bitwarden, GitLab), and if they crash, they just get restarted. Multiple containers increase throughput, supporting more users, is that it? It just sounds nightmarish in regards to storage and parallel, conflicting writes.
: One container per component (web, db, cache, ...).
For long I was scared of it because so many people say it's crazy complex. But actually it took no more than a dozen of hours for me to learn it and get a working setup on aws.
Maybe being full stack and having a strong knowledge of Linux and Docker helped.
Now I'm not pretending to be an expert with it, and there are certainly traps and mistakes that I didn't experience yet. But I don't understand what people find to be hard about it.
Also why Docker in the first place? I’m genuinely wondering - in the stacks I run (Express / Python) it doesn’t seem necessary at low scale. Elastic Beanstalk, Heroku, Digital Ocean etc all offer facilities for single-command deploys that work out of the box.
Docker make it easy to run the same version of code in different places and let’s things run next to each other without version conflicts.
Also, I think you’re in a very small minority not to care about $720/yr increases in your hobbies.
I replied that beyond a single instance, you can probably get away with not hiring a K8s devops person and just spinning another instance. I'm not sure you've read this whole thing right.
And yes, I certainly wouldn't mind paying an additional $720 / yr for a project that had revenue; I almost certainly wouldn't want to spend money hiring a specialist, or spend time hyperoptimizing that myself - I make that in about a dozen hours of work, so counting how far one can go down the rabbit hole of optimizing server costs, and the associated cost of opportunity, the economics are crystal clear.
I don't have any successful personal projects but I have significant experience working with clients, and they are sold on the reasoning pretty much every single time ("I can charge you $3,000 for developing this feature, or we can use a paid service for $720 a year").
I also don't see how Docker is going to save you that much money; if you need a certain amount of compute, you need a certain amount of compute. AWS ElasticBeanstalk for instance charges nothing for spinning up an additional instance compared to EC2; there is no overhead for the PaaS aspect of it, like there would be in Heroku. Digital Ocean app platform is the same as EB, AFAIK.
That makes me think they have multiple services of variable workload packed onto a single host, eg web server, async, and DB all on a single host via Docker.
That’s the antithesis of EB, which can only do horizontal scaling. Docker provides a way to replicate those multiple services in a deployment configuration when you want to set up a host image.
Not using Docker as an easy way to pack it all onto a single box as long as they can is just wasted expense.
- I can easily move from dev to prod by using the private container registry
- I have apt-get on the server and just use the default Ubuntu
- No distributed/network state
I used to use Core OS, but all these container OSes are here today and gone tomorrow, and normally have their own config standard. At least with Ubuntu they have LTS and a bunch of Google pages for fixing stuff.
In places with 10+ developers and 10+ services on a single cluster it works surprisingly well from the user (developer) side.
Definitely a learning curve but honestly not bad if you like ops. Absolutely possible and beneficial for any size team. As always, it depends on what your goals are and how you want to use k8s.
If you were self-hosting k8s, then I'd agree with you.
So, not simple.