I'm a huge fan of keeping things simple (vertically scaling 1 server with Docker Compose and scaling horizontally only when it's necessary) but having learned and used Kubernetes recently for a project I think it's pretty good.
I haven't come across too many other tools that were so well thought out while also guiding you into how to break down the components of "deploying".
The idea of a pod, deployment, service, ingress, job, etc. are super well thought out and are flexible enough to let you deploy many types of things but the abstractions are good enough that you can also abstract away a ton of complexity once you've learned the fundamentals.
For example you can write about 15 lines of straight forward YAML configuration to deploy any type of stateless web app once you set up a decently tricked out Helm chart.. That's complete with running DB migrations in a sane way, updating public DNS records, SSL certs, CI / CD, having live-preview pull requests that get deployed to a sub-domain, zero downtime deployments and more.
I don't disagree but this condition is doing a hell of a lot of work.
To be fair, you don't need to do much to run a service on a toy k8s project. It just gets complicated when you layer on all production grade stuff like load balancers, service meshes, access control, CI pipelines, o11y, etc. etc.
The previous reply is based on a multi-service production grade work load. Setting up a load balancer wasn't bad. Most cloud providers that offer managed Kubernetes make it pretty painless to get their load balancer set up and working with Kubernetes. On EKS with AWS that meant using the AWS Load Balancer Controller and adding a few annotations. That includes HTTP to HTTPS redirects, www to apex domain redirects, etc.. On AWS it took a few hours to get it all working complete with ACM (SSL certificate manager) integration.
The cool thing is when I spin up a local cluster on my dev box, I can use the nginx ingress instead and everything works the same with no code changes. Just a few Helm YAML config values.
Maybe I dodged a bullet by starting with Kubernetes so late. I imagine 2-3 years ago would have been a completely different world. That's also why I haven't bothered to look into using Kubernetes until recently.
> I don't disagree but this condition is doing a hell of a lot of work.
It was kind of a lot of work to get here, but it wasn't anything too crazy. It took ~160 hours to go from never using Kubernetes to getting most of the way there. This also includes writing a lot of ancillary documentation and wiki style posts to get some of the research and ideas out of my head and onto paper so others can reference it.
Likewise "i18n" (internationalization/internationalisation) and "l10n" (localization/localisation) avoids confusion of whether it's "ize" or "ise", which is literally the problem those concepts try to solve.
I can somewhat excuse "k8s" with "nobody can remember how kubernetes is spelled let alone pronounced" (Germans insist pronouncing the "kuber" part the same way "kyber/cyber" is pronounced in other Greek loanwords, with a German "ü" umlaut) but I admit that one is a stretch and "visual puns" like "k0s" ("minimal", you see?) and "k3s" (the digit 3 looks like half of an 8 so it's "lightweight", right?) are a bit beyond the pale for me.
There are at least a dozen languages where the English word "accessibility" translates to the same word spelled slightly differently.
I'm not saying it's difficult to understand. I'm saying it's an unwieldy word and "a11y" is easier to remember and write correctly.
Also, "a11y" looks too much like the English word "ally". That, IMO, is more likely to cause reading difficulties, particularly with non-native speakers and people with dyslexia.
How?! Or is that more a "you provide the safe way, k8s just runs it for you" kind of thing, than a freebie?
For saFeness it's still on us as developers to do the dance of making our migrations and code changes compatible with running both the old and new version of our app.
But for saNeness, Kubernetes has some neat constructs to help ensure your migrations only get run once even if you have 20 copies of your app performing a rolling restart. You can define your migration in a Kubernetes job and then have an initContainer trigger the job while also using kubectl to watch the job's status to see if it's complete. This translates to only 1 pod ever running the migration while other pods hang tight until it finishes.
I'm not a grizzled Kubernetes veteran here but the above pattern seems to work in practice in a pretty robust way. If anyone has any better solutions please reply here with how you're doing this.
Much simpler way is to run migration in init container itself. Most SQL migration frameworks know about locks and transactions, so concurrent migrations wont run anyway
I think the value in the init+job+watcher approach is you don't need to depend on a framework being smart enough to lock things which makes it suitable and safe to run with any tech stack worry free. It also avoids potential edge cases if a framework's locking mechanism fails, and an edge case in this scenario could be really bad.
But it does come at the cost of a little more complexity (a 30 line YAML job and then ClusterRole/ClusterRoleBinding resources for RBAC stuff on the watcher), but fortunately that's only a 1 time thing that you need to set up.
I understand you might outsource the Helm chart creation but this sounds like oversimplifying a lot, to me. But maybe I'm spoiled by running infra/software in a tricky production context and I'm too cynical.
That's values like number of replicas, which Docker image to pull, resource limits and a couple of timeout related values (probes, database migration, etc.). Before you know it, you're at 15ish lines of really straight forward configuration like `replicaCount: 3`.
It's just not finished yet. with < 0.01% of the funding kube has, it has many times more design and elegance. Help us out. Have a look and tell me what you think. =D
It also has all the hallmarks of a high-churn product where you need to piece together your solution from a variety of lower-quality information sources (tutorials, QA sites) rather than a single source of foolproof documentation.
Consider the source of the project for your answer (mainly, but not entirely, bored engineers who are too arrogant to think anybody has solved their problem before).
> It also has all the hallmarks of a high-churn product where you need to piece together your solution from a variety of lower-quality information sources (tutorials, QA sites) rather than a single source of foolproof documentation.
This describes 99% of open source libraries used.The documentation looks good because auto doc tools produce a prolific amount of boilerplate documentation. In reality the result is documentation that's very shallow, and often just a re-statement of the APIs. The actual usage documentation of these projects is generally terrible, with few exceptions.
This seems both wrong and contrary to the article (which mentions that k8s is a descendant of Borg, and in fact if memory serves many of the k8s authors were borg maintainers). So they clearly were aware that people had solved their problem before, because they maintained the tool that had solved the problem for close to a decade.
- containers focus on what you can do, easy to understand and you can start in 5 minutes
- kubernetes is the opposite, where verbose tutorials lose time explaining me how it works, rather than what i do with it.
https://news.ycombinator.com/item?id=27910481 - weird comparison to systemd
https://news.ycombinator.com/item?id=27910553 - another systemd comparison
https://news.ycombinator.com/item?id=27913239 - comparing it to git
For engineering, the common way is to use a couple of descriptive words + basic noun so things do get boring quite quickly but very easy to understand, say something like Google 'Cloud Container Orchestrator' instead of Kubernetes.
The concepts and constructs do not usually change in breaking ways once they reach beta status. If you learned Kubernetes in 2016 as an end user, there are certainly more features but the core isn’t that different.
There's a simpler and more powerful security model; capabilities. Capabilities fix 90% of the problems with *nix.
There's currently no simple resource model. Everything is an ad-hoc human-driven heuristic for allocating resources to processes and threads, and is a really difficult problem to solve formally because it has to go beyond algorithmic complexity and care about the constant factors as well.
The other *nix problem is "files". Files were a compromise between usability and precision but very few things are merely files. Devices and sockets sure aren't. There's a reason the 'file' utility exists; nothing is really just a file. Text files are actually text files + a context-free grammar (hopefully) and parser somewhere, or they're human-readable text (but probably with markup, so again a parser somewhere).
Plenty of object models have come and gone; they haven't simplified computers (much less distributed computers), so we'll need some theory more powerful than anything we've had in the past to express relationships between computation, storage, networks, and identities.
I really dislike when people assume containers give them security, it’s the wrong thing to think about.
Containers allowed us to deploy reproducibly, that’s powerful.
Docker replaced .tar.gz and .rpm, not chroots.
Most of the time the chroot functionality of Docker is a hindrance, not a feature. We need chroots because we still haven't figured out packaging properly.
(Maybe Nix will eventually solve this problem properly; some sort of docker-compose equivalent for managing systemd services is lacking at the moment.)
Disagree. Containers are primarily about separation and decoupling. Multiple services on one server often have plenty of ways to interact and see each other and are interdependent in non-trivial ways (e.g. if you want to upgrade the OS, you upgrade it for all services together). Services running each in its own container provides separation by default.
OTOH, containers as a technology has nothing to do with packaging, reproducibility and deployment. Just these changes arrived together (e.g. with Docker) so they are often associated, but you can have e.g. LXC containers that can be managed in the same way as traditional servers (by ssh into a container).
The former were built with security in mind. The latter was most assuredly not.
It would be good to have containers aim to provide the maximum possible isolation.
to be fair, there is lots of published text around suggesting that this _is_ the case. many junior to semi-experienced engineers i've known have at some point thought it's plausible to "ssh into" a container. they're seen as light-weight VMs, not as what they are - processes.
> Containers allowed us to deploy reproducibly, that’s powerful.
and it was done in the most "to bake an apple pie from scratch, you must first create the universe" approach.
You just need to install sshd and launch it. You also need to create a user and set a password if you want to actually log in.
Why? Because containers aren't a single process. It's a group of processes sharing a namespace.
And you can totally use a container as a light-weight VM. While most containers have bash or a your application as pid 1, there is nothing stopping you launching a proper initrd as pid 1 and it will act much like a proper OS.
Though, just because you can, doesn't mean you should.
What do you think about using filedescriptors as capabilties? Capsicum (for FreeBSD, I think) extends this notion quite a bit. Personally I feel it is not quite "right", but I haven't sat down and thought hard about what is missing.
> we'll need some theory more powerful than anything we've had in the past to express relationships between computation, storage, networks, and identities.
Do you have any particular things in mind which points in this direction? I would like to understand what the status quo is.
To be effective capabilities also need a way to be persistent so that a server daemon doesn't have to call cap_enter but can pick up its granted capabilities at startup. Capsicum looks like a useful way to build more secure daemons within Unix using a lot of capability features.
I also think file descriptors are not the fundamental unit of capability. Capabilities should also cover processes, threads, and the objects managed by various other syscalls.
> Do you have any particular things in mind which points in this direction? I would like to understand what the status quo is.
Unfortunately I don't have great suggestions. The most secure model right now is seL4, and its capability model covers threads, message-passing endpoints, and memory allocation(subdivision) and retyping as kernel memory to create new capabilities and objects. The kernel is formally verified but afaik the application/user level is not fleshed out as a convenient development environment nor as a distributed computing environment.
For distributed computing a capability model would have to satisfy and solve distributed trust issues which probably means capabilities based on cryptographic primitives, which for practical implementations would have to extend full trust between kernels in different machines for speed. But for universality it should be possible to work with capabilities at an abstraction level that allows both deep-trust distributed computers and more traditional single-machine trust domains without having to know or care which type of capabilities to choose when writing the software, only when running it.
I think a foundation for universal capabilities needs support for different trust domains and a way to interoperate between them.
1. Identifying the controller for a particular capability, which trust domain it is in, and how to access it.
2. Converting capabilities between trust domains as the objects to which they refer move.
3. Managing any necessary identity/cryptographic tokens necessary to cross trust domains.
4. Controlling the ability to grant or use capabilities across trust domains.
The processes may not live on the same machine.
The processes may not be in the same trust domain.
The resulting object may be on a third machine or trust domain.
The caller may have inherited privacy enforcement on all owned capabilities that necessitates e.g. translating the binary code of the second process into a fully homomorphically encrypted circuit which can run on a different trust domain while preserving privacy and provisioning the necessary keys for this in the local trust domain so that the capability to the new object can actually read it.
The process may migrate to a remote machine in a different trust domain in the middle of processing, in which case the OS needs to either fail the call (making for an unfortunately complicated distributed computer) or transparently snapshot or rollback the state of the process for migration, transmit it and any (potentially newly encrypted) data, and update the capabilities to reflect the new location and trust domain.
Basically if the capability model isn't capable of solving these issues for what would be very simple local computing then it's never going to satisfy the OP's desire for a more simple distributed computation model.
But you'll note no one is really deploying windows workloads to the cloud. Why? Well, because you'd still have to build a framework for managing all those permissions, and it hasn't been done. Also, you might end up with SVCHOST problem, where you host many different services/apps/whatever in one very threaded process because you can.
Capabilities aren't necessarily simpler. Especially if you can delegate them without controls -- now you have no idea what the actual running permissions are, only the cold start baseline.
No, I think the permissions thing is a red herring. Very much on the contrary, I think workload division into coarse-grained containers are great for permissions because fine-grained access control is hard to manage. Of course, you can't destroy complexity, only move it around, so if you should end up with many coarse-grained access control units then you'll still have a fine-grained access control system in the end.
Files aren't really a problem either. You can add metadata to files on Linux using xattrs (I've built a custom HTTP server that takes some response headers for static resources, like Content-Type, from xattrs). The problem you're alluding to is duck-typing as opposed to static typing. Yes, it's a problem -- people are lazy, so they don't type-tag everything in highly lazy typing systems. So what? Windows also has this problem, just a bit less so than Unix. Python and JS are all the rage, and their type systems are lazy and obnoxious. It's not a problem with Unix. It's a problem with humans. Lack of discipline. Honestly, there are very few people who could use Haskell as a shell!
> Plenty of object models have come and gone;
Yeah, mostly because they suck. The right model is Haskell's (and related languages').
> so we'll need some theory more powerful than anything we've had in the past ...
I think that's Haskell (which is still evolving) and its ecosystem (ditto).
But at the end of the day, you'll still have very complex metadata to manage.
What I don't understand is how all your points tie into Kubernetes being today's Multics.
Kubernetes isn't motivated by Unix permissions sucking. We had fancy ACLs in ZFS in Solaris and still also ended up having Zones (containers). You can totally build an application-layer cryptographic capability system, running each app as its own isolated user/container, and to some degree this is happening with OAuth and such things, but that isn't what everyone is doing, all the time.
Kubernetes is most definitely not motivated by Unix files being un-typed either.
I hope readers end up floating the other, more on-topic top-level comments in this thread back to the top.
See prior discussion here:
You'd have to learn AWS autoscaling group (proprietary to AWS), Elastic Load Balancer (proprietary to AWS) or HAProxy, Blue-green deployment, or phased rollout, Consul, Systemd, pingdom, Cloudwatch, etc. etc.
Are you saying you don't use any of your cloud vendor's supporting services, like CloudWatch, EFS, S3, DynamoDB, Lambda, SQS, SNS?
If you're running on plain EC2 and have any kind of sane build process, moving your compute stuff is the easy part. It's all of the surrounding crap that is a giant pain (the aforementioned services + whatever security policies you have around those).
Everyone's situation is different, of course, but there is a reason that cloud providers have these supporting services and there is a reason people use them.
In my experience it is less work than keeping up with cloud provider's changes . You can stay with a version of Kafka for 10 years if it meets your requirements. When you use a cloud provider's equivalent service you have to keep up with their changes, price increases and obsolescence. You are at their mercy. I am not saying it is always better to set up your own equivalent using OSS, but I am saying that makes sense for a lot of things. For example Kafka works well for me, and I wouldn't use Amazon SQS instead, but I do use Amazon SES for emailing.
> cloud provider's equivalent service you have to keep up with their changes, price increases and obsolescence
AWS S3 and SQS have both gone down significantly in price over the last 10 years and code written 10 years ago still works today with zero changes. I know because I have some code running on a Raspberry Pi today that uses an S3 bucket I created in 2009 and haven't changed since*.
(of course I wasn't using an rPi back then, but I moved the code from one machine to the next over the years)
I build AMIs for most things on EC2. That interface never breaks. There is exactly one service on which provisioning is dependent: S3. All of the code (generally via Docker images), required packages, etc are baked in, and configuration is passed in via user data.
EC2 is what I like to call a "foundational" service. If you're using EC2 and it breaks, you wouldn't have been saved by using EKS or Lambda instead, because those use EC2 somewhere underneath.
Re: services like SQS, we could choose to roll our own but it's not really been an issue for us so far. The only thing we've been "forced" to move on is Lambda, which we use where appropriate. In those cases, the benefits outweigh the drawbacks.
It can be simple but first you have to learn it.
Given that life is finite and you want to accomplish some objective with you company (and it’s not training dev ops professionals), it’s quite interesting having the ability to outsource a big part of the problems needed to be solved to get there.
Given this perspective, much better to use managed services. Let’s you focus on the code (and maintenance) specific to your problem.
In terms of the effort to deploy something new, for my organization it's low. We have a Terraform module creates the infrastructure, glues the pieces together, tags stuff, makes sure everything is configured uniformly. You specify some basic parameters for your deployment and you're off to the races.
We don't need to add yet more complexity with a Kubernetes-specific cost tracking software, AWS does it for us automatically. We don't have to care about how pods are sized and how those pods might or might not fit on nodes. Autoscaling gives us consistently sized EC2 instances that, in my experience, have never run into issues because we have a bad neighbor. Most importantly of all, I don't have upgrade anxiety because there are a ton of services stacked on one Kubernetes cluster which may suffer issues if an upgrade does not go well.
You're saying that the solution to k8s is complicated and hard to debug is to move to another cloud and hope that fixes it?
Not in the slightest. I'm saying that building a platform against k8s let's you migrate between cloud providers because the cloud provider's system might be causing you problems. These problems are probably related to your platform's design and implementation which is causing an impedance mismatch with the cloud provider.
This isn't helpful knowledge when you've only got four months of runway and fixing the platform or migrating from AWS would take six months or a year. It's not like switching a k8s-based system is trivial but it's easier than extracting a bunch of AWS-specific products from your platform.
I mean, yeah, that’s exactly what’s required to happen, and it’s a good thing because only your system engineers need to do most of the legwork. If you have a team of system engineers, you probably have a much bigger cohort of application engineers.
A lot of things go from not viable to viable if you have the luxury of allocating an entire team to it.
Oh it's Wednesday, ALB controller has shat itself again!
I like using the primitives the cloud provides, while also having a path to - if needed - run my software on bare metal. This means: VMs, decoupling the logging and monitoring from the cloud svcs (use a good library that can send to cloudwatch for eg. prefer open source solutions when possible), do proper capacity planning (and have the option to automatically scale up if the flood ever comes), etc.
Learning Heroku and starting using it takes maybe an hour. It's more expensive and you won't have as much control as with Kubernetes, but we used it in production for years for fairly big microservice based project without problems.
I understand that K8 does many things but its also how you look at the problem. K8 does one thing well, manage complex distributed systems such as knowing when to scale up and down if you so choose and when to start up new pods when they fail.
Arguably, this is one problem that is made up of smaller problems that are solved by smaller services just like SystemD works.
Sometimes I wonder if the Perlis-Thompson Principle and the Unix Philosophy have become a way to force a legalistic view of software development or are just out-dated.
The end-result of systemd for the average administrator is that you no longer need to write finicky, tens or hundreds of line init scripts. They're reduced to unit files which are often just 10-15 lines. systemd is designed to replace old stuff.
The result of Kubernetes for the average administrator is a massively complex system with its own unique concepts. It needs to be well understood if you want to be able to administrate it effectively. Updates come fast and loose, and updates are going to impact an entire cluster. Kubernetes, unlike systemd, is designed to be built _on top of_ existing technologies you'd be using anyway (cloud provider autoscaling, load balancing, storage). So rather than being like systemd, which adds some complexity and also takes some away, Kubernetes only adds.
Here are some bits of complexity that managed Kubernetes takes away:
* SSH configuration
* Key management
* Certificate management (via cert-manager)
* DNS management (via external-dns)
* Process management
* Host monitoring
* Infra as code
* Instance profiles
* Reverse proxy
* HTTP -> HTTPS redirection
So maybe your point was "the VMs still exist" which is true, but I generally don't care because the work required of me goes away. Alternatively, you have to have most/all of these things anyway, so if you're not using Kubernetes you're cobbling together solutions for these things which has the following implications:
1. You will not be able to find candidates who know your bespoke solution, whereas you can find people who know Kubernetes.
2. Training people on your bespoke solution will be harder. You will have to write a lot more documentation whereas there is an abundance of high quality documentation and training material available for Kubernetes.
3. When something inevitably breaks with your bespoke solution, you're unlikely to get much help Googling around, whereas it's very likely that you'll find what you need to diagnose / fix / work around your Kubernetes problem.
4. Kubernetes improves at a rapid pace, and you can get those improvements for nearly free. To improve your bespoke solution, you have to take the time to do it all yourself.
5. You're probably not going to have the financial backing to build your bespoke solution to the same quality caliber that the Kubernetes folks are able to devote (yes, Kubernetes has its problems, but unless you're at a FAANG then your homegrown solution is almost certainly going to be poorer quality if only because management won't give you the resources you need to build it properly).
> SSH configuration
Do you mean the configuration for sshd? What special requirements would have that Kubernetes would help fulfill?
> Key management
Assuming you mean SSH authorized keys since you left this unspecified. AWS does this with EC2 instance connect.
> Certificate management (via cert-manager)
AWS has ACM.
> DNS management (via external-dns)
This is not even a problem if you use AWS cloud primatives. You point Route 53 at a load balancer, which automatically discovers instances from a target group.
AWS already does this via autoscaling.
> Process management
systemd and/or docker do this for you.
AWS can send instance logs to CloudWatch. See https://docs.aws.amazon.com/systems-manager/latest/userguide....
> Host monitoring
In what sense? Amazon target groups can monitor the health of a service and automatically replace instances that report unhealthy, time out, or otherwise.
> Infra as code
I mean, you have to have a description somewhere of your pods. It's still "infra as code", just in the form prescribed by Kubernetes.
> Instance profiles
Instance profiles are replaced by secrets, which I'm not sure is better, just different. In either case, if you're following best practices, you need to configure security policies and apply them appropriately.
> Reverse proxy
AWS load balancers and target groups do this for you.
AWS load balancers, CloudFront, do this for you. ACM issues the certificates.
I won't address the remainder of your post because it seems contingent on the incorrect assumption that all of these are "bespoke solutions" that just have to be completely reinvented if you choose not to use Kubernetes.
You fundamentally misunderstood my post. I wasn't arguing that you had to reinvent these components. The "bespoke solution" is the configuration and assembly of these components ("cloud provider primitives" if you like) into a system that suitably replaces Kubernetes for a given organization. Of course you can build your own bespoke alternative--that was the prior state of the world before Kubernetes debuted.
You still need to figure out where your persistent storage is.
You still have to send logs somewhere for aggregation.
You have the added difficulty of figuring out cost tracking in Kubernetes since there is not a clear delineation between cloud resources.
You have to configure an ingress controller.
You want SSL? Gotta set that up, too.
You have to figure out how pods are assigned to nodes in your cluster, if separation of services is at all a concern (either for security or performance reasons).
Kubernetes is no better with the creation of "bespoke solutions" than using what your cloud provider offers.
Compare this tutorial for configuring SSL for Kubernetes services to an equivalent for configuring SSL on an AWS load balancer. Is Kubernetes really adding value here?
Yes, there is choice and variety among Kubernetes extensions, but they all have fundamental operational assumptions that are aligned because they sit inside the Kubernetes control and API model. It is a golden era to have such a rich set of open and elegant building blocks for modern distributed systems platform design and operations.
> You still need to figure out where your persistent storage is.
Managed Kubernetes comes with persistent storage solutions out of the box. I don't know what you mean by "figure out where it is". On EKS it's EFS, on GKE it's FileStore, and of course you can use other off-the-shelf solutions if you prefer, but there are defaults that you don't have to laboriously set up.
> You still have to send logs somewhere for aggregation.
No, these too are automatically sent to CloudWatch or equivalent (maybe you have to explicitly say "use cloudwatch" in some configuration option when setting up the cluster, but still that's a lot different than writing ansible scripts to install and configure fluentd on each host).
> You have the added difficulty of figuring out cost tracking in Kubernetes since there is not a clear delineation between cloud resources.
This isn't true at all. Your cloud provider still rolls up costs by type of resource, and just like with VMs you still have to tag things in order to roll costs up by business unit.
> You have to configure an ingress controller.
Nope, this also comes out of the box with your cloud provider. It hooks into the cloud provider's layer 7 load balancer offering. It's also trivial to install other load balancer controllers.
> You want SSL? Gotta set that up, too. ... Compare this tutorial for configuring SSL for Kubernetes services to an equivalent for configuring SSL on an AWS load balancer. Is Kubernetes really adding value here?
If you use cert-manager and external-dns, then you'll have DNS and SSL configured for every service you ever create on your cluster. By contrast, on AWS you'll need to manually associate DNS records and certificates with each of your load balancers. Configuring LetsEncrypt for your ACM certs is also quite a lot more complicated than for cert-manager.
> Kubernetes is no better with the creation of "bespoke solutions" than using what your cloud provider offers.
I hope by this point it's pretty clear that you're mistaken. Even if SSL/TLS is no easier with Kubernetes than with VMs/other cloud primitives, we've already addressed a long list of things you don't need to contend with if you use managed Kubernetes versus cobbling together your own system based on lower level cloud primitives. And Kubernetes is also standardized, so you can rely on lots of high quality documentation, training material, industry experience, FAQ resources (e.g., stack overflow), etc which you would have to roll yourself for your bespoke solution.
k8s ... I think is often overkill in a way that simply doesn't apply to systemd.
Systemd comparatively feels like a complete waste of time given the heat it has generated for the benefit.
Wouldn't the hundreds of lines of finicky, bespoke Ansible/Chef/Puppet configs required to manage non-k8s infra be the equivalent to this?
Honestly most of the annoyance is Azure stuff. Kubernetes stuff is pretty joyful and, unlike Azure, the documentation sometimes even explains how it works.
Kubernetes cluster changes potentially create issues for all services operating in that cluster.
Provisioning logic that is baked into an image means changes to one service have no chance of affecting other services (app updates that create poor netizen behavior, notwithstanding). Rolling back an AMI is as trivial as setting the AMI back in the launch template and respinning instances.
There is a lot to be said for being able to make changes that you are confident will have a limited scope.
Yes, there is a trade off here. You are trading a staggeringly complex external dependency for a little bit of configuration you write yourself.
The Kubernetes master branch weighs in at ~4.6 million lines of code right now. Ansible sits at ~286k on their devel branch (this includes the core functionality of Ansible but not every single module). You could choose not to even use Ansible and just write a small shell script that builds out an image which does something useful in less than 500 lines of your own code, easily.
Kubernetes does useful stuff and may take some work off your plate. It's also a risk. If it breaks, you get to keep both of the pieces. Kubernetes occupies the highly unenviable space of having to do highly available network clustering. As a piece of software, it is complex because it has to be.
Most people don't need the functionality provided by Kubernetes. There are some niceties. But if I have to choose between "this ~500 line homebrew shell script broke" and "a Kubernetes upgrade went wrong" I know which one I am choosing, and it's not the Kubernetes problem.
Managed Kubernetes, like managed cloud services, mitigate some of those issues. But you can still end up with issues like mismatched node sizes and pod resource requirements, so there is a bunch of unused compute.
TL;DR of course there are trade-offs, no solution is magic.
There’s a lot to unpack in that sentence, which is to say there’s a lot of complexity it removes.
Agree it does add as well.
I’m not convinced k8s is a net increase in complexity after everything is accounted for. Authentication, authorization, availability, monitoring, logging, deployment tooling, auto scaling, abstracting the underlying infrastructure, etc…
Does it really do that if it you just use it to provision an AWS load balancer, which can do health checks and terminate unhealthy instances for you? No.
Sure, you could run some other ingress controller but now you have _yet another_ thing to manage.
Kubernetes has readiness checks and health checks for a reason. The readiness check is a gate for "should receive traffic" and the health check is a gate for "should be restarted".
Myself I need a to setup a bunch of other cloud services for day 2 operations.
And I need to do it consistently across clouds. The kind of clients I serve won’t use my product as a SaaS due to regulatory/security reasons.
That said, there are relatively few organizations that actually require it.
K8S does very simple stateless case well, but anything more complicated and you are on your own. Statefull services is still a major pain especially thus with leader elections. There is not feedback to K8S about application state of the cluster, so it can't know which instancess are less disruptive to shut down or which shard needs more capacity.
Also, in the sense of "many small components that each do one thing well", k8s is even more Unix-like than Unix in that almost everything in k8s is just a controller for a specific resource type.
Orchestration has a political and business problem, too. How does Amazon feel about something that runs most jobs on your own bare metal servers and rents extra resources from AWS only during overload situations? This appears to be the financially optimal strategy for compute-bound work such as game servers. Renting bare iron 24/7 at AWS prices is not cost effective.
Having had a play with a few variants on this theme, I think kernel based abstractions are the mistake here. It's too low level and too constrained by the low-level details of the API, as you've said yourself.
If you look at something like PowerShell, it has a variant of this abstraction that is implemented in user mode. Within the PowerShell process, there are provider plugins (DLLs) that implement various logical filesystems like "environment variables", "certificates", "IIS sites", etc...
These don't all implement the full filesystem APIs! Instead they have various subsets. E.g.: for some providers only implement atomic reads and writes, which is what you want for something like kernel parameters, but not generic data files.
Hashicorp's stack, using Nomad as an orchestrator, is much simpler and more composable.
I've long been a fan of Mesos' architecture, which I also think is more composable than the k8s stack.
I just find it surprising an article that is calling for an evolution of the cluster management architecture fails to investigate the existing alternatives and why they haven't caught on.
Getting _something_ up and running quickly isn't necessarily a good indicator of how well a set of tools will work for you over time, in production work loads.
Things might have improved massively for Nomad since but I honestly have no desire to learn. Having used other Hashicorp tools since, I see them make the same mistakes time and time again.
Now I'm not the biggest fan of K8s either. I completely agree that they're hugely overblown for most purposes despite being sold as a silver bullet for any deployment. But if there's one thing K8s does really well it's describing the different layers in a deployment and then wrapping that up in a unified block. There's less of the "this thing is working but is this other thing" when spinning up a K8s cluster.
On the other side K8s was a steep learning curve with lots of options and 'terms' to learn but never was a point into the whole exploration where I was stuck. The docs are great. the community is great and the number of examples available allows us to mix n match lots of different approaches.
This leads to unnecessarily heavy systems - you do not need a container to host a server socket.
Industry puts algorithms and Big O on a pedestal. Most software projects start as someone building algorithms, with deployment and interactions only getting late attention. This is a bit like building the kitchen and bathroom before laying the foundations.
Algorithm centric design creates mathematically elegant algorithms that move gigabytes of io across the network for every minor transaction. Teams wrap commodity resource schedulers around carefully tuned worker nodes, and discover their performance is awful because the scheduler can’t deal in the domain language of the big picture problem.
I think it is interesting that the culture of Big O interviews and k8s both came out of Google.
Kubernetes becomes a problem when you have people who are not operations people with many years of experience with this stuff trying to do this while learning how to do it at the same time. The related problem is that having people spend time on this is orders of magnitudes more expensive than it is to run an actual cluster, which is also not cheap.
A week of devops time easily equates months/years of cloud hosting time for a modestly sized setup using e.g. Google Cloud Run. And lets face it, it's never just a week. Many teams have full time dev ops people costing 100-200$K/year, each. Great if you are running a business generating millions of revenue. Not so great if you are running a project that has yet to generate a single dollar of revenue and is a long time away from actually getting there. That describes most startups out there.
I actually managed to stay below the Cloud Run freemium layer for a while making it close to free. Took me 2 minutes to setup CI/CD. Comes with logging, auto scaling, alerting, etc. Best of all, it freed me up to do more interesting things. Technically I'm using Kubernetes. Except of course I'm not. I spent zero time fiddling with kubernetes specific config. All I did was tell Google Cloud run to go create me a CI/CD pipeline from this git repository and scale it. 3 minute job to click together. Service was up and running right after the build succeeded. Great stuff. That's how devops should be: spend a minimum of time on it in exchange for acceptable results.
This is the fundamental disagreement. DevOps was a reaction to developers that build software that was nearly impossible to operate because they treated Ops as servants that paid to do the dirty work, rather than peers with a set of valuable skills that cover a scope beyond what many Dev teams have. And it was a reaction to Ops being ground down into becoming the "department of no", when really they should be at the table with the development team as a way towards a collaborative reality check. A model where one team gets to completely ignore the complexities of operational reality is a broken, inhumane, and unsustainable model.
That said, it's also unsustainable to expose all complexity to dev teams that don't have the skills or incentive to manage this. Progressive disclosure and composable abstractions are the tool to remedy this. Kubernetes was never intended to be exposed directly to app developers, it was a system developer's platform toolkit. Exposing it is misunderstanding + laziness on the part of some operations teams. The intent was always to build higher PaaS-like abstractions such as Knative (which is what Google Cloud Run is based on).
But it is a totally different experience from doing this with Appengine, Heroku, Tsuru, etc... than with a custom in house built kubernetes plus a thousand custom home made tools and 10 different repositories with custom undocumented YAML files and another 3000 "gotchas" of things that don't work yet, we're on it, we need to migrate to the new version,etc.
So I symphatize with the parent comment in the sense that, in this custom built mountain of stuff, I don't want to do deveops... if you give me an easy to use, well tested, well documented, stable production infrastructure as the ones I mentioend, then I'm all in.
I also agree with you on your last paragraphs about not exposing the raw thing to the developers. This is the key.
The problem is when the systems gurus want you to understand to the same level everything they understand, your frontend coworkers want you to be on the latest of every library, your product manager wants you to perfectly understand the product, your manager expect you to be the best at dealing with people, and you still have to smile and be happy about team building... oh, and don't forget the Agile Coach expecting you to also be good at all the team dynamics and card games.
I'm all in in operating the applications my team builds. Having to operate custom in house kubernetes clusterfucks is not my job.
But the market overwhelmingly decided it wanted to play with a lower level foundation (those CF instances mostly are still chugging along running hundreds of thousands of containers, but they’re in their own world… “legacy”?).
Let’s own it and not delude ourselves that the current state of Kubernetes is the end state. It’s like saying the Linux syscall interface is too complex for app developers. Well yes! It’s for system developers. We as an industry are working to improve that.
It's not even great in that situation. Millions in profit, perhaps, but that $200k+ would probably better be spent elsewhere - enhancing functionality, increasing sales, support, etc.
By contrast, k8s is wildly popular. I have no idea how many installations of it exist in the world, but it probably numbers into the millions.
I'll take two pretty different contexts to illustrate why for me k8s makes sense.
1- I'm part of the cloud infrastructure team (99% AWS, a bit of Azure) for a pretty large private bank. We are in charge of security and conformity of the whole platform while trying to let teams be as autonomous as possible. The core services we provide are a self-hosted Gitlab along with ~100 CI runners (Atlantis and Gitlab-CI, that many for segregation), SSO infrastructure and a few other little things. Team of 5, I don't really see a better way to run this kind of workload with the required SLA. The whole thing is fully provisioned and configured via Terraform along with it's dependencies and we have a staging env that is identical (and the ability to pop another at will or to recreate this one). Plenty of benefits like almost 0 downtime upgrades (workloads and cluster), on-the-shelf charts for plenty of apps, observability, resources optimization (~100 runners mostly idle on a few nodes), etc.
2- Single VM projects (my small company infrastructure and home server) for which I'm using k3s. Same benefits in terms of observability, robustness (at least while the host stays up...), IaC, resources usage. Stable minimalists hardened host OS with the ability to run whatever makes sense inside k3s. I had to setup similarly small infrastructures for other projects recently with the constraint of relying on more classic tools so that it's easier for the next ops to take over, I end up rebuilding a fraction of k8s/k3s features with much more efforts (did that with docker and directly on the host OS for several projects).
Maybe that's because I know my hammer well enough for screws to look like nails but from my perspective once the tool is not an obstacle k8s standardized and made available a pretty impressive and useful set of features, at large scale but arguably also for smaller setups.
I love Nomad's flexibility and ease of use, a simple hcl file and I (and all the devs) can debug and understand what is going with the deployment without wasting a whole sprint, debugging and understanding the systems is trivial. However I agree parts of the documentation should be fixed and can confuse people who want to start up and it's also relatively "new" insofar that there is a small but growing community around it.
I love Kubernetes because of the community, if there's a Helm chart for a service, it's going to work in 80% of the cases. If however there are bugs in the helm chart, or something is quite not on the beaten path, then good luck. Most of the time wasted on Kubernetes was the inexperience of the operators and also the esoteric bugs that can happen now and then. Building on top of things that have been done before is a great way to win time and flexibility but it shouldn't be an excuse to not understand them (helm charts as an example).
In both cases, you always need an ops team to take care of the clusters. For Nomad, 2/3 people are enough. For Kubernetes you will need 5+ people depending on the size and locality of the cluster, if you want to do things right, that is. If your dev team is managing them it's already game over and just a question of time until you made yourself more real problems than you initially had.
What bugs me the most however is the cargo culting around the tools serving as a "beating around the bush" technique to not do actual work. They're just that, tools, if you have to deploy a rails or django app with an sqlite database just do it on metal with a two liner "ci/cd" and grow from there. If it gets bigger, sure, go for Kubernetes to manage the deployments and auto scale, but be damn sure that you can debug anything that goes wrong within minutes/hours. If things go wrong and there's no hit on your googled error code you essentially fall from your highest level of abstraction and are at the mercy of consultants that will both waste your time in writing requirements and waste your money by taking too much time than was initially planned and agreed upon (my experience, sample size N=6).
I have been working for a firm that have been onboarding multiple small scale startup or lifestyle businesses to kubernetes. My opinion is that if you have an ruby on rails or python app, you don't really need kubernetes. It is like bringing bazooka to a knife fight. However, I do think kubernetes has some good practice embedded in them, which I will always cherish.
If you are not operating at huge scale, both operations or/and teams, it actually comes at a high cost of productivity and tech debt. I wish there was an easier tech that would bridge going from VMs to bunch of VMs, bunch of containers to kubernetes.
Prove it. Create something simpler, more elegant and more principled that does the same job. (While you're at it, do the same for systemd which is often criticized for the same reasons.) Even a limited proof of concept would be helpful.
Plan9 and Inferno/Limbo were built as successors to *NIX to address process/environment isolation ("containerization") and distributed computing use cases from the ground up, but even these don't even come close to providing a viable solution for everything that Kubernetes must be concerned with.
I can also claim humans will have longer lifespans in the future. I don't need to develop a life extending drug before I can hold that assertion.
Kubernetes is complex. Society used to still work on simpler systems before we added layers of complexity. There are dozens of layers of abstraction above the level of transistors, it is not a stretch to think that there is a more elegant abstraction yet designed without having to "prove" themselves to zozobot234.
To me, Kubernetes is the new UNIX, centered around a small number of core ideas: controller loops, Pods, level-triggered events, and a fully open, well-standardized, and declarative, and extensible RESTful API.
The various clouds and predecessor cloud orchestrators were the infinitely complicated beasts.
OP just linked to a few rants about the complexity of the CNCF ecosystem (not Kubernetes), and extended cranky rant / thought exercise by the MetalLB guy. The latter is the closest to an actual argument against Kubernetes, but there’s a LOT of things to disagree with in that post .
Comments are also easier to write than code. He really does seem obligated to prove kubernetes is our generations multics, and that's a good thing.
Probably a language with good IPC (designed for real distributed systems that handle failover), some unified auth library, and built-in metrics and logging.
A lot of real-life k8s complexity is trying to accommodate many supplemental systems for that stuff. Otherwise it's a job scheduler and haproxy.
But Kubernetes is already this. Sure the core is a lot bigger than something like Nomad, but the some of it is replaceable, and there are plenty of simpler alternatives to those built in.
And anyway, my point still stands. What's the point of having 20 different independent systems that address the aspects K8s is trying to solve versus one big system that addresses all the headaches? To me having 20 different systems that potentially have many fundamental differences is more complex than a single system that has the same design philosophies and good integration across the board.
For local development (a must imo), just rock a docker-compose.yml that emulates your Cloud orchestrated with terraform/cloudformation.
I am sort of k8s hater myself, because I've seen very simple and straight-forward production pipelines, reasonably well understood by admins, turn into over-complicated shit with buggy deploy pipelines literally 10 times slower that no one really understands. All of this to manage maybe 10 nodes per service. All of that said, I cannot deny that these new solutions are something that previous generation of ansible scripts and AWS primitives were not. Now we can move all of it to pretty much any infrastructure without changing much. And as much as I hate it, I don't really have an answer to "what else, if not kubernetes?" that doesn't feel a little bit dishonest. I seriously would like to hear one.
So if you build the right interface abstractions around those components, it gets you a long way.
Like Yolodyne Cybernetrix
It is a fascinating dynamic however that generates these outcomes where a large numbers of people collectively settle on something that the majority of them seem to hate.
Kubernetes is a relatively simple system with few concepts. You have manifests stored in etcd, behind the API server, and various controllers that act on these manifests. Some controllers (Deployment, StatefulSet, etc.) come standard out of the box, some are custom and added later. The basic unit of computation is a Pod, and DNS is provided with Services. Cluster administrators need to worry about the networking and storage layers, not cluster users. Honestly, that's pretty much it! Really not so complicated.
Now, does that help you write a manifest for the Deployment controller? No, and neither does it help you autoscale the Deployment via writing a manifest for the HorizontalPodAutoscaler controller, or setting up a load balancer by writing a manifest for the Ingress controller. But I wouldn't call the UNIX model complex because Linux distributions and package managers add complexity.
k8s is complex not unnecessarily, but because k8s is solving a large host of problems. It isn't JUST solving the problem of "what should be running where". It's solving problems like "how many instances should be where? How do I know what is good and what isn't? How do I route from instance A to instance b? How do I flag when a problem happens? How do I fix problems when they happen? How do I provide access to a shared resource or filesystem?"
It's doing a whole host of things that are often ignored by shade throwers.
I'm open to any solution that's actually simpler, but I'll bet you that by the time you've reached feature parity, you end up with the same complex mess.
The main critique I'd throw at k8s isn't that it's complex, it's that there are too many options to do the same thing.
Unfortunately unless you've got a lot of k8s experience that scale/complexity lower bound isn't super obvious. It's also possible to have your scale/complexity accelerate from "k8s isn't worthwhile" to "oh shit get me some k8s" pretty quickly without obvious signs. That just compounds the TMTOWTDI choice paralysis problems.
So you get people that choose k8s when it doesn't make sense and have a bad time and then throw shade. They didn't know ahead of time it wouldn't make sense and only learned through the experience. There's a lot of projects like k8s that don't advertise their sharp edges or entry fee very well.
Maybe compared to Heroku or similar, but compared to a world where you're managing more than a couple of VMs I think Kubernetes becomes compelling quickly. Specifically, when people think about VMs they seem to forget all of the stuff that goes into getting VMs working which largely comes with cloud-provider managed Kubernetes (especially if you install a couple of handy operators like cert-manager and external-dns): instance profiles, AMIs, auto-scaling groups, key management, cert management, DNS records, init scripts, infra as code, ssh configuration, log exfiltration, monitoring, process management, etc. And then there's training new employees to understand your bespoke system versus hiring employees who know Kubernetes or training them with the ample training material. Similarly, when you have a problem with your bespoke system, how much work will it be to Google it versus a standard Kubernetes error?
Also, Kubernetes is really new and it is getting better at a rapid pace, so when you're making the "Kubernetes vs X" calculation, consider the trend: where will each technology be in a few years. Consider how little work you would have to do to get the benefits from Kubernetes vs building those improvements yourself on your bespoke system.
If you have just a single monolith app (such as a wordpress app) then sure, k8s is overkill. Even if you have 1000 instances of that app.
It's once you start having something like 20+ distinct services that k8s starts paying for itself.
Are you referring to instances of your application, or EC2 instances? If instances of your application, in my experience it doesn't really do much for you unless you are willing to waste compute resources. It takes a lot of dailing in to effectively colocate multiple pods and maximize your resource utilization. If you're referring to EC2 instances, well AWS autoscaling does that for you.
Amazon and other cloud providers have the advantage of years of tuning their virtual machine deployment strategies to provide maximum insulation from disruptive neighbors. If you are running your own Kubernetes installation, you have to figure it out yourself.
> How do I know what is good and what isn't?
Autoscaling w/ a load balancer does this trivially with a health check, and it's also self-healing.
> How do I route from instance A to instance b?
You don't have to know or care about this if you're in a simple VPC. If you are in multiple VPCs or a more complex single VPC setup, you have to figure it out anyway because Kubernetes isn't magic.
> How do I flag when a problem happens?
Probably a dedicated service that does some monitoring, which as far as I know is still standard practice for the industry. Kubernetes doesn't make that go away.
> How do I fix problems when they happen?
This is such a generic question that I'm not sure how you felt it could be included. Kubernetes isn't magic, your stuff doesn't always just magically work because Kubernetes is running underneath it.
> How do I provide access to a shared resource or filesystem?
Amazon EFS is one way. It works fine. Ideally you are not using EFS and prefer something like S3, if that meets your needs.
> It's doing a whole host of things that are often ignored by shade throwers.
I don't think they're ignored, I think that you assume they are because they are because those things aren't talked about. They aren't talked about because they aren't an issue with Kubernetes.
The problem with Kubernetes is that it is a massively complex system that needs to be understood by its administrators. The problem it solves overlaps nearly entirely with existing solutions that it depends on. And it introduces its own set of issues via complexity and the breakneck pace of development.
You don't get to just ignore the underlying cloud provider technology that Kubernetes is interfacing with just because it abstracts those away. You have to be able to diagnose and respond to cloud provider issues _in addition_ to those that might be Kubernetes-centric.
So yes, Kubernetes does solve some problems. Do the problems it solves outweigh the problems it introduces? I am not sure about that. My experience to Kubernetes is limited to troubleshooting issues with Kubernetes ~1.6, which we got rid of because we regularly ran into annoying problems. Things like:
* We scaled up and then back down, and now there are multiple nodes running 1 pod and wasting most of their compute resources.
* Kubernetes would try to add routes to a route table that was full, and attempts to route traffic to new pods would fail.
* The local disk of a node would fill up because of one bad actor and impact multiple services.
At my workplace, we build AMIs that bake-in their Docker image and run the Docker container when the instance launches. There are some additional things we had to take on because of that, but the total complexity is far less than what Kubernetes brings. Additionally, we have the side benefit of being insulated from Docker Hub outages.
This begs to question if there is a wrong or right way of doing things and if a single system can adapt fast enough to the rapidly changing underlying strategies, protocols, and languages to always be at the forefront of what is considered best practices in all levels of development and deployment.
These unified approaches usually manifest themselves as each cloud providers best practice playbooks, but each public cloud is different. Unless something like Kuberenetes can build a unified approach across all cloud providers or self hosting solutions then it will always be overly complex because it will always be changing for each provider to maximize their interests in adding their unique services.
And then there are all of the different kinds of resources and the general UX problem of managing errors ("I created an ingress but I can't talk to my service" is a kind of error that requires experience to understand how to debug because the UX is so bad, similarly all of the different pod state errors). It's not fundamentally complex, however.
The bits that are legitimately complex seem to involve setting up a Kubernetes distribution (configuring an ingress controller, load balancer provider, persistent volume providers, etc) which are mostly taken care of for you by your cloud provider. I also think this complexity will be resolved with open source distributions (think "Linux distributions", but for Kubernetes)--we already have some of these but they're half-baked at this point (e.g., k3s has local storage providers but that's not a serious persistence solution). I can imagine a world where a distribution comes with out-of-the-box support for not only the low level stuff (load balancers, ingress controllers, persistence, etc) but also higher level stuff like auto-rotating certs and DNS. I think this will come in a few years but it will take a while for it to be fleshed out.
Beyond that, a lot of the apparent "complexity" is just ecosystem churn--we have this new way of doing things and it empowers a lot of new patterns and practices and technologies and the industry needs time and experience to sort out what works and what doesn't work.
To the extent I think this could be simplified, I think it will mostly be shoring up conventions, building "distributions" that come with the right things and encourage the right practices. I think in time when we have to worry less about packaging legacy monolith applications, we might be able to move away from containers and toward something more like unikernels (you don't need to ship a whole userland with every application now that we're starting to write applications that don't assume they're deployed onto a particular Linux distribution). But for now Kubernetes is the bridge between old school monoliths (and importantly, the culture, practices, and org model for building and operating these monoliths) and the new devops / microservices / etc world.
I've been trying nomad lately and it's a bit more direct.
I've had a similar experience with Cassandra. Using Cassandra at Netflix was a joy because it always just worked. But there was also a team of engineers who made sure that was the case. Running it elsewhere was always fraught with peril.
I do understand people's complaints, however.
Setting up "the rest" of the system involves making a lot of decisions. Observability requires application support, and you have to set up the infrastructure yourself. People generally aren't willing to do that, and so are upset when their favorite application doesn't work their favorite observability stack. (I remember being upset that my traces didn't propagate from Envoy to Grafana, because Envoy uses the Zipkin propagation protocol and Grafana uses Jaeger. However, Grafana is open source and I just added that feature. Took about 15 minutes and they released it a few days later, so... the option is available to people that demand perfection.)
Auth is another issue that has been punted on. Maybe your cloud provider has something. Maybe you bought something. Maybe the app you want to run supports OIDC. To me, the dream of the container world is that applications don't have to focus on these things -- there is just persistent authentication intrinsic to the environment, and your app can collect signals and make a decision if absolutely necessary. But that's not the way it worked out -- BeyondCorp style authentication proxies lost to OIDC. So if you write an application, your team will be spending the first month wiring that in, and the second month documenting all the quirks with Okta, Auth0, Google, Github, Gitlab, Bitbucket, and whatever other OIDC upstreams exist. Big disaster. (I wrote https://github.com/jrockway/jsso2 and so this isn't a problem for me personally. I can run any service I want in my Kubernetes cluster, and authenticate to it with my FaceID on my phone, or a touch of my Yubikey on my desktop. Applications that want my identity can read the signed header with extra information and verify it against a public key. But, self-hosting auth is not a moneymaking business, so OIDC is here to stay, wasting thousands of hours of software engineering time a day.)
Ingress is the worst of Kubernetes' APIs. My customers run into Ingress problems every day, because we use gRPC and keeping HTTP/2 streams intact from client to backend is not something it handles well. I have completely written it off -- it is underspecified to the point of causing harm, and I'm shocked when I hear about people using it in production. I just use Envoy and have an xDS layer to integrate with Kubernetes, and it does exactly what it should do, and no more. (I would like some DNS IaC though.)
Many things associated with Kubernetes are imperfect, like Gitops. A lot of people have trouble with the stack that pushes software to production, and there should be some sort of standard here. (I use ShipIt, a Go program to edit manifests https://github.com/pachyderm/version-bump, and ArgoCD, and am very happy. But it was real engineering work to set that up, and releasing new versions of in-house code is a big problem that there should be a simple solution to.)
Most of these things are not problems brought about by Kubernetes, of course. If you just have a Linux box, you still have to configure auth and observability. But also, your website goes down when the power supply in the computer dies. So I think Kubernetes is an improvement.
The thing that will kill Kubernetes, though, is Helm. I'm out of time to write this comment but I promise a thorough analysis and rant in the future ;)
Let me rephrase that. ONE of Helm's biggest problems is that it uses text-based templating, instead of some sort of templating system that understands the thing it's actually trying to template.
This makes some things much MUCH harder than they should need to be.
It makes it really hard to have your configuration bridge things like "you have this much RAM" or "this is the CPU you have" to flags or environment variables that your code can understand.
It also makes it hard to compose configuration.
As much as I don't like BCL, it is depressingly good at being a job configuration language for "run things in the cloud".
On top of all that, the value that Helm delivers to people is "you don't have to read the documentation for Deployment to make a Deployment". But then you have to debug that, and you have another layer of complexity bundled on top of your already weak understanding of the core.
Like I get that Kubernetes asks you a lot of questions just to run a container. But they are all good questions, and the answers are important. Just answer the questions and be happy. (Yes, you need to know approximately how much memory your application uses. You needed to know that in the old pet computer era too -- you had to pick some amount to buy at the memory store. Now it's just a field in a YAML file, but the answer is just as critical. A helm chart can set guesses, and if that makes you feel better, maybe that's the value it delivers. But one day, you'll find the guess is wrong, and realize you didn't save any time.)
For gRPC and HTTP/2: you're doing end to end gRPC (IE, the TCP connection goes from a user's browser all the way to your backend, without being terminated or proxied)?
At work we have a service that communicates to clients over gRPC (the CLI app is a gRPC client). We typically deploy that as two ports on the load balancer, one for gRPC and the other for HTTPS. Again, the TCP connection isn't actually preserved while transiting the load balancer, but it's logically a L4 operation -- one client channel is one server channel. If the backend becomes unhealthy, you'll have to open a new channel to the load balancer to get a different backend. (This doesn't really come up for us, because people mostly run a single replica of the service.)
There is a lot of innovation possible in this space.
Too much of a cliffhanger! Now I want to know your pow :)
So, yes, we need to know.
Granted, I have to assume that borg-sre, etc. etc. are doing a lot of the necessary basic work for us, but as far as the final experience goes?
95% of cases could be better solved by a traditional approach. NixOps maybe.
The only example I can think of where a modern community is actively seeking to simplify things is Clojure. Rich Hickey is very clear on the problem of building more and more complicated stuff and is actively trying to create software by composing genuinely simpler parts.
Nobody is puppeteering some grand master plan, we're on a journey of discovery. When we're honest with ourselves, we realize nobody knows what will stick and what won't.
Discovery is very rarely an accidental process so we can't take for granted that it will be inevitable.
I think it's important to recognize that most people are not interested in discovery at all. Practitioners are often not explorers, and that's okay. They may find incremental improvements through their practice, but paradigm shifting innovation comes from those willing to swim against the stream of popular opinion.
Discovery has to be an intentional pursuit of those brave enough to imagine a future beyond Multics/Kubernetes/etc despite the torrent of opinionated naysayers telling them they are foolish for even trying.
Nobody gets anything difficult right on the first try, and there’s an arrogance in thinking we could.
> Essentially, this means that it [k8s] will have fewer concepts and be more compositional.
Well, that's already the case ! At its base, k8s is literally a while loop that converges resources to wanted states.
You CAN strip it down to your liking. However, as it is usually distributed, it would be useless to distribute it with nothing but the scheduler and the API ...
I do get the author's point. At a certain point it becomes bloated. But I find that when used correctly, it is adequately complex for the problems it solves.
We are really at the infancy of containerization. Kube is a springboard for doing the next big thing.
Kubernetes reminds me a lot of XML; there are too many decision points adding unnecessary complexity for the average user's needs. Too many foot guns. Too many unintuitive things.
People keep on describing it as "declarative", which seems to be about as true as saying that Java is a functional language. Hopefully someday we'll have something actually declarative, and much more intuitive, something more like AWS's CDK.
I don’t disagree about the exposed complexity, that’s a fundamental decision Kubernetes made about openness and extensibility. Everything is on a level playing field, there are no private APIs.
In my experience Terraform and CDK are much more declarative; where you never issue commands to delete a pod or a load balancer or similar. Instead you describe what you want, and their engine figures out what it needs to add or remove or change to get to that state.
For example if you edit a deployment, it will create a new ReplicaSet and new pods and do a gradual rollout from the old one.
There’s corner cases where a controller won’t let you edit certain fields of a resource because they didn’t cover that case, but that’s relatively rare.
Deleting a pod , which IME isn’t too common day to day but can be useful to recover from some failure conditions (usually low level problems with node, Storage, or network), is also a demonstration of declarative reactions at work: if it was created by a controller it will be immediately recreated. Pods are meant to be ephemeral.
Terraform certainly is declarative but it isn’t typically used as an engine that enables high availability and autoscale by scanning its declarative state and comparing to the real world. This is what Kubernetes excels at - continually scanning and reacting to changes in the world. Terraform I have found to be tricky to run continuously, any out of band state change can lead to it blowing away your resources.
Example case: DevOps pushed out a new version of Istio (without talking with anyone) and even though the container configs are referencing the new version of Istio, only half of the pods in the namespace got restarted, so we get paged because a number of services can't make any network connections with the other services. Had to manually delete all the pods, and then the new pods all came up with the right version of Istio and are able to communicate again.
On a side note: how is it at all acceptable to have a networking "mesh" that isn't backwards compatible? I can count on no hands the number of times that my fargate/lambda services couldn't communicate because half of my fleet is running a different version of VPC. Thus far my experience with Istio is that it has never added any business value (for projects I've been involved in), and only adds complexity, headaches, and downtime.
Back to the declarative thing: I'm fairly confident I've edited service configs, added service configs, edited the container image, and container environment variables, and never saw kubernetes restart anything automatically; had to manually delete.
The issue there is that it literally needs to rewrite the pod YAML to inject the sidecar envoy proxy. So say you want to upgrade Istio. Well Istio needs to change the Pod spec, and it doesn’t do this automatically. If you look at the upgrade instructions here: https://istio.io/latest/docs/setup/upgrade/in-place/#upgrade...
Step 6 is “After istioctl completes the upgrade, you must manually update the Istio data plane by restarting any pods with Istio sidecars:
$ kubectl rollout restart deployment”
Istio can be useful (most security teams want it for Auto-mTLS, it also could save you from firewall hell by using layer 7 authorization policies, and can do failover across DCs pretty well) but is crazy to use on its own as unsupported vanilla OSS without a distro like Solo, Tetrate, Tanzu, Kong, etc., or without significant automation to make upgrades transparent. Istio is often very frustrating to me because of cases like yours: it’s too easy to make a mess of it. There are much easier approaches that covers 80% (an ingress controller like Contour or ngnix + cert manager).
On editing configs, one area Kubernetes does NOT react to is ConfigMaps and Secrets being updated. Editing an Image or Env var in a ReplicaSet or Deployment will definitely trigger a pod recreate (I see this daily).
Though take a look at Kapp (https://carvel.dev/kapp/) which provides clearer rollout visibility and can version ConfigMaps + trigger reactions to them updating, also there is Reloader https://github.com/stakater/Reloader
I really enjoy the Oil Blog, & was really looking forward when I clicked the link to having some good real criticism. But it feels to me like most of the criticism I see: highly emotional, really averse/afraid/reactionary. It wants something easier simpler, which is so common.
I cannot emphasize enough, just do it anyways. There's a lot of arguments from both sides about trying to assess what level of complexity you need, about trying to right size what you roll with. This outlook of fear & doubt & skepticism I think does a huge disservice. A can do, jump in, eager attitude, at many levels of scale, is a huge boon, and it will build skills & familiarity you will almost certainly be able to continue to use & enjoy for a long time. Trying to do less is harder, much harder, than doing the right/good/better job: you will endlessly hunt for solutions, for better ways, and there will be fields of possibilities you must select from, must build & assemble yourself. Be thankful.
Be thankful you have something integrative, be thankful you have common cloud software you can enjoy that is cross-vendor, be thankful there's so many different concerns that are managed under this tend.
The build/deploy pipeline is still a bit rough, and you'll have to pick/build it out. Kubernetes manifests are a bit big in size, true, but it's really not a problem, it really is there for basically good purpose & some refactoring wouldn't really change what it is. There's some things that could be better. But getting started is surprisingly easy, surprisingly not heavy. There's a weird emotional war going on, it's easy to be convinced to be scared, to join in with reactionary behaviors, but I really have seen nothing nearly so well composed, nothing that fits together so many different pieces well, and Kubernetes makes it fantastically easy imo to throw up a couple containers & have them just run, behind a load balancer, talking to a database, which coverages a huge amount of our use cases.
TBH I think the graphical web browser is the current generation's Multics. Something that is overly complex, corporatised, and capable of being replaced by something simpler.
I am not steeped in Kubernetes or its reason for being but it sounds like it is filling a void of shell know-how amongst its audience. Or perhaps it is addressing a common dislike of the shell by some group of developers. I am not a developer and I love the shell.
It is one thing that generally does not change much from year to year. I can safely create things with it (same way people have made build systems with it) that last forever. These things just keep running from one decade to the next no matter what the current "trends" are. Usually smaller and faster, too.
If you use the stable APIs, your code will run for decades. My hypothetical deployment from 2016 will not need touching (beyond image updates for CVEs) to keep running in 2026 or 2036.
However Multics didn't offer automatic/elastic cloud scaling, which seems to be the main selling point of modern, usually very complicated, container orchestration systems, nor was it designed for building distributed systems.
However, if modern Linux had a Multics-style ring architecture, it could replace many of the uses for virtualization and containers.
"Since we chose the path of virtualization and containerization we've allowed the multi-tenancy facilities in Unix to atrophy and it would take a little bit of work to bring them back into form."
Unfortunately many of us are suffering with Kube.
Your own custom built solution will work, but what in 5 years? 10 years? When it all becomes legacy what then?
Will you find the talent who'll want to fix your esoteric environment, just like those COBOL devs?
Will anyone respond to your job posts to fix your snowflake environment. Will you pay above average wages to fix your snowflake ways of solving problems that k8s standardized?
I bet your C-Level is thinking this. What's to say they won't rip out all of your awesomeness and replace it with standard k8s down the line as its dominating the marketshare.
When you're laid off in the next recession, is your amazing problem-solving on your snowflake environment going to help you when everyone else is fully well versed with k8s?
And honestly its complexity is way overblown. There's like 10 important concepts and most of what you do is run "kubectl apply -f somefile.yaml". I mean, services are DNS entries, deployments are a collection of pods, pods are a self contained server. None of these things are hard?
Oh deploying on the cloud? Cloudformation/AzureRM as well.
Pretty easy. No damn complex k8s needed.
Wikipedia: Written in PL/I, Assembly language
Additionally, you can go over to Multicians and read the security assessemt reports of Multics vs UNIX done by DoD, back in the day.
And the alternative of doing everything yourself isn't too much better either, you need to learn all sorts of cloud concepts.
The better alternative is a higher level abstraction that takes care of all of this for you, so an average engineer building an API does not need to worry about all these low level details, kind of like how serverless completely removed the need to deal with instances (I'm building this).
The new concepts are leaky abstractions -- they wrap the old ones badly. You still have to understand both to understand the system. Networking in k8s seems to really suffer from this.
And the new concepts and old concepts don't compose. They create combinatorial problems, i.e. O(M*N) amounts of glue code.
Its even better when its a busybox based image for that linksys router/80s unix troubleshooting experience.
It's an argument about avoiding O(M*N) glue code. O(M*N) amounts of code are expensive to write, and contain O(M*N) numbers of bugs.
But...there are tons of reliable systems at Google, all using Borg, and that has a lot of features Kubernetes doesn't have.
Stripping down Kubernetes doesn't reduce complexity. It just shifts it.
I also disagree that the systems are reliable. From the outside most the stateless services are fast and reliable; the stateful ones less so. From the inside, no: Internal services were unreliable and slow. (This could have changed in the last 5 years, but there was a clear trend in one direction in my time there.) There were many more internal services on Borg than external ones.
Your UNIX system runs many daemons you don't have to care about. Whereas something like lockserver configuration is still a thing you have to care about if you're running Kubernetes.
Key insight can be summarized as "code the perimeter"
Sketch of the argument here, with links: http://www.oilshell.org/blog/2021/07/blog-backlog-1.html#con...
Here's my comment which links the "Unix vs. Google" video (and I very much agree based on my first hand experience with Google's incoherent architecture, which executives started to pay attention to in various shake-ups.)
It links to my comment about the closely related "narrow waist" idea in networks and operating systems. That is a closely related concept regarding scaling your "codebase" and interoperability.
I have been looking up the history of this idea. I found a paper co-authored by Eric Brewer which credits it to Kleinrock:
http://bnrg.eecs.berkeley.edu/~randy/Papers/InternetServices... (was this ever published? I can't find a date or citations)
But I'm not done with all the research. I'm not sure if it's worth it to write all this, but I think it's interesting I will learn something by explaining it clearly and going through all the objections.
I'm definitely interested in the input of others. I have about 10 different resources where people are getting at this same scaling idea, but I can use more arguments / examples / viewpoints.
> The industry is full of engineers who are experts in weirdly named "technologies" (which are really just products and libraries) but have no idea how the actual technologies (e.g. TCP/IP, file systems, memory hierarchy etc.) work. I don't know what to think when I meet engineers who know how to setup an ELB on AWS but don't quite understand what a socket is...
> Look closely at the software landscape. The companies that do well are the ones who rely least on big companies and don’t have to spend all their cycles catching up and reimplementing and fixing bugs that crop up only on Windows XP.
who, today, can write or optimize assembly by hand? How about understand the OS internals? How about write a compiler? How about write a library for their fav language? How about actually troubleshoot a misbehaving *nix process?
All of these were table stakes at some point in time. The key is not to understand all layers perfectly. The key is to know when to stop adding layers.
Regarding your points, I actually would expect a non-junior developer to be able to write a libary in their main language and understand the basics of OS internals (to the point of debugging and profilling, which would include troubleshooting *nix processes). I don't expect them to know assembly or C, or be able to write a compiler (although I did get this for a take-home test just last week).
Being able to glue frameworks together to build systems is actually not a negative. If you're a startup, you want people to leverage what's already available.
I like to get deep into low level stuff, but my employer doesn't care if I understand how a system call works or whether we can save x % of y by spending z time on performance profiling that requires good knowledge of Linux debugging and profiling tools. It's quicker, cheaper and more efficient to buy more hardware or scale up in public cloud and let me use my time to work on another project that will result in shipping a product or a service quicker and have direct impact on the business.
My experience with the (startup) business world is that you need to be first to ship a feature or you lose. If you want to do something then you should use the tools that will allow you to get there as fast as possible. And to achieve that it makes sense to use technologies that other companies utilise because it's easy to find support online and easy to find qualified people that can get the job done quickly.
It's a dog-eat-dog world and startups in particular have the pressure to deliver and deliver fast since they can't burn investor money indefinitely; so they pay a lot more than large and established businesses to attract talent. Those companies that develop bespoke solutions and build upon them have a hard time attracting talent because people are afraid they won't be able to change jobs easily and these companies are not willing to pay as much money.
Whether you know how a boot process works or how to optimise your ELK stack to squeeze out every single atom of resource is irrelevant. What's required is to know the tools to complete a job quickly. That creates a divide in the tech world where on one side you have high-salaried people who know how to use these tools but don't really understand what goes on in the background and people who know the nitty-gritty and get paid half as much working at some XYZ company that's been trading since the 90s and is still the same size.
My point is that understanding how something works underneath is extremely valuable and rewarding but isn't required to be good at something else. Nobody knows how Android works but that doesn't stop you from creating an app that you will generate revenue and earn you a living. Isn't the point of constant development of automation tools to make our jobs easier?
Lets say you have a network performance issue because the framework you were using was misusing epoll, set some funky options with setsockopt, or turned on Nagle's algorithm. A person can figure it out, but its gonna be a slog whereas if they had experience working with the lowest level tools the person could have an intuition about how to debug the issue.
An engineer doesn't have to write everything with the lowest level primitives all the time, but if they have NEVER done it than IMO that's an issue.
The point being that maybe it’s fine if there are a lot of people who only know how to glue frameworks together if they know enough to build useful products. Let all of them try; some of them might very well make it.
1. Working as a programmer perspective: I worked at a company with good practices but so-so revenue. What happens: horribly underpaid salary, nice laptop (but not the one I want), nice working conditions. I am now working at a company with pretty great revenue and mediocre practices. What happens: good salary, I get the laptop I want (not the one I need), working conditions are mediocre.
2. UX perspective (I did a bootcamp for fun): UX'ers make throwaway prototypes all the time in order to validate a certain hypothesis. When that's done, they create the real thing (or make another bigger throwaway prototype).
I feel this is the best approach, from a business standpoint. This also means you have different kind of developers and it depends on the stage what kind they are. I'd separate it as prototype stage, mid-stage and massive scale stage.
This is what most software development is becoming. We are no longer building software, we are gluing/integrating prebuild software components or using services.
You no longer solve fundamental problems unless you have a very special use case or for fun. You mostly have to figure out how to solve higher level problems using off-the-shelf components. It's both good and bad if you ask me (depends at what part of the glass you're looking at).
Thankfully I can use these cool processors to build the next CandyCrush and shine in our modern and innovative society.
Some argue it's unnecessary complexity, but I don't think that's correct. Even individuals want more than a basic geo cities website. Businesses want uptime, security, flashy, etc... in order to stand out.
That's what I expect from someone who graduated from a serious CS/Engineering program.
I'm going to parrot the GP: "That's what I expect from someone who graduated from a serious CS/Engineering program."
I know there are a lot of really bad CS programs in the US, but some experience implementing OS components in a System course so that they can "hack into the OS when needed" is exactly what I would expect out of a graduate from a good CS program.
That's surprising. Recent grads?
The discussion centers on the following expectation of graduates from strong CS programs.
> having working knowledge and being able to hack into the OS when needed.
Now, the course from the listed schools may prepare some students, but I am simply reporting that I have met numerous graduates who state very explicitly.
- they are not comfortable with a variety of operating system concepts
- they are not comfortable interacting with operating systems in any depth
I don't have a big diverse data set, but the impression given is that if you expect this level of expertise you will be disappointed regularly. If the strongest CS programs pre-selecting for smart and driven students can't reliably impart that skillset, why would I expect other schools to?
For context, the original quote was:
* > How about understand the OS internals? How about write a compiler? How about write a library for their fav language? How about actually troubleshoot a misbehaving nix process?
Writing a compiler, writing a library for their fav language, and troubleshoot a misbehaving nix process are all examples of things I would definitely expect a CS major to have done at some point.
A SoTA compiler for Rust or whatever? Ok, no. But, you know, a compiler.
Ditto for library -- better than the standard lib? Ok, no. But, you know, a standard lib that's good enough.
ditto for debugging nix processes. Not world-class hacker, just, you know, capable of debugging a process.
I guess the other examples in that quote seem to suggest that "OS internals" probably means something like "knowledge at the level of a typical good OS course".
And who knows what those people meant by "comfortable interacting with operating systems in any depth". There could also be some reverse D-K effect going on here... "I got a B- in CMU's OS course" still puts you very well into the category of "understand the OS internals", IMO.
Ex-Amazon here. You are describing standard skills required to pass an interview for a SDE 2 in the teams I've been in at Amazon.
Some candidates know all the popular tools and frameworks of the month but do not understand what an OS does, or how a CPU works or networking and do not get hired because they would struggle to write or debug internal software written from scratch.
[added later] This was many years ago when the bar raiser thing was in full swing and in teams working on critical infrastructure.
Also, gate-keeping is not helpful.
This term is really getting over-used. The purpose of job interviews is to decide who gets to pass through the gate. It is literally keeping of a gate.
Software engineers, even the ones that are so superpowered that they :gasp: got a job at Amazon once in their life, can go an entire successful career without knowing how to use a kernel debugger, or understand iptables or ifconfig, or understand how virtual memory works.
Some engineers might need to know some of those things, but it is absolutely bonkers to claim that you could never progress past level 2 at Amazon without knowing such things. I know this because I once taught a senior principal engineer at Amazon how to use traceroute.
For many roles in Amazon (particularly the tens of thousands of SDE positions that will end up working with the JVM all day long), asking such low level questions about how OSes work is about as useful of a gatekeeping device as asking them whether white cheese tastes better than yellow cheese. And that's why the term gatekeeping is used.
Yes, if there is a nasty issue that needs to be debugged, understanding the lower layers is super helpful, but even without that knowledge you can figure out what's going on if you have general problem-solving abilities. I certainly have figured out a ton of issues in the internals of tools that I don't know much about.
Get off your high horse.
Playing devils advocate I guess it depends on what sort of software you're writing. If you're a JS dev then I can see why they might not care about pointers in C. I know for sure as a Haskell/C++ dev I run like the plague from JS errors.
However, I do think that people should have a basic understanding of the entire stack from the OS up. How can you be trusted to choose the right tools for a job if your only aware of a hammer? How can you debug an issue when you only understand how a spanner works?
I think there's a case for engineering accreditation as we become even more dependent on software which isn't a CS degree.
If someone is trying to debug that LB and doesn't know what a socket is, or debug latency in apps in the cluster and not know how scheduling and perf engineering tools work, then it's going to be hard for them, and extremely likely that they will just jam 90% solution around 90% solution, enlarging the frame to do more and more, instead of actually fixing things, even if their specific problem was easy to fix and would have had a big pay off.
Erlang is what you can get when you try to design a coherent solution to the problem from a usability and first-principles sort of idea.
But some combination of Worse is Better, Path Dependence, and randomness (hg vs git) has led us here.
 As far as what I've read about its design philosophy.
Complex problems often have complex solutions, the algorithm we need to run as developers is - what's the net complexity cost of my system if I use this tool?
If the tool isn't removing more complexity than it's adding, you probably shouldn't use it.
I have written C and C++ for decades, deployed it in production, and barely ever looked at assembly language.
Kubernetes isn't a good abstraction for what's going on underneath. The blog post linked to direct evidence of that which is too long to recap here; I worked with Borg for years, etc.
Source, I was a Google SRE for 5 years (Ads, Traffic). I ran the in-house kubernetes clusters at a company for 3 years (so, no, no hosted kubernetes, we stood them up either on pretty naked VMs or bare metal).
I think learning AWS/kubernetes/docker/pytorch/whatever framework is buzzing is easy if you understand Linux/networking/neural networks/whatever the underlying less-prone-to-change system is.
You could also read some books. Rami Rosens "Linux Kernel Networking - Implementation and Theory" is quite detailed.
The "UNIX and Linux System Administration Handbook" (Nemeth et al.) covers a lot superficially and will point you in the right direction to continue studying. It's very practical-minded.
For low-level socket programming, you can probably read "Advanced Programming in the UNIX environment". It might be more detail than you need though.
At the other extreme, if you want to study distributed systems, you could read Steen & Tanembaums "Distributed Systems"
I'm totally self-taught and have never worked a programming job (only programmed for fun). Do professional SWEs not actually understand or have the capability to do these things? I've hacked on hobby operating systems, written assembly, worked on a toy compiler and written libraries... I just kind of assumed that was all par for the course
This manifested in my frustration when I lead building a new transport layer using just sockets. While the people working with me were smart, they had limited low level experience to debug things.
Professional SWE are professional in the sense that they know what needs to happen to get the job done (but I am not surprised when someone else does not get or know something that I consider "fundamental")
Pretty much any command I don't run several times a month, I look up. Unless ctrl+r finds it in my history.
All of these were still table stakes when I graduated from small CS program in 2011. I'm still a bit horrified to discover they apparently weren't table stakes at other places.
Any one of the undergraduates who take the systems sequence at my University should be able to do all of this. At least the ones who earn an A!
But developers should understand what assembly is and what a compiler does. Writing a library for a language you know should be a common development task. How else are you going to reuse a chunk of code needed for multiple projects?
Certainly also need to have a basic understanding of unix processes to be a competent developer, too, I would think.
I understand how a car engine work. I would actually explain it to someone that does not know what is under the hood. Does that make me a car mechanic? Hell no. If my car breaks down I go to the dealership and have them fix it for me.
My car/car engine is ASM/OS Internals/writing a compiler/etc.
Quickly getting up to speed on something you don't know yet is probably the single most critical skill to be a good engineer.
Once again: if you don't know your stack, you're just wasting performance everywhere, and you're just a code plumber.
isn't that why MTU discovery exists?
> Write your software to take advantage of the platform it's on, and the stack beneath it
sure, but usually those bits are usually abstracted away still. otherwise cross-compatability or migrating to a different stack becomes a massive pain.
> The simple fact of using http2 might change your organisation from one fat file served from a CDN, into many that load in parallel and quicker.
others have pointed out things like h2push specifically, that was kind of what i meant with the "(much)" in my original comment. Even then with something like nginx supporting server-push on its end, whatever its fronting could effectively be http/2 unaware and still reap some of the benefits. I imagine it wont be long before there are smarter methods to transparently support this stuff.
God knows how a UDP-based http is going to work but these are considerations a 'Software Engineer' who works on web systems should think about.
In 95% of cases if you want to get something/anything done you will need to work at an abstraction layer where a lot of things have been decided already for you and you are just gluing them together. It's not good or bad. It is what it is.
Which says a lot about the situation we find ourselves in, I guess.
It is a process of commodification.
The problem with software development as a discipline is its all so new we don't have proper division of labor and professional standards yet. It's like if the people responsible for modeling structural integrity in the foundation of a skyscraper and the people who specialize in creating office furniture were all just called "construction engineers" and expected to have some common body of knowledge. Software systems span many layers and domains that don't all have that much in common with each other, but we all pretend we're speaking the same language to each other anyway.
I sometimes hate joke/fantasize about nailing a SE candidate with an obscure BPG or esoteric DNS question and then being outwardly disappointed in his response, watching him realize he’s going to lose this job over something I found completely reasonable to ask, but ultimately entirely useless to his position
Developers who can only configure AWS are software operators using a product, not software engineers. There’s nothing wrong with that but if no one learns to build software, we’ll all be stuck funding Mr Bezos and his space trips for a long time.
This is absolutely fine, and one can draw parallels in software, as a mid level software engineer working in an AWS based environment wont generally need to know how to parse TCP packet headers, despite the software/infrastructure they work on requiring them.
Wait, what? Are you telling me that jet turbine blades are one single crystal instead of having the usual crystal structure in the metal?
"I don't know what to think when I meet engineers who know UNIX but don't quite understand assembly."
What you quoted is tantamount to the lament of a dinosaur that has ample time to observe the meteor approaching and yet refuses to move away from the blast zone.
Less facetiously: the history of progress in most domains, and especially computing, is in part a process of building atop successive layers of abstraction to increase productivity and unlock new value. Anyone who doesn't see this really hasn't been paying attention.
Can we provide an example that isn't also a big company? I'm not really thinking of big companies that don't either dogfood their own tech or rely on someone bigger to handle things they don't want to (Apple spends 30m a month on AWS, as an example). You could also make the argument that kind of no matter what route you take you're "relying on" some big player in some big space. What OS are the servers in your in-house data center running? Who's the core maintainer of whatever dev frameworks you might ascribe to (note: An employee of your company being the core maintainer of a bespoke framework that you developed in house and use is a much worse problem to have than being beholden to AWS ELB, as an example).
This kinda just sounds like knowledge and progress. We build abstractions on top of technologies so that every person doesn't have to know the nitty gritty of the underlying infra, and can instead focus on orchestrating the abstractions. It's literally all turtles. Is it important, when setting up a MySQL instance, to know how to write a lexer and parser in C++? Obviously not. But lexers and parsers are a big part of MySQL's ability to function, right?
I know how to use it certainly, but how the hell it is implemented is more or less black magic to me.
Now that’s not to say I couldn’t learn how a socket works. It’s just never been at all relevant to performing my job.