Hacker News new | past | comments | ask | show | jobs | submit login
Reasons Kubernetes is so complex (buttondown.email/nelhage)
428 points by bumbledraven on Jan 27, 2022 | hide | past | favorite | 277 comments



""" One could imagine a very imperative “cluster operating system,” like the above, which exposed primitives like “allocate 5 CPUs worth of compute” or “create a new virtual network,” which in turn backed onto configuration changes either in the system’s internal abstractions or into calls into the EC2 API (or other underlying cloud provider). """

There is a name for this: Single System Image. There is no need to write the operating system layer - Linux works well.

I built a SSI. Each grid can host several apps, 'gridapps'. As priorities change during the day, an operator can rebalance resources using a bash-like shell. Each gridapp has (1) core nodes, which need to keep running, and (2) scale nodes which can be started and stopped. Nodes can be configured so they are taskset-bound to an individual processor, or sharing a processor between several nodes. There are equivalent primitives for memory and persistent storage. Commercial, 1000 cores.

The author contrasts this with K8s by saying that K8s is declarative. This is not the important distinction. My system had declarative configuration, and did a fitting at startup.

The key difference is motive. The design above is the natural choice when you build a grid computer from first-principles.

K8s is designed to solve a different problem: scaling up the traditional model of programming. Here, a developer writes an algorithm on their workstation. The deployment circumstances and interactions with other systems are an afterthought. Often this developer does not deploy the code themselves, this is done by another team who don't understand the internals of the code.


>The author contrasts this with K8s by saying that K8s is declarative. This is not the important distinction.

I feel like declarative is some kind of technical manual meme for the last few years. I don't know at this point how often I've read that something is better because it's modern and declarative. Why it's easy to understand because it's declarative, etc.

There's a similar thing with "functional". In that case I at least understand the intuition why it's better, which is because if I/O behavior is constant and you don't have to think of an internal state, like in OOP. It is indeed much easier to divide and conquer when you can understand the parts of the whole individually. And as soon as you understand the parts you can piece them together.

With "declarative" I still have to find out what's the intuition WHY it's supposedly better, until then it's kind of like the 90s, where the washing machine was "better" because it had "fuzzy logic".


> I don't know at this point how often I've read that something is better because it's modern and declarative. Why it's easy to understand because it's declarative, etc.

I feel most comments here miss the point of declerative systems - resillience.

You write a script to stand up an Application on a clean system, a declarative system does the same. But what if current state is not a blank slate? A declarative system is meant to be smart enough to achieve target state even if machine crahsed half way through the setup last time, or has a different config already running.

Writing a shell script that addresses all edgecases is hard. Now, declarative systems do fail, but that's the promice.


I agree with the ‘promise’ of the declarative. A useful analogy is human teams. Effective leaders are often good at the declarative - they provide a vision or end state and the creative problem solving team members determine how to implement the stated vision in relation to the current state. The result is determined by the problem solving ability of the element to which the end state was given.


In other words, a declarative system has the capability to do a diff with the current system and apply changes to eliminate those differences.


No - imperative systems can do that too. Take Chef (an imperative tool) for example - it can trivially examine a tracked file and mutate the state if the state is not correct. This satisfies the "diff" criteria you mentioned above. Isn't Chef looking sort of declarative?

The nuance is subtle but important.

Let's say you stop tracking that file. In Chef, the file will simply remain in its last known state, because Chef forgot about it. Likewise, if you do the same process on other systems, files will be left in various states - causing configuration drift. This is what a declarative tool is meant to solve.

A proper declarative tool would _delete_ the file, because it's no longer tracked, to ensure that the system is in the state the user expects.


> With "declarative" I still have to find out what's the intuition WHY it's supposedly better,

Declarative is better because you write what you want to _have_ not what you want to change, which means you need implicit knowledge of the state of the system, and leave the complexities of figuring out the in between to the tool you're using.


The problem with most black-box declarative interfaces is that you often do care about how things will happen. The real-world limitations often leak through the declarative interface, because you have to care about things like performance, or do things that don't quite fit into the opinionated declarative interface, or have other weird constraints that the authors didn't think of.

And then you're in a world of pain, where you have to not only know how to do what you want, but then also how the system will map from declarative->imperative, and then how to massage your declarative input so that it does the right imperative thing. And your teammates too. And you've all got to think about it every time you change the code or upgrade.

I've run into this problem in nearly every declarative system I've ever used (examples: SQL, Prometheus, infrastructure-as-code frameworks, dependency injection frameworks, ORMs, k8s, and so on).

Neither approach is "better". They have tradeoffs.


> The real-world limitations often leak through the declarative interface, because you have to care about things like performance, or do things that don't quite fit into the opinionated declarative interface, or have other weird constraints that the authors didn't think of.

But this is the case with opinionated imperative interfaces as well. It's irrelevant to the declarative vs imperative issue. This is a fundamental aspect of abstractions. There will always be things you can't do unless you're writing assembly on bare metal. And even then, assembly is an abstraction over microcode, and what the actual hardware is doing.


Any API can have issues, but an imperative api has some advantages in my opinion.

The boundaries of an imperative api are atomic and easily well-defined. When something goes wrong, it is possible to look at the methods individually and figure out exactly where expectations and reality diverged.

With a declarative system you just put out everything there as one big chunk of state and hope the right thing happens. When the wrong thing happens, it's usually not clear if the problem is a bug with the engine or your understanding of it, and also not clear how to go about determining where the problem is.

To be clear, I am not totally against declarative systems, but I would be hesitant to use any declarative system if it isn't mature and well-tested, documented with priority, and in a space where the use cases are predictable.


I think it gets worse as an interface gets more declarative and tries harder to hide machinery and state (or tries less to expose it), either on principle or because it allows for extra safety or optimizations or evolution.


Probably the sweet spot is declarative with an imperative escape hatch.

The vast majority of time, you just want your SQL query to give you back the right rows and don't care how the DB engine manages it. In the rare case where it takes too long, then you want to be able to dig into the query plan, change the indexes, maybe even deploy a custom function written in a different language in extreme cases.


Both React and Webpack could be considered hybrid declarative/imperative approaches and have been very successful in the FE ecosystem


> The problem with most black-box declarative interfaces is that you often do care about how things will happen. The real-world limitations often leak through the declarative interface, because you have to care about things like performance, or do things that don't quite fit into the opinionated declarative interface, or have other weird constraints that the authors didn't think of.

Can you give an example of this for k8s?


> Can you give an example of this for k8s?

Deployment.spec.strategy is a common purely "how" configuration switch.

Explicit PodDisruptionBudget is less common but exists only to configure a "how". A smarter (e.g. layer 6+ aware) system should be able to derive this for me.

You might respond "well, that's what an operator is for" and in some sense you're correct, but every operator I've read is a mess of the most imperative code I've ever seen; and then that leaks out as soon as the operator's defaults don't work well for your case.


As someone who has spent the last 7-8 years working on declarative infrastructure tooling: this is only better if the tool can work out how to get from where things are to where you want them to be without being destructive. It’s not like these tools are composing pure functions, sadly.


> this is only better if the tool can work out how to get from where things are to where you want them to be

That's a bit like sating car is only better than a horse if the engine works. The tool has one job,

They do ofcourse fail regularly, I feel that devops tools are 10 years behind our development tooling


Part of the reason they fail so regularly is because so many are wedded to the declarative model, in order that they don't have to understand what they are managing and how it operates.


Except that with k8s, it's somewhat declarative. A number of things are immutable, or they're mutable, but will never be picked up if they change. So, you have to jump through hoops in those spots to make things declarative, like ensure things have unique names (like config maps with mounts that have sub keys), and orchestrate pod restarts. Kustomize/argo will do this orchestration for you, in a way that makes your config look like it's declarative.

Also, you really do need to know how things are going to change when they're changed. By default when you update your deployments in k8s, you're probably going to have a brief amount of downtime. Argo rollouts and such let you define this declaratively, in a way that avoids downtime, but again, more complexity.


A common thread I've seen is that preference for declarative or functional often correlates with how you approach systems.

Many of our traditional operators / SysDE greatly prefer reams of declarative configuration. They like to be able to look at a system configuration and understand it in totality. Even if there are abstractions. These folks tend to run other people's software, be great at reading manuals, and debugging systems they can't see code for. The reasoning for a given setup is usually in a wiki or is common knowledge (to some extent) in their heads and the organization expertise of running that software for years.

Most of our engineers (SDE) hate declarative as it's not DRY, really boring, and doesn't encode the intent as you said. Code based solutions let a developer (who groks code) understand some of the rules behind WHY (to some extent), and developers tend to be lazy and want to move that knowledge out of their head and into computers so they don't have to deal with it.

Both sets of people can be right.


It really doesn't have to be either/or. It's possible to build quite nice abstractions in a declarative style, using real code.

This is actually the next generation of devops tools... CDK, Pulumi, etc.


Been using CDK for years, since pre-alpha. I love it and my company is moving very heavily towards CDK, a thing I helped drive.

Some of our ops folks still prefer pure declarative. Like down to the explicit yaml and knowing exactly what AWS flags are ticked in what way. They're also much less comfortable with code.

It's not an argument it's a preference, it's how their brains think and their comfort zone. They would literally rather see 1.5 MB of zipped yaml then read through CDK libraries. They literally look at the generated CFN over the cdk.


Pulumi looks really interesting.


I think this is the most satisfying answer I got in this thread yet, the WHY makes sense, and I like the WHY, because it is the simplest way to understand the how. I guess I'm in camp 2.


Imperative config means you run the code and it does what the code says.

Declarative means “some system reads and parses the config, can do clever things to change what happens in different circumstances, show you a preview of what happens, track chances of what the config has been in a straightforward way, audit the config for consistency across several systems or for adherence to selected principles, and do other smart things”. These are possible because the config is a data structure now. You can do things with a data structure you can’t do with a shell script. The script is too opaque and our computers are not yet smart enough to analyze it in these ways.

There are great advantages to the simplicity of imperative configuration scripts, which is why shell scripts will not soon die. There is some peril (shell scripting in particular is full of tricksy traps; alternative languages may mitigate this.) There is great power and some peril in imperative config. It’s a choice.


>I feel like declarative is some kind of technical manual meme [...] With "declarative" I still have to find out what's the intuition WHY it's supposedly better,

>, why it's implicitly assumed to be better is what eludes me.

I think the common generalization of why declarative is almost always better than imperative is simply this:

  strlen(declarative_statement) < strlen(imperative_statements)
Basically, a shorter declaration of a goal/objective is easier to comprehend than a longer list of step-by-step instructions. With imperative code, it's harder to see the forest because you're in the details of the trees.

So a more concrete example is:

  strlen("SELECT * FROM customers where state='CA'") < strlen("open binary db file, seek to disk sector xyz, btree search index id, allocate memory buffer for cursor results, etc, etc")
An interesting insight about the declarative -vs- imperative category is that a particular abstraction can simultaneously be declarative AND imperative because its status is relative to other abstractions above and below it. E.g. SQL is typically thought of as declarative but it can be reframed as imperative when it's used to fulfill a higher declarative objective:

  strlen("We want to increase sales of suntan lotion.") < strlen("SELECT \* FROM customers where state='CA' or state='FL'; select \* from products where category_tag = 'suntan_lotion'; [...more SQL instructions...]")
A common definition of declarative is "the what" and imperative is "the how" -- but that's not the full story because an abstraction layer's "the what" can itself become "the how" to a higher level abstraction.


This is a tangent

> [in functional programming] you don't have to think of an internal state, like in OOP.

You still have to but less than with OOP languages. For example I've got an Elixir GenServer that uploads files to Google Storage and keeps retrying of something goes wrong. The internal state of the server is the list of files to upload. Files get added to the list when the server receives a message with a file and removed when the file is uploaded. Tests have to deal with the internal state, as in OOP.

This kind of stateful "functions" are implemented by recursively passing the state to the same function. Tail call optimization helps not overflowing the stack.

Of course we have state only where it matters so the servers are few and sparse. But sometimes we have to pass large data structures between functions. They would be the internal state of an object in OOP. Check Ecto's Multi for transactions.


And Elixir/Erlang is actually more imperative and stateful than some other functional languages, because of the process model. Sending messages imposes an ordering on the receiver's state. One way you can think of a mailbox is as a continuously-updating imperative program that runs on the "CPU" of its process.


Declarative here means you’re describing the end state and letting the system decide how to transition from the current state.

For instance, you can declare that you need two X (e.g. containers, disks, …) and pass that configuration to the framework. If the framework sees you currently have 1 X in production, it allocates another. If there are three, the framework kills one. And if there are 2, it’s a no-op. The beauty is that if you express what you need declaratively, you’re shielded from deciding the state transition.


It's what you want vs how to do it:

Imperative: open the top drawer, then take out a knife, then take out a board, then open the basket, take out a potato, slice the potato with the knife 4 times in parallel, then turn it 90 degrees and do it again.

Declarative: make some diced potatoes however you like, whenever you like. Have it ready for when the frying pan is hot.

Imperative lets you be very precise about what to do and when to do it, what resources you need, etc. Declarative lets the system decide things like execution plans (and thus algorithm and resources) so you can spend your time specifying the target state rather than all the steps in getting there.


I know the difference, why it's implicitly assumed to be better is what eludes me.

In my experience, describing the end result in a way the program understands is sometimes much harder than telling the program "do this".

Don't get me wrong I think the longevity of SQL shows that declarative can be the right way. However I think there's a logical leap with declarative = better.


Declarative is better because it’s simpler. For instance let’s say you have a system and you want to add a new resource to it. If you’re using an imperative paradigm, then you’ll write a script to transition the state from the current state to the desired state. Now let’s say you want to spin up a new cluster in a different region, you need to write another script to go from the empty state to your final state from before. With a declarative paradigm, both modifying and creating resources are the same.

It’s also easier to understand the current state of the system. In an imperative style, you need to understand every operation that has been performed to know what the current state is. In a declarative system, you just need to look at the most recent declaration.

In short, declarative is more scalable and understandable.

Another way to look at your question is to ask what is better about the imperative style? Having used both, I can’t think of any advantages. Maybe you could say you have more control with imperative, but is that control really necessary? In my experience, it is not. The additional control just makes things more complex.

And of course you can make a overly complex, hard to understand system using a declarative paradigm. But the simplest declarative system can be simpler than the simplest imperative system for achieving the same configuration.


> declarative is more scalable and understandable.

If you don't have to debug your system in the large (why is my outcome not what I expect)

If your operators are written correctly and you don't have to debug your system in the small


What you’re saying is equivalent to “buggy declarative code is harder to debug than working imperative code.” Obviously that’s true, but not exactly a convincing argument that imperative is better than declarative.

Your underlying assumption is that it’s easier to write correct imperative configuration than it is to write correct declarative configuration. In my experience, this is not the case. Consider a basic task like provisioning a fleet of hosts and deploying some code to the fleet. In an imperative approach, you need two distinct steps. One for provisioning, and one for deploying. Both operations are multi-step processes that are required to be idempotent because the number of possible failures between the start of provisioning and the end of deployment is quite large. It’s not easy to write a system that does this, as there are a lot of steps that need to be enumerated via imperative scripts for both provisioning and deployment, and it’s even harder to do in a failure tolerant way.

Compare the above to a Kubernetes deployment. The entire configuration is 30 lines of yaml. There is no providing step. Kubernetes will make sure the resources you need are provisioned and it will do so in a failure tolerant way.


Nope. Buggy declarative code is harder to debug than buggy imperative code.


I think the core thing is that if you have a bug in the declarative code, it usually affects everything and can be fixed in the execution engine, or you didn't actually declare what you wanted but something else.

By contrast, bugs in imperative code can manifest simply because you happened to write it in such a way that relied on some assumption that no longer holds and your imperative code does something it should not do because its preconditions are invalid.

Declarative approaches are generally more robust because they by necessity tend to be designed to handle more diverse initial states and still reach the correct result.


You still are implicitly assuming it’s easier to write bug-free imperative configuration compared to declarative. It doesn’t really matter which is harder to debug if it’s much easier to write bug-free code using declarative code. Again, some specific examples that underpin your assumption would be helpful in making your argument more convincing.


Read beefwellingtons SQL EXPLAIN PLAN, comment. I've also watched my roommate have a four hour argument with her coworkers about one line in their kubernetes deployment[0]. I've also spent days trying to fix a declarative Ansible playbook, only to table flip and rewrite it from scratch in 20 minutes (this was a MySQL install-and-configure command where MySQL version had drifted underfoot).

[0] iirc this was not a bug in their kubernetes operator (or maybe it was, I don't recall exactly) but a situation where they didn't understand why their containers were exhibiting the anti affinity properties they were expecting out of their declarations.


edit: were not exhibiting the anti-affinity. They were trying to make sure distributed postgres replica nodes weren't on the same physical hardware, for obvious reasons.


This is an often overlooked thing.

Anyone who's had to do an EXPLAIN PLAN or equivalent to figure out why a SQL query is suddenly taking 40x the time to execute and return results can attest to this I think.


You're right, it isn't always better. But generally in a lot of problems these days we already have a load of generic methods for the implementations, and you just want to specify the end result. There's enough resources that we don't care how the thing is done, the engine will probably do it in an acceptable manner, so you save a lot of code just telling it what you want.

It's really just a preference for convenience over control.


You not liking it is a "you problem". The reason people keep insisting that declarative is better, is because it just is. Assume a program that is much better than you can ever hope to be at optimization at compile time. A real AI, much more intelligent than you and any other human being could ever be. Would you still describe the minutia of how you want to do things? Knowing full well it won't do that. And will chose a better option. Or would you just tell it what you want to be done, and not care about how it achieves this?

If it can achieve whatever you want to be done, is it not more logical, natural, and human to just describe what you want the end result to be? The answer is an unequivocal "yes".

You've already received the answer. Here in this thread, but also, I'm guessing a great many times. You just refuse to accept it. As I've said, "a 'you' problem".


I turned the knife 90° it stopped cutting that imperative recipe doesn't work.


> Why it's easy to understand because it's declarative

Whoever came up with that ... XSLT or Prolog are declarative yet I challenge both of them to be hard to understand


I'm fairly certain people like me are confused by Prolog not because it's declarative but because it's a logic programming language. Which is more exotic than a functional one with even more esoteric syntax.

I have a feeling XSLT would be better known (i.e. be able to justify investment into learning) if not for json/yaml popularity. Though XML is not particularly human readable anyway. The next logical step after https://json-schema.org/ would be JSLT I imagine.

On a related note, Maven is a great example of a declarative build system. Both Gradle and SBT were gigantic steps back.


But on the flip side, Excel spreadsheets are declarative, and they're the most commonly used programming language in the world. Declarative programs can be simpler, when simplicity is the goal.


Very good examples. I would put SQL in the "good" column for example. I'm not saying declarative = bad either, just that I don't see the causation.


> With "declarative" I still have to find out what's the intuition WHY it's supposedly better

Declarative is generally better because you move complexity out of your code to someone else's code. In other words, it reduces the LOC you write and have to maintain.

For example, say you are writing some code to query Postgres and aggregate a column. The imperative approach would be to query Postgres and do the aggregation yourself. The declarative approach would be to query Postgres and ask it to do the aggregation via GROUP BY.

Aside from reducing complexity, it's also usually more efficient to take a declarative approach. Dozens if not hundreds of people have been working on Postgres for almost 20 years now. The chances of your own aggregation algorithm being more efficient than Postgres's is practically zero.


> The imperative approach would be to query Postgres and do the aggregation yourself. The declarative approach would be to query Postgres and ask it to do the aggregation via GROUP BY.

These solutions don't necessarily have anything to do with the concepts as you name them.


>These solutions don't necessarily have anything to do with the concepts as you name them.

Why not? A "GROUP BY" syntax is -- relatively speaking -- more declarative than handwritten imperative code looping through a cursor and dynamically adding to a keys-values data structure to accomplish the same idea.


That's not "more declarative", that's just "more high-level". Using higher-order functions does not suddenly turn imperative programming into declarative programming.


>Using higher-order functions does not suddenly turn imperative programming into declarative programming.

I wasn't talking about strict definitions of "declarative programming" vs "imperative programming" to satisfy language lawyers.

It was a continuation of the subthread discussion started by carlmr asking why declarative is better: https://news.ycombinator.com/item?id=30098793

So it was just talking about aspects of "declarative" in general...

The idea is at a small scale, one can compare an imperative loop construct to the more "declarative" syntax that accomplishes the same task. Yes, the "more declarative" syntax will be a higher-level abstraction.

Another example of the micro-instead-of-macro type of discussion was C# when it added LINQ (declarative quasi-SQL) and async/await (declarative futures). The language designers and tutorials used the word "declarative" to describe those new language features even though C#-as-a-whole is not formally categorized as a declarative language. Does using LINQ turn C# into a declarative language? No, but nobody claimed that.


A higher-order function is a function that takes a function as an argument which doesn't have anything to do with the discussion at hand.

But you are right that it is higher level in that it abstracts away the logic behind GROUP BY. In this sense, all declarative programming is higher level.


"Promise theory" by Mark Burgess tries to explain why a declarative approach may be desirable for those kinds of systems.


> an operator can rebalance

So, you're not talking about the same thing as OP, because one of the points being made here is that Kubernetes intentionally tries to get rid of human operators for day-to-day decision making. If you have a machine making operations decisions instead of humans, then a "bash-like shell" is worse than watch/subscribe mechanisms with full type safety, plus standard conventions for reporting events and communicating status.


The paper presented declarative configuration as a contrast between the SSI approach and the k8s approach. I don't think it is.

It is useful to have both the fitted arrangements and operator interaction.

In addition to giving operators power to make changes in production, the command line is useful for development. Devs can develop in a one-host grid and interact with it in a repl-like way. Contrast this to traditional workflow where the deployment environment is quite different to the development environment. This encourages devs to work with the grain of the platform and discourages the "it works on my machine" dynamic.


If a human operator can make changes with a computer language, then a human programmer can make a computer operator that makes changes autonomously


Theoretically yes (of course) but Bash can make it tricky.


The purpose of the shell is to allow users to interact with the grid. You can't do this in a conventional OS shell. Example: if you are using /bin/sh on linux-host-a and want to affect something running on linux-host-b, you can't. In a SSI shell, you can affect resources from a set of machines from a single shell.

You could offer something like shell scripting on that, but my system did not and there are better options. For example: have the resource manager expose a command that offers a DSL for making system-wide changes.


> a DSL for making system-wide changes

Like... Kubernetes? :)


Operator in the Kubernetes world is a different thing from a human operator of a computer system.

The Kubernetes operator is software that continually compares what-is to what-should-be and makes a change that changes the state of the world to be closer to what-should-be. You specify what-should-be and (if all is well with the world) the system morphs itself into that in a more or less expeditious way.

The original article described this pretty well.


Surely k8s benefits from being a standardized way to do this. Yes you can roll your own version but it'll be different in every company.


> There is a name for this: Single System Image. There is no need to write the operating system layer - Linux works well.

Wouldn't a true SSI need to include some version of software-implemented distributed shared memory, which in turn would need to be achieved by involving the kernel? Linux now has process checkpoint and restore which will actually work seamlessly if all resources are properly namespaced ("containerized"), but not yet DSM, so far as I can tell.

(The only alternative I can think of would involve hacking around the debug infrastructure to "attach" to other processes and somehow handle VM faults in userspace, which sounds positively terrible.)


We don't need Linux, a type 1 hypervisor for Cloud infrastructure is good enough.


Question about drivers. If you had DMA-oriented/kernel-bypass network cards or wanted to resource-manage GPUs, would a type 1 hypervisor be able to support this?


I don't see why not, type 1 hypervisors are the modern version of what started on System/360, naturally depending on specific one the APIs for such stuff will be different.


>K8s is designed to solve a different problem: scaling up the traditional model of programming.

Yep. K8s is a hell spawn of ITIL (a decades old IBM's take on BigCo/government IT infrastructure management). Google has been practicing ITIL, Borg comes out as a result, which takes K8s shape outside of Google. That is one of the main reasons why management likes K8s - ITIL was created by managers for managers. The approaches like SSI for example shift the balance toward developers/engineering and thus instinctively resisted by managers.


> Google has been practicing ITIL, Borg comes out as a result,

Having been a member of the team that built Borg, I can assure you that those two things were unrelated. I can't say for sure whether the initial idea/impetus for Borg came from the management or the developer side, but once the project was started it was very much engineer-driven, and pretty organically so. The Infrastructure team at Google at the time (of which Borg was a part, along with several other related cloud projects) had a single manager for what grew to eventually 100+ engineers (mostly senior/staff level, and highly independent) before Google started creating/hiring managers more aggressively, so we were pretty much self-managing. The design and implementation of Borg was horizontal, both within the team, and coordinating with engineers (developers and operations teams) that would actually need to use the system. I'd say it was at least three years in before management really got involved in design decisions, by which time Borg was running most of Google's production services

It was a bit of a Wild West, but compared to some later major from-scratch design/development projects I've been involved in since, it was a Belle Époque.


ITIL configuration management has nothing to do with K8S cluster management. ITIL configuration management is process driven, whereas K8S is declarative. It's in the names of the processes, Incident Management, Problem Management, Change Management.

K8S has controllers that are given a configuration and work out how to deploy that, taking into account current state. ie, it's functionally taking the state of the system now, S0 and applying functional modifications to return S1. The issue is that the modifications aren't "pure" in a functional sense, because there's a temporal aspect that means there can be difficulties in achieving an equilibrium to get to S1.


(Disclaimer: I have never used k8s, so cannot assess the factual accuracy of this piece.)

If Kubernetes hasn’t yet incorporated this article into their documentation somewhere prominent, they should. This is an excellent, and crucial, piece of documentation: providing a mental model and explaining key design decisions. All complex systems should have a document like this one. As a newcomer, this is what I want to read first.

Bravo, author. I hope you get rewarded.


The most important thing for me about Kubernetes is that it makes things fun again. (YMMV) The Kubernetes Slack is super friendly and helpful! (also YMMV ;) )

The article is factually accurate and the mental model is consistent with other K8s docs, but I'm not sure I've seen it put so concisely from an outside perspective before.

The ¹ footnote should have been in-line, because "front-loading" complexity is a great explanation: distributed systems are hard, and require a whole bunch of decisions. It's intimidating to be faced will all of those decisions up front, but at least they're all visible, and it forces you to be very clear on what you want, or you get nothing. I'd rather be surprised and confused (which is to say, have to think) now rather than in 6 months when I realize I had not accounted for some something.

Finally ... I would be lost without k9s: https://k9scli.io/


> The most important thing for me about Kubernetes is that it makes things fun again.

Because you have to re-learn things you were already doing, new bugs and edge-cases to fix or waiting to be implemented etc etc. I understand the fun part to tinker with a new toy, but it cannot be the only parameter taken into account when making a decision.


Not only k8s, but other CNCF and Kubernetes native software, too. Layer on fluentd/bit, Prometheus, etc and you can stand up a production ready compute platform with observability, monitoring, and alerting in a day.


Have you run into any breaking issues with k9s, or is it stable for production use?


Used it pretty heavily in previous job, especially given that our log shipping was... deficient is being nice about it.

Thus k9s with log view on multiple containers while we were debugging something hairy (turned out to be a problem on clients end, not in our k8s cluster)


K9s is just a convenient interface to kubernetes, I've not done much with it in production but if anything were to happen I would just use the kubectl commands directly. It's like emacs magit vs git itself.

And just like git GUIs, you should also learn the underlying tool first.


The article is only a few hours old, so K8s devs would have to be very quick on the uptake to have it incorporated.

But agree that it provides a great mental mode for K8s


I would add one more thing: K8s is complex because oci containers are complex.

Running a process in isolation in Linux means dealing with many different, resource specific APIs. Unfortunately, any Linux container orchestrator will inherit the resulting complexity.


Actually running the container is a small part of kubernetes.


Agree, this is abundently clear when attempting to get similar value on a freebsd system using jails.


Kubernetes is complex because coordinating potentially hundreds och machines with potentially tens of thousands of containers so that it all flows smoothly, expediently, and allows change is a complex problem.


Not mentioned in this writeup, is the added complexity of not using IPv6 inside.

IPv6 has an on-demand, probabalistic-unique unroutable address assignment model called ULA and it was made for something like this. Instead of NAT and the like, this framework could have been using semantic address models in the bits of the IPv6 address space, and delivered V6 to the edge.

But they went v4 only, and acquired all the growing pains of making that work.


If Kubernetes were IPv6 it would just add complexity for newcomers. Its adoption is actually impressive, but still not really something a junior dev or admin knows. And they do know basics of IPv4.

The whole article is kind of nice, but I don't think Kubernetes itself is that hard. It's large, but after initial shock it's also very logical and well documented.

I think people are afraid of k8s because it forces them to actually implement concepts that they ignored and neglected. Suddenly it occurs that they did not understand how storage works (and k8s has nothing new to offer). Or what their network admins were doing when they set up their "VIP" addresses.

Watched a large webinar on "kubernetes" just yesterday. It was aimed at developers, had a huge positive response. Presenter basically did not say more then 2-3 sentences on Kubernetes itself. It was all regular concepts that were with us all the time. State/stateless, synchronous/async, tracing, monitoring. It even ignored storage completely yet still people were shocked "how much usefull kubernetes they learned".

Kubernetes exposes how much we knew about system architecture, as our poor designs are exposed quickly.


Yes and no.

There's a tendency in common distributions to pick complex base components to support seldomly used features at the expense of turning K8s from understandable to magic.

I can tell you the main modes of operation and how everything fits together with flannel. Ask me to compare Canal to Calico or explain Cilium and I have no idea anymore. These systems also don't explain themselves well either. I have to dig into the code and docs pretty deeply to see it's BGP and eBPF, two technologies I wouldn't expect your average network-literate admin to know.

And don't even get me started on Istio.


Istio has lots of support for retrying and recovering from network errors, which is fortunate because adding it to our infra introduced a lot of transient network errors


I think that's the other thing that's understated too because a lot of these underlying choose-your-own-adventure pieces are also in wildly different levels of maturity when it comes to observing and debug. You can spend a while pursuing one solution only to find one of the pieces simply doesn't have the functionality or it is not achievable or the big thing for me is not able to be secured in any meaningful way or it's a TODO with no sign of completion.

That's still the main thing for me that gives me pause, it ends up being very unreliable when you are trying to secure it or simply put walls up around things. Finding out that your Network plugin doesn't support the thing you need is often not obvious until you have put in a massive amount of resources and work and sometimes it's not reconcilable. The idea of k8s is good, and the model is good, but the observability of features and working through the operational costs are still really raw right now and still has a lot of room for growth.


I think this complexity is part of the business model. You either pay the cloud vendor to solve this issue, or you pay enterprise vendors for storage/network licenses, etc.

J2EE was similar, from what I saw. In theory, you could run the open source stuff. In practice, IBM and Oracle made $$$ selling and supporting a proprietary version that supposedly worked out of the box.


I don't see vanilla kurnetes as complex.

It is the whole ecosystem around it that is complex, especially as there are no 2 setups done the same way. Also each third party operator has its own set of pros/cons/limitations/learning curve.

This is the reason when you follow a tutorial about how to publish application X using helm on your local kubernetes minikube/k3s whatever setup on your own laptop usually works fine but just doesn't necessarily translate quickly to deploy a production ready working app of your own on your company's managed k8s setup.


Kubernetes Dual-stack is enabled by default... https://kubernetes.io/docs/concepts/services-networking/dual...


> still not really something a junior dev or admin knows

I don't think junior devs should deal with k8s but of course, it depends on how your teams are structured.

Some people think juniors should get involved in everything to make them able to choose better what they want to specialize on and I agree partially.

It's just that, before you touch k8s, you better are already involved in everything, and have good communication with bigger stakeholders.


It's very specific to the company/environment you are in. Plain Kubernetes can be clean and pleasure to work with until you hit level that "sascha_sl" is describing. On conferences I've met a few guys who considered themselves "junior/mid admins" and did all infra (k8s based) AND desktop support for devs in a 40-50 people software houses. Sounds crazy, but some people just don't complicate their lives, like larger companies love to.

Some make it a silo and a complex, huge piece of internal IaaS, some just do "fly my dear bird" like on the popular comic [1]

You can manage your cluster externally and have an k8s API ready for you with (some of...) popular clouds with barely any knowledge, including ingress on a public facing ip and a DNS pointing to it. A junior can surely do it. "Does it have backups? YES. So it's production ready".

[1] https://www.bluelock.com/wp-content/uploads/2017/06/enterpri...


Second that - any chance there's a link to the webinar?


Well, it was in Polish :) https://m.youtube.com/watch?v=7NXJHM5oetg

It's "How to implement microservice applications on Kubernetes". It's a cool run through many topics, but it barely touches Kubernetes or even clear microservices as I understand it. I find those topics quite universal, thus I chose that example.


Is there a link maybe?


Please see my other comment. Unfortunately it's Polish only.


Kubernetes isn't IPv4-only. It makes very few assumptions about the network, and can easily be configured to use IPv6 (either directly or in an overlay).

I wrote a small tutorial on exactly this: https://john-millikin.com/stateless-kubernetes-overlay-netwo...


Kubernetes isn't IPv4 only. https://kubernetes.io/docs/concepts/services-networking/dual... Dual stack is stable and has been around a while.

Also you don't have to use NAT even with IPv4, it's just that most CNIs implement Kubernetes networking with non-addressable pod addresses. It's perfectly possible to make service and pod addresses routable from outside the cluster.


> Also you don't have to use NAT even with IPv4, it's just that most CNIs implement Kubernetes networking with non-addressable pod addresses. It's perfectly possible to make service and pod addresses routable from outside the cluster.

This is the default on Google Cloud now. Also an option on Azure.

At first I found it to be a strange decision, but then I realized the beauty of it, having pod as first class citizens of the cloud.


Yes! At home I have the pod IP space routed and can connect to pods with out any port-forwarding.


> But they went v4 only, and acquired all the growing pains of making that work.

Because the major clouds didn't speak IPv6 well enough, back in the time - Kubernetes dates back to 2014 after all. And from what I've read on various threads here on HN and on Reddit, all three major clouds still have weird bugs and IPv4-only services...


Yes. Another thing that they made (in my opinion) poor choices on.


We use Kubernetes IPv6-only for some time now. Using Cilium CNI (software defined network) plugin.

Works great after find workarounds for some minor issues. Much less hassle than with tight/inflexbile IPv4 address space.


I also love how the ipv6 docs say to give each pod a /64. Like wtaf is a pod going to do with that many ip addresses?


Might be for ease of use (fitting MAC addresses with spare bits) and/or security, like residential subnets are supposed to be one (/64) suffix each (with from /56 to /48 per customer).


Just a funny thought exercise I did recently - I looked at our sizeable kubernetes cluster of 30ish VMs, and added up the core counts. It added up to 98. You can easily get a box that has this many (or more) cores. I wouldn't be surprised if our cloud provider analyzed the topology of the connections, and decided to put the whole thing on the same server.


Sure, but then you can't, say, restart the server with no app downtime (for patches, maintenance, etc). You'd also probably have to setup cgroups yourself if you don't want to risk a rogue process taking up all the resources and starving other processes.

For AWS and GCP, there's zone-specific VM scaling groups that ensure you're in multiple AZs so you'd know for sure you're not on a single physical server.


What is that nonsense? I literally restart apps in a rolling fashion on VMs all the time, no Kubernetes needed and no downtime in sight. This includes a full reboot of the VM itself.


They are talking about restarting the single server that all of your VMs are running on.


Which can also be done with zero downtime to the VM.


How do you restart a single server hosting VMs with zero downtime to the VMs?


Did I misunderstand that the original post was regarding being able to do it with Kubernetes, not a few physical hosts? I didn't see "single" in the comment when I originally responded.


Yeah, Top comment says

"You can easily get a box that has this many (or more) cores. I wouldn't be surprised if our cloud provider analyzed the topology of the connections, and decided to put the whole thing on the same server."

And the comment you replied to is referring to "the server"


VMware can do it by live migrating the vm, you will incur a short pause though, and the networking is a bit tricky to setup.. This of course doesn't happen during an unexpected downtime, it's a cold boot on another node in that case.


If you migrate a VM to a second hardware server then you have, by definition, a second server.

The question was how you reboot the posited singular hardware server with no downtime to any VMs running on it.


I have never seen this go smoothly on a production server; it's always WAY slower than expected (if you use any significant amount of memory) and something always gets f'd up wrt the network connectivity, broken caches, etc.


if I understand correctly, this is happening (nearly) transparently on GCP all the time. https://cloud.google.com/compute/docs/instances/live-migrati...

It involves copying the whole VM image over, and rewiring the network connections virtually on the fly.

I say (nearly) because it's not 100% transparent, but my understanding is that it works properly the vast majority of time.


Probably true but I'm also pretty sure Google isn't using VMware for live migrations under the good.


And what do you do when you need to restart the actual VM itself? You know, for when you need to patch the kernel and such...


VMWare has had a high availability mode for over a decade. It keeps checkpoints of system state elsewhere and synchronously replicates them.

If the primary catches fire or crashes, the secondary boots the VM quickly without data loss.

If the primary reboots, it checkpoints RAM to the secondary pauses the VM and unpauses it on the secondary a few milliseconds later.

Note that this transparently handles storage replication, which is something that is notoriously difficult in Kubernetes.

If your cluster fits on one machine, you've paid for a lot of (currently) unnecessary complexity up front, both in dev time and in hardware cost.

If your app scales to need the cluster, congrats. Sometimes delaying time to market to allow for a smoother ramp makes sense. Sometimes it does not.

(Wikipedia is a good example of succeeding without ever needing to scale out the back end. I doubt they'd have won out over their competition if they took an extra 12-24 months to launch.)


> VMWare has had a high availability mode for over a decade. It keeps checkpoints of system state elsewhere and synchronously replicates them.

So you don't have one VM's, but two. In case of Op's scenario you are paying for 196 CPUS instead of 98.

> If the primary reboots, it checkpoints RAM to the secondary pauses the VM and unpauses it on the secondary a few milliseconds later.

So a very small downtime, but not 0.


It depends on your SLA. Under 10ms 99.9% of the time is pretty tight. Even if the migration takes a second, you'll meet your SLA for the hour.


So not 0 down time then


It's just like flying- throw yourself at the ground and miss...


I responded above, either I missed the word "single" or it's been edited (but doesn't show it as being edited?) -- live migration between hypervisors is a thing.


Also curious how this is possible


I don't recall seeing "single" in the comment I replied to. You can live migrate VMs between hypervisors. Either I missed it or it's a dirty edit.


This is the relevant part of the root comment:

> You can easily get a box that has this many (or more) cores. I wouldn't be surprised if our cloud provider analyzed the topology of the connections, and decided to put the whole thing on the same server.


Can you do that without knowing what apps are running on them? Your entire platform say "do a rolling update" and don't look back?

I wouldn't want to have paid for the engineering that went into that if you could actually do that.


> Your entire platform say "do a rolling update" and don't look back?

Yes. GCP literally has a button that allows you to Replace/Restart all the instances in an Instance Group. Our deployment system just pushes a new GCP template and GCP does all the work rolling it out. No Kubernetes.


Yes there are other solutions out-there that have been pre-engineered, but he was talking about VM's and apps and rebooting them. Doesn't sound like instance groups to me.


If things are architected to be updated in a rolling fashion. If it's some single node app you're screwed. But we're talking about K8s so I would hope you're using it for its purpose and not some tiny cluster for a 3 request per minute app.



I was talking about doing this on non-k8s platforms :)


Yes. You can do this without knowing about or modifying the applications. There was a big push for hardware consolidation a decade or so ago, where legacy apps were crammed on to fewer and fewer machines, and were cheaper/faster/more reliable as a result.

However, a permanent machine failure or unexpected reboot implies a few seconds or minutes of downtime.

These systems usually use a local disk, a synchronously updated disk across town (low latency, but on a different power grid, outside most natural disaster blast radii), and a far away asynchronously updated disaster recovery disk.

The disaster recovery disk provides a crash-consistent state that's less than a few seconds out of date.

These days, the "disks" tend to be deduped and compressed SSD's that use parity encoded raid. Their hardware cost is lower than a single copy on ext4, but they come with an enterprise tax / support contract, etc.

In practice, it's durable unless corporate HQ is wiped out. If that happens, it won't be the weak link in your business continuity plan (but a few customers' orders might've been dropped).

This stuff is tremendously unsexy, but it'll plug along just fine until you have some workload that can't be partitioned, and that needs more than roughly 40-100gbit of network/disk bandwidth, 128 cores, or 1TB of DRAM.

(Edit: Forgot to mention that the the apps run inside a VM infrastructure that supports live migration.)


Well, I didn't claim it was impossible, I only claimed I wouldn't want to pay for the engineering that would go into enabling this when building that from scratch. But there are of course alternatives to k8s which also enable this.


for stateless apps (with HA, capacity etc) its easy


> you can't, say, restart the server with no app downtime You can, on properly configured server. Cloud named cloud due to it allows to do such things - VM migration from one host to another one flawlessly.


This would all depend on your risk tolerance and fail-over plan. On Azure, for example, K8S node pools run on VM Scale Sets. By enabling Availability Zone distribution for those VMs, you could at least keep part of your node pool up if one zone in the region went down:

https://docs.microsoft.com/en-us/azure/aks/availability-zone...


So if that host goes down you are screwed? Would not be a wise decision from your cloud provider. Although thinking about it twice, usually cloud providers response in such a case would be a shrug and that's all...


I very doubt that would happen. Even if, at the very least, control plane would be distributed, so host going down would just cause another underlying VM to spawn and everything being recreated.


I guess it's all relative, our largest cluster scales up to around 3000 VCPU and there's plenty of much larger ones out there.

You might be able to get better utilisation of your VMs by running a smaller number with more cores on each - I'd try 16vcpu nodes.


At my previous workplace, one of the clusters I herded was 60-odd (physical) machines, 72 or 96 cores (honestly can't recall at this point) and 384 GB RAM (IIRC) per physical server. But, that place was a wee bit strange, in that we ran all our kubernetes in-house and started on bare metal.

And, yes, as a general recommendation, I would tend towards bigger, rather than smaller, nodes. There's a trade-off to be had, but you want each node to be at LEASt as large as the largest pod you intend to deploy, and you probably want at least some spare, on top of that.


And managing dozens of apps on a giant VM using raw chroots and cgroups and your own scripting is desirable to Kubernetes?

I don't see what you win here. I don't want to have to write my own rolling updates, canary deployments, networking/traffic/load balancing scripts, configuration/secret management, dependency management, log aggregation, user management, etc.

I just spin up a Kubernetes cluster, create service accounts for my users, and let them use the k8s docs and declarative primitives to ship their code. If I'm lucky they are already familiar with the system from their last job, so they can bring expertise with them.


Kubernetes has the ability to allow you to prevent this. At the very least, to tell your operator to not do this.

You can use inter-pod affinity and anti-affinity[1] to basically say "don't run the instances of this scaling group on the same physical nodes". Then, when the physical VMs are upgraded for whatever reason, the cluster knows not to bin all the Pods hosting your blog onto the same physical node.

Combined with how, e.g., Google handles node upgrades, means that I can keep services up and running pretty seamlessly without a lot of work. In fact, I haven't had downtime due to an upgrade of the underlying nodes or VMs in over 6 years, despite Google regularly updating the underlying nodes to newer host OS and kubelet versions.

You can also use node selectors[2] and related configurations to run specific pods on specific nodes or types of nodes (e.g., only run this pod on a node with a fast disk).

We do this with all of our Jobs. They are assigned to nodes in an autoscaling nodepool that scales down to zero. So, those nodes are only ever in use when we have jobs running, and jobs coming and going doesn't create any thrashing amongst our long running services.

Of course, your cluster operator could ignore all that somehow, but then when you have unexpected downtime during a machine or VM host upgrade, you'll be pretty peeved.

You do bring up a good point regarding node core counts. If you are running "30ish VMs ~= 98 cores", then you are probably running 2 and/or 4 core nodes. You may actually want to run "fatter" nodes with higher core counts. Keep in mind that each node is going to have dedicated resources for the kubelet, and any daemonset pods, which can add up pretty quickly sometimes. You may actually be wasting quite a bit of resources to overly redundant overhead. You may also suffer from slower scale-ups because you have to add a lot more individual nodes when things start to ramp up. There's a balance to find, not just with perf and overhead but with resiliency, but it's something I see a lot of folks overlook because they think they want to have super fine-grained node scaling.

[1] https://kubernetes.io/docs/concepts/scheduling-eviction/assi...

[2] https://kubernetes.io/docs/concepts/scheduling-eviction/assi...


Probably need less cores for the same performance. Its never entirely clear what a "vCPU" is.


It's one hyperthread core on x86 and full core on ARM.


In AWS EC2 you can avoid that not just by spreading across different zones but also using placement groups so nodes are placed in different racks.


If your cloud provider is scheduling everything on the same server, they're doing something very, _very_ wrong.


I would be astonished if what you say is true


The posters here who are commenting on what should be (often largely what Kubernetes already is) would do well to read the paper about the Omega scheduler.

https://static.googleusercontent.com/media/research.google.c...

It is almost the rule that none of us has the breadth of experience to understand all of the problems that need to be solved. In fact, an explicit goal of Kubernetes is that almost all users should not need to understand much of how the system actually works.

Frankly, most users have neither the time, inclination or intellectual capacity to understand more than a tiny part of the systems they work on. That includes most people who say they do understand those systems completely ... a few short questions can generally expose how little they know about important aspects.

That ignorance is a measure of success, btw, not failure.


The problem here is the law of leaky abstractions: https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

Systems that abstract away the complexity for you work fine up to a point... but inevitably you'll end up debugging something caused by a lower layer than the one you are familiar with, at which point you need to dig in and figure out what that abstraction is doing for you.

It's not possible to understand everything, but the more you understand the more productive and effective you'll be.

I see my overall career partly as an ongoing quest to deepen my understanding of the various abstractions I rely on.


Wasn't Omega the successor to Borg? I'd be curious what's the status of it today at Google?


I believe there are several public articles what happened. Some omega features were folded back into borg, omega development pivoted to k8s development. Most of google still runs on borg/internal but more and more people are moving projects to GCP-hosted cloud. That only really applies for things that can run Outside, and for which k8s is "good enough". Borg will continue to exist, roughly in its currently for, for at least 20 more years if Google doesn't get shut down by DOJ. It would simply not be cost effective to migrate large parts of GOogle's infra off Google borg/internal.

In my experience, absolutely nothing in the world matches the experience inside of google; enormous number of enormous clusters and gobs of networking within and between them, plus a heavy set of existing workloads constantly testing it, all while people constantly deploy new versions of borg, filesystems, database and applications from a single codebase. Just as importantly, there was a massive culture of cluster nerds (SRE) that ran the clusters very efficiently, but also with care and attention to a wide range of users, including ones that weren't on Big Teams. I never, ever would have been able to build and run exacycle, an idle cycle harvester that provided at least 4 significant contributions to science.


Are you ex-Google then? Could you share more detail on what exacyle is/was? It sounds very interesting.



I wasn't a big fan of k8s when I first started using it (note: I was on the borg team at google when omega was being built, and know the original k8s developers from that time). When I started using it it was still pretty unstable (this was at least 5 years ago).

Since then I've been using it more heavily and all I can say is this will likely be the system I spend most of my cluster work in for the next 10-20 years (after that I'll be retired!). It's enough like borg that I can translate over nearly everything, it's not broken in any fundamental way (although there are some serious bugs from 2015 that I never expect to get fixed).

One thing about borg that was always controversial was the scripting language created to submit and update jobs (bcl/gcl). It's quirky but I have not seen a better system for inheriting and updating templates. Even so I'm glad the k8s authors force you to use a simple config language with a minimum (or no) logic in it.


Could you elaborate on the serious bugs you think are unlikely to get fixed? I think they might help me understand the current limits of k8s better.


When people ask for something to be lightweight, what they really mean is "keep the things I need and remove the things I don't..." A lot of these "why you don't need kubernetes" articles should really be "do you really need containers for your 10 user app?"


I think one aspect your reasoning does not properly respect is that running things in container is not all about scaling, but also about easy of deployment. A 10-user-app probably doesn't need k8s, but the advantage of packing it into a container and just starting that container (on a single machine) is very nice. It's so easy to have multiple apps with different languages (or even language versions) on a single host without worrying how to install 3 different PHP versions on that machine.


Back in the day we used to call it EAR files, or static linking with resource files embedded.


Thank you. If I had a dime for every time I've uttered, how is this easier or more efficient than "deploy war", over the past 5-7 years I'd be a very wealthy man.


To me, it really comes back to how badly the Java system was managed with regard to the quality of its tools. Portability and security would always have been a draw for every other language but even the Java projects I know used containers for basically two reasons: Tomcat was unrewarding to use and the Java web stack had a lot of complexity which users were forced to deal with. If Sun had spent, say, 5% of their marketing budget on making the user experience better I don't think there would have been anywhere near the pressure to avoid it.


Somehow I get the feeling that most Java haters never had the pleasure to deal with CORBA and COM/DCOM or MTS.

So those newcomers never got the point why many of us actually enjoyed using Application Servers.


I did, actually, have the misfortune of using COM (and touched CORBA enough not to want to use it more).

I don’t hate Java but, having started using it in 1995, it felt like the community was prone to sweeping ambitions which exceeded the available time or appetite to improve further. One challenge is that while doing everything yourself allows the most control, it also means that improvements only come from within the smaller community of Java users.


Well I wouldn't call that community small, given that it grew to overtake C++ on enterprise distributed computing.


It's definitely not small but it's a subset of the entire development community. That's not wrong – maybe it allows iterating faster without supporting other languages, for example — but it can also reduce the number of people interested in working on something. Just as a simple example, a Python user can benefit to an improvement in Apache httpd made by a Java, C, PHP, etc. shop but that happens a lot less frequently with Tomcat.


That Python user can benefit from Jython and GraalVM.


Kind of, but in practice they probably aren't going to use those unless they're already committed to the Java world because they're slower, less compatible with the extensions they're probably using, and fewer people use them so you're more likely to find bugs or gaps in documentation. Again, I'm not saying that Java is terrible or that nobody uses it but simply that the community is smaller than the entire tech world — especially after Oracle's sales pressure — and that means that any given project has a greater chance of not having enough support unless a major company backs it.


Each community is tiny when measured against the whole tech world, not really getting the point.

A C++ shop couldn't care less about what a Swift shop is doing and so forth.


The point being that Java-specific things like WARs, Tomcat, etc. were almost exclusively used by Java shops whereas more general tools such as httpd are used across multiple communities. A C++ shop can host their app behind httpd/nginx/etc. the same as a Swift shop even if they don't otherwise use the same tools.


A C++ shop was most likely using COM or CORBA back in those days, I really don't get that insistence with httpd.

So, our startup created a mod_tcl similar to AOLServer, basically a TCL Application Server, no else got to use our tooling besides our customers.


My first job involved CORBA in Java. Tomcat was better, I guess.


Well, for one thing, Docker Hub now has a helpful web badge telling you whether or not each container contains a vulnerable copy of log4j.

I'm not sure why it's taking the J2EE containers so much longer to patch than the good 'ol PHP stuff or shiny new Go apps.

/s


> I'm not sure why it's taking the J2EE containers so much longer to patch than the good 'ol PHP stuff or shiny new Go apps.

I was wondering what it'd look like if they used these badges for every language. Maybe like 5% of the outside containers I look at don't have significant unpatched CVEs, even for relatively popular projects.


I agree, yes. My history is less Java and more PHP.


...or a tarball, or a deb file or an rpm.


Perhaps as a result of incorrectly classifying K8S or K8SaaS as a PaaS when it’s very far from it. Heroku offers the same - or better - devex without the operational burden.


It’s worth noting that advantage (and in my belief the large adoption of containers generally) is only valuable to interpreted languages that dont allow static linking.

For statically compiled binaries I’ve not worried about different versions of libraries for my binaries for 20 years (and the technology is decades older than that).

I do appreciate the docker file format but largely because it’s become used for true virtualization systems like firecracker. Plain Linux “containers” largely offer me very little.


Containers are still valuable for statically-linked code, as ways of comprehensively namespacing resources. (In fact, the converse is true; containers are built on the resource-namespacing features of modern Linux, which in turn were inspired by Plan9 a distributed operating system building on *nix).


Right. But I’ve been able to use resource limits for my Linux processes for far longer than containers were popularized. With my processes anyway that’s the big win with namespaces.

That is to say, if you control the hosts and process then the container features are available with much less ceremony while if you are hosting someone else’s process (or they you) containers don’t provide enough isolation.


You're thinking of cgroups, which is what you gives you per-process group resource limits. Namespacing gives you quite a bit more than that. You can have all services running as the same user but still not able to see each other's data. You can have all services get a unique IP address even though they're on the same kernel. You can have them all listen on port 80. They can all read from the same socket file from their perspective but it won't actually be the same file. You can have them all use different DNS providers even though they're just delegating to /etc/resolv.conf. You can't do that with just static linking. Containers give you quite a bit more than incompatible dependencies being able to run on the same host, including features that can only be provided by a kernel, not a language runtime. They allow you to use arbitrarily many different languages. You can run third-party applications you can't rewrite in Java to get a WAR or force them to statically link.


I think the more fundamental advantage is on the dev side:

How do you reproducibly build that static binary in a dozen different developer's environments, and in CI?

In fact, containers kind of suck in prod environments.

For example:

- Kernel/binary mismatches break vDSO, and gettimeofday becomes 100x more expensive.

- Current apache can't run in Synology Docker any more, because they require a new kernel secure RNG (even for http).

- There's no reasonable built in story for persistent storage.

VM's and bare metal solved these issues long ago.


Worse, except for stuff like what was being done at Xerox, static linking was the only way for compiled languages.

Also on desktop systems, embedding resources into the binaries exists since the early 1980's, or just deliver every in the same directory like NeXTSTEP bundles, or what is called xcopy deployment on the PC world on homage how we used to do it on MS-DOS days.


K8s makes small-scale hard, but large-scale easy. So no, for a 10 user app you probably won't need it, and would make your life a lot harder.

But one of the things many people don't grasp is that scale comes in a few forms. The obvious everyone thinks about is the millions of users thing, but from what I've seen, that's not the most important-one that k8s enables - it's the organisational thing. Enabling self-servicing for a dozen+ dev-teams, having to manage 20+ totally different applications or just a microservice platform?

That's where it becomes more practical to think about Kubernetes, because it forces the applications (and everybody involved) to streamline things, and you have to treat it as cattle, not pets. Metrics, logging, monitoring, failover, zero-downtime upgrades, scaling, managing secrets, network policies, auditing and security, platform upgrades, even the "replace & scrap" kind become WAY easier to manage by a very small team of people if they know what they're doing (that's another subject though).


Scheduling a large heterogeneous workload on a single cluster is both a very deep problem, and a very lucrative one for large companies to solve, in order to manage their capacity/utilization.

Many Kubernetes deployments are for a homogeneous workload, or a heterogeneous workload is split up into many Kubernetes clusters. In these cases you are bringing in a whole lot of very complex infrastructure for bin-packing, isolating workloads from each other, etc. that you are not benefitting from.


Yeah I run kube for all my small apps and it’s great. It still seems far better than the alternatives.

I don’t buy this “kube is only for enterprise” story


Kubernetes is so complex for 2 reasons:

1. A distributed system with as many design considerations as K8s will always be highly complex.

2. K8s was poorly designed by committee by people who didn't really understand how to build what they were building. It still lacks things like true multitenancy (I'm aware of the SIG) and its security model is a farce. Everything from its configuration format to its RPC to its roles and beyond are unnecessarily complex and painful. It should have been redesigned twice over by now but that will never happen at this point.

The time is right for someone to write a Kubernetes replacement and disrupt the lumbering joke that is the K8s ecosystem.


Part 1 is not true. You can implement consensus, leader election, DHTs using few simple and small libraries.


Those are only 3 of like 50 different design requirements of k8s. It really is a ton of functionality when you implement it all, and how it all has to combine together is quite complex. Just trying to keep a mental model of the actual execution paths to run a distributed microservice using all of k8s is more than most people could reason about without flow charts. But that's no excuse for poor design, which is what we got


I think it is related to but not the same as the OP but to me, something that you don't think of much is that this was invented by Google for Google-scale issues.

What does this mean?

Lots of things that would be relatively easy for Google make it harder for much smaller companies. Logging? Google can build a log aggregator. Hardware? Google have 100s of 1000s of servers. Documentation? Originally, certain knowledge could be kept in small teams who worked on it, the documentation then had to be generated after-the-fact, which is perhaps why the docs are quite factual and not so good at introducing some of the concepts. You want GUI tools for setting up e.g. rbac and certs? Great but Google don't because they have people who can write the relevant automation scripts in their sleep so don't need what us mere mortals might prefer when new to it.

I do like it though and use it for some of our systems.


Although people love to think of Kubernetes as “the thing google runs on that was open sourced”, it is actually a simplified version, written from scratch, based on the same principles of the tool google actually runs on.

On used to run on, because it seems that there is no public information about the new platform after Borg.

As a comparison, Mesos was built with the same goal (open source version inspired on Borg) and made some design choices Kubernetes haven’t.

UPDATE: - I don’t really know why Google spent the effort on building Kubernetes, but I believe it was for Google Cloud. And it needed to be open sourced to form a community.


> On used to run on, because it seems that there is no public information about the new platform after Borg.

I left Google quite a few years ago, but my impression is that Borg is still going strong for many of Google's services. There was an attempt to replace it with a new system named Omega, but that ended up being more of an experiment - some of the findings of which were rolled back into future Borg development.


My understanding is that it's still borg. All the cool features from omega were backported into borg, since they were needed to support (insert product here), and in the end some of the omega team got an OK to write and open-source kubernetes based on the findings from omega.

When I left, the big ongoing project in that space was an effort to replace babysitter (most people in Google do not have to worry about babysitter, and for 98% of the time, I didn't, but the 2% where I needed to meant that I kept myself informed of changes in that space).


Your last sentence with respect to Apache Mesos led me to this previous HN discussion: Apache Mesos to be moved to Attic [1].

1: https://news.ycombinator.com/item?id=26713082


I always thought Google built Kubernetes as a commoditize your complement strategy: https://www.gwern.net/Complement


Maybe I shouldn't be surprised that so many don't understand where k8s came from.

Google doesn't run on k8s. k8s was _not_ invented or designed to solve Google-scale problems.

Since it's inception, the open source community and several corporate contributors have made k8s much closer to something that can solve google scale problems, but that's definitely not where it started, or arguably, where it is now.


We moved all our apps to AWS ECS. Developers happy. Kube was installed by hired consultants, after they left it was real struggle. Then we moved to ECS.


Huh, we are moving our apps from ECS to kube and are super happy with it.

I’m not sure why anyone would use ECS. Their APIs are terrible and the tooling is no where near what k8s offers.

Want to have a rapid development experience like tilt/scaffold/devspace? Have fun

Want to add Prometheus and Grafana? Have fun

Want to create development environments where an app declared its dependencies and they are installed alongside and can communicate? Also have fun

There are many more of these…

Kube is cloud Linux, and ECS, in classic Amazon style, is just a terrible version of it.


> Want to have a rapid development experience like tilt/scaffold/devspace? Have fun

Wtf is tilt? I made it so after code is pushed to repo, CI/CD updates ECS Service in about a minute. Rapid enough. No third party tools used.

> Kube is cloud Linux

Nah. Kube smells like Google, looks like Google and is fat like Google.


tilt is a local development tool for containerized apps that handles things like file watching, auto-rebuilds, etc. It can do k8s, multiple docker-compose setups, whatever. We've liked it so far: https://tilt.dev/


I agree. I don't understand why you would use k8s on AWS. It's an abstraction on top of an abstraction doing nothing but adding complexity. If your k8s pods have to interact with S3 buckets, IAM profiles, and load balancers anyway then you can't ignore the platform. It's just somebody chasing the cloud agnosticism fallacy or doing resume driven development.


We have to support cloud providers in other countries due to local data protection laws (not every country has an AWS data center); doesn't it make sense to make your infrastructure management more-less uniform no matter where it's hosted? In some countries, providers may have only a single availability zone (in reality), or they can get very unstable from time to time (everything is down and god knows for how long), so it's natural for us in such countries to migrate our whole production between completely different providers several times a year. As an ordinary engineer whose job sometimes involves investigating problems in production or configuring something here and there (I'm not a full-time DevOps or SRE myself), I don't have to learn the details of whatever is the current cloud provider this week in the affected country N, I access the production system through a familiar interface which looks more-less same everywhere (k8s), including our test environments, too. Am I missing something? I am pretty curious what "cloud agnosticism fallacy" means. How would you do it differently?


In your case it makes sense because you don't run primarily on AWS. I work with people who only host in AWS and still use k8s + other native AWS constructs.


I did the same for every project where k8s was used and started with ECS where k8s was proposed. Even ECS is too complex for my taste, but who am I to dictate what clients want.


I am a huge fan of ECS. It just works. You have to write a tiny bit of tooling, but it's the kind of glue code you'd probably write on top of kube anyway. Failures are straightforward and visible. There's no debugging layer after layer of network redirects. It's much harder for an errant config to "helpfully" punch a hole through your firewall. You get the same uptime and scalability benefits as k8s or EKS. Everything is still a container.


ECS or EKS?


ECS. EC2 and Fargate types.


There is no reason for kubernetes not being just as simple as heroku or its many alternatives, none.

The only reason I‘d believe is that it‘s a conspiracy for long lasting job security for anyone in devops.

They managed to shit out an application server even more complex than enterprise websphere, which is both fascinating and abhorrent.

Same for terraform and all the other overengineered cloud crap companies pay their devops wizards for.


Not sure why you get the downvotes, all I can imagine is that there are a lot of people here who learned on K8s and have never tried anything different, or had to manage infrastructure at scale (In my experience scale being 30k+ bare metal machines)

99.9% of the time I see a k8s implementation it's a shop pushing a ridiculously low amount of traffic and then needing to scale their application across decades-old hardware provided by a cloud provider. It's so bad that a single, modern 4U server, like I used to manage working at at a large company, could have replaced their entire infrastructure. The k8s users wind up with thousands of lines of YAML to solve a problem that could have been solved with better design decisions. The abstractions upon abstractions also prevent developers from truly understanding what is going on in the real world, for example your cloud provider's hypervisor doesn't align with your para-virtualization hypervisor so you end up with all sorts of issues with affinity and noisy neighbors on the same hardware that you can't even see, or your racks aren't splayed right so failures wipe out disproportionate amount of instances, that K8s then takes ages to rebalance things.

As far as the statement "there's no reason it should be as simple as Heroku", I agree completely. I imagine in a few years something like AWS Firecracker+ a simpler control plane will end up superseding k8s, and k8s will end up in the graveyard of dying virtualization orchestrators like OpenStack and CloudStack.


> I imagine in a few years something like AWS Firecracker+ a simpler control plane will end up superseding k8s, and k8s will end up in the graveyard of dying virtualization orchestrators like OpenStack and CloudStack.

Past experience makes me pessimistic that we will ever get something "easy" that everyone uses.


Doing all of that is my job and you are 1000% correct. I will always have job security now because of how ridiculously over-complicated, clunky, rigid, flaky, and bizarre all the DevOps tools are. K8s Admin is now the new DBA. Writing Terraform is a programming job in itself. It's like somebody decided we don't have enough useless people in tech so they made all this new tech be more annoying than it has to be. We could probably replace it all with a couple shell scripts and it'd work just as well.


To add to that, I find very few engineers actually "writing" anything in these IaC languages. It's like a big game of copy/paste where people simply keep repeating bad patterns that use a library that uses an API they don't understand, maintained by someone working at a cloud provider who barely understands how their own service works. Like, there are so many turtles all the way down, when you see this happen long enough it drives you crazy.


I am like that. To get new applications working in my company, I have to run a proprietary shell script that creates about 20 terraform files that I then later have to manually adapt via copy and pasting. Then I have to enter several configs/settings into our config server, etc.

In the first company I ever worked for, we could run one-click jobs that would build and deploy .war packed applications to our tomcats instead. They were based on 1) building the application with mvn and 2) pushing the .war to the tomcats. Each version was running in parallel, the version agnostic path routing could to the relevant version could be set via dropdown.


> I have to run a proprietary shell script that creates about 20 terraform files that I then later have to manually adapt via copy and pasting.

Would it be possible to replace this with modules and variables? I've done this a few times although my scale is incredibly small at the moment.


Yes and no. There's a diminishing return after a certain amount of modules and variables. It's because the tool is shitty and limited. For example there's no such thing as inherited or global variables between Terraform modules, you have to explicitly pass them, or else pass these clunky object variables that embed lots of data and types but not the actual variable concept. And at the end of the day you need one final root module which requires a bunch of files, so you might as well use a template.


Thanks for the self awareness. I do not hate devops engineers, I am fully onboard with engineering for job security, but I am against anonymously pretending it's anything but.


As soon as we brought our redis and mongo in house, k8s became easier and cheaper than heroku (and 2 other PAAS solutions that we had experimented with).

After the switch (and a single open source tool) we still didn't need any specialized devops knowledge, we just have developers in house.


A lot of easy things suddenly become very complex if you need to deal with high availability and/or a lot of distributed computers.


Like counting distributed streaming data with high accuracy and low latency - or especially calculating quantiles - for example.


Not at all. I worked on HA and distributed systems for 10+ years and the best distibuted designs are by far the simplest.


Why are you so bitter about devops people?


I am not bitter about them, personally. If I could pick one area to work in, in the field of the technical side of software engineering I would go for devops. I applaud everyone who earns their money with this dreck!

But taking a step back, it's madness. I have seen mindbogglingly complex setups 10 years ago, when I had to go begging to the old websphere greybeards when I needed anything sysadmin/devops related, yet in that time it has gotten worse. 4 of the 4 kubernetes projects I have seen personally involved highly customized code for deployment and infrastructure, that wasn't comprehensible for anyone besides the relevant devops teams. From the perspective of efficiency and the company owner(s) it appeared borderline malicious.


None if that explains why it’s so, how do I put it, fiddly? A flexible tool could still have reasonable defaults. If you rent a managed K8s cluster from AWS, you’re of course fine, but why does deploying it yourself have to be so complicated that the recommendation is now to use a “Kubernetes distribution” like K3s. Why can’t it just be a daemon you install using standard mechanisms?


Because it was quickly thrown together in an experimential/evolving language, in a design by committee way, by a very non-diverse group.

And it's getting much better at ergonomics/UX/DX (go is also getting much better), but it's very similar to OpenStack. (Python 2, resource hog, impossibly fragile, basic setup works, but uselessly incomplete without certain critical 3rd party components - eg. without Ceph and ovs it was just too big of a mess, similarly the CNI stuff and the LoadBalancers for k8s are the important missing batteries.)

That said both microk8s (snap packaged by Canonical) and kind (kubernetes in docker) are turnkey solutions. k3s is more of a vanguard :)

.. also, it can be (is already) packaged into a simple RPM/DEB easily, like the GitLab Omnibus.


There are 3 node clusters on RPi and 15000 node bare metal clusters. For which will you optimize?


First of all, are you sure there isn't a way to improve the ergonomics of one without harming the other?

Second of all, to the extent that there is a tradeoff, this is essentially a solved problem in every other kind of daemon (from web servers to databases). Make the simple case secure and simple, while making the complicated case (which will require a lot of custom configuration and reading the documentation anyway, not matter what you do) as straightforward as possible.


Probably because it was designed for scale and big companies that can afford lots of hardware don't need to invest on standalone instances.


"Kubernetes is a cluster operating system" makes sense. Some of the complexity of Kubernetes also feels like the early days of Linux where you had to basically build your own Linux distribution. Even with managed solutions like EKS, you still need to install and learn about cert-manager, how to pass the IAM credentials around, monitoring kinda sucks so you want to replace it, ...


Note that there are other cluster operating systems which make different tradeoffs than Kubernetes. Some of these are true Single System Image (SSI), but even something as comparatively simple as Plan9 is closer to the SSI end of the scale than the less hands-on, more 'declarative'-focused k8s.


I explain qualitativley to people that the 'cloud' provider is doing a lot on the control plane below the hypervisor.

K8s is a 'puff' within the 'cloud', and a bunch of additional control plane activity going on below the 'cloud cover' of the hypervisor line is now yours.

Happy days have arrived with the added IP managent, scaling, DNS, &c requirements.


I personally feel like Kubernetes tries to be all the things, which creates feature and complexity bloat to be able to accommodate any possible workload you can throw at it. “Cloud Native” is a bit of a misnomer. It’s more like trying to run your own cloud versus relying on a service provider.

Sure you can run a stateful, highly available database on k8s. But you’re going to have to manage the complexity of making it work on k8s. Or you could just use a managed database service of your cloud provider (which has existed even before k8s).


i don't think k8s is particularly difficult to use/understand. at least from app developer perspective. maybe setting it up and operating it is a different story, but deploying apps to it is fairly simple.


Kubernetes is one of those products designed to appear simple and easy to get started with, to drive adoption. Only later does one find how unavoidably complex running such a system is (usually when something awful happens and nobody knows how to fix it). In that respect I miss Mesos: it was in-your-face complex and as a result had far fewer (zero for me) surprises.


Actually Kubernetes is not complex.

The requirements of most software developers are complex. Running small self contained and isolated applications in a highly scalable manner which can scale automatically, where dead services get killed and replaced, where health checks, readiness and liveliness checks fully automate a lot of dev ops problems, where things can be load balanced publicly as well as privately in a private VPN and where complex ingress rules can be created to route traffic from one load balancer to different smaller load balancers which distribute traffic across different apps and with automatic SSL certificate management for public services via an ACME provider are VERY COMPLEX requirements.

If you want to run your stuff in such an advanced fashion then Kubernetes makes it EXTREMELY SIMPLE.

If you don't care about these things then don't complain about Kubernetes. Just run your shit on a single VM and deal with everything manually as and when you need it.


I think a reasonable design approach is to consider Kubernetes an application management system, just as Postgres is a database management system. Put your stateless pods in there, wire them together with DNS, and inject config into them.

Having that philosophy, and optimising that case so it's easy for coders to deploy into it is a big win IMO.


Am I wrong when I say this type of opinion normally comes from infrastructure personal? I say that because I've never seen a standard so easy and seamless for a developer to get his code running.


> I say that because I've never seen a standard so easy and seamless for a developer to get his code running.

You deploy using extremely long YAML files, your app interacts with a lot of complex software (secret storage, ingress controller etc) and accessing your app for debugging is very non-straightforward. Once you understand the concepts and stack it's great, but when you're coming from locally running your app or even docker, it's as opaque as it gets.

In my experience, it's actually worse for developers, because infra people are forced to learn k8s as direct part of their job, while for developers it's more of a sidenote.


I agree it is a steep learning curve coming from docker only. But if you have to scale your app or other resources quickly and easily, I think k8s is the best of the bunch. I have found other solutions more difficult (docker swarm, paas platforms like heroku, a custom nginx and a lot of docker hosts etc..).

If you don't need that easy scaling, then I don't think there is a need for k8s.


> I say that because I've never seen a standard so easy and seamless for a developer to get his code running.

"deploy war" was pretty damn easy.


Very seriously: Do yourself a favor and try out a real PaaS like Heroku sometime. Running a real, production-ready system on Heroku is truly as easy as "git push", and the concepts are way, way simpler to understand. It's not going to be the right fit for everybody, but as far as just pure devex goes, it's truly excellent.


People have mentioned here that a Heroku-like layer would be incompatible with k8s. Based on the description of k8s as a distributed OS, I don't understand how to reconcile these things.

I'd want to deploy a k8s infrastructure, but have an easy experience on top for developers that might not maximize k8s use initially, but allow devs to move down a layer eventually. The whole time the ops folks would be running k8s and limited layer above that'd keep devs in a particular lane.

Isn't this feasible?


I think several companies are trying to sell exactly that. if you look up VMWare's "TAP" offering it's attempting to provide a wide umbrella of components that move in that direction. Others in the space include Redhat, Replicated, Rancher.

A few individual components that try to bring back the ease of rapid/easy deploys include: Flux, Tilt, Argo, kapp-controller.

They all have slightly different priorities and make different compromises, but anyways just to assure you there's quite a few people trying to build and sell the layer you're describing.


> I've never seen a standard so easy and seamless for a developer to get his code running.

Docker Compose is a breeze compared to Kubernetes.


Really? I miss WebSphere dearly when doing Kubernetes as developer.


> "you will not, in general, see that error at creation time. "

I agree, this was daunting for me too.

I use werf.io now (an opensource project which I am not affiliated with, I am just a user). It deploys my resources and waits to validate whether the deploy succeeded. They call it "giterminism".

You can do this yourself with scripts for sure, but I have not needed to extend any of their built in functionality yet.

Highly recommended for anyone using k8s with CD.

https://github.com/werf/werf


Let me SSH into a cluster and navigate the configuration database like a virtual filesystem. Make it feel more like a traditional operating system. That will make Kubernetes more approachable.


What you just said is fun, because that is a design choice in Kubernetes.

You DO have a virtual hierarchical configuration. It’s just that it’s not a txt file, it’s a json object.

The whole Kubernetes things is actually that: a single json defining a supercomputer. Changes to the json become changes in the supercomputer.

The whole complexity aims just that: make that task easy.

You have a lot of tools to interact with that json, but you actually only need a http client, because it is exposed as an API.



Without looking into it, I bet there are companies on CNCF selling exactly that.


I was thinking the exact same thing. Something like /proc or /dev.


What we need is a real distributed OS.


Plan 9 was made for this.


why is explaining why kubernetes is so complex, so complex?

could somebody ELI-RP? (start with a collection of raspeberry pi's sitting on my desk and take it from there)


Kubernetes is very complex because the problem it is solving is far more complex than that. It is actually a simplification of the problem, with a (somewhat) simple solution.

The problem is: datacenter operational system.

That is, in other words: distributed operational system, or global scale software scheduling (because you have an operational system to run software).

Why we need something like that? Because is better (today) to connect millions of small computers rather than building a very big one. But you still need to operate and manage all those small computers.

A distributed operational system is a way to abstract away the fact that you have millions of small computers instead of a big one.

Now, bear in mind that this is a very hard problem, one people have been thinking about and trying solutions for decades.

Kubernetes is a solution to a simplified version of that problem, built while informed by the experience gathered by Google while running their own solution, called Borg.

With Google experience and knowledge gained after 12 (plus) years running Borg, they came to build a very good solution to the problem.

As a user, I believe Kubernetes is the closer we can get (right now) to a user interface where I can just ask to run my software.

Why this is not as simple as a double-click on a icon somewhere? Because software meant to be run in a global scale computer is not simple software (probably complex software composed of several small simple parts). So the usage is tuned for more complex software and thus is more complex.

As a system administrator, Kubernetes is very complex. But a smartphone hardware and software is far, far, far more complex than Kubernetes, it's just that you are not exposed to it.

So, Kubernetes is complex? Yes. You can probably figure out a simpler way to run a container? Yes.

Can you build a simpler solution to global scale software scheduling? Well, you can try.

So you don't need to run a global scale computer, just a tens-of-servers computer (or thousands-of-servers computer)? Maybe Kubernetes is overkill for you, but if you use it, you'll benefit from all the community built around it [1].

1 - Meaning: your own solution to a tens-of-servers computer maybe very good, but will be complex enough that you will have a hard time finding people willing to use it or teaching people how to use it than you would have finding people already familiar with Kubernetes.


I think this is a very good explanation.

I'd add a caveat though. Kubernetes is designed for extremely high reliability at datacenter scale. If you have a suite of applications that run at that kind of scale and require that level of uptime, then kubernetes is an excellent tool for managing that complexity.

If your application doesn't need that scale and reliabilty, kubernetes is probably overkill and adds more complexity than it saves. And frankly, your application probably doesn't need to operate at that scale and it probably doesn't need that level of uptime.

Of course, your mileage may vary. If you can handle kubernetes' upfront complexity, than scaling up is a solved problem.


thanks, lots of good pointers to follow up on, though I suspect there is a load of terminology who's precise meaning is more context dependend. For example the wikipedia distributed operational system [0] entry does not seem to really apply.

Turns out somebody did build (some version of) kubernetes cluster on rasbperry pi [1] so I might get to the bottom of it :-)

[0] https://en.wikipedia.org/wiki/Distributed_operating_system

[1] https://ubuntu.com/tutorials/how-to-kubernetes-cluster-on-ra...


It applies actually. I forgot to explain why Kubernetes solves a simplified version of the problem.

The simplification is that you can build a distributed operational system completely hiding the fact that you have millions of computers. This is, let’s say freacking hard.

But you can build one where this fact is not hidden, just alleviated. This is only super hard. This is what Kubernetes does.

Also you can break those millions of computers in small groups of thousands. This is what Kubernetes is aimed for. But you can still make those groups work together, using external tools, like load balancers and global dns zones and networks.

UPDATE:

a very interesting explanation of the problem and the solution used by Borg and kubernetes is on this lecture. https://youtu.be/0W49z8hVn0k It’s quite insightful and easy to understand actually.


Excellent question. Because it has plenty of unnecessary complexity.


Unnecessary complexity -for your use case-. Not for the set of use cases it's trying to solve. It's hard to view complexity caused by stuff you don't care about as anything but unnecessary.


it may be. can't really tell until somebody maps it to a universe that (I think) I understand...

but I suspect this seeming to be a hype of sorts within certrain tech circles (with associated professional incentives etc) there aren't many around that would have both the in-depth knowledge and the desire to really de-obfuscate and break it down


Because it's taking care of a complex problem with lots of necessary complexity (though how complex it seems depends on whether you learn it bottom up or top down, IMO)


> hype of sorts within certrain tech circles

The downvotes on my comments and the complains are telling.


Telling you that you don't know what you're talking about.


That's what you want to believe.


Kubernetes is complex because it was built for "The Datacenter as a Computer" http://www.cs.yale.edu/homes/yu-minlan/teach/csci599-fall12/... (PDF)


I would love to see an article from the same author about nomad, and comparison between the two…


I have barely any kubernetes experience but what struck me is that it seemed like a lot of interfaces that need to be implemented for different environments. So you need to know a lot about those specific implementations.


I have an axiom I mention on almost every occasion I can when people ask for more granularity on a specific request. "A system that is infinitely configurable is infinitely complex."


Has Kubernetes some code of java or angular? (end of joke)

I remember somebody tell me that the problem of YaCy (a free software alternative to google search) was made on java.


Complexity slows down future potential competition.


Kubernetes is so complex because vendors can make huge piles of money selling solutions to handle the complexity.


Declarative = goodFunctionName(Imperative)


Didn't K8 add an autopilot or similar feature to make it easier to grok?


Google's Kubernetes as a service (GKE) has a "simpler" autopilot mode:

https://cloud.google.com/kubernetes-engine/docs/concepts/aut...


As a summary are you trying to say, its hard to debug problems on Kubernetes if you have deployed a controller/operator that is not well written. If yes, does that mean Kubernetes is complex (i agree k8s is complex).


off topic but buttondown.email looks very good for doing newsletter


Agreed. In fact, I signed up last week mainly because it is one of the few tools that makes it easy to add footnotes to your writing (although as a reader, the anchor links for footnotes don't work reliably in my email client).

Until I discovered that Substack also supports footnotes :).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: