I'll stand by my assertion that for 99% of users (maybe even 99.99%), Kubernetes offers entirely the wrong abstraction. They don't want to run a container, they want to run an application (Node, Go, Ruby, Python, Java, whatever). The prevailing mythology is you should "containerize" everything and give it to a container orchestrator to run, but why? They had one problem, "Run an app". Now they have two, "Run a container that runs an app" and "maintain a container". Just give the app to a PAAS, and go home early.
Most startups - most large companies - would be far better served with a real PAAS, rather than container orchestration. My encounters with container orchestrators is that ops teams spent inordinate amounts of time trying to bend them into a PAAS, rather than just starting with one. This is why I don't understand why this article lumps, e.g. Cloud Foundry in with K8S - they solve entirely different problems. My advice to almost every startup I speak to is "Just use Heroku; solve your business problems first".
The article also mentions it enables "new set of distributed primitives and runtime for creating distributed systems that spread across multiple processes and nodes". I'll throw out my other assertion, which I always though was axiomatic - you want your system to be the least distributed you can make it at all times. Distributed systems are harder to reason about, harder to write, and harder to maintain. They fail in strange ways, and are so hard to get right, I'd bet I can find a hidden problem in yours within an hour of starting code review. Most teams running a non-trivial distributed system are coasting on luck rather than skill. This is not a reflection on them - just an inherent problem with building distributed logic.
Computers are fast, and you are not Google. I've helped run multiple thousand TPS using Cloudfoundry, driving one of Europe's biggest retailers using just a few services. I'm now helping a startup unpick it's 18 "service" containerised system back to something that can actually be maintained.
TLDR; containers as production app deployment artefacts have, in the medium and long term, caused more problems than they've solved for almost every case I've seen.
Containerization helps with one thing: end-to-end dependency hell management. You get the same executable artifact in prod and on every dev machine. You get to share arcane tricks required to bootstrap library X. You get to share the complete recipe of building your OS image. Hopefully, you pin versions so your build is not subject to the whims of upstream.
Kubernetes helps with one thing: taking your container and running it on a fleet of machines.
Building 18 services is an architectural choice made by the team. It has nothing to do with containerization or Kubernetes. For a single team, a monolith just works most of the time. You may consider multiple services if you have multiple [large] teams, think Search vs. Maps. Even then, consider the trade-offs carefully.
I deploy code with all of the dlls in separate folders. The executables/services don't share any dlls. I kept asking the "consultants" who are trying to push us to Docker, what is the business value over raw executables + Nomad.
The build server creates one zip file that is stored as an artifact that gets decompressed and released in each environment - in a separate folder.
> what is the business value over raw executables + Nomad.
It's not a given than any of the major business value generators are relevant to your shop, your domain, and your business demands. KISS is always good advice.
Low hanging fruit: Nomad (backed by HashiCorp), is a direct competitor to kubernetes (backed by google). One of those solutions is available turn-key on every major cloud provider and also the premiere Enterprise VM management solution. The other is called Nomad ;)
Raw executables pack up very nicely into containers, so if you're able to exist happily with just apps then just apps in containers won't change much (and therefore look like extra work)... For numerous domains raw executables are just a percentage of the deployment. Be it third-party apps/drivers that need to be installed, registry fixes, or whatever the Ops demands for server maintenance are a non-starter. And then things like load balancing and dynamic scaling pop up...
More importantly, for what I do, the binary validation of an immutable server in multiple zones is critical to ensuring security. Nothing can be changed, nothing shall change, and every point of customization will be scripted, or else it can't get near our data.
Cross platform and legacy scenarios are major players. More pressing, though, are the application level primatives that k8s provides in a cross-platform cross-cloud manner (which can also be federated...), so that your scaling story is adequately handled and your local apps become much more robust and cloud-native.
Bottom line: it's not a given that k8s will improve your life, here and now, apps + Nomad is viable. For the broader eco-system though the "other stuff" in k8s, and the rigidity/stability of dependency graphs in containers, are clear value drivers and highly meaningful.
Yeah KISS was very important when I first started working at my current company. I was hired to setup a modern development shop with three database developers who were just learning C#, no source control, no CI/CD, basically no development "process", they ran a lot of things manually.
I was going to be introducing a lot of changes.
Every decision I made was based on keeping things as simple as possible to keep them from getting frustrated. If that weren't the case, I would have gone straight to Docker. Knowing that I might need that flexibility later but didn't want to commit right now, I chose Nomad because I knew it could both handle phase 1 and allow us to move to Docker once appropriate.
But now, that we are in AWS, there is a big push to get to the next level of cloud maturity - not just moving VMs to the cloud, but how to take advantage of a "cloud first" approach and actually take advantage of some of the features that AWS offers.
So in that vein, there is a need for Docker to go "serverless". Lambda is not an option - we have long running processes.
Even when we do go to Docker, we will probably make a transistion from Nomad straight to Amazon's Fargate.
I see a path where we move from .Net 4.6 to .Net Core and Docker with Nomad to Fargate.
The only issue with Fargate for us now is the added complexity that Fargate only supports Linux containers. I don't know how much of a lift that would be. Theoretically it shouldn't be much with a pure .Net Core solution.
You are remarkably well positioned to take advantage of any solution from what you've told.
My group is skipping Kubernetes to go straight to Fargate and we are... not so we'll positioned as you happen to be.
Much to my chagrin, as a newbie to AWS who has loads of homegrown experience with Kubernetes and its predecessors (Fleet, etcd) I am wholly reliant on the AWS solutions engineers we have in-house to help me navigate this thing via CloudFormation and friends, it's too much for one person to figure out in 20 hours during a pilot/assessment study.
I am an application developer who learned Kubernetes in his free time over the past 3 years because it was free. There are thousands of us, with computers in our basements, learning these systems on our own, with no institutional support. Sure, I needed lots of help, but I didn't have to spend money on cloud instances just to learn, or be sure to remember to terminate them when the experiment was over.
By contrast, AWS has only just made Amazon Linux 2 available to run on your own machines less than two months ago. There is still no way to set up ECS or Fargate on our own metal, and probably never will be, because Amazon does not see a reason for it.
Vendor lock-in is real and it has casualties! There are real negative effects that you don't see. If you say "I would not hire someone like you because you have specific skills I won't take advantage of," you have to ask yourself is that because of something that I've done or is it something that Amazon is doing.
If you're juggling multiple incompatible versions of application servers across multiple platforms across multiple datacenters and multiple cloud providers with multiple dev teams... you're gonna see some real value to those kinda of "solutions". It's not random that this tech is coming from cloud leaders and Entprise shops, and they don't address problems common to development, they address problems common to cloud apps and cloud heavy Enterprise shops.
I think Assembler looks like ass and it doesn't add much to how I want to program... It's still frequently used, though, because it solves problems other than the ones I have.
We have a bunch of apps that run based on some type of external event - a time interval (Nomad supports cron like scheduling across an app pool), a file coming in, an external event, etc.
We submit a job via the api and it runs the job on whatever server has available resources. We specify the mininimum amount of RAM and CPU needed to run a job. If too many jobs are queued on a regular basis, we can either add more RAM or CPU to an existing instance or add another instance and install a Nomad agent.
Yes I know k8s can do the same thing but we don't have to use Docker, we can though.
That is a solid solution. As long as everyone in the ecosystem is on JVM, more power to you. If, for example, one needs to interface with some cutting edge DL modules written in Python, one needs something else. The transitive closure over "something else" is called "containerization".
PS. Maybe "EAR" also supports Python. But then I'd argue "EAR" is a "container".
With python you then use wheels + virtualenv, for Ruby you can use bundler. Each language has this issue solved.
Using containers is essentially:
- uh, I have problem with these dependencies, dealing with RPMs is such a nightmare, I need to generate OS specific packages and there might be conflicts with existing packages that are used by the system...
- oh I know, let just bundle our app with the entire OS!
If they are minimalistic and hold the app then this makes sense and then containers are essentially an unified packaging format that is accepted on "serverless" public cloud. This provides a value because you can then easily run your application on multiple providers where it is cheaper at given time.
I'm thinking that in the future your IDE could just compile your project into a single file that you then upload it anywhere and just run.
But the docker was promoted as something different with the union fs, nating etc. That works fine for development but it's a bit problematic operationalizing it.
This is what I noticed as well. Most of things containers are advocated for are already solved.
The selling point of containers is to solve certain issues (seems like package management, removing dependency on the OS etc are the most popular).
To me it looks like instead fixing the actual issues, we are taking a blanket covering all of that crap and building our beautiful solution on top of that.
We have a beautiful world with unicorns on top of a dumpster fire of mixing system dependencies with our application dependencies.
Also yesterday found something amusing a coworker was complaining that putting a small app into a base container resulted with image that was almost 1GB in size, compared to ~50MB when using a minimalistic one. When asked why not just use the minimalistic one I learned that it was mandated to use the standard image for everything.
To me this is absurd since by doing that aren't we essentially making a full circle?
Absolutely. I think the actual issue are the OSes directory structure(FHS for example) that impedes having libs/packages isolated and coexisting with their different versions.
Containers add a heavy abstraction on top of that. For me the simpler/better dependency management solution is nix.
No, and it makes perfect sense because
1. Container size is a minor issue, docker images are layered so you only fetch the diff of what's on top anyway
2. Standardization simplifies knowledge sharing, i.e. someone else can help you
As usual, "it depends." JEE isn't magic and app servers have their own issues. I think you're better off packaging Java web apps as self contained fat jars (see Dropwizard, Spring Bootstrap)
For the dependency hell management part, nix is a solution that operates at a lower cost level of abstraction, it doesnt emulate the whole OS(avoiding overhead) and keeps dependencies isolated at the filesystem level(under /nix).
I think that for reproducible development environments is a much simpler solution.
I tend to agree with you and it's one of the biggest reasons that I'm a fan of Elixir.
Here's the path that leads to K8s too early.
1. We think we need microservices
2. Look how much it will cost of we run ALL OF THESE microservices on Heroku
3. We should run it ourselves, let's use K8s
One of the big "Elixir" perks is that it bypasses this conversation and lets you run a collection of small applications under a single monolith within the same runtime...efficiently. So you can built smaller services...like a monolith...with separate dependency trees...without needing to run a cluster of multiple nodes...and still just deploy to Heroku (or Gigalixir).
Removes a lot of over-architectural hand-wringing so you can focus on getting your business problem out the door but will still allow you to separate things early enough that you don't have to worry about long term code-entanglement. And when you "need" to scale, clustering is already built in without needing to create API frontends for each application.
It solves a combination of so many short term and long term issues at the same time.
100% agreed. A lot of the cloud computing products are simply re-implementations of what was created in the Erlang/BEAM platform, but more mainstream languages. IMO it's cheaper to invest in learning Erlang or Elixir than investing in AWS/K8s/etc.
Elixir and Erlang are basically a DSL for building distributed systems. It doesn't remove all of the complications of that task, but gives you excellent, battle tested, and non-proprietary tools to solve them.
This is also true of Erlang, for those not aware that Elixir runs on the Erlang Virtual Machine (BEAM).
You do get a lot of cool things with clustered nodes though (Node monitors are terrific) and tools like Observer and Wobserver have facilities for taking advantage of your network topology to give you more information.
Not going to lie, Java app servers basically had me predisposed to see the appeal of Elixir. When I was spending a lot of time with Ruby I got really into Torquebox (Ruby-ized JBoss) specifically for the clustering aspects, ability to spread workers and clustered cache with Infinispan.
Elixir has a lot in common, but it takes it to another level. You can call functions from those other applications on the server with nothing more than a Module.function(arguments). You can call a function on another node in the cluster by just sending the node + module, function and arguments.
Because of immutability and message passing, this just works everywhere. With Java, a similar implementation would have to guard against memory references and mutex locks that wouldn't behave the same way on different nodes.
Interesting, I didn't know that about Elixir. Do you ever have to break them up into smaller Elixir apps or can you stick with that pseudo-monolith for good?
Each of the three digital production agencies I've worked with has the same problem: jobs come and go all the time, often have varied tech stacks (took over a project from a different company, resurrected 5yr old rotting dinosaur, one team prefers Node, another Django, etc), each project requires a dev/staging/live environment (and sometimes more than that, e.g. separate staging for code / content changes), and so on... In one shop we went thru 500 git repos in 4 years.
One day I spun up a k8s cluster on GKE and just started putting all projects there. This cluster enabled huge cost savings (running a fleet of 3 VM's instead of ~50), allowed cheap per-feature dev/staging environments, forced developers to consider horizontal scaling BEFORE we needed to scale (read: when we missed our only shot), and overall reduced ops workload tenfold. It wasn't without a few challenges of its own, but I would never go back.
I think you've hit on the major issue with the "anti-hype" around kubernetes and related products: they're not something you need, per se, to develop an app. They are something you need to manage multiple parallel development processes.
For devs stuck in a silo it's a little like putting margarine on butter. For DevOps looking at hundreds of little silos it's the foundation of operational sanity.
To sort of echo what you're saying, most of these articles seem to suggest that containers solve a technical problem. More often then not I've seen them as a solution to an organizational problem.
Kubernetes has helped to make our app less distributed.
Parts of the system were distributed not for capacity, but for HA reasons. So where before we had two instances of beanstalkd with their own storage and clients had logic to talk to both, we now have a single instance of beanstalkd backed by distributed storage and a Kubernetes service that points to it.
And I think we get more benefit deploying dependencies than we do our own apps. If one of them is low volume and needs mysql, just `helm install mariadb`. No complicated HA setup, no worries about backups, we already know how to backup volumes.
I'll stand by my assertion that for 99% of users (maybe even 99.99%), Kubernetes offers entirely the wrong abstraction. They don't want to run a container, they want to run an application (Node, Go, Ruby, Python, Java, whatever)
I agree completely and your comment gives me the perfect opportunity to praise how much I love the flexibility of Hashicorp's Consul+Nomad.
Nomad let's you run almost anything - Docker containers, executables (the raw_exec driver), jar files, etc.
Dead simple to setup - one self contained < 20Mb executable that can be used in either client, server, or dev mode (client + server), configuration is basically automatic as either a single server or cluster of you are using Consul.
The stock UI is weak but The third party HashiUI is great.
I played with Vault and it wasn't quite as simple as Consul and Nomad. My major issue was trying to figure out how I would "unlock" the Vault automatically in case of a system restart.
I punted for now and just stored sensitive values directly in Consul encrypted.
since you mentioned Cloudfoundry...
I think it's a thousand times easier to get up and running with k8s, than with Cloudfoundry on Bare-Metal (no Cloud).
It's also a thousand times easier to maintain. (Thanks CoreOS)
Basically if you want a managed simple no maintance, no cost bare-metal K8S installation you basically just use tectonic/kubeadm and you get something which is self-containing, or close to self-containing.
and the only things you need to get it done is actually way easier than reading through cf docs (I'm pretty sure bare-metal isn't even supported that easily).
Amazon's EKS is still in preview. I wouldn't expect it to be generally available (that is, stable) for several months at least. I've also heard reports that getting into the preview is really difficult at the moment.
> Alpha This is an experimental release as part of the Amazon EKS Preview. Interfaces and functionality may change. Expect bugs (and please help us squash them). DO NOT use for production workloads.
I agree that most startups should work at a Heroku level of abstraction.
You mention 18 microsevices, I think that small teams are better off with a monolith.
I would see Kubernetes as a new machine level. We're moving from bare metal, to VMs, to container schedulers.
Heroku was one of the first companies that ran a container scheduler internally. So I think we agree that is the future.
But a small team probably doesn't need to work at that abstraction level.
At GitLab we think most teams will want to work at a higher abstraction layer. Just push your code and have it deployed on Kubernetes. Without having to write a dockerfile or helm chart yourself.
The funny thing is I have 3 courses on Docker and I'm a Docker Captain but I pretty much agree with what you wrote about container orchestration.
A lot of people forget that you can just put your application up on 1 server and serve hundreds of thousands or millions of requests a month without breaking a sweat.
For that type of use case (1 box deploys), Docker is still amazingly useful so I would 100% containerize your apps for that purpose, but I agree, Kubernetes and container orchestration in general is overkill for so many projects.
I agree with this for the most part, but wanted to point out that docker's first big success was as a dev tool. Solving the "it works on my machine" problem, or the "oh you forgot to install v13.1.2 of X and then uninstall it and then install v12.4 because that's the only way it works for some reason" problem. So, avoiding k8s in order to avoid docker seems odd.
That said, a good number of projects don't require anything special about the environment other than a runtime for the app's language, where the remaining dependencies can be explicitly included in the build. For those, I agree, jumping on docker/k8s right away is overkill.
An additional benefit of working with something like Heroku initially, is that it will help guide your devs to sticking with more tried and trusted stacks rather than everyone pulling in their own pet project into the business's critical path.
I agree with pretty much everything you said and it's very heartening to not be the token Cloud Foundry person in the comments.
As a nitpick:
> This is why I don't understand why this article lumps, e.g. Cloud Foundry in with K8S - they solve entirely different problems.
In fairness, the reference was to Cloud Foundry Diego, which is the most analogical component to Kubernetes. And they are of comparable vintage. Diego never found any independent traction outside of CFAR.
> I've helped run multiple thousand TPS using Cloudfoundry, driving one of Europe's biggest retailers using just a few services.
We have customers doing millions of payments per hour, billions of events per day. Running tens of thousands of apps, thousands of services, with thousands of developers, deploying thousands of times per week.
CFAR doesn't get much press out of enterprise-land, but it works really well.
Disclosure: I work for Pivotal. We have commercial distributions of both Cloud Foundry (PAS) and Kubernetes (PKS).
There's even the higher-level desire, what users really want isn't a place to run their app, but to the function the app provides. e.g. in a more micro-services-like environment with a service that simply looks things up in a database, what they really want is just query access to the data. But now they have the data in some db, the db in some container, some API written using the latest API style, some software the provide the API (also in a container), some container orchestration to coordinate everything, load balancers, caches and so on.
So there's all these layers of stuff that sit between the user and the data just to make the act of asking WHERE DATATHING="STUFF" convenient.
The root of this is really people making distributed systems when they don't need to. This microservices trend really is a massive waste of resources for most smaller teams that get caught up in it.
you should check out docker swarm. the UX of swarm is brilliant - use a 10 line compose.yml file to get a stack up and running. Let's you specify tons of stuff if you want to.
The batteries included nature of swarm is a huge help as well - with k8s, you have to muck around overlay network, ingress, etc.
However, I think the writing is clear on the wall - k8s has won. Probably even to Docker Inc, given the kubernetes integration they are building into swarm now.
I think Docker Swarm can exist as an opinionated distro of k8s. I wouldnt mind paying it money for that.
This is why I primarily see Kubernetes as a set of low-level primitives for a PaaS to build upon.
We don't use Kubernetes at my shop, we've begun to use OpenShift though which layers PaaS tooling on top of it and the developers on my team love it. They create a deployment, point it at the git repository containing their code, set their configuration and the app is live - the underlying primitives are available if we need them still, but that's for me to worry about as the DevOps guy and not the developers.
Kubernetes team foten says than one if it's goal is to be a "low level" project which should be the base additional tool/services/... are using under the hood.
Helm (https://helm.sh/) allows you to define an app as a collection of K8S components then to manage (=deploy, update, ...) your app as a standalone component
Clarification: 18 containerised services can absolutely be the right choice. It’s just my experience says the trade off between the costs of maintaining that versus a smaller PAASed system rarely come out in favour of it.
I think it's overrated though - not open source, doesn't have an ecosystem.. the dev experience is sub par - services take too long to come up even with the one node cluster on a beefy laptop. Plus you cannot run the service outside of SF as an exe now.
I migrated a decent sized solution still in dev from SF to .netcore and SF - 10/10 would do it again. Not to mention that you also end up saving 50% $$$ on vm costs with linux vms (not considering SF on Linux)
Thanks for sharing your experience, I did not understand it in full.
Do you recommend using SF or not? you mention that you would do it again - was that only about moving from Windows .NET to .NET Core on Linux (ie. NET Core rocks?) and the rest about SF is crap or would you recommend SF in general for any future work (instead of for example Akka.NET for service coordination in a cluster) ?
> Kubernetes takes you to serverless, where you don't care about the hardware.
Serverless isn't a good name - but it doesn't stand for "don't care about the hardware". Devs are already not caring about hardware anymore since VMs.
What serverless removes is the abstraction level of a server/vm/container.
A simple example is scaling your stateless components. In a serverless FaaS, functions are scaled for you. You don't have to do anything to handle a peak in web traffic. You don't have to do anything to handle a peak of msgs in your MQ.
In k8s, you still have to go and fumble around with CPU/memory limits and better get it right. k8s also doesn't scale your containers based on the msgs in your MQ out of the box. You have to build and run that service yourself (or ask GCP to whitelist you should you be running their MQ https://cloud.google.com/compute/docs/autoscaler/scaling-que... ).
AWS Lambda had that since 2015...
Well, yes — the problem is that the JVM was too big and too platform-independent. We don't want JVM everywhere; really, we want POSIX-everywhere. The JVM's also this weird level of statically-typed hyper-extensibility — it's Greenspun's Tenth Law in action, and the result is typically in really terrible taste. The end result is a JVM which is really, really impressive, but appallingly ugly.
And yet finding people who can reliably install K8s from scratch, who understand what's going on under the hood, remains remarkably close to 0.
How many people can, within a few hours, tell you how Kubernetes runs DNS, and how it routes packets between containers by default? How do you run an integrated DNS which uses, say, my_service.service.my_namespace instead of my_service.my_namespace?
I've found that most installs of k8s have been made using defaults, using tooling that Google has provided. We hired one such administrator, but when asked anything outside of how to run kubectl, they just shrugged and said "it never came up".
The codebase is vast, complicated, and there are few experts who live outside of Google. And it's getting more vast, more complicated on a quarterly basis.
It bothers me how far operations has gone from "providing reliable systems on which we run software" to "offload work onto the developer at any cost".
</rant>
I realize that a lot of this is because of scarcity. The good devops folks (i.e. those who are both competent generalist sysadmins and competent generalist programmers) are few and expensive. That makes pre-packaged "full stack" solutions like GAE, Kubernetes, and Fargate very appealing to leadership.
"You don't need an operations department to act as a huge drain on your revenue, just re-use your developers" holds a lot of appeal for those high up in the food chain. It's even initially appealing to developers! But in the end, it makes as much sense as re-using your developers to do customer service.
This isn't a unique problem to Kubernetes, it's an issue in general within the industry. There are very few competent operations people, and you'd think they'd be in high demand but in actuality operations groups are heavily mistreated compared to their software development peers.
I've abandoned operations as a career path and have now gone into product management, but I was an operations person for more than 12 years. In that time frame I learned very quickly that upper management considered the operations teams to be "system janitors" and that developers considered operations engineers to be their inferiors. The "move fast and break things" attitude is great sometimes, except it gives license to shortsightedness.
The reality is that operations is not a specialized skillset, in fact it's a generalized skillset made up of being a specialist in multiple facets of complex systems. There's simply not that many people out there who have that level of knowledge and understanding, and the industry has both perpetuated this problem by treating operations people terribly and worked around this problem by focusing on building stacks that require minimal operational overhead. Any good operations person could have been a software developer, but wanted to get beneath the abstraction layers. Instead, we get treated worse, paid less, and have less job demand despite being more competent. Most of the best ops people I've worked with ended up either leaving ops entirely, like myself, or becoming software developers to get a pay bump.
Luckily I got to work for a few decent companies along the way in my career that treated me well and I made a lot of life-long friendships with very smart people as well. So don't read the above as some deep complaint. It's just an observation of the reality that the incentives aren't there for smart and talented people to invest their energy in operations. I advise most of the young people passed my way to become software developers. They'll have more autonomy, get paid more, have higher job demand, and get treated better in general.
How many people understand how the Linux kernel works from top to bottom? There are more than a handful of cloud providers (AWS, Azure, Microsoft, Alibaba etc) that offer a completely managed Kubernetes experience, for most folks, this will be good enough and you don't need to understand everything in order to take advantage of Kubernetes, similar how you don't need to understand how the kernel (think POSIX) works: https://www.cncf.io/certification/software-conformance/
You're right. You don't have to know anything about Linux to run software on it... until you do. Until you have to understand and modify swap. Until you have to understand and change the various schedulers (for both processes and disk operations). Until you have to troubleshoot networking problems. Until you have to change a kernel setting to avoid a 0-day exploit. Until you have to encrypt all communication because a client said so.
Being on AWS or Azure or Microsoft doesn't shield you from these needs.
The job isn't typically to be an expert from day one, the job is to learn and develop as things come up. Field experience is how you, over time, build those skills.
If you're going distributed the first place, doesn't that often imply that big team / big codebase?
In those cases, I'd argue things will likely come up quite quickly. Kubernetes is key component of a platform, but not a PaaS, e.g. you are required to understand to low level stuff, even if it's managed by a public cloud provider.
Cue out of memory apps, recovering GB+ JVM thread dumps out of a transient container, lack of troubleshooting tools, the kinda of stuff falcolas said above, plus high pressure to resolve because it's highly visible production app and you're got a recipe for sadness.
Even at google AFAIK, K8S ran in the context of BORG/ BORGMON and a host of other internal tools.
Most teams shouldn't install Kubernetes from scratch, but use a PaaS distribution like OpenShift, preferably with commercial support.
You need much more than Kubernetes: a secure (!) container registry, a container build system, deployment, log management, metrics...
It's fun to set up k8s from scratch, but there's little business value in reinventing the wheel all over again. Just like you wouldn't build your own Linux distro, you shouldn't do it with Kubernetes.
I've seen startups waste SO much time reinventing basic infrastructure instead of focusing on their product.
Honestly, I'm not even talking about startups here - it's established companies who have grown too big for the PaaS offerings, or who have specialized needs that PaaS providers don't offer. Such as an HTTPS enabled Redis cluster in AWS. Just recently started to become available, after years of our insistence for it.
Not to mention, the costs for PaaS providers don't scale up well (if they can even handle the load). They're great for startups on VC, but deadly for companies who want positive cash flow.
E.g. on AWS you might have all of a node's pod IPs on a bridge interface, then you talk to pods on other nodes thanks to VPC route table entries that the AWS cloud provider manages. NAT happens only when talking to the outside world or for traffic to Amazon DNS servers, which don't like source IP addresses other than those from the subnet they live in.
My memory is a touch fuzzy, but to route traffic out of a container in AWS, you have to either NAT thorough the instances network adapter, or attach an ENI to the container. However, you only get one ENI per vCPU in a VM (at least until Amazon finishes its custom NICs). What I'm really fuzzy on is whether the instance itself consumes one of those allocated ENIs.
That is, if you're running off a m4.2xLarge instance, you get a maximum of 8 ENIs - 8 containers if you want to use only VPC routing. For some services, this may be OK, but for many others (most?), it's far too few.
What's the destination? If it's the outside world, yes, you need NAT for state tracking and address rewriting, since the rest of the AWS infrastructure knows nothing about the pod CIDR (I guess you could set up a subnet for it and run a GW there).
For pod to pod, if you're OK with the limitations of 50 routes per VPC route table (you can open a ticket to bump that to 100, at the cost of some unspecified performance penalty), then you don't need NAT.
Otherwise, you can use something like Lyft's plugin, which does roughly what you describe. On a m4.2xlarge you only get 4 ENIs, but each of them can have 15 IPv4 and 15 IPv6 addresses, which the plugin manages. They assign the default ENI to the control plane (Kubelet and DaemonSets), so you should get 45 pods.
In my experience NAT is almost always involved in a Kubernetes setup (for on-prem).
The container network is generally not routable to the wider corporate WAN (it'll use RFC1918 addresses by default). You typically get one set of addresses for the main container network, a different set of addresses for the service IPs and then an routable set on the ingress.
What you describe is not NAT, the containers network segment is a separate network segment which is not accessible from outside the cluster, not directly and not through address translation. The ingress and service addresses are externally reachable addresses that expose services. NAT is not required for the setup.
That's inbound traffic coming from the outside world. You need NAT because the load balancer only knows about nodes, not individual pods (perhaps you can pull it off with e.g. ELBv2, but definitely not with v1).
There's more iptables magic if you talk to a service's virtual cluster IP, because of the load balancing, but from pod to pod, which is what I thought you were referring to, NAT is usually not involved.
Are you referring to the service cluster IPs? Those are great for short lived or low volume connections. If you want to balance load over long lived connections or have high volume, you really want to know the addresses of all your backends, whether that's done in your code or in a sidecar like Istio's.
A lot of it is due to an effort to make it work in as many environments with as few external dependencies (and environment control) as possible. The "simplest solution which could possibly work".
Personally, I'd rather just bring on ipv6. But, in my case, we don't have enough people who understand ipv6 (and it's barely supported in AWS) to use it ourselves.
Because that's the easiest thing to do when you don't know anything about networking. Ironically this also makes everything else much more complex and failure prone.
This is the answer surprisingly missing in the industry overall. It amazes me that I work with highly educated folks who cannot grasp some of the fundamental issues with k8s and the container ecosystem.
Because NATting encapsulates while routing doesn't? And encapsulation is the whole idea behind containers. Until everything is ready for IPv6 (lol, yeah right), NATting seems the only way to me.
The reason why there are no people like that is that the vast majority of the K8s is driven by the teams that try to masquerade their lack of understanding of systems ( cloud or non-cloud ).
Building containers that contain entire operating system gives no wins. In fact it add additional layer that will create issues, will break in a different way, etc.
The current love of the modern orchestration system by the management is similar to mid-nineties love of the "compute management packages" running on SGI that showed one "flying" though from one server to the other.
> I've found that most installs of k8s have been made using defaults, using tooling that Google has provided. We hired one such administrator, but when asked anything outside of how to run kubectl, they just shrugged and said "it never came up".
What is up with this? The last time I tried to learn kubernetes I couldn't find any information about how to set it up. Just some set up tools from google. I guess it is still like this? Is there really no one running kubernetes infrastructure with config management or anything?
The post you're replying to is absolute hyperbole. If you're hiring k8s guys who don't know etcd and the backend of k8s (we're not going to understand every single gear, I constantly forget how k8s garbage collects, I never have to interact with it) then you're not hiring Seniors who have worked on k8s for several years. That's no different from hiring a linux admin who only knows how to fix Cpanel. You made a bad hire or your budget wasn't high enough to attract experienced talent.
I'm one of the most frequent commenters on #kubernetes-users so I'm very aware of the questions and issues that come in from new k8s users and I'd say an absolutely massive majority of the users are running in baremetal via kubeadm/kops/etc. Typically on AWS (NOT EKS). The #gke channel is literally 1/10th the size of the #kubernetes-users channel.
If you have questions about k8s post in #kubernetes-users. The community is extremely helpful.
A LOT of people deploy K8s clusters via Terraform/Ansible, as well.
Why are professionals who know k8s back and forth less common? 2 years ago k8s was 1.1 and we had no idea where the market was going and if it would take off like it did. It takes time to build up the community and expertise. There are a LOT of very experienced k8s users nowadays whereas there were not 2 years ago. Finding someone with 2+ years of k8s experience who isn't a Xoogler is fairly rare right now because 2 years ago it wasn't the market behemoth that it is right now. I don't work with Google but I just happened to get involved with k8s almost 3 years ago. We are out there.
If you can't find an answer ping me @mikej and I'll try to get you going in the right direction.
You call it hyperbole, yet you just also verified that the scarcity is a real problem.
> If you're hiring k8s guys who don't know etcd and the backend of k8s [...] then you're not hiring Seniors who have worked on k8s for several years.
> Finding someone with 2+ years of k8s experience who isn't a Xoogler is fairly rare right now because 2 years ago it wasn't the market behemoth that it is right now.
Indeed, there's not enough people who know how to run it for as broadly as it's spread; for how much it's hyped.
If there's a thousand people out there who have that level of experience, I'd be surprised. And in an industry running hundreds of thousands of clusters (or more!), that's just too few people.
Yeah, I'm not buying that there are thousands of k8s "Seniors" out there, and they just aren't the people that are being hired. I've been in operations for 20 years, and I think I would qualify for what you would call "good devops" in that I am a generalist with a wide breadth of experience in systems as well as programming. Kubernetes is a beast. We run it from scratch, in production, and just keeping up with the changes since the project inception can be maddening.
I did much of the early research POCs for my company when the idea of containerization really took off, and my deployments would seem to not have a shelf life of more than a few days before I would have to conform to some new method they came up with. I was using Tectonic when it was first released and the documentation would change underneath me as I would try to set up the clusters. It's a LOT to keep up with.
I can understand and explain to someone every protocol or idea underlying Kubernetes, sure, because they build upon standards that we have all used before in operations. But to try to understand how it is all working together within Kubernetes, and then add in the complex interplay if you are like us and integrate with non-k8s systems that have comlex firewall and routing rules now to allow the intercommunication... add in Calico or Flannel...Docker under the sheets with all its warts...it's a lot to manage. You need people that are engaged with the k8s project at a level that would normally be reserved for Googlers working on it.
Don't get me wrong... I like Kubernetes for the most part. I do agree that if you are planning to run in-house, you are in for some challenges, and that you will need a very high caliber of operations team to deploy and maintain it.
There are almost 24k people in the k8s.slack.io #kubernetes-users channel and it's only a small portion of the actual community. For instance I rarely see people from other huge K8s consumers, like Lyft, Zalando, Walmart, etc in the channel (high probability they just don't mention where they are or I never noticed).
I won't deny actual k8s experts are low in abundance right now. It's a complex platform in its near infancy. There are people brand new to k8s embarking on the journey to learn it every day in slack. Give them 6mo+ or another year and you'll have a few near experts and a ton of just generally experienced admins.
I don't feel as though it's any different from when I was working on my CCIE. When I was working on that there were only 18k other people out there who had CCIEs. I very rarely met one or even someone with just that level of skill (I'm not in the Bay Area/NYC). I had to ask questions through newsgroups and IRC and in IRC there were maybe 4-10 people at that level out of thousands. You could say there are thousands of networks out there that need a CCIE to run them but that isn't ever what happened; you'd have a CCIE basically lead from the top and their skill/experience would trickle down to lesser experienced netengs or they'd be brought in as consultants when necessary. I've worked with very few CCIEs. I see k8s going the same way. Every State will likely have a handful of experts on the subject while there will remain a ton of CCNA/CCNP level k8s admins and you just need to determine how complex your k8s infrastructure is and what level you'll need to hire to effectively manage it.
Brendan Burns goal is to democratize ephemeral infrastructure so that anyone who can code can manage it. That's another topic entirely but the community is starting to output enough general guidance in the form of blogs, books, slack, et al. that hopping into the ecosystem now is basically a breeze compared to what it was when I got involved in the 1.x days.
Could just be that I'm a masochist and like learning painful things.
My biggest complaint about k8s right now is the lack of real-world production knowledge being distributed. A lot of people set up a cluster and leave it and never optimize it or make it actually production-ready. My goal is to significantly accelerate that through training, blogs, etc.
The first real CCIE certificate awarded was in 1993, #1025. 18k was something in like the year 2005 when I was in my early 20s.
Kubernetes was barely a blip on the map until 2017. There was Cisco routing and switching hardware LONG before there was even a CCIE certification.
This is a new ecosystem. Of course it takes time to develop experts.
A CCIE not touching a network for 10 years? That's ludicrous. I was a network engineer. I wish that was even remotely the case. I was constantly fighting firmware/bugs/etc. In fact I swore off Cisco and began working on my JNCIE at some point to stick with Juniper which of course had it's own issues.
I've run k8s 1.5 in production for 2 years on various clusters. That is almost a 2 year old release. I've had zero k8s specific problems. I recently migrated those clusters to 1.9 and apart from updating some API endpoints that changed over the major releases, and a lot of annotations that changed, it was very little actual work. It was mostly tedious "find & replace" work.
I'm not going to bullshit people. K8s has its bugs, quirks and is complex, but there seem to be a huge number of people who run away in fear on HN.
It can be understood. I didn't even know how to use Docker before I jumped into learning k8s.
As someone who was raised in Operations, but fully bought into the dev/ops kool-aid. I'd argue that most of the unhappiness I've felt in operations positions has been due to being the bottleneck in organizations with lots of development teams that are depending upon our services. It is this, more than any technical benefit that I think systems like Kubernetes provide. This doesn't really answer your not many people know how to run Kubernetes point, but I might argue it is when the cost of managing the infrastructure beneath lots of different application exceeds the cost of learning Kubernetes that one should make the switch. I think this is probably somewhere around 25+ development teams.
A consequence of the "Kubernetes Effect" is that while distributed systems are easy to build and use, a lot of developers lose sight of the fundamental problems which make distributed systems difficult.
For example, the sidecar in a sidecar pattern might fail while the application is running and the system can get stuck in weird states. The developer still needs to understand fundamentally how the system works.
Eschewing deeper knowledge just because it is easy to use is trap in this case. While the article compares Kubernetes to JVM, Kubernetes can fail in a lot more hard to debug ways than the JVM right now. I don't know if this semantic gap between distributed systems like Kubernetes and monolithic systems like JVM can ever be bridged.
> A consequence of the "Kubernetes Effect" is that while distributed systems are easy to build and use, a lot of developers lose sight of the fundamental problems which make distributed systems difficult.
I would extend this to cloud as well. The more prevalent cloud becomes, the more ignorant developers become. It's like: I have Mathematica license, who cares how to calculate function derivative?
More generally, society's achievement currently rely on a workforce that gets more and more specialized.
We are bound to fragment every sector into sub-niche where specialists in functions, general programming and infrastructure resources cooperate on their boundaries without being able to quite understand what the others are doing.
> ... distributed systems are easy to build and use ...
I would not say distributed systems are easy to build or use. I think Kubernetes makes distributed systems _easier_ but definitely not easy in general, or at scale. Just easier than doing it all by hand/manually.
Kubernetes by itself may be daunting for most teams.
But I'm not sure I understand the backlash. Once you've built your application and it's been packaged (containerized) and deployed why would anyone care how its run. Also running a container in production and orchestration seem to be conflated somewhat in this thread and the use cases are very different.
You can think of Kubernetes as an Automated SysAdmin . This is a bit reductive I know but it is useful to think of this way. You ask the sysadmin to run something for you and they tell you how to package it (tgz, war, zip etc) and they run it for you on hardware.
The level of engagement that a dev has with getting his app running on hardware is no different to that of dealing with a sysadmin and with the admin requesting that your app is packagedin a container.
Kubernetes out of the box will give you most of this functionality as long as you keep state outside of the cluster. There are also options on how to make the experience smoother. There also these tools to help too:
* Openshift
* Kubernetes + Rancher
* Mesos
If you need orchestration and scheduling. I am a little perplexed.
I led the expert certification you're referring to, so I'll show some restraint and not talk it up too much.
But I will mention that we're aware of some consulting organizations that are requiring that new employees take the exam after they are hired, as it gives both the engineer and their manager confidence in their understanding.
The exam has only been around for 5 months, but it's already gone through 3 versions and is based now on K8s 1.9. Also note that it's a proctored, online exam where you configure 7 clusters over the course of 3 hours. There's no multiple choice.
Registered nurses have been around for decades, so we have a long way to go to catch up their recognizability. But we do see the Certified Kubernetes Admninistrator as a core building block for the cloud native ecosystem.
I've been working with kubernetes since 2015 and running production workloads on GKE since 2016. Since you asked for opinions, mine is that the certs don't matter very much. CNCF plays them up, and they will probably have some impact in larger orgs as enterprises get on the train, but in the open source community from which most of the kubernetes momentum emanates there has never really been a ton of respect for formal certification programs, and this doesn't feel any different to me.
Just wanted to chime in to generally second this opinion, but with one exeception as it relates to hiring.
While I don't think having a cert would help you get hired necessarily, it would probably influence a decision to get an interview. What really matters is if you know how to do real-world operational tasks with the knowledge, which will show up if your technical interviewers know k8s. If you are the first person they are hiring at the company to begin their k8s project, then you might have a real advantage with a cert.
Personally, I've never been one to give undue respect to many of the certs on the basis of having them alone, but it can depend on where you interview. Some places love certs.
Yeah, but nurses have real professional licensing exams and maintenance requirements; in technology we don't, and instead every employer is trying to incorporate (often, cargo culting) the equivalent basic competency evaluation as well as any unique employer-specific requirements into their own hiring process, so it's at best cumbersome and redundant and at worst absurdly perverse.
My wife has been a nurse for 12 years and there's some truth to that, but "license" in their case goes way beyond kubernetes certification :). It's like if someone hands you an up-to-date pilot's license and log book you know they can fly proficiently. We don't have that in software, as far as I know, so that automatic respect for the document isn't there.
Yes, and you also don't have nurses having to train for different "flavors" of humans (I'm speaking in a general practice sort of way, of course there are speciality nurses). If you know how to draw blood, you don't have to learn a whole new system based on the type of hypodermic needle you use. In the Kubernetes world, you'd need to know how to use AWS, GKE, your own custom stack, etc, in addition to knowing how to draw blood on this particular version of this flavor of human.
I've worked with some people (present company excluded, of course ;) ) on the operations and development side that seem like they didn't pass much more of an interview to get their position... ;)
Most startups - most large companies - would be far better served with a real PAAS, rather than container orchestration. My encounters with container orchestrators is that ops teams spent inordinate amounts of time trying to bend them into a PAAS, rather than just starting with one. This is why I don't understand why this article lumps, e.g. Cloud Foundry in with K8S - they solve entirely different problems. My advice to almost every startup I speak to is "Just use Heroku; solve your business problems first".
The article also mentions it enables "new set of distributed primitives and runtime for creating distributed systems that spread across multiple processes and nodes". I'll throw out my other assertion, which I always though was axiomatic - you want your system to be the least distributed you can make it at all times. Distributed systems are harder to reason about, harder to write, and harder to maintain. They fail in strange ways, and are so hard to get right, I'd bet I can find a hidden problem in yours within an hour of starting code review. Most teams running a non-trivial distributed system are coasting on luck rather than skill. This is not a reflection on them - just an inherent problem with building distributed logic.
Computers are fast, and you are not Google. I've helped run multiple thousand TPS using Cloudfoundry, driving one of Europe's biggest retailers using just a few services. I'm now helping a startup unpick it's 18 "service" containerised system back to something that can actually be maintained.
TLDR; containers as production app deployment artefacts have, in the medium and long term, caused more problems than they've solved for almost every case I've seen.