Hacker News new | past | comments | ask | show | jobs | submit login
A Gentle introduction to Kubernetes with more than just the basics (github.com)
540 points by feross 34 days ago | hide | past | web | favorite | 147 comments

After having just spend most of the day yesterday trying to nurse a failing Kubernetes cluster back to health (taking down all of our production websites in the process), I’ve come to completely loathe it. It felt more like practicing medicine than engineering: try different actions to suppress symptoms, instead of figuring out a root cause to actually prevent this from happening again. While it is a pretty sweet system if it does run, I would strongly advice against anyone trying to manage their own cluster, as it is simply to complex to debug on your own, and there is preciously little information out there to help you

I wrote this whole thing below as a reply to someone stating that I should just stop complaining and figure it out, but that comment has since been deleted. Figured sharing my frustration might be cathartic anyways :)

If only I could!! That’s exactly the frustrating part: there seems to be no way of grokking what goes on under the hood, and there are so many different ways of setting up a cluster and very few have any information about them online whatsoever.

As a practical example, what happened yesterday was that all of a sudden my pods could no longer resolve DNS lookups (took a while to figure out that that was what was going on, no fun when all your sites are down and customers are on the phone). Logging into the nodes, we found out about half of them had iptables disabled (but still worked somehow?). You try to figure out what’s going on, but there’s about 12 containers running in tandem to enable networking in the first place (what’s Calico again? KubeDNS? CoreDNS? I set it up a year ago, can’t remember now...) and no avail in Googling, because your setup is unique and nobody else was harebrained enough to set up their own cluster and blog about it. Commence the random sequence of commands I’ll never remember until by some miracle things seems to fix themselves. Now it’s just waiting for this to happen again, and being not one step closer to fixing it

Well that sounds like a problem of "I run my own cloud". And not a problem of Kubernetes. You dont remember how you set it up, oh well.

If you use a managed kubernetes (not in aws since they suck, eks is not really managed). Like gke or aks, then you skip the whole "there is a problem in my own cloud of my own making".

btw, I also encountered DNS problems in kubernetes, on ACS, it took 5-10 minutes to resolve, and was caused by ACS not having services enabled to restart dns upon reboot, lol.

The whole promise of Kubernetes (and, coincidentally, containers) is that you are not locked to a platform/provider/OS.

Reading this comment made me realise that often new technology is adopted because it is optional and promises options. But those options quickly shrink away and suddenly you’re locked into it.

Not to invoke a controversial name. But this is what happened with systemd.

Yes! You are not locked to a platform/provider/OS. Your GKE cluster operates more or less like your EKS cluster operates more or less like your cluster in Azure, DigitalOcean, etc. Kubernetes is a deployment platform you target and complaining that the things two layers under the hood of that are different is a false analogy.

Moving from one Kubernetes Provider to another is not zero time. You need to learn some differences in the way GKE ingresses vs AWS ELBs work, etc. It is a substantially more tractable problem than the differences between Cloud Bigtable and DynamoDB, and that one is still a tractable problem.

The way to fight lock-in is is not, and has never been, "These two providers offer exactly the same service". It has been about avoiding "These two providers offer nothing that is analogous, and their documentation is directly written to encourage using practices that do not port". It has never been an all-or-nothing thing.

Would you be willing to tell what kind of business you are in, where you have few enough customers that they can reach you by phone, but still need such a large number of machines that you need kubernetes ?

There's the rub: we don't actually need Kubernetes at all! Just a case of resume-driven development by a predecessor

To me it sounds like the problem wasn't kubernetes, the problem was that you (your predecessor) rolled your own instead of going down the path of using something with a support number. Redhat obviously comes to mind first but there are countless options with enterprise support included.

Have you considered rebuilding/moving the containers onto something more "enterprisey"?

A 'support number' does not solve the technical issues with kubernetes, it just kicks the responsibility down the line so that someone else has to deal with it.

A "support number" gets you access to an expert. No offense to OP, but it doesn't sound like he's got a deep knowledge of the internals of Kubernetes. Which is fine, because it also sounds like that's not his main job, just something he's tasked with taking care of. That's literally why enterprise support exists. If we all had infinite time and brainpower to be experts in everything, we could just roll-our-own. But we don't. Which is why AWS exists, and IBM bought Redhat for billions of dollars.

Now that the myth of experts have been accurately described, let's say a few words about the reality.

If IBM wanted the experts, they would have hired or grown their own. What they wanted was, I guess, the contacts (actual and prospectives).

My experience with support contacts is that you often times have access only to someone who is not any more expert than you, and who care much less for your customers than you do. In several occasions it turned out the "expert" had been the one benefiting from the teaching from the in house "not supposed to be expert but knows more than the expert" guy (and yes also, and maybe especially, with "reputable" large companies like IBM or Oracle).

I can even remember of a particular instance when the expert had no access to his company internal documentation to get details about a specific error message we were hitting, and we had to find a pirate copy of some internal manual from some Russian website and hand it over to him.

It makes sense to have a service contract when you have really no knowledge at all on the domain, but as soon as it's related to your daily job then you will quickly realise that experts are mythical characters whom your contractor have no better access to than any other company, including your own.

>It makes sense to have a service contract when you have really no knowledge at all on the domain, but as soon as it's related to your daily job then you will quickly realise that experts are mythical characters whom your contractor have no better access to than any other company, including your own.

I would seriously question who you're doing business with. Anytime I have had a significant issue, the escalation path is to the guy who wrote the code. To imply that enterprise support is just a bunch of people who don't know more than the average guy off the street is ridiculous. Tier 1 support? Sure, but you don't stay there long if you're a clued user with a real issue.

That exemple was from a time I worked on a project for a big Telco, and support was (supposed to be) provided by a very large database company which database and file system we were using. I think the guy who wrote that code was very long gone.

Escalating to 'the guy who wrote the code' does not scale beyond a one digit customer count.

That's practically 99% of the time really. Even in enterprise scenarios Kubernetes is very, very rarely required.

I've yet to come across a single instance where such rapid scaling happens and stays consistently high.

Most of the time you know well in advance when your resources will be put to the test.

I’m not trying to defend kubernetes complexity or saying it should be used for deployment of all server-side software, but I’d push back on the oft-repeated idea that kubernetes is just solving for scale. It’s a model of deployment that happens to make scaling easy, but the model has many advantages other than scale.

Resume-driven development is unfortunate; but a valid strategy when employers who run everything on one server that hasn't been updated since 2003 start listing 5 years of Kubernetes experience as a requirement in their job posting.

There's the start-up kind - "We need to be able to scale! We will potentially need to process billions of users!" ...waiting...

I love Kubernetes and think it solves my problems very well. Although I think I have problems that generally require it due to scale and failover.

But I have also had a number of DNS problems that we still haven't resolved, and they sometimes go away on their own. Same for IP tables rules issues. This is of course on a hosted kubernetes cluster at a large supercomputing center. (I didn't set it up, I just have to fix it. Ugh.) At Google, it's been great and we've had no networking problems, but they almost certainly run their own overlay network driver.

The various networking solutions you can plug into kubernetes seem pretty spotty, and they are very hard to debug. I still haven't figured it out myself. But I am trying to not throw the baby away with the bathwater. I think the networking (and storage) parts will get better.

I feel for your troubles. Letting you know you can move it into EKS or Google cloud could probably save you a lot of headaches in the long run.

Definitely can sympathize with you on this, having spent plenty of time myself fighting some clusters that ended up in a broken state, and trying to get them going again.

I think that this pain is sometimes more severe in the context of automated provisioning tools out there and the trend towards immutable infrastructure - folks tend to not have the know-how to dig in and mutate that state if need be.

It's really important to have a story within teams, though, about either investing in the knowledge needed to make these fixes, or to have the tooling in place to quickly rebuild everything from scratch and cutover to a new, working production cluster in a minimal amount of time.

I'm just beginning my journey into the vanilla Kubernetes world.

As I build my knowledge I am also building Ansible playbooks and task files. After each iteration I shutdown my cluster. Do an automated rebuild and test. Delete the original cluster and start my next iteration.

I have an admin box with everything I need to persist between builds (Ansible, keys, configuration files, etc) and can deploy whatever size and quantity of workers (VM) needed.

It has been a good process so far. I haven't yet put things in an unrecoverable state, but if that happens I can rebuild the cluster to my most recent save and try again.

I don't see it taking a lot of resources to have a proving ground. I would definitely not feel comfortable going to production without the ability to reproduce the production clusters' exact state.

I anticipate exactly what you describe as a roll back mechanism. At all times I want to be able to automate the deployment of clusters to an exact known state.

I think building a cluster, walking away from it for a year, and then coming back to it for a break fix/update/new deployment is a huge gamble.

Hi I'm not sure if you saw my comment below, but this is 100% the usecase Sugarkube [1] was designed for. Depending on where you are in setting things up it might save you time to give it a try. There are some non-trivial tutorials [2] and sample projects you can use to kickstart your development. It does only currently work with EKS, Kops and Minikube though so wouldn't be suitable if you're using something else to create your K8s cluster.

[1] https://www.sugarkube.io/

[2] https://docs.sugarkube.io/getting-started/tutorials/local-we...

This is my thinking too. Build a new cluster and push it all over to a new cluster. If you feel like understanding the old (and can afford it), keep it around and try to figure it out.

Clusters are cattle, not pets.

Very much agree, but never managed to reach this point. One reason is that the amount of hardware needed for this is pretty prohibitive. Second is that configuring a new cluster (last time I did it) was so much work, and I never managed to automate the process, that there was just simply no way I could have created a new cluster in time to get our websites back up

I've heard estimates along the lines of two person years of work, and $500 000.

We need to do self hosted Kubernetes and after evaluating it (not deployed to production yet), we considered the training and costs of running, and came to the conclusion that our needs are met by Nomad [1].

It is also a cluster management tool, but much simpler and can be combined with other tools to make it just as powerful as Kubernetes.

[1] https://www.nomadproject.io

That looks very interesting. I like their solution to running stateful applications. They only schedule containers on nodes that have the requested volume. Of course the downside is that you need to manage the volumes manually which is perfectly reasonable because I only need 3 nodes for high availability.

Out of interest, what was wrong with it and how did you fix it?

In 4 years I've never came across a cluster I was unable to fix, nor has it really broken without someone taking an unadvisable action on it. This may simply be because I started early enough that I was forced to manually configure the components and thus understand the underlying system well enough.

Over time I have seen some interesting things though:

- Changing the overlay network on running servers probably the silliest thing I've done. This wasn't on production, but figuring out where all the files are and deleting them was something pretty undocumented.

- A few years back somebody ran a HA cluster without setting it as HA which resulted in occasional races where services keep changing IP addresses. I believe the ability to do this was patched out.

- An upgrade caused a doubling of all pods once. This was back when deployments were alpha/beta and they changed how they were references in the underlying system, causing deployments to forget their replicasets, etc.

Overall though, in 4 years I've spent very little time debugging clusters and more time debugging apps, which is what we want.

> nor has it really broken without someone taking an unadvisable action on it

You’re basically saying “the tool X is fine, you’re just inexperienced/undisciplined and using it wrong”. Which is fair critique if I was an intern, but I have a decade+ experience in development and operations and I look at kubernetes in disbelief - why should things be that complicated? I get it, everything is pluggable and configurable, but surely this must be balanced out by making it more approachable and convenient?

You can’t sneeze in kubernetes without it requiring you to generate some ssl certs to the point where it’s just cargo-culture without any consideration of purpose and security.

And what’s up with dozens and dozens of bloated yamls and golang files? The fresh 30-odd commits ”official” flink operator is 3 THOUSAND lines of Go and 5 THOUSAND lines of yamls. How is that reasonable? In which universe is that reasonable? all it does is a for-loop that overwrites a bunch of pods to keep their spec in sync with desired config. There’s like 1000:1 boilerplate ratio in kubernetes and it’s considered good somehow?

Sorry for the rant, I’m just angry that we’re six decades into software engineering and the newest hottest project I the newest hottest line of work behaves like everybody should be paid per line of code they produce.

Not sure I'd actually even responded to you, but that's not at all what I was saying.

You can have a decade tech experience and still not know another system well. We all forget the learning we did to get to where we are, but I'm sure all the old reliable tools were frustrating at one point too.

Personally, I don't find kubernetes that complex, but then I did write and setup a schedulers for an early IaaS provider, so maybe I'm just comfortable with the problem, or maybe it's simply because I've been using it for several years.

Flink is shit software. You're right, those things are ridiculous. They're an indication that something is wrong, and you pushed ahead anyway. Your problems are your own.

I wrote a long reply to someone else’s question below that should answer your question :)

It's interesting, because many of your problems there are relatable to the simpler deploy discussed by the parent. I'd be no wiser debugging your bespoke ansible script, and likely neither would you, if not for the fact you've written it.

Don't get me wrong, debugging overlay networking issues isn't something to love, but it's also not all that complex:-

- There's a worker daemon on every box that manages the local configuration, whether thats IPtables, IPVS, BPF or something else. There may be a seperate worker for service IP addresses than pod IP addresses.

- There's a controller that does the actual figuring out what things should be doing and lays out the rules for the workers. This might include network policy controller, but this might be in a seperate daemon.

This setup enables Service IPs, Pod IP addresses & Network Policy.

Obviously in ansible you can just write your own firewall rules, but as soon as you step away from running every app on every box, you'll either be relying on something as complex (but managed by someone else) like the cloud providers SDN, or you'll need to run your own system that does the same.

As much with anything, it depends what you're doing, but I like auto recovery, app level health checks, infrastructure as code, namespaces, resource quotas, and don't want to force my dev teams to couple their network policies with infrastructure details, so I'm fairly happy with the abstraction.

Would you say that you would be happier if you put a bunch of websites/web applications behind a load balancer like haproxy - all in a few VMs or even bare-metal servers instead of taking on the complexity of Kubernetes?

No snark or pushing opinion; I’m genuinely wondering how it is from someone who went through this path.

As a sysadmin who cares more about the reliability of services, still managing critical services outside of Kubernetes, I’m wondering what I’m missing out with Kubernetes.

Infinitely happier, because if something goes wrong, you can usually figure out what it was, and fix it or even prevent it from happening again.

Sure, the blue-green automatic deployment in k8s is cool, but a bit of clever Ansible scripting should get you there as well. It might be more busywork, but the amount of time spent nursing my k8s cluster in no way amount to time saving

This is yet another one of those "introductory" articles.

The whole field it seems everywhere is filled with "introductory" "gentle" books/articles, and then "this is how you do reusable rocket science with x".

Pro tip: to understand kubernetes in between, go read the manual pages of Linux networking and get a really good grip on iptables. Go read the manual pages for linux namespaces, cgroups and containers with lxc.

Why dont people get the basics of the parts of the tool they are going to use, first, instead of trying to "understand a tool"? You wont succeed doing anything with kubernetes, if you come from say macosx or windows envrionment, and have no clue how/what kubernetes is built on.

Sure, I know iptables, namespaces, cgroups and containers and lxc. I stopped right here and did not bother to learn Kubernetes because all of this knowledge gives me enough to run anything I want, websites, applications, hadoop clusters and so on without it. I can debug my stack and resolve any issues with confidence, have upgrades and do autoscaling (using AWS). Why should I even think about Kubernetes? Now the real question is, if you need to learn all these to run Kubernetes than is it really tool they are trying to sell it is? One of the promise I often hear is that you don’t need systems knowledge or SREs to run it. Yet I read these stories and many customers reach out for help because they have an undebugable problem in production and nobody can help. I learned in the last 20 years working as a systems engineer that the best tools or stacks are ones easy to understand, debug and fix. Black box computing is the worst, when things happen for no good reason and it is hard to find out the root cause.

> One of the promise I often hear is that you don’t need systems knowledge or SREs to run it.

Maybe if you use GKE that might be the case, but otherwise, running Kubernetes is a fools errand if you don’t have extensive experience with systems and SRE, imho, and anyone telling you otherwise is selling snake oil. Sure, you might get lucky and never have a major issue, but do you really want to depend on luck?

> I learned in the last 20 years working as a systems engineer that the best tools or stacks are ones easy to understand, debug and fix.

Sure, that’s an ideal goal and one I strive for too when possible, but there’s a reason why many SRE folks make a very large salary compared to national averages. Hard work is expected from time to time.

How is AWS not black box computing? Isn't it worse than a black box? Because it's a black box that's sitting in a black building that you don't have access too?

Disclaimer: I'm a n00b when it comes to web stuff.

You know when it breaks (built in monitoring) and there are SLAs. If it is broken Amazon will fix it. I can work around it, using caching proxies for S3 for example. There are other approaches to be more resilient against cloud outages. The most important difference between self hosted Kubernetes vs cloud native services is that who will fix it.

If there's a problem with an aws service then aws will fix it. Of course there's always this weird bug that you will run into even in their services, but at the least, if there's an API that's documented, it works as documented for the documented scenario.

An autoscaling group in AWS might have issues if you complicate it a bit and run into edge case bugs, but otherwise it's most definitely going to scale your instances up and down (because they bill you to do so)

What AWS does is easy in the small scale but hard on a large scale. The core tech is about giving you a seemingly unlimited supply of object storage, disks, virtual machines and networks to connect and isolate them. This is something a hundred companies have been doing for years but AWS is in the lead thanks to their level of tech that has made the buying of computer resources fast and easy.

Simply said: buying basic VM/dedicated machines has been around long enough that it is not a black box for professionals.

I feel mostly convinced that your point here is the sane one.

One nitpick though:

> do autoscaling (using AWS)

But that would mean that you're partially locked in on AWS? Only when it comes to auto scaling bit still...

Yeah I am willingly be locked in to AWS which is a stable, proven, compliant solution with sophisticated SLAs and even paid support. I can create reliable distributed highly available performant services on top of it. That is all I need. As far as locking in on Kubernetes goes I am not sure.

Someone has to have physical servers somewhere right, even with kubernetes?

You're either locked into a provider for that or locked into the team that runs your own datacenters.

Sure but if all your infrastructure scripts are built with kubernetes in mind switching from one provider to another becomes much less labor intensive than if they are built with Azure or Amazon in mind. At least in theory.

> Why dont people get the basics of the parts of the tool

not saying that you are incorrect, but "basic" is relative.

To understand Linux networking should we also understand basic linux kernel ? or basic chemistry to understand what happens inside the cpu ?

We should all stand on the (stable) shoulders of giants. I would call that good (also relative indeed) documentation that abstract us from the level below/above.

If you are working on fabs, yes you should understand basic chemistry. If you are working on distributed stateless cloud-agnostic applications, yes you should understand both the networking and kernel of your OS on which said applications run.

Even if there is an abstraction layer with nice boundaries and well documented, you can still not ignore the layer below and pretend its raining.

When you are making a web app (for advertising purposes since this is HN), can you ignore understanding HTTP and TCP and IP? Would you hire a web-application developer who didnt know how to setup a web-server or load-balance his app?

> Would you hire a web-application developer who didnt know how to setup a web-server or load-balance his app?

My company does this all the time.

In fact, in the absence of our step-by-step guides, I would estimate that maybe 5% of the development team could successfully configure their local web server on their own.

Understanding the fundamentals is indeed very important. Not that you need to be an expert in it, but knowing basic concepts is extremely helpful for troubleshooting, and I would argue is the thing that makes the difference between a junior and senior person.

I don’t use most of the fundamentals I learned in school directly on a daily basis, but I do use them almost every day to inform the decisions I make about higher level things.

To bake a pie, you must first invent the universe.

The challenge with Kubernetes is the number of layers that are being created. Whilst each individual layer can be understood, it takes time.

Additionally, these layers are prone to rapid change. To take one example Cilium and other players in the container world are looking to replace iptables with eBPF. So learning iptables becomes obsolete.

For me, Kubernetes is cool and if you have enough scale to need it, very useful. The problem is (like most IT hype cycles) it's getting used in inappropriate places, where simpler more basic solutions could work just fine.

Because Kubernetes is the new NoSQL, there are consulting services, conference talks and books to be sold.

And SREs to resist. :) I refuse to put Kubernetes into our infra.

I wanted to, then I read about it.

I'm wondering if Kubernetes would be the right choice in our use case. Or if something like Ansible would be better suited and easier to setup and use?

I'll soon have to manage 50 remote bare-metal servers, all of them in different cities. Each one of them will run the same dockerized application. It's not 1 application distributed in 50 servers; it's 50 instances of the same application running in 50 servers, independently.

A frequent use case is to deploy a new version of this application on every server (nowadays I do it manually, it's OK since I manage only like 10 servers).

A nice-to-have would be to avoid downtimes also, when I update the application (nowadays I have a downtime of 2-5 minutes (when it goes well), which matters for us).

No, as @dragonsh said, Kubernetes will be a bad choice for you. Kubernetes is complex, specially its network stack, you'll need to setup an VPN from each site to your control plane site to keep some basic functionality working (kubectl logs, for example).

If you don't care about a centralized API to probe status and manage each instance, Ansible should be enough to orchestrate these installations and with little effort (that also depends on the application at hand) getting zero-downtime rollouts with Docker can be easily done with it.

However, if you want a single control place to probe status and want to avoid writing your own rollout scripts, Hashicorp's Nomad [1] might be a good solution for this. It is a lot simpler than Kubernetes while still giving you nice primitives to describe jobs/services, health checks, rollouts strategies and etc. Treat every site as a datacenter of its own, setup a job of type "system" (akin to Kubernetes DaemonSets) and all you need on these sites is internet access to your HTTPS endpoint of the Nomad control plane.

If you want to talk more about this, hit me up on Twitter or Telegram, I use @rochacon as my handle virtually everywhere.

[1] https://www.nomadproject.io

Edit: grammar and typos

Agreed, Nomad seems perfect for most use cases where cluster management, deployments and rollouts is required.

It's a bit overlooked now because every DevOps person nowadays seem to think Kubernetes is the only rational thing as it will look good on a CV.

I predict Nomad will be on the upswing the next few years as people realize Kubernetes is extremely complicated to self host.

How can zero downtime deploys be done with Docker?

I had to write my own custom blue/green deploy script to hot reload traffic to proxy_pass definitions in nginx upstream configs since I don’t use Docker.

IIRC, if you use docker swarm it wil handle the routing for you. So if an app is running on port 8080 on two swarm nodes, if you access port 8080 on each node, you might actually be accessing another node.

Since docker swarm knows if an instance is down, it will know to not use that instance.

See https://docs.docker.com/engine/swarm/ingress/

I run on Docker Swarm and there are hiccups as one container goes up and another goes down during deployment.

I think you can try a tool like Envoy for doing routing between versions plus some other custom stuff.

At my job, the devops team had to write some kind of special program that maps whether a Marathon/Mesos instance is up when dealing with Envoy. Not sure if the same is required for just plain Docker?

I would also investigate Terraform if the Ansible path appeals to you.

No kubernetes will be a bad choice for your use case. But if you learn all the intricacies and manage a behemoth besides your own application it will help in your resume to brag that you are a modern devops and a fashionable one using kubernetes.

If you need a practical suggestion use simple solution like LXD [1]. But as you are already using docker stay with it and ansible.

As you are using ansible you are already doing infrastructure as a code, you are way ahead of curve already.

Probably when you grow a bit bigger use packer [2] and terraform [3], but I think ansible will do just fine.

Kubernetes was designed for Google kind of problem the burden to maintain is quite high with lots of moving parts, unless you use Google GKE or Amazon or managed kubernetes service. So don't fall for it unless you need it for your resume.

[1] https://linuxcontainers.org/

[2] https://www.packer.io/intro/

[3] https://www.terraform.io/

GKE and EKS are very expensive which in my opinion makes using them unattractive for smaller scale projects. But Digital Ocean now offers managed Kubernetes for free, I've played around with it and it actually seems really easy to set up and manage. Would be great if the competition would follow suit.

> GKE and EKS are very expensive

> Digital Ocean now offers managed Kubernetes for free

GKE is just as "free" as DO, in that you only pay for the nodes you actually use (those nodes are probably a bit more expensive, admittedly). EKS is the outlier in this trio in that you also have to pay (a lot) for the master. Could you clarify what GKE could do to "follow suit"?

It seems, nothing! I was just wrong about GKE being expensive, not sure why I thought that (though it seems that until late 2017 they did have costs per cluster).

This sounds like Chick-fil-A's use case where they decided to put a k8s cluster in each restaurant. https://medium.com/@cfatechblog/edge-computing-at-chick-fil-... https://medium.com/@cfatechblog/bare-metal-k8s-clustering-at...

If you have a single node not a cluster in each location then k8s buys you a lot less. Some people still advocate single-node k8s "clusters" for benefits like rolling deployment.

I’m still stuck on the Chick-fil-A part. Haha.

It's handy to have all those k8s instances scale to zero on Sundays.

I am fairly certain the second the old dude dies they are going to be like "Motion to open on Sundays..." in the board room.

S. Truett Cathy (who I assume is the old dude you mentioned) passed away in September of 2014.

Ah I did not know that. I figured he was still hanging out and getting a kick out of refusing to open on Sundays, given how much of meme it has become.

I've seen rolling deployment using plain distribution upgrade happily working on larger fleets than that, but that's borderline uncivilized.

actually rolling deployments are painful. it's like THE feature of k8s. and since k3s is so simple, it's alo not to painful to run it.

Echoing others, you really don't want Kubernetes here.

I only have limited experience, but Kubernetes works very hard to abstract away the actual machines, and it works best that way: you just say "Deploy 20 instances of job X", and k8s will somehow find the machines to deploy them. You don't care where they are running - k8s handles that.

Once you start to care about actual machines and which jobs are running where, k8s starts to make less sense. You're paying for huge complexity (required to abstract away the machines), but you're getting none of the benefits - it just becomes a glorified wrapper around Docker daemon.

Docker in production is a potential disaster, it runs everything as root from the daemon, we are heavy k8s users, and we are transitioning to rootless cri-o containers specificaly because of this. Investigate podman which is a docker cli clone that builds cri-o containers from docker files and allows you to run them under a designated user id, the only restriction is that in rootless mode you cant map ports that are less than 1024, for that you need to run them under sudo which defeats tbe purpose. Howver we use an ingress controller to map inbound requests to containers so it can do the port mapping. We used to use traefik, but now we use nginx as ibgress, becuase we can run the excellent lua-waf module.

We run on multiple clouds, so vendor apis and k8s implementations are not usefull to us.

I struggle to understand when Docker is supposed to be introduced to a stack.

If you have a 100 user SaaS platform, do you run Linux containers on top of Linux just for isolation?

If you need strong guarantees of isolation and if every user has their own dedicated installation. If you are a general service where isolation is enforced at the application level then just deploy straight away.

Docker is useful when you don't want to care about the underlying operating system or you might be deploying on different operating systems etc. Also if you want to quickly switch providers (like going from Ubuntu 18.04 on digital ocean to Windows Server on Azure) since it's a self contained image.

This self contained image is useful during development too. I can cleanly separate dependencies between applications and dev/test/prod. For example I develop a node app based on node 12 + pg 11 but I tinker with ghost too which requires node 10 + mysql. Having this in Docker keeps cruft and complexity off my macOS installation at the cost of space. Having multiple versions of a runtime/database etc can quickly become a nightmare to manage on local operating systems.

Operationally if you can control deployment and you don't need many servers, deploying to the base operating system makes sense. Docker/Kubernetes shine when you have scale + need to provision many servers across different platform providers.

> Operationally if you can control deployment and you don't need many servers, deploying to the base operating system makes sense. Docker/Kubernetes shine when you have scale + need to provision many servers across different platform providers.

On that same note... when do Terraform / Ansible start to make sense?

I have found that Ansible is necessary if you have more than one server and useful even for less than one (eg managing a service account on an old legacy shared machine owned by another team).

I use terraform mainly for provisioning infrastructure (how many VMs on which platform, how many databases etc) while Ansible for configuring the software side of the deployment - like the dependencies, which version of code to pull, configuring nginx, cache etc

It can be introduced at many points.

in dev: You can setup dependencies without worrying about underlying os. So it is easier for building a new dev machine, adding a new team member, etc.

in integration test: you can setup multiple integration test environment as long as you have enough resource. Running multiple integration tests without interfering other team members is the major benefit here.

in prod: as you may know, for isolation and horizontal scaling. Easier roll back and update?

Containers are fantastic at isolating one software application from another. All their dependencies - runtime, library, packages - live in separate chrooted filesystems.

User-to-user isolation is hard, and extremely hard if any of the users could be malicious. Docker is broadly neutral here - adds a few tools, might add a few vulnerabilities.

Ansible all the way.

I'm really upset with the way Google let their marketing team run roughshod all over the place with that software. Kubernetes is almost never the tool to use. It's entirely insecure, overly complicated and almost never fits the intended supposed benefit. Worse it feels that the entire CNCF ecosystem is ran entirely by marketing people with "developer evangelists" that have never coded a single line of code in their day - it's a real shame and quite honestly an insult to professional engineers.

I don’t understand when to use Ansible, Puppet, Chef, or Terraform...

Terraform: First you use this to create VMs.

Ansible/Puppet/Chef/Salt: Then you use this to install your stuff in the VMs. Just pick one of these and stick with it.

Install k8s with a/p/c/s. :)

K8s is not about configuration management, it’s about dynamic application management. Some parts infringe on areas where CM tools work as well, but k8s is all about managing containerized applications.

Trying to set up flows for ”works on my laptop”-dev, ci/cd, loadtesting, a/b tests, canary releases, autoscaling and rollbacks for multiple teams of devs? K8s really simplifies these things.

The idea is that you have one api spec to rule the whole stack (from a dev perspective). If you go down a more light-weight stack, a lot can still be achieved. More duct tape required though. That being said- I love duct tape!

Ansible can be used to create VMs too. I used to use it to provision AWS instances and to configure them after.


Never use Ansible, Puppet or Chef. Those are old dead tools.

Those are tools for configuration management. If you instead use packer, docker, you can build your vm/image at build time, and use Terraform to setup vm with that image. Use etcd (in the image set to pull config) or similar key-value for distributing configuration.

Not "setup base vm with terraform" and then "ansible to install and configure it". Just build your vm/container image with the software you want already installed, and a etcd or other pull-configuration from a pre-set source. Done.

Now you dont need configuration-management, and dynamically changing infrastructure, since you moved it to the build-step.

I don't think configuration management tools are dead. I use puppet to build my image with packer. It's way easier than build with a bunch of shell scripts. I agree that you shouldn't use it on live servers.

Very curious about what kind of service has these requirements! Servers in 50 different cities suggest you likely care A LOT about latency. Going bare metal likely means you’re dealing with huge amounts of computation. But tolerating 2-5 mins of downtime on every deploy (until now) means availability isn’t a big deal.

I’ve never seen this combo of requirements on any service before, what is it? Or are my assumptions just wrong about what these things suggest about your business?

We detect and monitor commercials airing on terrestrial TV networks. :)

Please tell us you’re planning to monitor extra-terrestrial TV networks. It’ll make this year seem so much cooler tech wise, you know using spacex’s new satellites.

Ok, joking aside, intruiging as there seems to be a few companies providing monitoring services for various things. How did you all find the market for this?

Pretty cool stuff. I'm assuming you have to set up Antenna's where you have your servers colocated and monitoring the transport stream's going through?

Obviously you don't have to spill the company sauce but I'm trying to imagine the benefits of having all of this data.

Hah very cool! I was indeed way off on my assumptions :) I guess the 50 different cities is for picking up different regional broadcasts? And bare metal over cloud compute is because you have to connect special hardware?

Cool, so, looks like the product generates daily reports for advertisers about performance of ad spots over a trailing 1 day, lining up info about (a) what ad ran in what markets at what times plus (b) what increase in mobile/web traffic/conversion events happened from those markets coincidental with advertising, if I understand https://www.decidata.tv/advertisers.html right. Looks like a complementary product (maybe running as a second process on the same nodes, or maybe running as part of the logic of the thing that parses the video) also produces quality reports about aired content ( https://www.decidata.tv/operators.html ). I don't really know video very well, probably some very clever stuff you can do here to parallelize the processing.

So hmmmm, a few things to think about for the tooling:

() How quickly can you notice a single-node service outage and react / restore service?

() How much downtime is ok? Is it ever ok to fail to process data for an entire day for one of the ~50 markets (can you just say "this report is incomplete" and reprocess it later)? How much wiggle room do you have (how long does it take 1 instance of the application to do its local processing job on 24 hours of data, and how long does the downline data pipeline have?)

() Are you responsible for the part that watches TV + saves it to disk, and ALSO the part that batch-processes the saved video and produces the ad metadata / quality metadata? Are those different services deployed separately?

() Yikes, is there, like, a person on call in each of 50 places who can unplug the thing that's watching TV and plug it back in, if a wire burns out or if the machine that's watching TV fails? That sounds like it could be an operational nightmare but maybe this is a solved problem / maybe you can buy watch-TV-and-save-it-to-disk as a service?

() How do you make sure that every node is running the same version of the software with the same version of all the configuration?

() How do you release new versions of the software? Do you (a) have a non-production environment which runs side-by-side at every node, processing the same data, and measure that the output is equally correct or better, the performance is the same or better, and there are no new errors or failures -- then promote the non-production code to production automatically after a while? Or do you (b) deploy a "canary" to one of the 50 nodes and watch how it performs for a while, then deploy to the rest if they all behave okay? Or do you (c) just ship the latest code to all 50 nodes on Friday nights when no one's looking, and check some health metrics over the weekend, and roll it back on Monday if a metric looks bad?

() How do you keep track of what software versions you released when? If someone asks "What changed on 'X date'" does your tooling let you tell them pretty quickly and pretty accurately?

() If you discover a bad bug (the parser is corrupting data; the new parsing job has a slow memory leak and fails catastrophically after 21 days of uptime) and you need to make a change in a hurry, do your runbooks let you make that kind of change quickly?

Doesn't seem like you really need k8s or ansible or chef or whatever for this, as much as you need to write down your operational requirements and decide what technology will meet them.

(ouch, I ended up formatting this in a way I didn't quite expect...)

This is a pretty typical retail backoffice recipe as well (changes only during maintenance windows).

I almost want to suggest Windows Group Policy for something like this...

I do this on ~30 servers with GuixSD, so there is a single file to configure all of them. To update the application I wrote a Guix pkg for it that pulls from our repository (simple to set up with Guix) everything is controlled in emacs by me to change/start/stop all the servers. https://news.ycombinator.com/item?id=17084561 is similar to what I'm doing. Caveat: you have to go on google scholar to read guix and nix papers to understand the system architecture, then their documentation also downtimes aren't an issue for us since these are internal servers, and this is done during non critical time but it's usually just restarting the new app so a few seconds. Other caveats, security fixes we just edit in changes not run guix pull but I also have that automated now after mailing list help. You may have totally different req otherwise look into nix/guix, Guix the pkg manager runs on any distro.

Kubernetes is complicated, it's a whole new world of terminology and abstractions that you need to learn, and it changes quickly.

That said, maintaining a large cluster is complex too. Often, there is much more involved than making sure each image's config files are up to date. Kubernetes shines when you have a number of different docker images, where each needs its own unique network, scaling, cpu, and disk settings, i.e., hardware requirements that can't really be controlled with just config files.

In your case, the fact that all of your images are identical probably cuts against using Kubernetes. However, the fact that these images are running on different networks where each may require their own tweaking leans towards it. If you expect to start adding many different docker images to your cluster, with different hardware requirements, it's probably worth the learning curve to get on to Kubernetes.

Disclaimer: I've never run Kubernetes on bare metal, just hosted environments. Bare metal may make Kubernetes much less attractive.

At a high level, your use-case maps well to a Kubernetes DaemonSet, which is a resource that represents the desire to run a copy of something containerized on every node within your cluster. Kubernetes would give you some things that might be useful:

* A unified control plane (monitoring, statusing)

* An automated rolling-upgrade system

* "self-healing" in the form of automatically restarting your containers if they go awry

However, for your use-case, it is likely more effort than it's worth:

* Settin up and maintaining Kubernetes on bare metal servers is a huge amount of work

* A lot of Kubernetes features are not needed for your use-case

* The Kubernetes networking model essentially requires you to set up private network spanning all of your sites in order to work but provides almost no benefit for your use-case

Technically you could use a DaemonSet for that, but the primary use-case is not that for DaemonSets. It should be used for "daemon" containers, like logging daemons (e.g. fluentd), network plugin daemons like calico, flannel or monitoring like prometheus. If you want to run multiple instances of your application, you should define a replicaset or deployment with setting Pod affinity instead.

That's super fiddly though: you will be leaning on (anti) affinity and counts to retain the 1:1 relationship between hosts and pods. I would argue that daemonsets aren't for any specific workload, they're more about how you want scheduling to happen (1 per node, on some subset of your nodes).

This is where K8S becomes messy because of its rapid development history.

Applications should just be a `Deployment` manifest with the type of distribution being a child setting (X replicas with affinity or 1-per-host). There's no reason to have competing ReplicaSets and DaemonSets just for this detail.

I disagree on this one, DaemonSets are for really a different use-case. There is no replicas field you can set, and strictly 1 Pod will run on 1 node. A Deployment is more flexible, because you can run even more Pods in different cities if necessary. What if there are more/less nodes are in different cities or different specs, which could and should run more Pods? You can't fine-tune DaemonSets for those cases, it really is for daemons.

It's not about use-case, it's that "X number of replicas with affinity" OR "1-per-node" should just be a setting within a Deployment. No need to use a completely separate DaemonSet definition just to run 1-per-node.

Kubernetes is supposed to installed nodes close to each other, typically in the same server room, or datacenter, otherwise the latency would kill the etcd cluster. Ansible is perfectly fine and simple for your use case, why would you bother even thinking about overcomplicating it? Simplicity is king!

You could rather easily manage this flow with ansible. K8s abstracts a lot of the pieces in your ansible plays away, but comes back at you in form of a rather complex stack. Say you want to host instances of this app in a DC/cloud, k8s would be more appropriate, and perhaps then it would make sense to go single node “micro-k8s” style (for single server locations), to give you a common api for managing deployments, network configuration, security etc.

Win with k8s is that once you’re tuned, a lot of “config crap” is abstracted away, basically. But - there's a reason ansible is supported specifically through the “operators” SDK, something we use to manage seamless upgrades of a k8s deployed app:


Edit - forgot that if meteics based autoscaling of the app is a requirement my experience is that k8s makes this a breeze. Probably not a req if using single node hosts though...

I haven't used ansible in a few years, but I sorta loathed the transport after a time with even dozens of boxes -- constantly creating ssh connections was slowing things down significantly.

I switched to saltstack some time back and it's much better while having a similar sorta interface. The docs are worse than ansible imo tho. ymmv

You shouldn't be creating ssh connections all the time, but instead using persistent connection to each server that ansible opens once at the beginning, and reuses for all subsequent tasks.

See ControlPersist/ControlPath OpenSSH client configuration.

And for even more speedup, use Mitogen https://mitogen.networkgenomics.com/ansible_detailed.html

Never found this incredibly reliable - and if you change networks, sleep, or power off your machine, those connections die. Salt is just better at all of this imho.

Do you often do these things in the middle of running a playbook? I would hope not. :)

It's not meant to have the connections persist forever, just long enough for repeated short ssh connections to have almost no overhead, and therefore be much faster.

Yeah, good point.

I mean, I guess if you use a dedicated host, it's more reliable - but I've always just pushed from the client using Ansible whereas with Salt you have the master running the cloud running somewhere most usually.

You could also go full docker and experiment with docker swarm. I have little experience with swarm itself, but AWS ECS has deployment strategy. You could say you want 50 container instances on a fleet of 50 machines with deployment strategy of spreading across all machines and you would guarantee that every single machine is populated with a docker container. Maybe try to look for a similar functionality thinking about you bare metal machines as a single fleet of instances. Do you manage your containers configs with environment variables? I think docker also let's you share the hosts environment with the docker container, so you can pre provision the server with the necessary information and once the swarm is deployed, the container figures out what to do with the hosts env. Applying that to the spread across machines deployment strategy and you solve your problem.

I would personally go with ansible and consul if your application is something like a rest api, should be trivial to setup.

Summary: 50 production instances of a single application.

No, you don't need Kubernetes (though fwiw if you used it what you'd have would be 50 nodes with a single daemonset, 1 pod of the app per node, if that helps understand) - but I'd suggest not using Ansible.

Instead, use Packer (or similar) to create a machine image/snapshot/whatever your server provider calls it, and then deploy that same image on all 50.

You might have some small amount of host-specific things left to configure, and by all means use ansible for that if you want, but there's no need for error-prone running of the same large playbook 50 times.

Personally, I find that final setup is little enough to do it with Terraform (which is provisioning the servers anyway).

I've used SaltStack in a similar setup with some success, it's easy to keep all the servers in sync and prevent configuration drift.

Check out BalenaCloud. It's a container orchestration service for edge devices.

A major problem I've seen using Kubernetes is it's difficult to bring up an entire cluster from scratch, play with it and tear it down. So I wrote Sugarkube [1] which lets you launch an ephemeral cluster, install all your applications into it (including creating any dependent cloud infrastructure) and tear it down again. This means each dev can have their own isolated K8s cluster that's in parity with production. And you can release the same code through different environments. In fact your prod environments can become ephemeral too - instead of doing complicated in-place upgrades you could spin up a sister cluster, direct traffic to it and then tear down the original. Or you could spin up a cluster to test the upgrade before repeating it in your live environment.

Based on my own experience I believe ephemeral clusters can solve a huge number of problems dev teams face using Kubernetes.

Sugarkube currently supports launching clusters using Minikube, EKS and Kops and we'll be adding provisioners for GKE and Azure in future. Sugarkube also works with existing clusters, so you can use it deploy your applications. It's a sane way of managing how to ship infrastrucuture code (e.g. Terraform configs) with your applications that need them.

Sugarkube also supports parameterising applications differently per environment - you could almost view it as something like Ansible but that was written with Kubernetes in mind. And it's in Go so it's way faster than Ansible (an early POC was actually written in Ansible and it was very slow).

I've just finished intro tutorials for deploying Wordpress to Minikube [2] and EKS [3]. I'd be keen to hear feedback. We just tagged our first proper release earlier this week and it's ready to try now.

[1] https://docs.sugarkube.io/introduction/ephemeral-clusters/

[2] https://docs.sugarkube.io/getting-started/tutorials/local-we...

[3] https://docs.sugarkube.io/getting-started/tutorials/dev-web/

I definitely agree with you - my team switched to an "ephemeral cluster" model which allows us to very quickly spin up an entirely new cluster and drain traffic to it as needed.

It's something that we've ended up implementing on our own with a lot of Terraform, but that's had its own obstacles and is something of a small maintenance burden. I'll be taking a look at sugarkube!

Awesome. If you need any help or if anything's unclear please get in touch: https://www.sugarkube.io/#contact-us


Not quite the same, but good for CI jobs


Oh my - there are two distinct k8s bootstrapping projects named "kind" :(


This sounds like one of those "Why didn't somebody do this sooner?" moments.


Not sure I understand. Do you mean local dev clusters? Because that is trivial to do using K3S/minikube/kind.

No, more than that. To Sugarkube, the actual type of cluster (minikube, EKS, Kops) is just an implementation detail. Actually, when you think about it, even the region you deploy in or your cloud provider are really just an implementation details. Provided your applications are portable you could create a local minikube cluster that just includes the subset of stuff from your cluster that you care about for whatever you're working on. E.g. imagine you're in an ops team, and the ops cluster runs monitoring stuff and Jenkins, you could create a local cluster just running Jenkins and its dependencies (e.g. tiller, cert-manager, etc.). This is possible because Sugarkube understands your application's dependencies [1].

But the cool thing is that you could then go to the cloud whenever you wanted to. If you have some hard dependency on some cloud infrastructure, you could just use Sugarkube to spin up an isolated dev cluster on EKS for example. The EKS dev cluster could look the same as your prod cluster in terms of versions of software installed, but just use fewer, smaller EC2s, and again, perhaps just running a subset of the applications in your overall ops cluster.

Once you've developed in isolation, you could then deploy to a staging cluster which could again have been brought up - either just for this task, or more likely at the start of the day/week. Finally you could then promote your updated version of Jenkins into your prod cluster. For major upgrades to the prod cluster you could use Sugarkube to spin up a brand new sister cluster and then start to gradually migrate prod traffic over to it. Once all the traffic is going to your new cluster, just tear down the old one. If there's a problem, back out by sending all traffic back to the old cluster. Of course this last ability depends on something at the perimeter like Akamai (I think AWS have something similar?), and is a lot easier if your state is outside the cluster (e.g. in hosted databases, S3, etc.) but it'd be doable.

On projects I've worked on I've seen so many problems that basically came down to long-lived clusters where someone set them up a year ago and left/forgot how they worked. Or because performing upgrades was a nightmare they weren't done, etc. 100% automation of ephemeral clusters just solves all those problems.

[1] https://docs.sugarkube.io/concepts/kapps/dependencies/

There are advantages to using minikube for local development if you are interested in replicating and testing the whole flow. I think these are most important for small teams of 1-5 people where everyone needs to understand the whole infrastructure stack and regularly performs changes down to the kubernetes resources. That said, I've found that for bigger teams where you have a few people concerned with keeping k8s stable, and most people developing the actual application logic, docker-compose is still the most no-frills-easy-to-understand option for local development.

I've seen a deployment of a large number of ec2 instances + other Amazon services handled with Puppet+Ansible, and I've also seen a somewhat smaller but still fairly sizeable deployment using a custom Kube cluster. The custom Kube cluster was cool and definitely seemed to make those working with it happy, but I was much happier working with the Puppet+Ansible bare metal setup. It's just easier to figure out what's wrong. I am now working with hosted Kubernetes, which seems to be much less of a mental headache.

Is there a really abbreviated reference to all of core kubernetes? Like, one page with just all the categories, services, objects, attributes, schemas, etc, for every component. I know it would be 1000 pages long, but it would allow quickly skimming as a reference, as well as getting a very quick low-level glance at everything involved.

Guides like OP's are great to get started, but I want to know that weird_flag_that_almost_nobody_ever_sets exists in random_service_parameter.

+1 for the reference documentation.

I've often been able to write complex and correct manifests, just by recursively following the links and filling in the YAML as appropriate.

i mean reference documentation is nice if you understand the core concepts. my struggle is with grasping those core concepts and how they all fit together to solve the orchestration problem.

for now, a few books (kubernetes patterns is the current one i am reading) and articles are helpful. i would like to note that some of these concepts had their names changed overtime. i suppose there will be more changes as the platform stabilizes. mind you guys that kubernetes is only 5 years old...

`kubectl explain` is also very nice.

Yes! If this could be mashed up with different levels of diagrams I think this would be extremely useful for visualizing the components

How about a children's story? https://youtu.be/R9-SOzep73w

I hate the fact that most of my time is spent configuring rather than building.

I have a short text on kubernetes operators. https://github.com/MoserMichael/cstuff/releases/download/kbo... learning this stuff got me something of an overview of what is going on there.

Tried to set up a Kubernetes cluster with IPv6 the other day. These hipster tools don't care about IPv6 :(

A generic engineering rule: fight complexity, a.k.a KISS principle. It takes additional effort and courage. You'd need to fight for that project time too. You'll live longer (and your systems too), but probably on your next job :)

Is mesos dead ?

no, It's very much alive, there's just no hype around it anymore

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact