If only I could!! That’s exactly the frustrating part: there seems to be no way of grokking what goes on under the hood, and there are so many different ways of setting up a cluster and very few have any information about them online whatsoever.
As a practical example, what happened yesterday was that all of a sudden my pods could no longer resolve DNS lookups (took a while to figure out that that was what was going on, no fun when all your sites are down and customers are on the phone). Logging into the nodes, we found out about half of them had iptables disabled (but still worked somehow?). You try to figure out what’s going on, but there’s about 12 containers running in tandem to enable networking in the first place (what’s Calico again? KubeDNS? CoreDNS? I set it up a year ago, can’t remember now...) and no avail in Googling, because your setup is unique and nobody else was harebrained enough to set up their own cluster and blog about it. Commence the random sequence of commands I’ll never remember until by some miracle things seems to fix themselves. Now it’s just waiting for this to happen again, and being not one step closer to fixing it
If you use a managed kubernetes (not in aws since they suck, eks is not really managed). Like gke or aks, then you skip the whole "there is a problem in my own cloud of my own making".
btw, I also encountered DNS problems in kubernetes, on ACS, it took 5-10 minutes to resolve, and was caused by ACS not having services enabled to restart dns upon reboot, lol.
Reading this comment made me realise that often new technology is adopted because it is optional and promises options. But those options quickly shrink away and suddenly you’re locked into it.
Not to invoke a controversial name. But this is what happened with systemd.
Moving from one Kubernetes Provider to another is not zero time. You need to learn some differences in the way GKE ingresses vs AWS ELBs work, etc. It is a substantially more tractable problem than the differences between Cloud Bigtable and DynamoDB, and that one is still a tractable problem.
The way to fight lock-in is is not, and has never been, "These two providers offer exactly the same service". It has been about avoiding "These two providers offer nothing that is analogous, and their documentation is directly written to encourage using practices that do not port". It has never been an all-or-nothing thing.
Have you considered rebuilding/moving the containers onto something more "enterprisey"?
If IBM wanted the experts, they would have hired or grown their own. What they wanted was, I guess, the contacts (actual and prospectives).
My experience with support contacts is that you often times have access only to someone who is not any more expert than you, and who care much less for your customers than you do. In several occasions it turned out the "expert" had been the one benefiting from the teaching from the in house "not supposed to be expert but knows more than the expert" guy (and yes also, and maybe especially, with "reputable" large companies like IBM or Oracle).
I can even remember of a particular instance when the expert had no access to his company internal documentation to get details about a specific error message we were hitting, and we had to find a pirate copy of some internal manual from some Russian website and hand it over to him.
It makes sense to have a service contract when you have really no knowledge at all on the domain, but as soon as it's related to your daily job then you will quickly realise that experts are mythical characters whom your contractor have no better access to than any other company, including your own.
I would seriously question who you're doing business with. Anytime I have had a significant issue, the escalation path is to the guy who wrote the code. To imply that enterprise support is just a bunch of people who don't know more than the average guy off the street is ridiculous. Tier 1 support? Sure, but you don't stay there long if you're a clued user with a real issue.
I've yet to come across a single instance where such rapid scaling happens and stays consistently high.
Most of the time you know well in advance when your resources will be put to the test.
But I have also had a number of DNS problems that we still haven't resolved, and they sometimes go away on their own. Same for IP tables rules issues. This is of course on a hosted kubernetes cluster at a large supercomputing center. (I didn't set it up, I just have to fix it. Ugh.) At Google, it's been great and we've had no networking problems, but they almost certainly run their own overlay network driver.
The various networking solutions you can plug into kubernetes seem pretty spotty, and they are very hard to debug. I still haven't figured it out myself. But I am trying to not throw the baby away with the bathwater. I think the networking (and storage) parts will get better.
I think that this pain is sometimes more severe in the context of automated provisioning tools out there and the trend towards immutable infrastructure - folks tend to not have the know-how to dig in and mutate that state if need be.
It's really important to have a story within teams, though, about either investing in the knowledge needed to make these fixes, or to have the tooling in place to quickly rebuild everything from scratch and cutover to a new, working production cluster in a minimal amount of time.
As I build my knowledge I am also building Ansible playbooks and task files. After each iteration I shutdown my cluster. Do an automated rebuild and test. Delete the original cluster and start my next iteration.
I have an admin box with everything I need to persist between builds (Ansible, keys, configuration files, etc) and can deploy whatever size and quantity of workers (VM) needed.
It has been a good process so far. I haven't yet put things in an unrecoverable state, but if that happens I can rebuild the cluster to my most recent save and try again.
I don't see it taking a lot of resources to have a proving ground. I would definitely not feel comfortable going to production without the ability to reproduce the production clusters' exact state.
I anticipate exactly what you describe as a roll back mechanism. At all times I want to be able to automate the deployment of clusters to an exact known state.
I think building a cluster, walking away from it for a year, and then coming back to it for a break fix/update/new deployment is a huge gamble.
Clusters are cattle, not pets.
It is also a cluster management tool, but much simpler and can be combined with other tools to make it just as powerful as Kubernetes.
In 4 years I've never came across a cluster I was unable to fix, nor has it really broken without someone taking an unadvisable action on it. This may simply be because I started early enough that I was forced to manually configure the components and thus understand the underlying system well enough.
Over time I have seen some interesting things though:
- Changing the overlay network on running servers probably the silliest thing I've done. This wasn't on production, but figuring out where all the files are and deleting them was something pretty undocumented.
- A few years back somebody ran a HA cluster without setting it as HA which resulted in occasional races where services keep changing IP addresses. I believe the ability to do this was patched out.
- An upgrade caused a doubling of all pods once. This was back when deployments were alpha/beta and they changed how they were references in the underlying system, causing deployments to forget their replicasets, etc.
Overall though, in 4 years I've spent very little time debugging clusters and more time debugging apps, which is what we want.
You’re basically saying “the tool X is fine, you’re just inexperienced/undisciplined and using it wrong”. Which is fair critique if I was an intern, but I have a decade+ experience in development and operations and I look at kubernetes in disbelief - why should things be that complicated? I get it, everything is pluggable and configurable, but surely this must be balanced out by making it more approachable and convenient?
You can’t sneeze in kubernetes without it requiring you to generate some ssl certs to the point where it’s just cargo-culture without any consideration of purpose and security.
And what’s up with dozens and dozens of bloated yamls and golang files? The fresh 30-odd commits ”official” flink operator is 3 THOUSAND lines of Go and 5 THOUSAND lines of yamls. How is that reasonable? In which universe is that reasonable? all it does is a for-loop that overwrites a bunch of pods to keep their spec in sync with desired config. There’s like 1000:1 boilerplate ratio in kubernetes and it’s considered good somehow?
Sorry for the rant, I’m just angry that we’re six decades into software engineering and the newest hottest project I the newest hottest line of work behaves like everybody should be paid per line of code they produce.
You can have a decade tech experience and still not know another system well. We all forget the learning we did to get to where we are, but I'm sure all the old reliable tools were frustrating at one point too.
Personally, I don't find kubernetes that complex, but then I did write and setup a schedulers for an early IaaS provider, so maybe I'm just comfortable with the problem, or maybe it's simply because I've been using it for several years.
Don't get me wrong, debugging overlay networking issues isn't something to love, but it's also not all that complex:-
- There's a worker daemon on every box that manages the local configuration, whether thats IPtables, IPVS, BPF or something else. There may be a seperate worker for service IP addresses than pod IP addresses.
- There's a controller that does the actual figuring out what things should be doing and lays out the rules for the workers. This might include network policy controller, but this might be in a seperate daemon.
This setup enables Service IPs, Pod IP addresses & Network Policy.
Obviously in ansible you can just write your own firewall rules, but as soon as you step away from running every app on every box, you'll either be relying on something as complex (but managed by someone else) like the cloud providers SDN, or you'll need to run your own system that does the same.
As much with anything, it depends what you're doing, but I like auto recovery, app level health checks, infrastructure as code, namespaces, resource quotas, and don't want to force my dev teams to couple their network policies with infrastructure details, so I'm fairly happy with the abstraction.
No snark or pushing opinion; I’m genuinely wondering how it is from someone who went through this path.
As a sysadmin who cares more about the reliability of services, still managing critical services outside of Kubernetes, I’m wondering what I’m missing out with Kubernetes.
Sure, the blue-green automatic deployment in k8s is cool, but a bit of clever Ansible scripting should get you there as well. It might be more busywork, but the amount of time spent nursing my k8s cluster in no way amount to time saving
The whole field it seems everywhere is filled with "introductory" "gentle" books/articles, and then "this is how you do reusable rocket science with x".
Pro tip: to understand kubernetes in between, go read the manual pages of Linux networking and get a really good grip on iptables. Go read the manual pages for linux namespaces, cgroups and containers with lxc.
Why dont people get the basics of the parts of the tool they are going to use, first, instead of trying to "understand a tool"? You wont succeed doing anything with kubernetes, if you come from say macosx or windows envrionment, and have no clue how/what kubernetes is built on.
Maybe if you use GKE that might be the case, but otherwise, running Kubernetes is a fools errand if you don’t have extensive experience with systems and SRE, imho, and anyone telling you otherwise is selling snake oil. Sure, you might get lucky and never have a major issue, but do you really want to depend on luck?
> I learned in the last 20 years working as a systems engineer that the best tools or stacks are ones easy to understand, debug and fix.
Sure, that’s an ideal goal and one I strive for too when possible, but there’s a reason why many SRE folks make a very large salary compared to national averages. Hard work is expected from time to time.
Disclaimer: I'm a n00b when it comes to web stuff.
An autoscaling group in AWS might have issues if you complicate it a bit and run into edge case bugs, but otherwise it's most definitely going to scale your instances up and down (because they bill you to do so)
Simply said: buying basic VM/dedicated machines has been around long enough that it is not a black box for professionals.
One nitpick though:
> do autoscaling (using AWS)
But that would mean that you're partially locked in on AWS? Only when it comes to auto scaling bit still...
You're either locked into a provider for that or locked into the team that runs your own datacenters.
not saying that you are incorrect, but "basic" is relative.
To understand Linux networking should we also understand basic linux kernel ? or basic chemistry to understand what happens inside the cpu ?
We should all stand on the (stable) shoulders of giants. I would call that good (also relative indeed) documentation that abstract us from the level below/above.
Even if there is an abstraction layer with nice boundaries and well documented, you can still not ignore the layer below and pretend its raining.
When you are making a web app (for advertising purposes since this is HN), can you ignore understanding HTTP and TCP and IP? Would you hire a web-application developer who didnt know how to setup a web-server or load-balance his app?
My company does this all the time.
In fact, in the absence of our step-by-step guides, I would estimate that maybe 5% of the development team could successfully configure their local web server on their own.
I don’t use most of the fundamentals I learned in school directly on a daily basis, but I do use them almost every day to inform the decisions I make about higher level things.
Additionally, these layers are prone to rapid change. To take one example Cilium and other players in the container world are looking to replace iptables with eBPF. So learning iptables becomes obsolete.
For me, Kubernetes is cool and if you have enough scale to need it, very useful. The problem is (like most IT hype cycles) it's getting used in inappropriate places, where simpler more basic solutions could work just fine.
I'll soon have to manage 50 remote bare-metal servers, all of them in different cities. Each one of them will run the same dockerized application. It's not 1 application distributed in 50 servers; it's 50 instances of the same application running in 50 servers, independently.
A frequent use case is to deploy a new version of this application on every server (nowadays I do it manually, it's OK since I manage only like 10 servers).
A nice-to-have would be to avoid downtimes also, when I update the application (nowadays I have a downtime of 2-5 minutes (when it goes well), which matters for us).
If you don't care about a centralized API to probe status and manage each instance, Ansible should be enough to orchestrate these installations and with little effort (that also depends on the application at hand) getting zero-downtime rollouts with Docker can be easily done with it.
However, if you want a single control place to probe status and want to avoid writing your own rollout scripts, Hashicorp's Nomad  might be a good solution for this. It is a lot simpler than Kubernetes while still giving you nice primitives to describe jobs/services, health checks, rollouts strategies and etc. Treat every site as a datacenter of its own, setup a job of type "system" (akin to Kubernetes DaemonSets) and all you need on these sites is internet access to your HTTPS endpoint of the Nomad control plane.
If you want to talk more about this, hit me up on Twitter or Telegram, I use @rochacon as my handle virtually everywhere.
Edit: grammar and typos
It's a bit overlooked now because every DevOps person nowadays seem to think Kubernetes is the only rational thing as it will look good on a CV.
I predict Nomad will be on the upswing the next few years as people realize Kubernetes is extremely complicated to self host.
I had to write my own custom blue/green deploy script to hot reload traffic to proxy_pass definitions in nginx upstream configs since I don’t use Docker.
Since docker swarm knows if an instance is down, it will know to not use that instance.
If you need a practical suggestion use simple solution like LXD . But as you are already using docker stay with it and ansible.
As you are using ansible you are already doing infrastructure as a code, you are way ahead of curve already.
Probably when you grow a bit bigger use packer  and terraform , but I think ansible will do just fine.
Kubernetes was designed for Google kind of problem the burden to maintain is quite high with lots of moving parts, unless you use Google GKE or Amazon or managed kubernetes service. So don't fall for it unless you need it for your resume.
> Digital Ocean now offers managed Kubernetes for free
GKE is just as "free" as DO, in that you only pay for the nodes you actually use (those nodes are probably a bit more expensive, admittedly). EKS is the outlier in this trio in that you also have to pay (a lot) for the master. Could you clarify what GKE could do to "follow suit"?
If you have a single node not a cluster in each location then k8s buys you a lot less. Some people still advocate single-node k8s "clusters" for benefits like rolling deployment.
I only have limited experience, but Kubernetes works very hard to abstract away the actual machines, and it works best that way: you just say "Deploy 20 instances of job X", and k8s will somehow find the machines to deploy them. You don't care where they are running - k8s handles that.
Once you start to care about actual machines and which jobs are running where, k8s starts to make less sense. You're paying for huge complexity (required to abstract away the machines), but you're getting none of the benefits - it just becomes a glorified wrapper around Docker daemon.
We run on multiple clouds, so vendor apis and k8s implementations are not usefull to us.
If you have a 100 user SaaS platform, do you run Linux containers on top of Linux just for isolation?
Docker is useful when you don't want to care about the underlying operating system or you might be deploying on different operating systems etc. Also if you want to quickly switch providers (like going from Ubuntu 18.04 on digital ocean to Windows Server on Azure) since it's a self contained image.
This self contained image is useful during development too. I can cleanly separate dependencies between applications and dev/test/prod. For example I develop a node app based on node 12 + pg 11 but I tinker with ghost too which requires node 10 + mysql. Having this in Docker keeps cruft and complexity off my macOS installation at the cost of space. Having multiple versions of a runtime/database etc can quickly become a nightmare to manage on local operating systems.
Operationally if you can control deployment and you don't need many servers, deploying to the base operating system makes sense. Docker/Kubernetes shine when you have scale + need to provision many servers across different platform providers.
On that same note... when do Terraform / Ansible start to make sense?
in dev: You can setup dependencies without worrying about underlying os. So it is easier for building a new dev machine, adding a new team member, etc.
in integration test: you can setup multiple integration test environment as long as you have enough resource. Running multiple integration tests without interfering other team members is the major benefit here.
in prod: as you may know, for isolation and horizontal scaling. Easier roll back and update?
User-to-user isolation is hard, and extremely hard if any of the users could be malicious. Docker is broadly neutral here - adds a few tools, might add a few vulnerabilities.
I'm really upset with the way Google let their marketing team run roughshod all over the place with that software. Kubernetes is almost never the tool to use. It's entirely insecure, overly complicated and almost never fits the intended supposed benefit. Worse it feels that the entire CNCF ecosystem is ran entirely by marketing people with "developer evangelists" that have never coded a single line of code in their day - it's a real shame and quite honestly an insult to professional engineers.
Ansible/Puppet/Chef/Salt: Then you use this to install your stuff in the VMs. Just pick one of these and stick with it.
K8s is not about configuration management, it’s about dynamic application management.
Some parts infringe on areas where CM tools work as well, but k8s is all about managing containerized applications.
Trying to set up flows for ”works on my laptop”-dev, ci/cd, loadtesting, a/b tests, canary releases, autoscaling and rollbacks for multiple teams of devs?
K8s really simplifies these things.
The idea is that you have one api spec to rule the whole stack (from a dev perspective).
If you go down a more light-weight stack, a lot can still be achieved. More duct tape required though. That being said- I love duct tape!
Those are tools for configuration management. If you instead use packer, docker, you can build your vm/image at build time, and use Terraform to setup vm with that image. Use etcd (in the image set to pull config) or similar key-value for distributing configuration.
Not "setup base vm with terraform" and then "ansible to install and configure it". Just build your vm/container image with the software you want already installed, and a etcd or other pull-configuration from a pre-set source. Done.
Now you dont need configuration-management, and dynamically changing infrastructure, since you moved it to the build-step.
I’ve never seen this combo of requirements on any service before, what is it? Or are my assumptions just wrong about what these things suggest about your business?
Ok, joking aside, intruiging as there seems to be a few companies providing monitoring services for various things. How did you all find the market for this?
Obviously you don't have to spill the company sauce but I'm trying to imagine the benefits of having all of this data.
So hmmmm, a few things to think about for the tooling:
() How quickly can you notice a single-node service outage and react / restore service?
() How much downtime is ok? Is it ever ok to fail to process data for an entire day for one of the ~50 markets (can you just say "this report is incomplete" and reprocess it later)? How much wiggle room do you have (how long does it take 1 instance of the application to do its local processing job on 24 hours of data, and how long does the downline data pipeline have?)
() Are you responsible for the part that watches TV + saves it to disk, and ALSO the part that batch-processes the saved video and produces the ad metadata / quality metadata? Are those different services deployed separately?
() Yikes, is there, like, a person on call in each of 50 places who can unplug the thing that's watching TV and plug it back in, if a wire burns out or if the machine that's watching TV fails? That sounds like it could be an operational nightmare but maybe this is a solved problem / maybe you can buy watch-TV-and-save-it-to-disk as a service?
() How do you make sure that every node is running the same version of the software with the same version of all the configuration?
() How do you release new versions of the software? Do you (a) have a non-production environment which runs side-by-side at every node, processing the same data, and measure that the output is equally correct or better, the performance is the same or better, and there are no new errors or failures -- then promote the non-production code to production automatically after a while? Or do you (b) deploy a "canary" to one of the 50 nodes and watch how it performs for a while, then deploy to the rest if they all behave okay? Or do you (c) just ship the latest code to all 50 nodes on Friday nights when no one's looking, and check some health metrics over the weekend, and roll it back on Monday if a metric looks bad?
() How do you keep track of what software versions you released when? If someone asks "What changed on 'X date'" does your tooling let you tell them pretty quickly and pretty accurately?
() If you discover a bad bug (the parser is corrupting data; the new parsing job has a slow memory leak and fails catastrophically after 21 days of uptime) and you need to make a change in a hurry, do your runbooks let you make that kind of change quickly?
Doesn't seem like you really need k8s or ansible or chef or whatever for this, as much as you need to write down your operational requirements and decide what technology will meet them.
That said, maintaining a large cluster is complex too. Often, there is much more involved than making sure each image's config files are up to date. Kubernetes shines when you have a number of different docker images, where each needs its own unique network, scaling, cpu, and disk settings, i.e., hardware requirements that can't really be controlled with just config files.
In your case, the fact that all of your images are identical probably cuts against using Kubernetes. However, the fact that these images are running on different networks where each may require their own tweaking leans towards it. If you expect to start adding many different docker images to your cluster, with different hardware requirements, it's probably worth the learning curve to get on to Kubernetes.
Disclaimer: I've never run Kubernetes on bare metal, just hosted environments. Bare metal may make Kubernetes much less attractive.
* A unified control plane (monitoring, statusing)
* An automated rolling-upgrade system
* "self-healing" in the form of automatically restarting your containers if they go awry
However, for your use-case, it is likely more effort than it's worth:
* Settin up and maintaining Kubernetes on bare metal servers is a huge amount of work
* A lot of Kubernetes features are not needed for your use-case
* The Kubernetes networking model essentially requires you to set up private network spanning all of your sites in order to work but provides almost no benefit for your use-case
Applications should just be a `Deployment` manifest with the type of distribution being a child setting (X replicas with affinity or 1-per-host). There's no reason to have competing ReplicaSets and DaemonSets just for this detail.
Win with k8s is that once you’re tuned, a lot of “config crap” is abstracted away, basically.
But - there's a reason ansible is supported specifically through the “operators” SDK, something we use to manage seamless upgrades of a k8s deployed app:
Edit - forgot that if meteics based autoscaling of the app is a requirement my experience is that k8s makes this a breeze. Probably not a req if using single node hosts though...
I switched to saltstack some time back and it's much better while having a similar sorta interface. The docs are worse than ansible imo tho. ymmv
See ControlPersist/ControlPath OpenSSH client configuration.
It's not meant to have the connections persist forever, just long enough for repeated short ssh connections to have almost no overhead, and therefore be much faster.
No, you don't need Kubernetes (though fwiw if you used it what you'd have would be 50 nodes with a single daemonset, 1 pod of the app per node, if that helps understand) - but I'd suggest not using Ansible.
Instead, use Packer (or similar) to create a machine image/snapshot/whatever your server provider calls it, and then deploy that same image on all 50.
You might have some small amount of host-specific things left to configure, and by all means use ansible for that if you want, but there's no need for error-prone running of the same large playbook 50 times.
Personally, I find that final setup is little enough to do it with Terraform (which is provisioning the servers anyway).
Based on my own experience I believe ephemeral clusters can solve a huge number of problems dev teams face using Kubernetes.
Sugarkube currently supports launching clusters using Minikube, EKS and Kops and we'll be adding provisioners for GKE and Azure in future. Sugarkube also works with existing clusters, so you can use it deploy your applications. It's a sane way of managing how to ship infrastrucuture code (e.g. Terraform configs) with your applications that need them.
Sugarkube also supports parameterising applications differently per environment - you could almost view it as something like Ansible but that was written with Kubernetes in mind. And it's in Go so it's way faster than Ansible (an early POC was actually written in Ansible and it was very slow).
I've just finished intro tutorials for deploying Wordpress to Minikube  and EKS . I'd be keen to hear feedback. We just tagged our first proper release earlier this week and it's ready to try now.
It's something that we've ended up implementing on our own with a lot of Terraform, but that's had its own obstacles and is something of a small maintenance burden. I'll be taking a look at sugarkube!
But the cool thing is that you could then go to the cloud whenever you wanted to. If you have some hard dependency on some cloud infrastructure, you could just use Sugarkube to spin up an isolated dev cluster on EKS for example. The EKS dev cluster could look the same as your prod cluster in terms of versions of software installed, but just use fewer, smaller EC2s, and again, perhaps just running a subset of the applications in your overall ops cluster.
Once you've developed in isolation, you could then deploy to a staging cluster which could again have been brought up - either just for this task, or more likely at the start of the day/week. Finally you could then promote your updated version of Jenkins into your prod cluster. For major upgrades to the prod cluster you could use Sugarkube to spin up a brand new sister cluster and then start to gradually migrate prod traffic over to it. Once all the traffic is going to your new cluster, just tear down the old one. If there's a problem, back out by sending all traffic back to the old cluster. Of course this last ability depends on something at the perimeter like Akamai (I think AWS have something similar?), and is a lot easier if your state is outside the cluster (e.g. in hosted databases, S3, etc.) but it'd be doable.
On projects I've worked on I've seen so many problems that basically came down to long-lived clusters where someone set them up a year ago and left/forgot how they worked. Or because performing upgrades was a nightmare they weren't done, etc. 100% automation of ephemeral clusters just solves all those problems.
Guides like OP's are great to get started, but I want to know that weird_flag_that_almost_nobody_ever_sets exists in random_service_parameter.
I've often been able to write complex and correct manifests, just by recursively following the links and filling in the YAML as appropriate.
for now, a few books (kubernetes patterns is the current one i am reading) and articles are helpful. i would like to note that some of these concepts had their names changed overtime. i suppose there will be more changes as the platform stabilizes. mind you guys that kubernetes is only 5 years old...