That said for many people taking some downtime on their Kube API server isn't the end of the world. The system, by design, can work OK for sometime in a degraded state: workloads keep running.
A few things that I do want to try to clarify:
1) The strict documented upgrade path for etcd is because testing matrixes just get too complicated. There aren't really technical limitations as much as wanting to ensure recommendations are made based on things that have been tested. The documentation is all here: https://github.com/coreos/etcd/tree/master/Documentation/upg...
2) Live etcd v2 API -> etcd v3 API migration for Kubernetes was never a priority for the etcd team at CoreOS because we never shipped a supported product that used Kube + etcd v2. Several community members volunteered to make it better but it never really came together. We feel bad about the mess but it is a consequence of not having that itch to scratch as they say.
3) Several contributors, notably Joe Betz of Google, have been working to keep older minor versions of etcd patched. For example 3.1.17 was released 30 days ago and the first release of 3.1 was 1.5 years ago. These longer lived branches intend to be bug fix only.
I understand that the implementation of live upgrades for a distributed database will be complex but this post is about the user experience. Given enough resources, is there a reason that it can't be a single "upgrade now" command? Or maybe slightly more real-world, a 3 step process like: "stage update" -> "test update" -> "start update".
for i in members
Replace etcd binary
Restart etcd process
When going through this, I did test it and see that the upgrade appeared to work, until I think it was 3.3 where it would panic, but didn't want to rely on the undefined / untested behaviour, even if it seemed to work in a lab. The interview was long, so as we were cutting it down I think this aspect got lost.
And thanks for the hard work on etcd.
Not only is the app now different, but the upgrade itself is going to be dangerous. The idea that you can just "upgrade a running cluster" is a bit like saying you can "perform maintenance on a rolling car". It is physically possible. It is also a terrible idea.
You can do some maintenance while the car is running. Mainly things that are inside the car that don't affect its operation or safety. But if you want to make significant changes, you should probably stop the thing in order to make the change. If you're in the 1994 film Speed and you literally can't stop the vehicle, you do the next best thing: get another bus running alongside the first bus and move people over. Just, uh, be careful of flat tires. (https://www.youtube.com/watch?v=rxQI2vBCDHo)
Physical systems, like a car, have obvious limitations on what can be modified when. Similarly, software will have some limitations on what happens when you are updating. But accepting "upgrades can't be done easily" for software is putting much more limitations on the software than makes sense.
I mean, if you use a plug-in style system, you can program it to block operations while a module is reloaded or something. But most software is not designed this way. Especially with ancient monolithic models like Go programs.
Upgrades just can't be done easily in a complex system. You can do them without concern for their consequences, but that doesn't mean they're safe or reliable methods.
If your app needs a container to run properly, it’s already a mess.
While what K8s has done for containers is freaking impressive, to me it does not make a lot of sense unless you run your own bare metal servers. Even then, the complexity it adds may not be worth it. Did I mention that the tech is not mature enough to just run on autopilot and now instead of worrying about the “devops” for your app/service you are playing catch-up with upgrading your K8s cluster?
If you’re in the cloud, VMs + autoscalling or fully managed services (eg S3, lambda, etc) make more sense and allow you to focus on your app. Yes there is lock-in. Yes, if not properly arhitected it can be a mess.
I wish we would live in a world where people pick simple over complex and think long term vs chasing the latest hotness.
Anyway go and try and build a typical application with Lambda/SAM. It is a nightmare of complexity and all you are doing is moving your logic to AWS where you pay 100x the cost of just running it yourself in a container.
And the idea that Kubernetes isn't mature is pretty laughable. It's used everyday by Netflix, eBay, Apple, Microsoft, IBM, Lyft, Uber, Square, Google, Pinterest, Stripe, Airbnb, Yahoo, Salesforce etc. And with AWS you have EKS which allows you to run containers in a HA and managed way.
EKS is still... early. I'd rather manage myself with Kops or Terraform in its current stage. And in fact, that's what we do at my company.
I manage a few hundred worker nodes, they are heavily loaded at all times but happily chugging along. My pager is silent and has been for a while now. Last k8s issue we saw was actually our own fault, due to a misconfiguration.
I'm the kind of guy who enjoys Python 2.7 because it's deprecated, it's done changing :)
I try remember the maxim, Happiness is a Boring stack.
Also if you look at a container long enough and squint you will see the words serverless emerge.
I also had the “pleasure” of working with K8S. If anything, this looks like a pretty good play from Google to get you to eventually run your workloads in GCP.
I've just rolled off a project (line of business web app + cluster of workers for background job processing), this more or less describes what we have running.
Some of the architecture is a bit wrong (it wasn't designed for AWS but shifted there after it had been running for a year or so) but the system works well enough to deliver value to the business.
As a developer I hate state, things that aren't properly isolated, ill-defined system boundaries -- but it's not obvious to me what the business case would be to containerise everything.
Containers allow you to move apps trivially between environments and guarantee that they will just work. It allows you to isolate dependencies between apps e.g. Python 2 versus Python 3. It allows you to move apps between cloud providers or between on premise and cloud. With platforms like Kubernetes it allows you to easily scale and self heal when nodes die.
And compared to rewriting your app in Lambda which is expensive and complex it is simple to build a container as almost every language has automated tooling.
Assuming your staff can keep your cluster un-screwed, am I right?
Technologies like vmware, (which also give you os independence not just linux flavors) if we're talking on premises also allow you to move apps between environments. It's trivial to, gasp, push a machine out of vmware player to the cloud even. Pick your poison, vmware, virtualbox, azure, aws, gcp.
I'll run kubernetes as provided by GCP or AWS for my clients if it's warranted (ohhh, you wanted that in "webscale", got it. ;) ), but I really feel sorry for all of the on-prem "enterprise" shops that have taken the hype bait and are now paying the maintenance burden under all of the false pretenses that are flying around. "Horror's of Upgrading Etcd Beneath Kubernetes" with dozens of production application instances, with uptime SLA's and real customer business impact? Indeed. Fraught with peril, and without the right staff onboard, disastrous.
Folks be like, hey a team of consultants just finished building out our new kubernetes cluster, now we want to run our mission critical oracle/mssql servers on it. They said it should work "fine". Too bad they all got jobs at insert mega capacity company here right after we cut the invoice.
Y'all remember when everyone would line up to get the latest microsoft windows beta? Flashback. Yo yo yo, XML is all the rage! Not. Relational databases are dead! Um, no.
Maybe this needs some time to travel down and back up the hype cycle curve. Power to all of you beta testers! You're truly doing gods work :)
I love all of the new development, we're going great places, we just need to prescribe the right medicine for the patients so to speak. Cocaine was great in coca-cola before folks realized what it was doing to people. :)
edit: now it's a club! http://boringtechnology.club/
> Let's say every company gets about three innovation tokens. You can spend these however you want, but the supply is fixed for a long while.
> If you choose to write your website in NodeJS, you just spent one of your innovation tokens. If you choose to use MongoDB, you just spent one of your innovation tokens. If you choose to use service discovery tech that's existed for a year or less, you just spent one of your innovation tokens. If you choose to write your own database, oh god, you're in trouble.
> There is technology out there that is both boring and bad. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough. MySQL is boring. Postgres is boring. PHP is boring. Python is boring. Memcached is boring. Squid is boring. Cron is boring.
- First off we went from manually managed servers to chef managing servers. That was good progress, because it allowed us to scale a growing application on a cloud provider due to a large new contract.
- Then we added vault in order to simplify secret generation, management and rotation in chef. It's cool, because now we have a secure secret storage. We can give our devs access specific access to the secrets of clusters they manage but not other clusters. We can script a lot of stuff around vault.
- Then we added terraform to manage VMs easier. We should have done that earlier, I suppose, but hindsight.
- And now our devs are having large issues with their docker-based test setups, so we can open up the consul cluster and deploy nomad for this use case. We'll probably migrate some other services into that nomad cluster so we can get them loadbalanced with little effort. We'll probably shuffle some annoying things in chef around and use consul-template there.
I like that approach, because it is problem-driven and converges to simplify existing problems. For example, we have an elastic stack, and we won't move the elasticsearch cluster or the influxdbs around it away from chef on bare metal. It's a solid and stable setup, why change it.
Not to mention that a lot of people don’t understand what a container is.
That's part of the appeal, the abstraction away of all those pesky, irrelevant details. Developers just want their app to run, as has been mentioned elsewhere in the thread. That desire is understandable.
So long as the abstraction isn't too leaky and nothing underneath breaks, there's no downside. It's all upside in terms of human productivity and time to market.
Even introduced inefficiency (if any) is unimportant if VC money is fueling the auto-scaling. There's a popular aphorism about premature optimization. Humans are also, in general, far more expensive than machines, especially at scale, and even a mature/traditional company would ignore this at their peril.
If problems do eventually crop up with containers or the toolsets around them, chances are, by then, their sheer popularity will ensure the availability of a cadre of experts who can troubleshoot. They may even understand what a container is, even if it's terribly unfashionable to admit, as is the case with operating systems today.
Every new technology/tool has some growing pains and is subject to what some consider misuse (due to ignorance). That doesn't necessarily mean it's best to reject it outright.
However, having acquired a strong affinity for startups, I also accept that risk (even complete, betting-the-farm risk) is totally OK, so long as that risk is taken in an informed manner. Startups, especially early ones, are a risky proposition from the get-go, and VC money tends to amplify that.
The market doesn't much punish a web startup that grew like crazy but lost some user-generated content by using some "NoSQL" database when it first came out. It does, however, punish the one that failed to grow by being too conservative.
That's obviously a false dichotomy, but I believe that's essentially the perception that's created, partly by characterizing new technologies as dangerous because they're new and "shiny".
Ultimately, I consider mine to be a service profession and an engineering one. As such, if I think a new tech is too undesirable, it's up to me to provide an alternative that actually addresses the original problems (without offloading the burden onto my users/customers).
Well, I'll bite. Do you? What is a container?
I played with stuff like this since the chroot jail days and knew about containers before they were cool (anyone remember lxc?).
It is unpopular for a reason.
Disclaimer: if you can run solely on cloud managed services + serveless, please do that and do not even look at the rest of this message. This is a very nice approach, although there are some things you need to setup before calling victory (deployment pipeline is one). And, as you mentioned, there is vendor lock-in.
Now, containers. Look, no-one WANTS containers. Or VMs. Or anything else. We just want to run our stuff. It just so happens that containers are one of the most useful abstractions there are. Unless someone else comes up with a new abstraction, containers it is.
Because a container is at the end of the day, a process. Are you against processes? Or against process isolation in general?
You cannot lift an existing service and run serveless, you need to modify it. In many cases, it is not practical. In other cases, they need to be an actual server-like application and hold a connection. Lambda doesn't help there.
> While what K8s has done for containers is freaking impressive, to me it does not make a lot of sense unless you run your own bare metal servers.
One thing has nothing to do with the other. These are different levels of abstraction, there are challenges when running bare metal servers which are not present in cloud environments. Kubernetes can do so much more in a cloud environment (persistent volume claims, etc). Rolling out network attached storage on bare metal servers is a pain. It is also a pain with Openstack, but at least there is a standard interface there.
> If you’re in the cloud, VMs + autoscalling or fully managed services (eg S3, lambda, etc) make more sense and allow you to focus on your app.
Sorry, I respectfully disagree. I have spent the last two months implementing automation for deploying a cluster on AWS, with auto-scaling, auto-healing, the works, automatically deployed through Jenkins. It is NOT easy, it is not simple, and it is not focusing on my end application, unless you are ignoring all the technical debt you are incurring. And we DO have several k8s clusters, I will be moving that crap to k8s as soon as I can.
Let me make a quick list of what you need:
You can use a barebones (ubuntu|redhat|coreos|etc) VM. In which case, provisioning is not complete once the VM is up by the ASG, you need to install the app.
If you use an AMI, you now need to build automation to construct these AMIs. Note: if at this stage you are building AMIs by hand, this is a technical debt, which you will have to pay.
Alternatively, you can use something like cloud init. If so, see below:
* Create the auto-scaling group
* Create the launch configuration
* If your AMI is not entirely complete, add user data (or equivalent) here
* Set the health checks
And you are done! Right? No.
What about log rotation? Do you have centralized logging? No, tech debt. Go set it up.
What about monitoring? Are you using cloudwatch? Prometheus? Go set that up.
What about alerting? Not everything requires a VM to be destroyed, you need to set it up.
What about upgrades? Are these cattle servers? Then you have to modify your AMI and launch config. Go automate this (tech debt if not)
How are you controlling access? Do you have a team? Are they allowed to SSH? Where and how are you storing the keys? How do you invalidate if a key gets compromised?
I could go on, but let's keep at this level because the point is to draw a comparison.
With K8s, here's what you do:
Create a container image. Dockerfile, fancy Jenkins script, some other mechanism, I don't care. Create an image, put it in a registry somewhere.
Create a YAML file describing your 'deployment'. It can be a few lines of code if you don't care about most of the stuff.
If you need external access, you can create a service, which is another YAML
If you don't have an existing HTTPS load balancer, point to the k8s workers (trivial with something like ingress on GKE)
And you are done. This automatically gets you:
• Self healing
• Scheduling among worker nodes. You can control it or let K8s decide
• Logging (centralized logging requires a one-time step, with fluentd or similar, may be handled by cloud providers)
• Similarly, monitoring and alerting require a one-time investment in deploying something like Prometheus, after that is done. Getting prometheus to scrape your pods is very easy to do, easier than deploying in a VM by VM basis
• Upgrades: deployments handle that for you. Even replica sets before it, you just needed to apply a new YAML with an updated version
• There are no SSH keys to mess around. K8s has certificate-based user access control, with an optional RBAC
• The SSH equivalent is kubectl exec
• Service discovery: you have DNS records for all local services created for you. The cluster will direct you to the correct node.
• Scaling is trivial, but most importantly, quick. kubectl scale deployment --replicas=X. It only takes whatever time is required to download the image and run it. You don't have to spin up a whole operating system
• Optional: you can have horizontal pod auto-scaling, so your services can scale up and down automatically.
It is not perfect, but it can be a game changer. I cannot imagine how we would be running our operation without K8s. Actually I can: version 1.0 of the app was a bunch of VMs, one for each service. It was nightmarish. Now the push, company-wide, is to move everything to K8s. All VMs, all data stores, all of it. And it has absolutely nothing to do with hype, it has everything to do with proven advantages, compared to most of other alternatives.
I guess you could also do Mesos. They have a similar concept, only it's not K8s.
Note that SOMETHING needs to run the K8s cluster itself. That something is precisely your auto-scaling groups and VM images. It is less painful with a container-optimized OS (like CoreOS or whatever Google uses)
I'm getting k8s's goal to become a cross-cloud orchestration framework, and I'm as much of a standardization fan as could be. I just doubt the overreaching goals for k8s are worth it in the majority of cases, and have seen better business value in Mesos (though Marathon isn't where it could be) because you can realistically run it on your own premises.
I guess k8s is Google's vision for a self-service cloud platform that offloads everything to configuration details on a uniform matrix of nodes, and in particular such that Google doesn't need to provide customer support. I just don't see the benefit for the customer, considering we've been running POSIX workloads for almost 50 years now.
> I have spent the last two months implementing automation for deploying a cluster on AWS, with auto-scaling, auto-healing, the works, automatically deployed through Jenkins. It is NOT easy, it is not simple, and it is not focusing on my end application, unless you are ignoring all the technical debt you are incurring.
yes, i can appreciate that having a system automatically handle even some of this necessary plumbing in a reasonable and standardised way is attractive.
K8s is not going to solve the issues you outline above (logs, proper monitoring, etc). Even worse, you’re gonna have a bad time migrating them to a proper solution.
> If you’re in the cloud, VMs + autoscalling or fully managed services (eg S3, lambda, etc) make more sense
So running with container isolation is a mess, but serverless isolation is admirable?
That is I default to good design and testing rather than boilerplate orchestration and external control planes.
All containers have done (popularly) in my opinion is add complexity and insecurity to the OS environment and encouraged bad behavior in terms of software development and systems administration.
From my experience consul seems to have a better clustering story but I'd be curious why etcd won out over other technologies as the k8s datastore of choice.
That'd be some interesting history. That choice had a big impact in making etcd relevant, I think. As far as I know, etcd was chosen before kubernetes ever went public, pre-2014? So it must have been really bleeding edge at the time. I don't think consul was even out then - it might have been they were just too late to the game. The only other reasonable option was probably ZooKeeper.
etcd didn't have an embedded DNS server, etc. Of course, these things can be built on top of etcd easily. Upstream has taken advantage of this by swapping the DNS server used in Kubernetes twice, IIRC.
Contrast this with Consul which contains a DNS server and is now moving into service mesh territory. This isn't a fault of Consul at all, just a desire to be a full solution vs a building block.
The design of etcd 3.x was heavily influenced by the Kube usecase, but the original value of etcd was that
A) you could actually do an reasonably cheap HA story (vs Singleton DBs)
B) the clustering fundamentals were sound (zookeeper at the time was not able to do dynamic reconfiguration, although in practice this hasn’t been a big issue)
C) consul came with a lot of baggage that we wanted to do differently - not to knock consul, it just overlapped with alternate design decisions (like a large local agent instead of a set of lightweight agents)
D) etcd was the simplest possible option that also supported efficient watch
While I wasn’t part of the pre open sourcing discussions, I agreed with the initial rationale and I don’t regret the choice.
The etcd2 - 3 migration was more painful than it could be, but most of the challenges I think were excacerbated by us not pulling the bandaid off early and forcing a 2-3 migration for all users right after 1.6.
Both are atill much better to operate than ZooKeeper.
The alternate method, and the method we used before, is to use an existing cluster, as you mention. If cattle self-healing is that important, perhaps you could afford a small cluster only for bootstrapping? Load will be very low unless you are bootstrapping a node somewhere. There are costs involved in keeping those instances 24/7, but they may be acceptable in your environment(and the instances can be small). Then the only thing you need is to store the discovery token and inject it with cloud init or some other mechanism.
That said, I just finish a task to automate our ELK clusters. For Elasticsearch I can just point to a load balancer which contains the masters and be done with it. I wish I could do the same for ETCD.
1) Images / files / etc. It all lives in cloud storage ("s3"), outside of K8s
2) RDBMS data. You can just run as hosted sql (say CloudSQL) or a not-in-k8s VM. I have found no compelling reason to move my RDBMS into my k8s cluster.
Category 1: stateless-"ish" workloads. More than 90% of hosts/containers used . . . less than 25% of operations headaches and time. Issues that happen here are solvable with narrow solutions: add caches, scale out, do very targeted, transparent fixes to poorly-performing application code.
Category 2: stateful workloads. Less than 10% of hosts/containers. 75% or more of operations headaches and time. Issues that happen here have less visibility, fewer short-term fixes ("just add an index and turn off the bad queries" only works so many times before you're out of low-hanging fruit), and require more expertise to solve in a way that doesn't require the application/clients to change.
If k8s and other next-gen technologies are only easing the first category, that makes me sad. It's like we have this sedan (off-the-shelf web technologies) that we have to take off-roading and it falls apart all the time. I don't want a better air conditioning system and more cushions in my seats; I want the vehicle to not break.
> Kubernetes is not aware of the deployment details of Postgres. A naive deployment could lead to complete data loss.
That sounds ominous, but is actually a tautology.
You have the exact same challenges anywhere else, but since K8s makes some operations so easy to do, you need to be careful. RDBMs are specially tricky because most of them expect a single "master" which holds special status. And it so happens to hold all your data too (as do your replicas, provided they are up to date).
Surprise! IT is hard and requires experts. Either hire one, or become one, but either way, the idea that "nobody in my enterprise/company/team need to understand the details of how stuff works" is crazy. Imagine a car-mechanic shop where nobody knows how an engine works: "we plugged in the computer tool, and it said something is wrong with your engine. I guess you need a new one"
Not everyone complaining about the deficiencies of current-gen tools is doing so out of ignorance or laziness about their stack.
I use k8s for bin packing and easy rolling deployments. Neither of which matter for my DB (i.e. no such thing as an easy rolling postgres deployment (maybe citris?)), and I am not going to put anything else on my database server... so no bins to pack.
If you are doing one or the other, then sure worry about it from a k8s sense. But don't throw stuff in k8s for no reason. It makes stuff like rolling upgrades of your cluster way more scary.
One of the cooler innovations we've seen, and I think we're going to see more of, is the ability to take a non-k8s VM and expose it to the cluster as-if it were just another pod. This would let you schedule and expose RDBMs and other specialized servers through kubernetes while keeping them on a standard VM.
I think that's the win/win approach to bridge the gap.
But I think that doesn't go far enough: I want to provide K8s with a YAML file and have Kubernetes itself go and create and provision a VM for me. I.e, I want to "kill" Terraform. I don't care about a 'fabric' network (although that could be convenient), just give me an IP that I can reach even it if is external to the cluster. VMs could be just one more resource that can be created or destroyed by k8s, just like network-based storage today.
I haven't found this exact use-case implemented yet. If noone else builds it in the coming months I'll probably start doing it.
The base pattern should be pretty easy to modify, although the use case here is very specific.
Also agree on keeping state outside. Maybe the relevant tech will be mature sometime soon but we’ve seen orchestration bugs do nasty things to stateless containers that would have been a nightmare if state had been involved.
A K8s cluster can survive just about anything. Worker nodes destroyed, meh, scheduler will take care of bringing stuff up. Master nodes destroyed. Meh. It doesn't care.
ETCD issues though? Prepare for a whole lot of pain. They are very uncommon though. Upgrading is the most frequent operation.
There is no longer an etcd anywhere in Cloud Foundry.
Cloud Foundry's current orchestrator, Diego, is of similar vintage to Kubernetes. It now relies entirely on relational databases for tracking cluster state. Ditto other subsystems (eg, Loggregator). It scales just fine. MySQL, while not my personal favourite, has proved more reliable in practice than etcd. Some folks use PostgreSQL. Also more reliable in practice.
Paying customers care more about being more reliable in practice than being more reliable in theory.
I've ha-ha-only-seriously suggested we throw engineering support behind non-etcd cluster state. For example: https://github.com/rancher/k8s-sql
For the purposes of this thread, etcd is the underlying Kubernetes storage mechanism. For many practical purposes, that's all one needs to know, unless you are in charge of maintaining the ETCD cluster.