A story about a Kubernetes migration

zedpm · on Oct 2, 2018

It sounds like they really wanted to switch to K8s and rationalized it. The cons of their existing solution are minor and easily addressed with correct use of Ansible, and the massive complexity of K8s is understated.

As an example, they suggest that there's a heavy cognitive load associated with having devs run some Ansible playbooks, and then argue that to avoid that, they just have to introduce an entirely new toolchain via workshops and tutorials. Right.

halbritt · on Oct 2, 2018

Regardless of your skepticism, the benefits are real.

Scaling applications in k8s, updating, and keeping configs consistent are a great deal easier for me than using Ansible or any other config management tool.

In the end, it's a singular platform that one can build tooling against that allows an organization to abstract away the infrastructure. My team has done that (on top of k8s). As such, a developer can spin up a new environment with the click of a button, deploy whatever code they like, scale the environment, etc. with very little to no training. Those capabilities were a tremendous accelerator for my organization.

Sure, you can build something similar with Ansible on AWS, but then you're married to AWS, you have to worry about sizing, and the cost of idle instances. In my experience, it's just a great deal more overhead.

falcolas · on Oct 2, 2018

With ECS running on Fargate, idle instances don't exist. Throw in service autoscaling, and you have a simple scaling solution with no K8s cluster management required.

oppositelock · on Oct 2, 2018

Or, you use EKS, no k8s cluster management required either.

I'm running a production service on EKS, also tried it on GKE. Both take away most of the cluster management pain.

halbritt · on Oct 2, 2018

Agreed.

halbritt · on Oct 2, 2018

Given the adoption of Kubernetes, would you honestly recommend that someone seriously consider ECS?

zedpm · on Oct 2, 2018

Yeah, I would recommend it in certain circumstances. EKS has $150/month base cost for the control plane, so for small environments it's too expensive. For groups with existing experience with Docker and Docker Compose but no experience with Kubernetes, it's fairly easy to get things working on ECS. If you don't have a whole ops team with time to build out all the tooling to make k8s use easy for devs, then again k8s probably isn't the right answer.

Kubernetes is great, but you're not being honest with yourself if you can't acknowledge the difficulty in going from 0 to production-ready. There is a ton of complexity and lots of grief on the path to a fully functional k8s environment.

halbritt · on Oct 3, 2018

That last part, I wholly agree with. The learning curve is steep and difficult.

falcolas · on Oct 2, 2018

In addition to Zed’s comment, ECS is remarkably mature and for many cases has feature parity with K8s. It’s much more mature and better supported than EKS right now.

Migrating between the two isn’t even all that difficult if you change your mind later.

user5994461 · on Oct 2, 2018

You still need ansible to setup the hosts and the databases.

halbritt · on Oct 2, 2018

Not in my environment.

Creating a cluster is a one-liner, no Ansible required. The node pool comes with that, which sets up the hosts. Databases are all created in cluster with a helm chart.

Many people will advise against running stateful workloads in k8s. It's tricky, sure, but the benefits are still there.

As a result of the orchestration my team has built on top of k8s, any developer in my organization can clone any production environment at any time with production data. Those cloned environments can be configured to receive streaming updates from the production environment, once created.

Developers can test bug fixes and features on live streaming production data with absolute certainty that they won't break anything. This capability is immensely valuable.

mosselman · on Oct 2, 2018

I am setting up a swarm deployment of one of my apps as an experiment and I must say the learning curve is hardly there. I tried kubernetes, but I found that most resources that try to explain how it works are focussing too much on github-size deployments. I just want 2 instances of my app, a database and traefik with lets encrypt. Does anyone know of a proper resource for the 'just a tad more than dokku' size?

shawabawa3 · on Oct 2, 2018

Setting up a kubernetes cluster itself is probably the biggest hurdle. Also, bear in mind if it's just for a single service the resource overhead of kubernetes may be significant, possibly even more than 50%.

I'd strongly recommend using a hosted k8s - either GKE, EKS, or I believe digital ocean have just released one.

If you want to use an existing VPS just to test it out, see the docs here https://kubernetes.io/docs/setup/independent/create-cluster-...

Once you have the cluster running, kompose[1] might be a nice tool if you're used to using docker-compose, however I'd say just use it as a guideline - you'll probably want to rewrite most of what it generates at one point or another

[1] https://github.com/kubernetes/kompose

PedroArvela · on Oct 2, 2018

If you take the approach of a managed kubernetes instance, I really recommend GKE, everything is taken care of for you.

On the other hand EKS offers what I'd call a "managed kubernetes master", everything else is still pretty manual.

halbritt · on Oct 2, 2018

Triple-backing this recommendation.

My team has deployed in AWS using KOPS and EKS, Azure with AKS and ACS-engine, and GKE in GCP

GKE is far, far easier to manage, update, etc.

sandGorgon · on Oct 2, 2018

Same here. I think if you have less than 100 servers, then k8s is a real overkill.

Swarm is much easier to reason about and run. It's a godsend for startups without dedicated devops.

hinkley · on Oct 2, 2018

> dedicated devops.

I have dreamed a dream, but now that dream is gone from me.

mosselman · on Oct 2, 2018

Swarm is working out really nicely so far.

lykr0n · on Oct 2, 2018

You can try Nomad. It's almost stupidly easy to setup and as long as you're comfortable in a command line- easy to hit the ground running.

The one downside is that it doesn't have a feature complete UI and there arn't any good ones out there that do what the Kubernetes webui does.

zerogvt · on Oct 2, 2018

Maybe you can find a similar configuration as a helm chart (https://helm.sh/) (i.e. a k8s pkg manager of sorts).

mosselman · on Oct 2, 2018

Yes, I saw helm and then I had to learn about yet another thing. Its all nice enough technology, don't get me wrong, but I think it isn't for 'running your own heroku' for your side projects. Maybe I am wrong though.

halbritt · on Oct 2, 2018

If what you want to run is already a helm chart, then that makes for a good starting point.

sambull · on Oct 2, 2018

Have you taken a look at Rancher [0]? Easy to spin up a couple nodes and manage them in a single place. It takes care of most of the issues I've found tedious for you with no issue.

[0] https://rancher.com/

mosselman · on Oct 2, 2018

Thank you, but yes I have already tried Rancher. I have had it running with lets encrypt, etc. But it was pretty difficult to juggle all the different concepts: k8s has different concepts than rancher. What I also didn't like was when I had setup some things with the UI, I couldn't figure out how to create yamls for them.

tekkk · on Oct 2, 2018

I hope that the original title of the story was intended sarcasm: "Unbabel migrated to Kubernetes and you won’t believe what happened next!"

But so they managed to consolidate their infrastructure around Kubernetes and Google Cloud which made the management of their servers easier and faster? I wonder how much actual money they saved but I guess it will pay off for them in the long run.

I've been dabbling with Kubernetes for some time now but God forbid it can be a bit complicated. Time required to become well-versed with Kubernetes is a hefty investment which is not for all organizations. Lots of small things that can drive up your blood-pressure when figuring them out. Were it simpler I would be much more inclined to be using it but now it's only in the "learning for funsies" -category. I feel people who've developed k8s have been more of the theoretical sort and not the regular-joe-dummy-kind like me.

falcolas · on Oct 2, 2018

Saying that Kubernetes is a bit complicated seems like saying that water can be a bit wet.

Even their documentation can't keep up. And with a release cycle of 3 months, and a deprecation cycle of 6 months, you need a team dedicated to keeping up with K8s state-of-the-art; so much of that knowledge you picked up a year ago is at best stale, and at worst wrong.

Sure, it makes setting up and keeping a set of containers up simple. But that's never really been that hard.

To paraphrase an article from a few weeks ago:

"We made microservices to address the problems with monoliths."

"We made containers to address the problems with microservices."

"We made Kubernetes to address the problems with containers."

eeZah7Ux · on Oct 2, 2018

"Now we have a distributed monolith that requires 2x less developers to build and 5x more system engineers to deploy"

falcolas · on Oct 2, 2018

Well, to be fair, all of those developers probably found themselves filling a systems engineer role "because the product developers are best equipped to handle the running and support of their own applications".

cdelsolar · on Oct 2, 2018

deploying: `kubectl apply -f file_with_changed_docker_image.yaml`

halbritt · on Oct 2, 2018

My org made a similar transition. It's hard to articulate exactly precisely how much we saved as our "production" or revenue generating environments are roughly the same.

We transitioned out of AWS where we had relatively well managed instances of our stack managed with chef and terraform to GKE in Google Cloud where we migrated to a helm chart and custom orchestration tooling on top of that.

Prior to the migration I'd say that 80% of our instances were idle. Currently, all of our k8s nodes with 16 cores are running with an average load 5-7. We try to keep enough headroom to prevent any waiting or queuing, which is an entire medium blog post unto itself.

So, roughly the same or a little more "production" workload, but the number of non-prod instances of our stack quadrupled. Anyone in the org at any time can spin up an instance of the stack for a custom sales demo, to debug an issue, to test a feature, or anything else. There was a great deal of pent up demand that nobody expected and which caused my team to thrash a bit to catch up to when we made the transition.

All in all, our GCP spend is about 20% less than our AWS spend was. We're getting a lot more utility for a little less money.

falcolas · on Oct 2, 2018

Idle instances are what autoscaling groups were made for. Frankly, if there are a lot of idle instances, AWS features are not being taken advantage of - that's not on them.

adwf · on Oct 2, 2018

Only if you've got a homogeneous workload that can scale out.

More typical is Team A working on Project A spin up a test DB + test app server. Team B also spin up a test app server + web server + DB for project B

Then Project A gets productionized and you have SIT/UAT/Stage copies of all of that sitting mostly idle.

Then project C comes along and the devs need to test on a 3 node C* cluster with 3 kafka brokers...

Suddenly you have a whole bunch of dev environments sitting mostly idle. No-one would ever fork out on reserving a t2.medium, but they all add up to $$$$$ every month. With k8s you can reclaim all that idling power, reserve some beefy instances, whilst also gaining easily deployable artefacts, CI/CD, production scaling, etc.

falcolas · on Oct 2, 2018

But those use cases don't change between AWS and GCE. They only change if you start using containers in a container environment; something possible in AWS and GCE.

halbritt · on Oct 2, 2018

The distinction is between traditional config management and instances vs. Kubernetes. I could've seen similar benefits running k8s in AWS. I simply chose GKE because it's superior to other offerings.

There are other things to like about GCP vs. AWS, but that's a bit tangential.

wgjordan · on Oct 2, 2018

Seconded. If the primary driver of this transition was 'getting a lot more utility for a little less money', then simply adopting Auto Scaling Groups within the existing AWS setup could have improved instance-utilization with much less migration effort.

TeeWEE · on Oct 2, 2018

For me kubernetes is also a breeze. There is some learning curve because we started with Helm, Tiller, Grafana, Prometheus right from the start. But the kubectl command is easy to work with, and the k8s yaml files are really a breeze of fresh air compared to Ansible playbooks.

We're not on production yet, but moving soon.

shawabawa3 · on Oct 2, 2018

> k8s yaml files are really a breeze of fresh air compared to Ansible playbooks

Hah. I'm a huge kubernetes fan but not sure I can agree here.

k8s yaml files are the most verbose and spammy things imaginable.

granted, ansible playbooks can be horrific, but i'd say that's more down to the authors of the playbook than ansible itself.

barrkel · on Oct 2, 2018

Ansible is a glorified templating language for composing, distributing and executing shell scripts.

K8s is designed around a desired state of the world with control loops.

The two are very different conceptually, and lead you in different directions organizationally.

Ansible encourages you to code the derivative and hopefully approach the integral, whereas K8s encourages you to code the integral and infer the derivative in your controller, if that makes sense.

PedroArvela · on Oct 2, 2018

This is definitely a point.

Kubernetes resource definitions are verbose, but you can expect them to always be about that verbose and nothing else.

Ansible playbook instead really depend on the author, they can both be works of art or abominations.

geerlingguy · on Oct 2, 2018

I imagine as Kubernetes becomes more popular there will be a lot more of these abominations present... similar thing has happened in popular programming languages—as they are more widely adopted, early adopters who were more focused on quality and correctness are fewer, and new devs who do 'all the wrong things' are much more prevalent.

It's more of an issue with your organization's (or in some cases, personal) process if you allow abysmal code to get checked into your codebase :) Even Ansible has easy to integrate linting and testing tools.

y4mi · on Oct 2, 2018

Uuuh, seriously?

I always preferred ansibles to kubernetes yaml

I'm using both daily and can work with either though

SmirkingRevenge · on Oct 2, 2018

Can't speak for the OP but I dislike the direction they seem to be heading, incrementally (and perhaps accidentally) - yaml as a Turing complete programming language.

geerlingguy · on Oct 2, 2018

> yaml as a Turing complete programming language.

If someone is authoring Ansible playbooks this way, this is definitely not a best practice. Code should go into modules, plugins, filters, etc. Playbooks should be YAML, with extremely minimal use of any coding constructs.

mgoetzke · on Oct 2, 2018

I have not worked with Kubernetes yet, but I do have experience with ansible and I was under the impression that Kubernetes is working on a higher abstraction level than ansible.

Do kubernetes files really concern themselves with little details such as how a database or application is configured ?

I assumed that kubernetes is more about having 'images' of pre-installed machines (e.g via ansible) and having kubernetes just 'clone' them into production and interconnect them.

ryukafalz · on Oct 2, 2018

You are correct, Kubernetes does operate at a higher level of abstraction. By the time you're deploying to Kubernetes, you'll already have images that can be used to run your applications.

However, those images typically will be unconfigured aside from sane defaults. The final configuration (connecting an application to a database, etc) is indeed handled through Kubernetes.

SmirkingRevenge · on Oct 2, 2018

I'm having the same experience, more or less.

The one pain point, for me, is still the development workflow, which is still lacking compared with told like docker-compose.

However, skaffold seems to be quickly closing that gap and I'm pretty excited about it.

zerogvt · on Oct 2, 2018

It seems that k8s has won the deployment race by and large. I see a lot of success stories around (I'm hearing nice things from the DevOps teams in my organization as well). Yet I'm curious to hear a few cases where things did not pan out quite right.

Note: The 5-15s DNS problem seems a pretty serious one. Weird that it didn't get more publicity (and a proper fix).

zimbatm · on Oct 2, 2018

There are a lot of things that can go wrong with K8s but there is always a way to fix them. For example a common mistake it to forget to allocate limits on pods, which then brings the worker node to capacity. I think the failure scenario is soft, it's just going to cost more engineering time to figure out how to upgrade the cluster to the new version, find out why this network overlay isn't performing as expected or debug this external resource that isn't being allocated properly, configure RBAC properly, play with various resource deployment strategies, tune how pods are being moved during a node auto-scaling event... The nice thing is that at the end it gives a unified API for all of the things, it forces some consistency in the infrastructure.

My personal rule of thumb is that unless the client specifically need auto-scaling or have more than 100 services to run, have a 5 people devops team, just use Terraform.

For a small number of servers a better strategy is to have a base image with Docker and monitoring, and use Terraform to deploy the infrastructure. CI can then use docker-compose to deploy the containers onto the hosts directly. This approach is much more stable and doesn't require to learn as many things as K8s. This can be run by a 1 man DevOps team without a sweat.

geggam · on Oct 2, 2018

Hard to fix something broken by design.

Using TCP as the main message bus then using layers upon layers of NAT needs to be revisited, routing is the solution.

geekuillaume · on Oct 2, 2018

I'm working with Kubernetes recently and the learning curve is quite hard. I hope the team will improve kubectl to make it more user-friendly (error messages are hard to understand for beginners).

A lot of cloud providers now have a way to easily deploy and manage a k8s cluster on their servers but I cannot find a tools that help with the deployment of a basic service, something like dokku but on Kubernetes.

http://dokku.viewdocs.io/dokku/

halbritt · on Oct 2, 2018

Have you looked at helm?

beat · on Oct 2, 2018

If you find Kubernetes a headache at first, consider looking at OpenShift. It's Red Hat's wrapper for Kubernetes, and does make some things easier.

arp242 · on Oct 2, 2018

I'm not sure if we've got enough wrappers yet. I think we want several more!

sulam · on Oct 3, 2018

"The amount of instances multiplying was also clogging up our DevOps team."

Let's pause a moment and appreciate that if you have a DevOps team, you're not doing DevOps.

Fishkins · on Oct 2, 2018

Since they mention it a couple times in the article, how do other folks handle auth for their k8s dashboards? I'm trying to figure out the best approach that right now.

colek42 · on Oct 2, 2018

You could build an authorization proxy that creates a token with the Kube API server and sets the Authorization header. This probably exists, but a project I worked on: https://github.com/boxboat/okta-nginx might be a good starting point.

tlynchpin · on Oct 2, 2018

Me too. I haven't tried any of this but here's a suggestion:

https://akomljen.com/protect-kubernetes-external-endpoints-w...

OJFord · on Oct 2, 2018

A story about a migration to Kubernetes.

(As it is, I expected it to be about migrating k8s version. (Still much better than OP though...))

user5994461 · on Oct 2, 2018

Facepalm. This entire blog post reads like they didn't figure out how to deploy to more than one server with ansible, while ansible is made just for that.