Why We Chose Kubernetes

jakejake · on Feb 26, 2016

I can't seem to get into the game of fully automated deployments to production. It definitely interests me but a few things always hold me back.

The first issue is that I've probably set up 10 or 15 app development and deployment "systems" if you will. I've found that it's very beneficial to automate the simple stuff but it quickly reaches a point of diminishing returns. A super-custom system always works great for a while until some big change or library upgrade or refactor or whatever comes down the pipe. Then we spend a ton of time resetting up everything. We have to keep the build system components up to date so it doesn't turn into an ancient mystery box. Sometimes an upgrade breaks the whole thing and then we're on stack exchange all day debugging a parser library or some other thing that we don't care about. Basically spending hours and days and weeks on the build system so we can have that sweet one-click (or fully automated) deploy.

The other thing is that we release frequently but we tend to double check everything before it goes to production. Our staging server is auto-deployed except DB changes which we do manually. Right now it's about 2-3 clicks for us to deploy to production and it works fine. We still do DB changes manually though. It takes a minute or two to deploy. I feel like the process encourages that final check that everything is cool.

I guess I'm nervous to set up something that deploys to production simply by adding a tag to a slack message or the git commit message. Should I get over myself? If I change my thinking is it possible that deployment to prod could be a non-event?

wcummings · on Feb 26, 2016

>I guess I'm nervous to set up something that deploys to production simply by adding a tag to a slack message or the git commit message. Should I get over myself?

Sometimes adding some friction to the deploy process can be good IMHO. Continual deployment isn't good for every product or every team.

dekz · on Feb 26, 2016

It can still get tested and smoke tested before deploying directly to production. If you cant trust your automated tests and smoke tests then you're setting yourself up to fail.

officialchicken · on Feb 26, 2016

Very true, but what's the ROI for full automation?

If a typical 2-3 click deploy generously takes an hour, and they do 40 per year... then it would take 1 year to break even presuming that a fully-automated system could be built and deployed in 1 man-week. If the deploy takes 10 mins, can it be built in less than 1 day?

Ignoring the development time for a fully automated system, I think the real question is, "how does a rollback and unscheduled downtime impact the ROI due to unforeseen problems?" because it will happen, eventually.

brianwawok · on Feb 26, 2016

Well how many mistakes do they make with those 4 clicks? That's part of the point of automation - remove chances for humans to flub.

reddit_clone · on Feb 26, 2016

Why downvote this? This is a valid point.

Automation is not just for saving time. It is also preventing the system from human errors.

People make errors due to fatigue, inattention etc. performing even simple tasks.

makecheck · on Feb 26, 2016

I think it is OK to add tools to the process as long as the most critical parts are the least mysterious.

For instance, in a networked file system, the on/off switch for “move from version A to version B” ought to be about as simple as swapping a directory symbolic link. That way, everyone can see exactly what it is pointing to now, what it used to point to (as old version targets are probably in the same parent directory), and anyone can figure out how to roll it back instantly.

Given trivial on/off switches, the details of the rest of the system can start to gain complexity.

Also, it helps a lot to have something like a “beta flow” that is essentially a parallel replica of your production environment.

jasonjei · on Feb 26, 2016

Amazon ECS/Elastic Beanstalk was painful to use. Their Docker Image Registry requires you to have rotating keys. You try to setup Docker registry authentication with their container service, only to find out they ignore the authentication settings for Amazon hosted registries. Deployment errors give unhelpful messages (and after much googling you realize it's due to some IAM policy that you were supposed to add with that bit of info hidden in some marketing page FAQ, meanwhile losing 3 hours of sleep). Redeployments take a long time.

Google Cloud, on the other hand, was truly easier to use. Redeployments didn't take forever, and it wouldn't try to fail over and over for minutes before returning an error like AWS.

boulos · on Feb 26, 2016

Glad to hear GCR (Google Container Registry - https://cloud.google.com/container-registry/) is working great for you! It's main purpose today is certainly for Kubernetes users on GKE, but we also have people using it directly and for App Engine Managed VMs. Having a fast, secure, and cheap (just storage and networking) place to push and pull containers is really helpful.

Disclaimer: I work on Compute Engine, but didn't work on GCR.

wstrange · on Feb 26, 2016

I'd love to see GCR expanded to support more Docker Hub like functionality (a nice GUI, being able to search for public images, etc.).

And for public images, it would be nice if Google would foot the bandwidth tab :-).

Galaxeblaffer · on Feb 26, 2016

we've been using gke(managed kubernetes on Google cloud aka container engine) in production for a year now, and I've been super happy ! I've only had one hiccup where the loadbalancer for some unknown reason stopped routing traffic to my cluster.. I must say, Google cloud has become really really good and there are new features and improvements every month.. It's obvious that Google i channeling many resources into its cloud services.. I have full logging from all my pods, I have rolling updates, I have https load balancing, they just released a cdn, there's automated health checks, super nice cli tool that can control every service on the platform(gcloud) makes it really easy to script it all, advanced monitoring the list just goes on.. The only third party ops we use is Opbeat.. And no, I don't work for Google, I just really love their cloud service and i think it deserves more attention instead of everyone just defaulting to Amazon. I'm the only one i know who uses it.

wstrange · on Feb 26, 2016

I agree - Google needs to promote GCE more. Maybe they should buy some adwords :-)

Having used both AWS and GCE, I find GCE is just a better experience.

I love the cloud console and the cloud shell. The CLI tools work well. VMs start quickly.

For development, preemptible VMs are an incredible bargain.

ptrincr · on Feb 26, 2016

I may have misunderstood but it appears kubernetes as a service was chosen (via GCE), but this is compared to alternatives which you have to install by yourself.

This is slightly unfair as the setup and configuration of kubernetes on your own kit is fairly difficult, at least it was the last time I looked, especially the networking side of things.

SEJeff · on Feb 26, 2016

Distributed systems aren't easy, highly available properly built ones even less so.

daurnimator · on Feb 26, 2016

Looks like an un-finished draft article? I'm only seeing headers for each section. Screenshot: https://i.imgur.com/ZjBchfo.png

(chromium, linux)

sfilipov · on Feb 26, 2016

If you are using Arch Linux: pacman -S otf-fira-sans

wyc · on Feb 26, 2016

Thanks! Helped me solve this in Gentoo: emerge -av fira-sans

JoshTriplett · on Feb 26, 2016

Same problem here in Firefox 44. If I disable the CSS rule that sets the font to Fira Sans, the text renders normally; that font seems to not render at all. The "Fonts" view in the console doesn't render the sample text ("Abc") either.

mintplant · on Feb 26, 2016

What OS? Firefox Developer Edition doesn't have this issue on Windows (haven't checked Linux).

smoyer · on Feb 27, 2016

I'm running Firefox Developer Edition 46.0a2 on Linux and I don't see this problem either.

AtomicOrbital · on Feb 29, 2016

works fine on ubuntu 14.04 using Firefox Nightly 47.0a1 (2016-02-28)

alpb · on Feb 26, 2016

I can read just fine. The website uses Fira Sans font, maybe there's an issue with that, check your browser's console.

daurnimator · on Feb 26, 2016

no errors in console

RickHull · on Feb 26, 2016

Same problem on chromium / linux

jacques_chester · on Feb 26, 2016

Since we're calling out OSS alternatives that the author missed, I'll point out Cloud Foundry. It solved the "rolling upgrade" problem years ago.

Currently you can run it on AWS, OpenStack or vSphere; Azure and GCE support are being worked in concert with Microsoft and Google respectively.

Disclaimer: I work for Pivotal, the company which donates the largest chunk of engineering effort to Cloud Foundry.

dinfarfar · on Feb 26, 2016

We use open source CF to host > 1000 applications in my company. I have virtually zero complains from an operational point of view and the developers love it.

I wish more people would spend the time to look into it instead of just defaulting to Docker + tooling, just because.

daddykotex · on Feb 26, 2016

Thanks for the reminder, I completely left out Cloud Foundry when I did my own search for this kind of tool.

I'll definitely have a look at it.

jacques_chester · on Feb 28, 2016

If you'd like any help, email me (see profile).

sbenario · on Feb 26, 2016

Actually, Azure is fully supported! Microsoft wrote the CPI adapter themselves.

FD: I work on Cloud Foundry

ec109685 · on Feb 26, 2016

Why describe your contributions as a donation? There are paid Cloud Foundry hosting options, so it is in your company's best interest to make the system awesome.

helloiamaperson · on Feb 26, 2016

There's a foundation that's a separate entity that owns the IP: https://www.cloudfoundry.org/membership/members/

jacques_chester · on Feb 28, 2016

As helloiamaperson said, the IP is owned by the Cloud Foundry Foundation. Pivotal has an interest in making Cloud Foundry awesome because we sell products and services based on it.

But IBM also has such an interest, so too HP and Fujitsu and NTT and Anynines and Intel and other members of the Foundation I've rudely neglected.

In a given team you can often find engineers from multiple companies. Or you'll find that different companies will provide a team. For example, the `cf` CLI was previously a Pivotal team, then a mixed Pivotal-IBM team, then an IBM team, now I believe Fujitsu are assigning engineers.

Voting rights in the Foundation are proportional to contributions, so it's in the interest of each participant to be generous in providing people, resources and IP. A defined high-level process, modelled after Pivotal's, is used to ensure a common approach to working on the platform -- pair programming, TDD, product management from a Tracker backlog and so forth. Every engineer goes through the same "dojo" onboarding.

I don't think anyone has ever tried anything quite like this before. At the corporate level it's press releases at twenty paces, but in the trenches it's a lot more seamless.

seiji · on Feb 26, 2016

It solved the "rolling upgrade" problem years ago.

Did it?

For simplicity, http://deis.io should be preferred for modern "deploy-by-buildpack" architectures.

jacques_chester · on Feb 28, 2016

> Did it?

Strictly, BOSH solved it. But BOSH was originally written for Cloud Foundry, so through the fuzzy lens of distant history it looks the same.

In Pivotal we upgrade our public-facing Cloud Foundry installation, Pivotal Web Services, within a day or two of a new CF release being blessed. Typically nobody ever notices. Hundreds to thousands of VMs (I don't know how many we are running now) are upgraded to the most recent version of Cloud Foundry in a few hours and nobody notices.

Yeah. It's solved.

reitanqild · on Feb 26, 2016

Anyone has any good resources (blog series, books, videos or courses preferred in roughly that order) to get started with Kubernetes on CoreOS?

Recently installed a three vm cluster in Vagrant (actually super simple, some things actually have improved a LOT in 2016) but I still need to understand a lot it seems to get rolling upgrades etc.

davidopp_ · on Feb 26, 2016

https://coreos.com/kubernetes/docs/latest/getting-started.ht...

Also http://kubernetes.io/v1.1/docs/getting-started-guides/coreos...

jmcqk6 · on Feb 26, 2016

Kelsey Hightower has some really great walkthroughs. I was just browsing his github to find a specific link for you, and the closest I found was this:

https://github.com/kelseyhightower/coreos-kubernetes-talk

That might not be the right repo, though. Somewhere he has complete walkthroughs, including command by command, but I'm not sure if that's the right one.

SEJeff · on Feb 26, 2016

Excellent post, this is really well written.

Regarding the bug in Mesos found by Aphyr, that has been fixed: https://issues.apache.org/jira/browse/MESOS-3280

The internet should thank people like him for finding these issues.

webo · on Feb 26, 2016

I see the benefit of ECS/Kubernetes for managing many services.

Say I already use ECS or Kubernetes for orchestration of some of my services. If I'm working on a new Rails/Node/Python app that doesn't talk to the rest of my services, would it make sense to stick the app into my existing cluster? If not, what would be an easy way to launch, deploy, and manage (non-PaaS) these kind of stand-alone services?

cpitman · on Feb 26, 2016

I would deploy all of your applications to the same Kubernetes instance. If you need to worry about isolating specific applications on different hosts, that can be accomplished by labeling those hosts and setting up affinity rules. In other words, hosts can be split into different zones and pods deployed for specific zones, but still have one larger Kubernetes instance.

boulos · on Feb 26, 2016

It's not a problem / wrong to put unrelated things in a single cluster (and just because they're not talking today, also doesn't mean they won't in the future). But if you really want hard separation between them, you could easily have separate clusters. It's a lot like a directory with files, if you put everything in one directory things like the output of ls have some noise in them, but if you're used to searching (service discovery) then it doesn't matter too much.

Disclaimer: I work on Compute Engine (and I've always just used a single cluster)

eknkc · on Feb 26, 2016

You can create one node kubernetes clusters on micro instances, if you need it just for the tooling or environment consistency.

Not sure if it's better to use larger clusters for unrelated stuff or this though.

idiocratic · on Feb 26, 2016

a relatively newcomer in the space is Nomad by Hashicorp, very promising and with simpler architecture. Of course it makes even more sense if used in conjunction with Consul and Atlas from the same company. It's still young and buggy though.

bacheson · on Feb 26, 2016

Rancher is the perfect blend between tutum and kubernetes.

I highly recommend you check it out before going down the long winding road that is kubernetes.

lukebennett · on Feb 26, 2016

Agreed, we've been trying out Rancher since its early days and we're currently in their beta program, we've really enjoyed using it - hits that sweet spot between power and simplicity. Biggest frustration is the lack of a CLI which is preventing us from being able to proceed to production as we don't want to manually handcrank API calls.

You'll actually be able to use Rancher to manage Kubernetes clusters[0] in the next few weeks, if you want the best of both worlds.

[0] http://rancher.com/introducing-kubernetes-environments-in-ra...

stuff4ben · on Feb 26, 2016

This is awesome, never heard of Rancher before! I like the simplicity, but I'm curious why you'd want to stand up Kubernetes on top of Rancher though? What is the benefits to doing so versus running Rancher by itself?

lukebennett · on Feb 26, 2016

I'm equally curious to be honest :) It's not something we do ourselves, I just mentioned it given the context of this thread.

stuff4ben · on Feb 26, 2016

In watching the video, it looks like it's a replacement for "cattle" which I guess is how nodes are scaled out. Interestingly they've made deployment of K8s extremely simple. I'm looking forward to playing with Rancher in some test environments.

crb · on Feb 26, 2016

The Kubernetes team would love to help you make the long road less winding. My contact details are in my profile!

andreis_ · on Feb 26, 2016

I was going to recommend the same. If you have a small number of servers and services, Rancher is the perfect solution.

zshev · on Feb 26, 2016

One more vouch for Rancher here. Very excited about the impending GA.

anentropic · on Feb 26, 2016

anyone got an opinion on http://armada.sh/

(vs Rancher, Tutum, Deis...)

?

kordless · on Feb 26, 2016

> It was the most robust solution we tried (we only tried it on Google Container Engine)

So they didn't actually "choose" Kubernetes. They just chose to use something that is run by someone else.

wstrange · on Feb 26, 2016

How is that different than AWS ECS? In both cases, someone else runs it.

kordless · on Feb 26, 2016

I'm not sure what your question means, but I'm simply pointing out that this didn't actually decide anything useful in terms of rolling your own solution and running it yourself. Choosing GCE over ECS is not the same as choosing Docker over Kubernetes if you are self hosting.