
Kubernetes at GitHub - darwhy
https://githubengineering.com/kubernetes-at-github/
======
iagooar
Love to see more Kubernetes success stories.

I work for an ISP and we are trying to write another success story ;) As an
ISP, we have tons of constraints in terms of infrastructure. We're not allowed
to use any public cloud services. At the same time, the in-house
infrastructure is either too limited, or managed via spreadsheets by a bunch
of dysfunctional teams.

For my team, Kubernetes has been truly a life saver when it comes to deploying
applications. We're still working on making our cluster production-ready, but
we're getting there very fast. Some people are already queuing up to get to
deploy their applications on Kubernetes :D

What I especially love about Kubernetes is how solid the different concepts
are and how they make you think differently about (distributed) systems.

It sure takes a lot of time to truly grasp it, and even more so to be
confident managing and deploying it as Ops / SRE. But once you get it, it
starts to feel like second nature.

Plus the benefits, in almost any possible way, are huge.

~~~
lima
Red Hat's OpenShift makes it a lot easier by providing all of the
infrastructure around it (docker registry, docker build from Git, Ansible
integration and so on).

Best docs of all open source projects I've seen.

~~~
EtienneK
I second this. Have been PoCing OpenShift for a couple of months now and it's
been a joy to use.

~~~
Valien
3rd for sure. We're a RH partner and specialize in OpenShift work. Tons of
excitement with customers on using OpenShift. I love it.

------
shock
> During this migration, we encountered an issue that persists to this day:
> during times of high load and/or high rates of container churn, some of our
> Kubernetes nodes will kernel panic and reboot.

Considering that Kubernetes doesn't modify the kernel, this issue sounds like
is present in mainline and kernel devs should be involved.

~~~
cyphar
I would be interested to know what storage driver they're using for their
nodes. High container churn puts a lot of stress on the VFS subsystem of
Linux, and we've seen cases where customers have trigger lots of mount/umounts
which results in filesystems causing panics. At SUSE, we do have some kernel
devs debugging the issues, but the workaround is almost always "rate limit all
the things". There are a few other kernel areas that are stressed with high
container churn (like networking), but VFS is the most likely candidate from
my experience.

While on paper containers are very lightweight, spawning a lot of them
exercises kernel codepaths that probably haven't been exerted to that type of
stress during development.

~~~
AaronBBrown
Hey, this is Aaron from GitHub. We're using devicemapper w/ LVM backed pools.
Would love to hear about your experience there. We definitely see this problem
during periods of high container churn.

~~~
rleigh
My schroot tool used for building Debian packages could panic a kernel in
under five minutes reliably, when it was rapidly creating and destroying LVM
snapshots in parallel (24 parallel jobs, with lifetimes ranging from seconds
to hours, median a minute or so).

This was due to udev races in part (it likes to open and poke around with LVs
in response to a trigger on creation, which races with deletion if it's very
quick). I've seen undeletable LVs and snapshots, oopses and full lockups of
the kernel with no panic. This stuff appears not to have been stress tested.

I switched to Btrfs snapshots which were more reliable but the rapid snapshot
churn would unbalance it to read only state in just 18 hours or so. Overlays
worked but with caveats. We ended up going back to unpacking tarballs for
reliability. Currently writing ZFS snapshot support; should have done it years
ago instead of bothering with Btrfs.

~~~
sweettea
In my work identity, we saw a similar problem in our testing, where blkid
would cause undesired IO on fresh devices. Eventually, we disabled blkid
scanning our device mapper devices upon state changes with a file
/etc/udev/59-no-scanning-our-devices.rules containing: ENV{DM_NAME}=="
_ourdevice_ ", OPTIONS:="nowatch"

Alternately, you could call 'udevadm settle' after device creation before
doing anything else, which will let blkid get its desired IO done, I think.

~~~
rleigh
Yes, we did something similar to disable the triggers. Unfortunately, while
this resolved some issues such as being unable to delete LVs which were
erroneously in use, it didn't resolve the oopses and kernel freezes which were
presumably locking problems or similar inside the kernel.

------
erulabs
If you're running Kube on AWS, make sure you install the proper drivers! For
Ubuntu, that's the `linux-aws` apt package.

[https://github.com/kubernetes/kops/issues/1558](https://github.com/kubernetes/kops/issues/1558)

Missing ENA and ixgbevf can be a real performance killer!

~~~
joombaga
Is this used by vanilla docker / ECS, or just k8s?

~~~
TheDong
that advice holds regardless what software you're using, so long as the
software does network traffic.

It holds for just running plain old nginx websites.

It doesn't really matter for small instance sizes where your networking is
already rate-limited by amazon so much that ENA drivers won't matter, but on
beefy instances it's always good advice to make sure you're using ENA
supported driverse.

See
[https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced...](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-
networking-ena.html#test-enhanced-networking-ena)

------
skewart
> Several qualities of Kubernetes stood out from the other platforms we
> evaluated: the vibrant open source community supporting the project, the
> first run experience (which allowed us to deploy a small cluster and an
> application in the first few hours of our initial experiment), and a wealth
> of information available about the experience that motivated its design.

It's interesting that the reasons they cite for choosing Kubernetes over
alternatives are entirely driven by 'developer experience' and not at all
technical. It shows how critical community development, good documentation,
and marketing are to building a successful open source project.

~~~
gu4x
I believe developer experience on being introduced to a tool is paramount to
its success. It gives a lot of confidence in what you're doing and keeps
things moving forward. To me appear that the application is built on a solid
simple concept instead of a convoluted complex architecture. Some tools sin on
the opposite though, very simple to setup but very complicated to understand
how to scale.

------
philips
Really exciting stuff, happy to see the Github team launch this.

Kubernetes is becoming the goto for folks needing both their own physical
metal presence and cloud footprint too. And the magic of Kubernetes is that it
has APIs that can actually give teams the confidence to run and reuse
deployment strategies in all environments. Even across clouds.

If you are like Github and want to use Kubernetes across clouds (AWS, Azure,
etc) & bare metal and do deploy/customize that infra using Terraform checkout
CoreOS Tectonic[1]. It also tackles more of the subtle things that aren't
covered in this article like cluster management, LDAP/SAML authentication,
user authorization, etc.

[1] [https://coreos.com/tectonic](https://coreos.com/tectonic)

~~~
robotmay
I'm still utterly perplexed as to what Tectonic actually -is-. I kinda get
that it's a kubernetes setup, but is it a GUI over the top of it? The website
is pretty confusing and I think I gave up really quickly when trying to set it
up.

~~~
philips
Tectonic is Enterprise Kubernetes. We start with pure upstream Kubernetes at
the core and install it in a production ready setup with the Tectonic
Installer[1] across clouds or bare metal. On top of those basics Tectonic
provides things most organizations need:

\- Authentication backed by LDAP/SAML/etc

\- One-click automated updates of the entire cluster

\- Pre-configured cluster monitoring/alerting

There is a bunch more in there and in the roadmap too but that gives you a
taste.

The other thing is that we provide professional services, training, and
support to customers on the whole stack from the VM or machine on up to the
Kubernetes API. We have done neat collaborations with customers like the ALB
Ingress Controller[2] too.

[1] [https://github.com/coreos/tectonic-
installer](https://github.com/coreos/tectonic-installer) [2]
[https://github.com/coreos/alb-ingress-
controller](https://github.com/coreos/alb-ingress-controller)

~~~
tachion
I'm currently deploying Tectonic flavoured Kubernetes at a large organisation
and I can vouch for how great you guys are at supporting users (who are not
yet customers) at any stage of the process - love that, and can't recommend
you guys for that more. However, as the comment above says, Tectonic (and Quay
for that matter) documentation is... just horrible and if not the engineers
support, I'd be pretty much stuck on quite few things. Why don't you push your
docs to a public repo, so I could do some writing and send some PR's? ;)

~~~
robszumski
Happy to report that the Tectonic docs are open source and we would love to
review your PRs: [https://github.com/coreos/tectonic-
installer/tree/master/Doc...](https://github.com/coreos/tectonic-
installer/tree/master/Documentation)

Any topics that stick out as needing the most attention? Glad you're enjoying
your interaction with our engineers :)

(Product manager for Tectonic)

~~~
zeeZ
The bare-metal scaling documentation needs to be scrapped and rewritten from
scratch. With very limited knowledge about terraform, it's faster to start
over with nothing than trying to get terraform (apparently you need the
installer bundled one) to work with your assets.zip (which is not even
mentioned in the installation documentation).

------
DDub
We're currently looking at moving our applications to k8s, and was wondering
what deployment tools people are using? This week we are evaluating spinnaker,
helm and bash wrappers for kubectl. There is concern over adding too many
layers of abstraction and that KISS is the best approach.

~~~
foxylion
We also did some evaluation and then decided to stick to KISS an chose kubectl
commands combined with cat and kexpand. Really simple approach to allow
dynamic kubernetes deployments.

Example command can be

    
    
       cat service.yml | kexpand expand -v image-tag=git-135afed4 | kubectl apply -f -
    

The service.yml contains the full deployment configuration, service definition
and ingress rules. So this works without preconfiguring anything in kubernetes
when deploying an new service.

An engineer only has to create the service.yml and Jenkins does deploy it
automatically on every master build.

*kexpand is a small tool which does something similar to sed, but in a simpler and less powerful way (keep it simple): [https://github.com/kopeio/kexpand](https://github.com/kopeio/kexpand)

~~~
nikon
I do something similar, but envsubst does the job.

~~~
ff_
+1 on envsubst, it's the minimal solution to the problem of templating
kubernetes (of course YMMV, we are a small team and don't need more complex
stuff)

------
sandGorgon
> _We enhanced GLB, our internal load balancing service, to support Kubernetes
> NodePort Services._

Everyone does this - because Kubernetes Achilles heel is its ingress. It is
still built philosophically as a post-loadbalancing system .

This is the single biggest reason why using Docker Swarm is so pleasant.

~~~
kelseyhightower
Any load balancer can be configured or modified to target routable Pod IP
addresses and skip node ports altogether. You'll have to integrate with the
Kubernetes Endpoints API[1] and support dynamic backends. Another option would
be to leverage Kubernetes' DNS and the SRV records[2] backing each service.

The reason node ports are used in the Cloud today is because most Cloud load
balancing solutions only target VMs, not arbitrary endpoints such as
containers, a limitation that will go away over time.

[1] Envoy with Kubernetes Endpoints integration:
[https://github.com/kelseyhightower/kubernetes-envoy-
sds](https://github.com/kelseyhightower/kubernetes-envoy-sds)

[2] [https://kubernetes.io/docs/concepts/services-
networking/dns-...](https://kubernetes.io/docs/concepts/services-
networking/dns-pod-service/)

~~~
sandGorgon
Hi Kelsey, thanks for replying. You are definitely k8s secret weapon ;)

I know about this. However, ultimately the question is that Kubernetes is not
a gradual scale up solution for most people. I have to be prepared to deal
with building my own load balancer.

Basically, I cannot do an on-metal deployment very easily. Most of the
questions on k8s slack for metal deployments were - how do I set this up with
few tweaks like ssl pass through and source ip preservation.

It is not easy.

Either you build your own load balancer or you use a cloud provided one. Now,
Ingresses are not pleasant. I'm not sure about the state of source ip
preservation, but last I remember that the nginx ingress had still not
surfaced ssl_preread_server_name to the ingress configuration.

Now, what would have been nice is if it was ingress-all-the-way-down : ingress
with something like istio/linkerd, maybe it is possible.

Tl;Dr - I'm not github. I can't build my own load balancer. Give me something
that works out of the box. Yes, I know it may go down - I'll survive. Docker
Swarm does this.

~~~
shaklee3
If you use a statefulset, there is no load balancing, regardless of bare metal
or cloud. Every pod has a DNS record you can use to address all other pods,
and it's carried over in case of a pod failure. Are you looking for a load
balancer or not? If you are, as the other person mentioned, you can use the
nginx one.

------
dookahku
Any favorted training for learning Kubernetes?

I found this one so far:
[https://classroom.udacity.com/courses/ud615](https://classroom.udacity.com/courses/ud615)

But any extra courses/trainings is always appreciated

~~~
pdelgallego
Udemy: [https://www.udemy.com/learn-devops-the-complete-
kubernetes-c...](https://www.udemy.com/learn-devops-the-complete-kubernetes-
course/)

Pluralsight: [https://www.pluralsight.com/courses/getting-started-
kubernet...](https://www.pluralsight.com/courses/getting-started-kubernetes)

~~~
timrichard
I'm got that Pluralsight one earmarked for 'soon'. I thought Nigel Poulton's
Docker courses were excellent, so looking forward to it...

------
drdaeman
How hard (and how realistic) it is to actually get a reasonable understanding
(and then stay up-to-date) with Kubernetes internals? Is there any go-to
reading material?

We had ran another large-footprint container management system (not K8s, but
also popular), and when its DNS component started to eat all the CPU on all
nodes, best I was able to do fast,was just scrapping the whole thing and
quickly replacing it with some quick-and-dirty Compose files and manual
networking. At least, we were back to normal in an hour or so. Obvious steps
(recreating nodes) failed, logs looked perfectly normal, quick strace/ltrace
gave no insights, and trying to debug the problem in detail would've taken
more time.

But that was only possible because all we ran was small 2.5-node system, not
even a proper full HA or anything. And it had resembled Compose close enough.

Since then I'm really wary about using larger black boxes for critical parts.
Just Linux kernel and Docker can bring enough headache, and K8s on top of this
looks terrifying. Simplicity has value. GitHub can afford to deal with a lot
of complexity, but a tiny startup probably can't.

Or am I just unnecessarily scaring myself?

~~~
AlexB138
I wouldn't say that you're unnecessarily scaring yourself at all. Kubernetes
is extremely complex. I've been running it for a few months and I'm just
starting to get my hands around it. Things will just stop working for what
seems like no reason, and there are so many places to investigate you can
easily burn most of a day troubleshooting.

It's a great system, but it's also relatively new, and most issues aren't well
documented. You'll spend a lot of time in github issues or asking for help in
the (very active, and often very helpful) community.

If you have a valid use case, I wouldn't steer you away from it, but your
fears are well founded.

------
dmart
> Enhancements to our internal deployment application to support deploying
> Kubernetes resources from a repository into a Kubernetes namespace, as well
> as the creation of Kubernetes secrets from our internal secret store.

Would love to hear more about this was accomplished. I'm currently exploring a
similar issue (pulling per-namespace Vault secrets into a cluster). From what
I've found, it looks like more robust secrets management is scheduled for the
next few k8s releases, but in the meantime have been thinking about a custom
solution that would poll Vault and update secrets in k8s when necessary.

------
Yeroc
One thing I would have liked to have seen addressed in the article is whether
the new architecture requires additional hardware (presumably) to operate and
if so how much more.

~~~
ben_jones
I've only dabbled in K8s and it strikes me that using it in production is a
long term investment and, as it stands currently, a long term project to
implement properly. You'll want to do exactly what Github did: setup a "review
lab" or similarly comprehensive dev and test environment until you are
absolutely comfortable with it in production. This will lead to the
provisioning (and cost) of quite a bit of hardware - and when it is finally in
production it'll likely be over-provisioned for quite some time until norms
can be established and excess cut.

So basically its a traditional devops migration. But you get quite a few
goodies and arguably much better practices at the end of it.

~~~
majewsky
I agree very much, and I'd like to add one point: When you build a lab
environment for testing Kubernetes deployments (and verifying Kubernetes
upgrades), make sure it's on the same hardware as your production environment.

When my team did the first Kubernetes deployment, we made the mistake of
building a lab environment that did not match the anticipated production
environment. (Two reasons: The BOM for the production environment was not yet
decided upon at that time, and the lab was frankensteined together by taking
hardware out of existing labs.) We learned the hard way that, just because the
Kubernetes upgrade worked in the lab, it need not work on the production
hardware.

Right now, we're stuck on last year's (i.e., ancient) Kubernetes 1.4 release
because no one dares to upgrade production. (There's light at the end of the
tunnel, though. A new lab is being built up in the datacenter around now.)

------
lobster_johnson
I'd be interested in hearing what kind of autoscaling system they use for
their Ruby pods.

We're running a few (legacy — we're moving to Go) Ruby apps in production on
Kubernetes. We're using Puma, which is very similar to Unicorn, and it's
unclear what the optimal strategy here is. I've not benchmarked this in any
systematic way.

For example, in theory you could make a single deployment run a single Unicorn
worker, then set resources:requests:cpu and resources:limits:cpu both to 1.0,
and then add a horizontal pod autoscaler that's set to scale the deployment up
on, say, 80% CPU.

But that gives you terrible request rates, and will be choking long before
it's reaching 80% CPU. So it's better to give it, say, 4 workers. At the same
time, it's counter-productive to allocate it 4 CPUs, because Ruby will
generally not be able to utilize them fully. At the same time, more workers
mean a lot more memory usage, obviously.

I did some quick benchmarking, and found I could give them 4 workers but still
constrain to 1 CPU, and that would still give me a decent qps.

~~~
sytse
At GitLab we recommend to use CPU cores + 1 as the number of unicorn workers
[https://docs.gitlab.com/ce/install/requirements.html#unicorn...](https://docs.gitlab.com/ce/install/requirements.html#unicorn-
workers)

~~~
lobster_johnson
How do you configure that? A pod doesn't know what machine it's running on
ahead of time. You can create nodepools and use node selectors to pin the pod
to that nodepool, but I'm not sure I love the idea.

~~~
twk3
Our entrypoint configures the unicorn workers before starting using a chef
omnibus call. So we grab the memory and cpus using Ohai:
[https://gitlab.com/gitlab-org/omnibus-
gitlab/blob/master/fil...](https://gitlab.com/gitlab-org/omnibus-
gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb#L303)
and do cpus+1

^This is pretty much we do by default for all our regular package installs,
but in some of our Kubernetes Helm charts we instead statically configure the
pod resources and unicorn workers. (Defaulting to 1cpu - 2workers per front-
end pod) eg:
[https://gitlab.com/charts/charts.gitlab.io/blob/master/chart...](https://gitlab.com/charts/charts.gitlab.io/blob/master/charts/gitlab/values.yaml#L198)

As someone mentioned in this thread, using the downward api might be a cool
way to configure the workers.

------
daxfohl
Curious, being a RoR app, did github ever run on Heroku? (Obviously googling
"github heroku" is just a million tutorials on how to integrate.)

~~~
jbarnette
EY, Rackspace, and then our own metal.

------
lukaskroepfl
hey! we at bitmovin have been using k8s for quite a while for our
infrastructure and on premise deployments. In case you're interested in how we
do multi stage canary deployments, check out:
[http://blog.kubernetes.io/2017/04/multi-stage-canary-
deploym...](http://blog.kubernetes.io/2017/04/multi-stage-canary-deployments-
with-kubernetes-in-the-cloud-onprem.html)

------
Alan01252
I'm curious to what this means for the existing puppet code base, is it now
irrelevant, or are there still usages for it in the k8s world?

~~~
spmurrayzzz
They could easily still use standalone puppet to handle the config management
for individual container images. I currently do this with salt-minion. It
reduces the burden on the Dockerfile itself, and lets you embrace a
declarative configuration state at build time.

~~~
ninkendo
It definitely seems like the wrong approach to me to have puppet manage your
base images. They're not VM's, they shouldn't have multiple services, they
shouldn't require any complex configuration management, they should just be
the minimum requirements to support your application's local runtime
dependencies, and that's it.

From previous experience migrating from a puppet setup to one that used
containers, puppet's vestigial use case ends up being to get the orchestration
control plane itself setup (ie. kubernetes, networking configs, etc) and
that's about it.

~~~
dkh99
There's nothing inherently about Puppet that means it has to manage multi-
service "whole OS"-like installations. It can just as easily be put to the
task of a Dockerfile: install dependencies and deployables for a single
application. Its robust ability to manage things like user accounts, packages,
scheduled jobs (e.g. for alerting, though you would have to install at least
_a_ second service for this: _crond) and the like makes it vastly superior to
Dockerfile shell scripts for complex tasks.

Think of puppet more like a way of simplifying your Dockerfiles to have fewer
crazy shell commands _in total_ , rather than hiding the craziness in layers
and hoping it all composes properly. If you do use lots of layers, Puppet can
make your life much easier, since it can be better at detecting previous
layers' changes and working around them (think redundant package install
commands. Even the no-op "already installed!" command takes time; if you're
installing hundreds of packages--many people are, for better or worse--that
can eat up build time).

Puppet isn't a VM provisioner; it can also be used as a replacement for large
parts of your Dockerfile, or a better autoconf to set up the environment/deps
for your application to run in.

Edit: syntax.

~~~
spmurrayzzz
The point about layer complexity is a great one I didn't even consider. Your
"config" step is no longer a mish mash of dozens COPY/RUN/etc directives
(resulting in N new intermediate image layers), it just results in a single
atomic layer where you run the Puppet bootstrap.

Obviously you could accomplish this with shell scripts as well to constrain
your config step into one docker RUN directive, but I prefer the declarative
state approach to the imperative one in this case.

