
Kubernetes Failure Stories - hjacobs
https://srcco.de/posts/kubernetes-failure-stories.html
======
m0zg
It's not for everyone and it has significant maintenance overhead if you want
to keep it up to date _and_ can't re-create the cluster with a new version
every time. This is something most people at Google are completely insulated
from in the case of Borg, because SRE's make infrastructure "just work". I
wish there was something drastically simpler. I don't need three dozen
persistent volume providers, or the ability to e.g. replace my network plugin
or DNS provider, or load balancer. I want a sane set of defaults built-in. I
want easy access to persistent data (currently a bit of a nightmare to set up
in your own cluster). I want a configuration setup that can take command line
params without futzing with templating and the like. As horrible and
inconsistent as Borg's BCL is, it's, IMO, an improvement over what K8S uses.

Most importantly: I want a lot fewer moving parts than it currently has. Being
"extensible" is a noble goal, but at some point cognitive overhead begins to
dominate. Learn to say "no" to good ideas.

Unfortunately there's a lot of K8S configs and specific software already
written, so people are unlikely to switch to something more manageable.
Fortunately if complexity continues to proliferate, it may collapse under its
own weight, leaving no option but to move somewhere else.

~~~
aprdm
In places worked we usually had a vmware cluster, load balancer, NFS for
shared data when necessary and DNS set up (e.g: through consul).

This setup is very, very simple and scalable. There is very little to gain IMO
on moving to Kubernetes.

Consul, VSphere and load balancers have APIs and you can write tools to do
everything that K8s does.

~~~
dilyevsky
Scalable NFS, riiite.

~~~
aprdm
If you have some time to read "how Google works" you would be surprised by how
long the company ran on NFS. I assume there are lots of workloads running on
Borg to this day on top of NFS. If that isn't enough for you you should have a
look in the client list of Isilon and see which kind of work they do, in case
you ever attend a SIGGRAPH most of what you see is built on top of NFS, so,
essentially, all of the computer graphics you see in movies. At last job our
NFS cluster did 300 000 IOPS with 82gb/s throughput

~~~
m0zg
82gb/s (assuming you mean gigabit) is _per-node_ throughput at Google (or FB,
or I assume Amazon/Microsoft -- they all use 100GbE networks now). 300K IOPS
is probably per-node, too, at this point. :-)

~~~
kortilla
Having a 100gbps nic in a node isn’t the same thing as doing storage at that
speed in an HA cluster.

Also, don’t confuse 100 gbe networks where spine links are 100 but the node
links are only bonded 10s (much more common at $fang).

~~~
m0zg
Nope. It's all 100GbE throughout as far as I know. And people do work really
hard to be able to saturate that bandwidth as it is by no means a trivial task
to saturate it through the usual, naive means without the use of RDMA and
Verbs. Years ago when I was there it was (IIRC) 40Gbps to each node straight
up.

It's a necessity really. All storage at Google has been remote and distributed
for at least the past decade. That puts serious demands on network throughput
if you want your CPUs to actually do work and not just sit there and wait for
data.

Here's some detail as of 2012: [https://storage.googleapis.com/pub-tools-
public-publication-...](https://storage.googleapis.com/pub-tools-public-
publication-data/pdf/43837.pdf). Note that host speed is 40Gbps. And here's FB
talking about migrating from 40Gbps to 100Gbps in 2016:
[https://code.fb.com/data-center-engineering/introducing-
back...](https://code.fb.com/data-center-engineering/introducing-backpack-our-
second-generation-modular-open-switch/)

------
dvnguyen
Having used Docker Compose/Swarm for last two years, I remember having
problems with them twice. One of which was an MTU setting which I didn't
really understand why, but overall I was relatively happy with them. Since
Kubernetes seems to have won, I decided to learn it but got some
disappointments.

The first disappointment is setting up a local development environment. I
failed to get minikube running on a Macbook Air 2013 and a Ubuntu Thinkpad.
Both have VTx enabled and Docker and VirtualBox running flawlessly. Their
online interactive tutorial was good though, enough for the learning purpose.

Production setup is a bigger disappointment. The only easy and reliable ways
to have a production grade Kubernetes cluster are to lock yourself into either
a big player cloud provider, or an enterprise OS (Redhat/Ubuntu), or introduce
a new layer on top of Kubernetes [1]. Locking myself into enterprise
Ubuntu/Redhad is expensive, and I'm not comfortable with adding a new, moving,
unreliable layer on top of Kubernestes which is built on top of Docker. One
thing I like about the Docker movement is that they commoditize infrastructure
and reduce lock-ins. I can design my infrastructure so it can utilize an open
source based cloud product first and easily move to others or self-host if
needed. With Kubernetes, things are going the other way. Even if I never moved
out of the big 3 (AWS/Azure/GCloud), the migration process could be painful
since their Kubernetes may introduce further lock-ins for logging, monitoring,
and so on.

[1]: [https://kubernetes.io/docs/setup/pick-right-
solution/](https://kubernetes.io/docs/setup/pick-right-solution/)

~~~
tyingq
Digital Ocean's K8S offering is out of beta now:
[https://www.digitalocean.com/products/kubernetes/](https://www.digitalocean.com/products/kubernetes/)

~~~
karakanb
Migrated my very small cluster from GKE to DigitalOcean's K8s a few weeks ago.
I was using 3 nodes on GKE with 1 core & 3.75GB RAM per node, and the cost was
around 100 $ per month including load balancer for the cheapest region, `us-
central1-a`. Now, on DigitalOcean, I have 3 nodes with 1 core & 2GB RAM per
node. The cost is exactly 40$ including load balancer.

I am a pretty basic user, I have started using k8s on this project as a
learning and 100$ was too much for the learning price, but now on DO I get a
similar cluster for less than half of GKE price and I feel like it is worth
it, considering all the simplicity and observability of deployments. Also, DO
allows me to select regions without any price difference, so I was able to
select Amsterdam to get 10 times better latency from where I live. My setup is
quite basic, my app with aroud 8-10 pods, + additional stuff such as cert-
manager and prometheus.

YMMV, but so far I am really happy with DO's offering, both in terms of
performance, simplicity and performance. I am not a power user and definitely
operate at no scale, but using DO in general is much simpler than using GCP
with GKE.

~~~
apaz037
The problem with that is that I can almost guarantee that it would still be
cheaper and easier to manage if you just leveraged whatever cloud provider's
managed service was there to run your stuff.

~~~
karakanb
Probably yes, but that approach has its disadvantages as well.

First, the biggest problem I see is the huge vendor lock-in you accept with
the PaaS offerings such as AWS EBS or GCP App Engine. When you commit to one
of these platforms, it is really hard to get out of it; it requires
engineering effort to move to another provider and feature parity between the
providers for your application to be supported. Plus, you get to learn
platform-specific stuff which has no standards across providers. Plus, it is
usually slow and bloated; have you ever tried deploying something to EBS? It
takes at least five minutes without any meaningful information about what is
going on or if your deployment succeeded.

Second, the tooling you get is usually very small compared to what Kubernetes
ecosystem has. Each and every platform ask you to use their own tools, but
there is a high possibility that the tools don't fit your usecase, or you may
need to modify your workflow. With a solution like k8s, you only need to
support the standard, which is k8s itself roughly, and you are free to use
whatever tooling you want.

Third, done right, Kubernetes allows you to move to another provider very
easily without changing a single line of code in your Kubernetes definitions
or your application. You define the desired state of your cluster, you check
in all these stuff into your VCS, and since k8s forces you to do these stuff
from the beginning, at the end you usually have a nice, reproducible system
that is more or less cloud agnostic. You have logging, horizontal scalability,
isolation, easy deployments, easy rollbacks and all that stuff. I have
migrated from GKE to DO's Kubernetes offering without changing a single line
in my Kubernetes definitions or my application. Of course, my usecase is very
very very small compared to most of people around here, but that was my
experience.

FWIW, I think Kubernetes is still a good learning to understand about current
state of infrastructure, deployments and the ideal state we all try to
achieve. Whether or not a business should depend on it is a whole another
topic.

~~~
guiriduro
For PaaS I thought the Heroku model was nice - the benefits of
containerisation built into the stack and you don't have to manage any of it -
ahead of Fargate and way ahead of K8s. On par with serverless, but with better
compability with monolithic or partial microservice architectures, albeit
higher cost.

There's no strong vendor lock-in either, buildpacks and backend services are
much of a muchness across Dokku, Herokuish, Flynn, Cloud Foundry etc. If your
app is 12-factor with externalised state, you're plain sailing with most PaaS
and simple docker setups, or at least I don't get what K8s brings to the table
in terms of operational simplicity.

------
cygned
I am a developer and I find k8s frustrating. To me, its documentation is
confusing and scattered among too many places (best example: overlay
networks). I have read multiple books and gazillions of articles and yet I
have the feeling that I am lacking the bigger picture.

I was able to set it up successfully a couple of times, with more or less time
required. Last time, I gave up after four days because I realized that what I
need was a "I just want to run a simple cluster" solution and while k8s might
provide that, its flexibility makes it hard for me to use it.

~~~
FridgeSeal
Have you used other google products? I find their documentation routinely
incomprehensible and difficult.

~~~
innocentoldguy
Agreed! I am an engineer and have written documentation off and on throughout
my career. I'm continuously dismayed at the incomprehensible documentation
generated by most companies. Google's documentation is particularly bad
though.

------
manigandham
I don't understand all the negative comments here, K8S solves many problems
regardless of scale. You get a single platform that can run namespaced
applications using simple declarative files with consolidated logging,
monitoring, load-balancing, and failover built-in. What company would not want
this?

~~~
pgwhalen
I very much agree that kubernetes is useful in an environment that doesn’t
need to scale, but do tell how it enables consolidated logging and monitoring,
since my medium/small shop is spending quite some time setting up our own
infrastructure for it.

~~~
013a
Installing a managed log ingestor is stupidly easy in Kubernetes. For example,
on GCP here's the guide to getting it done [1]. Two kubectl commands, and you
get centralized logging across hundreds of nodes in your cluster and thousands
of containers within them. Most other platforms (like Datadog) have similar
setups.

Infrastructure level monitoring is also very easy. For example, if you're on
Datadog, you flip KUBERNETES=true as an environment variable in the datadog
agent, and you'll instantly get events for stopped containers, with stopped
reason (OOM, evictions, etc), which you can configure granular alerting on.

Let's say you're in a service-oriented environment and you want detailed
network-level metrics between services (request latency, status codes, etc).
No problem, two commands and you have Istio [2]. Istio has Jaeger built-in for
distributed tracing, with an in-cluster dashboard, or you can export the
OpenTracing spans to any service that supports OpenTracing. You can also
export these metrics to Datadog or most other metrics services you use.

[1] [https://kubernetes.io/docs/tasks/debug-application-
cluster/l...](https://kubernetes.io/docs/tasks/debug-application-
cluster/logging-stackdriver/)

[2] [https://istio.io/docs/setup/kubernetes/quick-
start/](https://istio.io/docs/setup/kubernetes/quick-start/)

~~~
pgwhalen
I will admit that these things are slightly easier Kubernetes, my original
point was mostly just to say that Kubernetes itself doesn't really provide any
of these things in meaningful ways - you just described a bunch of separate,
nontrivial systems, that solve many but not all logging/monitoring needs.

------
nisa
The k8s hype feels like the Hadoop hype from a few years ago. Both solve
problems that most don't have and there is a lot of complexity - some due to
the nature of the problem, some because everything is new and moving.

Of course it's 2019 and you have to migrate Hadoop to run on k8s now :)

My impression is that if you are a small shop and have the money, use k8s on
google and be happy, but don't attempt to set it up for yourself.

If you only have a few dedicated boxes somewhere just use Docker Swarm and
something like Portainer.

~~~
lugg
Docker swarm is really nice. I wish it had more traction. I fear it's going to
be dropped and leave me holding a bag full of bugs.

~~~
BretFisher
Swarm isn’t going anywhere. It has a growing community and the team is activly
working in the repos. See my updates: [https://www.bretfisher.com/the-future-
of-docker-swarm/](https://www.bretfisher.com/the-future-of-docker-swarm/)

------
awinter-py
Beyond strictly runtime failures, 2018 feels like the year that most of my
friends tried kube but not everybody stayed on.

The adoption failures are mostly networking issues specific to their cloud.
Performance and box limits vary widely depending on cloud vendor and I still
don't quite understand the performance penalty of the different overlay
networks / adapters.

~~~
lykr0n
Network is a high performance system, and each layer you add adds latency.

Consider a traditional monolithic application. In comes your HTTP request in
one end, a bunch of cross thread communication happens, and database queries
come out the other end. With that, you have 2 points of network communication.

Now with a micro-service, you might have 4 or 5 applications that are needed
to replace the above monolith. Throw in a service mesh on top of your cloud
providers SDN, you've turned 2 points of network communication into 20 or
more. The 5 micro-services talking to each other and the service meshes
talking to each other. Add on top the additional processing overhead of maybe
1 to 2ms, you've just added at best 10ms round trip time to get to your
databases and some more CPU. And to what benefit? TLS? You can do this in your
application, or trust your private network is private. Tracing? You can do
this with PID matching and watching the kernel's networking stack.

~~~
devereaux
So true. For some low latency applications, anything above the bare minimal
virtualization is not acceptable.

For what I do, in theory, many things should not impact results. In practice,
anything that upon measurement impact results is stripped away. Think A/B
testing but for every single component - including the major version of say
the python interpreter.

That's how you end up running many things baremetal.

I'll say the future is not serverless but cloudless

~~~
hjacobs
I would argue that the longtail of applications does not really care about the
impact of overlay networks. For us, the biggest impact on low-latency
applications (where 1ms makes a difference) on Kubernetes was disabling CPU
throttling in all clusters (you can also remove container limits). Background:
a Kernel CFS quota bug leads to throttling even if quota is not yet reached,
see
[https://www.youtube.com/watch?v=eBChCFD9hfs&feature=youtu.be...](https://www.youtube.com/watch?v=eBChCFD9hfs&feature=youtu.be&t=810)

~~~
lykr0n
Overlay networks compound unstable networks. If your internal network latency
spikes to 2ms from 0.5ms, normally that's not a huge issue but if you have
micro-services that need to talk to each other- using my example, a 1.5ms
round trip time would cause an additional 8ms of latency.

Sure, that's a low number but if you already have 150ms of processing, adding
another 10ms might cause issues.

Also, if you're disabling the resource controls on kubernetes- you're kinda
defeating the whole point.

------
stonewhite
I managed multiple mesos+marathon clusters on production a little over 1.5
years, and when I switched over to the K8s the only thing that felt like an
improvement was the kubectl cli.

I really liked/missed the beauty of simplicity in marathon that everything was
a task, the load balancer, autoscaler, app servers everything. I think it
failed because provisioning was not easy, lack of first-class integrations
with cloud vendors and horrible horrible documentation.

Kind of sad to see it lost the hype battle, and since then even Mesosphere had
to come up with a K8s offering.

------
bdcravens
I've started the planning phase of a Kubernetes course, geared toward
developers more so than the enterprise gatekeepers. As I read stories like
these, I jump between different thoughts and feelings:

1) no matter what I think I know, there's too many dark corners to create an
adequate course

2) K8S is such a dumpster fire that I shouldn't encourage others

3) there's a hell of an opportunity here

Thoughts? Worth pursuing? Anything in particular that should be included that
usually isn't in this kind of training?

~~~
parasubvert
All three. It’s a gold rush, but as with any gold rush, conditions are hard
going - that’s why there’s an opportunity.

Best way to think of Kubernetes is that it was designed to be a successful
open source project that was widely adopted as a standard foundation to build
products. It wasn’t designed to be a useable product on its own.

We are at the equivalent stage of Slackware and SLS and Debian Red Hat pre-1.0
stages of GNU/Linux distros circa 1994. Red Hat eventually ran away most of
the money by the late 90s, but in the meantime, lots of opportunity to fill an
unmet need.

~~~
romeisendcoming
Don't forget SuSE the sole surviving competitor. Best Buy SuSE Linux gecko box
2.2.14 kernel veteran.

------
stunt
Kubernetes solves a problem that most of the companies don't have. That is why
I don't understand why the hype around it is so big.

For the majority, it just adds a little value when you compare to added
complexity to infrastructure and the cost of a learning curve and the ongoing
operation and maintenance.

~~~
jordanbeiber
In my experience most companies lack common conventions and automations.

Kubernetes "done right" is almost a part of your application. It becomes this
"machine" that you throw stuff into and good stuff happens.

You'll need a team to integrate it into the pieces you require (auth, secrets,
loadbalancers, permissions/app identities, monitoring and logging) but many
places lack bits and pieces, and in my opinion k8s gives you a fast track to
create a uniform application delivery platform.

What I don't like is that it kind of is the opposite of "the unix philosophy"
and in that regard I prefer the hashicorp stack.

~~~
romeisendcoming
Those are your startups and web/app tier shops. Yes, they suck at sysadmin
routinely and they need to be bottle fed a solution that fits the
scatter/gather shape of their business. They don't want OPs discipline. They
want a programmable solution that performs systems magic with a single toolset
to learn.

~~~
jordanbeiber
No no, these are your enterprises I’m talking about mainly.

Places entrenched in manual processes for release and change management.

With true service delivery in a CI/CD fashion (including infrastructure, as it
should be codified) many of these manual processes becomes obsolete.

Don’t get me wrong, the processes still exist, they are just sped up by a
magnitude and automated.

~~~
romeisendcoming
Who said anything about manual processes? That's not what the modern SA
does...it's mostly designing repeatable processes, creating recipes and
integration in my experience.

The real problem with the K8s and devops world is no understanding of why
there is no magic pill in 'codifying' a bad system.

~~~
jordanbeiber
I did. As I see it all to often mainly at the larger places.

Modern SA is about knowing that your job to help bring business value. Most of
the time this is down to automation.

~~~
romeisendcoming
Did you consider that when there is a manual process it is in place to bring
combined attention to what surely is (by 2019) a critical section: that needs
consensus not provided by some monitoring hook. Sure, everything is about
automation and it has been since 1999 in my experience.

~~~
jordanbeiber
Of course. I’ve been employed and consulted enterprise IT&Dev for almost 20
years.

The amount of people heating office spaces at your non-tech large enterprise
is astounding in my opinion.

I enjoy discussing the reasons for this, but it’s a lenghty one!

In super-short: lack of competency, meaning IT support and tools are not used
even remotely optimal. This lack of competency, which starts at the top,
results in laughable lead times for the simplest of tasks and processes. This
in turn has resulted in mass outsourcing and off-shoring of a bunch of tasks
(processes) that really should have been automated years ago.

Usually the incentive to improve this is 0 with these ”service providers” and
things detoriate even further.

The sad state of affairs is that many believe this is the way ”IT” works —
slow and error prone.

Awesome example: one place, one of the 500s, built an on-prem ”cloud” within a
business unit. Over 2000 physical xen hosts. I wanted them to apply a patch.
They refused. The last patch had taken 6 months to roll out. The process was:
ssh to server, scp patch, run sudo install patch. The entire operation was
bought by a renowned ”service provider”. Ouch.

I could talk about this for weeks! Of course there are those that manage an
awesome shop, but my experience is that this is usually isolated teams that
are somewhat shielded from the crazyness of big money politics.

~~~
romeisendcoming
Agreed with your experiences. Identified my niche a long time ago in HPC and
scientific development enterprise and core internet services (DNS, IP routing)
+ security (even though snake oil is big in sec now). The type of incompetence
you describe doesn't flourish in these domains.

------
tnolet
I'd be interested in a related "microservices failure stories". Must be a big
overlap with this.

~~~
dehrmann
I have two. One was caused by data inconsistency between services and regions.
One is more hypothetical: the microservices had gotten to the point that no
one knew how to start the system if all services are down, and it's possible
that services have circular dependencies to the point that it would be
incredibly hard to do a cold start.

~~~
nicobn
I've actually seen your hypothetical in action, but the bug was even more
subtle. Assume service A, B and C. A and C both need information from each
other which is usually cached. Normally, you'd deploy one service at a time so
the call chain would go A -> B -> C -> A or A -> C then A -> B -> C but in
this particular instance, A and C's caches were cold, causing an explosion of
service calls that took both services down.

~~~
quickthrower2
> A and C both need information from each other

Sounds like a monolith pulled apart :-)

------
hjacobs
Christian already followed the example and created a similar list for
Serverless: [https://github.com/cristim/serverless-failure-
stories](https://github.com/cristim/serverless-failure-stories)

~~~
gspetr
Is there also a list for Docker failure stories?

~~~
hjacobs
IMHO this would be less interesting, some people already run other container
runtimes such as containerd with Kubernetes (e.g. Datadog:
[https://www.youtube.com/watch?v=2dsCwp_j0yQ](https://www.youtube.com/watch?v=2dsCwp_j0yQ))
--- so Docker might stay as some user interface for local development, but I
would not know what "Docker failures" would be in the future.

~~~
pepemon
Docker is using containerd under the hood as its container runtime component.

------
dcomp
I run a single node cluster at home. In order to handle updates. I just wipe
the cluster with kubeadm reset. Then kubeadm init; followed by running a
simple bash script. which loops of files in nested subdirectories applying
yaml configs. Only have to make sure I only ever edit the yaml files and not
mess with kubectl edit etc.

for f in _/_.yaml ...

with a directory structure of:

    
    
      drwxrwsrwx+ 1 root 1002 176 Jan 20 21:15 .
      drwxrwsrwx+ 1 root 1002 194 Nov 17 20:06 ..
      drwxrwsrwx+ 1 root 1002  68 Jan 20 20:50 0-pod-network
      drwxrwsrwx+ 1 root 1002 104 Nov  1 11:18 1-cert-manager
      drwxrwsrwx+ 1 root 1002  34 Jul 11  2018 2-ingress
      -rwxrwxrwx+ 1 root 1002  93 Jan 20 21:15 apply-config.sh
      drwxrwsrwx+ 1 root 1002  22 Jul 14  2018 cockpit
      drwxrwsrwx+ 1 root 1002  36 Jul  3  2018 samba
      drwxrwsrwx+ 1 root 1002  76 Jul  6  2018 staticfiles

------
AaronFriel
I just went through all of the post-mortems for my own company's purposes of
evaluating Kubernetes. I've been running Kubernetes clusters for about a year
and a half and have run into a few of these, but here's what I found striking:

* About half of the post-mortems involve issues with AWS load balancers (mostly ELB, one with ALB) * Two of the post-mortems involve running control plane components dependent on consensus on Amazon's `t2` series nodes

This was pretty surprising to me because I've never run Kubernetes on AWS.
I've run it on Azure using acs-engine and more recently AKS since its release,
and on Google Cloud Platform using GKE; and it's a good reminder not to to run
critical code on T series instances because AWS can and will throttle or pause
these instances.

~~~
hjacobs
Nice observation, I haven't done statistics on the linked postportems myself
yet. Please note that your observation might also be due to the fact that AWS
has a far larger market share and did not provide managed Kubernetes until
recently (so people roll their own). We can therefore assume that any random
sample of Kubernetes postmortems would be biased towards seeing more incidents
with Kubernetes on AWS (compared to other cloud providers).

~~~
AaronFriel
That's a good point. In 2017 there weren't widely available managed Kubernetes
deployments, and now each platform has their own and much more reliable
integrations.

------
peterwwillis
Dang. I wish I had my SRE Wiki up and running already, or I'd add a "public
postmortems" section.

~~~
alien_
Just put it on Github like this and the Serverless one I also created after I
saw this.

~~~
alien_
Just saw it already exists: [https://github.com/danluu/post-
mortems](https://github.com/danluu/post-mortems)

------
hjacobs
There is now a Kubernetes podcast episode with me about the topic:
[https://kubernetespodcast.com/episode/038-kubernetes-
failure...](https://kubernetespodcast.com/episode/038-kubernetes-failure-
stories/)

