Hacker News new | comments | ask | show | jobs | submit login
Kubernetes Failure Stories (srcco.de)
513 points by hjacobs 33 days ago | hide | past | web | favorite | 236 comments



It's not for everyone and it has significant maintenance overhead if you want to keep it up to date _and_ can't re-create the cluster with a new version every time. This is something most people at Google are completely insulated from in the case of Borg, because SRE's make infrastructure "just work". I wish there was something drastically simpler. I don't need three dozen persistent volume providers, or the ability to e.g. replace my network plugin or DNS provider, or load balancer. I want a sane set of defaults built-in. I want easy access to persistent data (currently a bit of a nightmare to set up in your own cluster). I want a configuration setup that can take command line params without futzing with templating and the like. As horrible and inconsistent as Borg's BCL is, it's, IMO, an improvement over what K8S uses.

Most importantly: I want a lot fewer moving parts than it currently has. Being "extensible" is a noble goal, but at some point cognitive overhead begins to dominate. Learn to say "no" to good ideas.

Unfortunately there's a lot of K8S configs and specific software already written, so people are unlikely to switch to something more manageable. Fortunately if complexity continues to proliferate, it may collapse under its own weight, leaving no option but to move somewhere else.


In places worked we usually had a vmware cluster, load balancer, NFS for shared data when necessary and DNS set up (e.g: through consul).

This setup is very, very simple and scalable. There is very little to gain IMO on moving to Kubernetes.

Consul, VSphere and load balancers have APIs and you can write tools to do everything that K8s does.


How do you load balance? i mean load balance the "public ip"

In some networks DNS failover is really not that great, so at least a virtual ip needs to be used.


We use haproxy. I wrote code [1] that configures it based on Consul to do load balance. It has been running in production for 2 years without issue (tested with consul 1.0.6)

For people wondering why not use consul template, this has benefit of understanding haproxy, and it minimizes numbers of restarts to make changes.

Using haproxy over and has the benefit that if Winkle or consul goes down things continue to work, just not updates.

[1] https://github.com/takeda/winkle


Haproxy only solves a single Part of the Problem. If you do DNS based failover you should really Check how clients behave, when one node goes down. Without a floating ip or a cloud lb, some stuff will be troublesome


The haproxy method doesn't rely on DNS at all so I'm a bit confused.


well either it uses DNS for failover or you have ipvs (lvs, keepalived) enabled or worse if the machine with haproxy crashes your basically dead. Of course there is also bgp and anycast, but this is not "cheap"


That's not how it works. It is very similar to sidecar approach which various service discovery solutions have. You have haproxy running locally on localhost and you communicate with it, the haproxy then routes the request to right nodes. No DNS, no LVS and no keepalived.


well I've talked about edge Load Balancing..


Scalable NFS, riiite.


If you have some time to read "how Google works" you would be surprised by how long the company ran on NFS. I assume there are lots of workloads running on Borg to this day on top of NFS. If that isn't enough for you you should have a look in the client list of Isilon and see which kind of work they do, in case you ever attend a SIGGRAPH most of what you see is built on top of NFS, so, essentially, all of the computer graphics you see in movies. At last job our NFS cluster did 300 000 IOPS with 82gb/s throughput


82gb/s (assuming you mean gigabit) is _per-node_ throughput at Google (or FB, or I assume Amazon/Microsoft -- they all use 100GbE networks now). 300K IOPS is probably per-node, too, at this point. :-)


Having a 100gbps nic in a node isn’t the same thing as doing storage at that speed in an HA cluster.

Also, don’t confuse 100 gbe networks where spine links are 100 but the node links are only bonded 10s (much more common at $fang).


Nope. It's all 100GbE throughout as far as I know. And people do work really hard to be able to saturate that bandwidth as it is by no means a trivial task to saturate it through the usual, naive means without the use of RDMA and Verbs. Years ago when I was there it was (IIRC) 40Gbps to each node straight up.

It's a necessity really. All storage at Google has been remote and distributed for at least the past decade. That puts serious demands on network throughput if you want your CPUs to actually do work and not just sit there and wait for data.

Here's some detail as of 2012: https://storage.googleapis.com/pub-tools-public-publication-.... Note that host speed is 40Gbps. And here's FB talking about migrating from 40Gbps to 100Gbps in 2016: https://code.fb.com/data-center-engineering/introducing-back...


Sorry I don’t have to read it because i was borg sre for 6 years and i know how (the server part of) it works. You assume wrong.

I know there are a lot of companies that try to put some lipstick on nfs pig and call it reliable/scalable/etc. so long their clients don’t actually try to run it at scale or don’t complain too publicly when they try and can’t, they are able to get away with it.


Your concept of what is scale looks very different than mine, in my experience NFS does a very good job for in-datacenter workloads. CG rendering, oil/gas and others usually take this approach for HPC as far as I've seen. I consider this "scale". Close to 100k procs sharing the nfs is the biggest cluster I've worked at.

Of course that over longer networks it isn't suitable as the roundtrips have too much latency, other than that, is your experience much different regarding nfs?


What you consider ‘scale’ is a high watermark used by cloud providers that is irrelevant to 99.999% of the industry.

Supporting all of a Fortune 500’s business operations is very reasonable to call ‘scale’ in the normal world.

Your comment is a like a billionaire claiming that somebody that managed to hit 30 million isn’t rich.


worked at a company with 4 Petabytes on NFS ... FWIW


> leaving no option but to move somewhere else

Many of the major infrastructure/platform vendors are rolling out their own distribution of Kubernetes either as a cloud service e.g AWS, Azure, GCP or on premise e.g. RedHat.

So I suspect they are going to try and differentiate on features and ease of use and make it as hard as possible to move anywhere else.


k8s is meant to be hard to use. You're supposed to rent space on a k8s cluster from Google. Google has been pumping millions into marketing k8s as a mechanism to improve GCP adoption and establish a foothold in the cloud provider space.


I'm not exactly sure what point you're trying to make here. k8s is not meant to be a paas, but no one is trying to make k8s harder to use.

I work at Google on a large team of engineers dedicated to making it as easy as possible to use.


[Disclaimer: this is pure conjecture and represents my opinion only; I'm a Google outsider.]

For a while, "AWS" and "the cloud" were practical synonyms; it was very rare that anyone meant not-AWS. In my opinion, Kubernetes is a major piece of Google's strategy to turn that tide and improve their marketshare.

Does throwing "a large team of engineers" at a problem typically result in something that's "as easy as possible to use"? "Design-by-committee" is not a term of endearment.

A sibling comment at https://news.ycombinator.com/item?id=18958077 notes that "there are a ton of nicer UIs for Kubernetes", but that they're sold separately as proprietary PaaS platforms. Even if we pretend like GCP/GKE isn't one of them, Kubernetes-The-Platform will be impacted by the interests of its primary vendors and advocates, whether or not certain teams at Google are keen to admit that.


The fact it takes a team of highly skilled engineers from the top of the talent pool trying to make it easy should tell you perhaps the design is wrong ?


Or that to run a secured, highly available platform with the type of features k8s provides is non-trivial beyond basics.


What would make it easy is a good non-gcp GUI


I agree, some better UI for end users would be awesome: Kubernetes Dashboard kind of works, but is pretty limited and more a "kubectl in the browser". There are a ton of nicer UIs for Kubernetes, but they are all part of the value-add of proprietary platforms AFAIK (think about all the managed K8s offerings out there).


It took me a while to get comfortable in Borg (and in general that your binary can take hundredths of verbosely written command-line arguments (coming from gamedev, I was in a bit of shock state for a while)... But then got used to it - still I felt I could never fully internalize the evaluation rules - but the other tooling (diffing) really helped in that respect.

One thing I've really appreciated, was how one could enable/disable things based on the binary version rolled in, and if it's rolled back the state goes back.

Basically something like this:

    {
       new_exp_feature = (binary_compiled_after_changelist( 123456789 ) || binary_compiled_with_cherrypicks( { 123456795, 1234567899 } )
    }
Since piper is changelist based (like perforce/svn), each "CL" goes up atomically, so you can use this to say - this specific flag should get turned ON only if my binary has been compiled with base CL > 12345789 or if it was compiled with earlier, had these cherrypicks (e.g. individual Changelists) built with it. But this was heavily integrated with the whole system - e.g. each binary would basically be built at some @base_cl and additional @{cherry_pick_cl1, chery_pick_cl2, ..} maybe applied. For example the team decides to release with verison @base_cl, but during the release bugs were found, and rather than rolling to a new @base_cl, just individual cherry picks maybe be pushed - so basically you can then control (in your configuration) how to act (configuration could be pushed indepedntly of your binary, ... though some systems would bundle them together)... And then if you have to rollback, the Borgcfg would re-evaluate all this, and decide to flip the switch back (that switch would simply emit something like --new_exp_feature=true or --new_exp_feature=false (or --no-new_exp_feature, it was long time ago so I could be wrong)).

With git/hg - you no longer have such monotonic order, but also that monotonic order worked best with monorepos (or maybe I'm just too narrow-sighted here)...


From this comment thread I’m beginning to think I’m one of the few people on HN that hasn’t used Borg.

All of this seems way more complicated than the tools we use at my company. Is there a specialized need here I’m not seeing?


You seem confuse Borg and borgcfg.

The evaluation rules are merely a borgcfg artifact.

Disclaimer: I maintain borgcfg.


I found https://jsonnet.org/ to fix several issues with the Borg configuration language.

https://github.com/ksonnet/kubecfg is an attempt to reboot the borgcfg experience with k8s + jsonnet


There is also Https://GitHub.com/dhall-lang/dhall-kubernetes which is built on top of dhall, another competitor of jsonnet


There is no such thing as borgcfg experience. It’s just a configuration dsl applied to producing borg specs.


You might take it for granted, and thus not experience it as an "experience", but if you use other tools that are popular in the k8s world (such as helm) you might feel a tinge of nostalgia.

For example, {borg,kube}cfg allow you to import an existing config and override it so you can adapt it to another scenario (different things in different clusters, like prod vs staging, or a cluster has a new feature while another one doesn't etc).

Furthermore, the overrides are described with the same "shape" as the things they override, and can override things that weren't necessarily marked as overridable.

Compare this with the current state of affairs with helm, where the only way for users to be able to inject e.g. the resources requests and limits in a pod spec is for the original template author to have foreseen that need and explicitly added a hook in the template so that values from another file (the values.yaml) can be injected into it.

        spec:
          imagePullSecrets:
            - name: {{ .Values.image.pullSecret }}
          containers:
            - name: {{ template "mycontainer.name" . }}
              ......
              resources:
    {{ toYaml .Values.resources | indent 12 }}
          volumes:
              ......


Sure yes - I meant borgcfg, not borg - stupid me...


An open source having complexity of an enterprise monster - this is what generates the half million plus salaries. An old enterprise software trick. Simplification of it would serve no interests of anybody in the position to do the simplification.


I'm going to voice a contrarian viewpoint here and say that "half million plus" salaries are actually a good thing. Rising tide lifts all boats and since a lot of technies live in areas with exorbitant cost of living, that money re-enters the economy at a rapid clip anyway. But such salaries are only good if commensurate value is being delivered for the money. Which in a large IT shop it might be, but as a small business owner K8S is a hard slog, hence my suggestion to simplify. I'm pretty sure 80/20 breakdown still applies, and 80% of K8S complexity could be removed without affecting anything much. One might suggest that I use GKE and bypass the problem entirely, but I need to run a lot of GPUs 24x7, and the pricing on those in any cloud is insane.


"a lot of techies live in areas with exorbitant cost of living,"

And what do you think is the primary driver of said cost of living?

I believe there is a ton of arbitrary complexity in the system, and though it's not specifically created, it definitely grows if it's not checked and entities with power have no reason to do that.

Google, Oracle, MS, Governments, Banks - have very little incentive to clear the weeds, usually just the opposite.

Wherever there is steady profit, there are layers of cruft.


>> And what do you think is the primary driver of said cost of living?

Mostly NIMBY-driven refusal to build more housing and transportation infrastructure. It's not like the US is lacking for land. There's no reason a dilapidated teardown-ready shack should cost $2M, no matter where it is.


No, there is NIMBY-ism in most places, the reason Valley prices are sky high is the salaries.

" There's no reason a dilapidated teardown-ready shack should cost $2M, no matter where it is."

The cost is not the shack, it's the land it's sitting on.

The higher the salaries in the valley, the more that dilapidated shack will cost.


You make it sound like people enjoy paying millions for dilapidated shacks, which I can assure you is not the case. The reason housing costs so much is limited supply coupled with high demand. There could be a 10-story 40-apartment building in place of that one shack. Place a few thousand of those strategically through Bay Area, and the price per square foot would come down big time even if the cost of land stays high. But, NIMBY. Can't reduce the "value" of all that (mostly dilapidated) real estate people already own.


" much is limited supply coupled with high demand. "

No, it's just 'supply and demand' neither are 'high or low' necessarily.

SV has quite high wages, that's a huge driver of demand.

The residents of SV do not want to be like NY or Hong Kong, that is their choice. It's the choice many, many places make as well. Zurich, Paris, even London, they don't live in high rises.

The attractiveness of Cali in many ways is that it's not entirely flooded/urban like NYC.

There are not just arbitrary ways to create more homes, it has an effect on the situation.


It is both. There is more money buying a relatively fixed number of homes, so price goes up. But if you could buy a piece of land with one home. And put 10 units on it, then that could be offset.


Simplified Kubernetes is a thing that exists. OpenShift (and the open source version, OKD) jumps out as the immediate example. There are other non-k8s tools that cover some of the same territory, like Docker Swarm or Cloud Foundry.

There's still a learning curve, but it's much more humane than Kubernetes.


I think you meant to write "(and the upstream community version, OKD)", because OpenShift is also fully open source.


Yes, many thanks for the correction.


Hmm, I think you and many others do not get how complex a general purpose infrastructure can and should be.

Kubernetes is very simple. And it will become much more complex with the growing hardware, network, and applications it's trying to manage.

What's missing is that there is a layer of complexity on top of k8s are still left for figuring out. And I think the operator's pattern is the right abstraction for service jobs. Some kind framework is still needed to handle the batch/offline workloads though.


Quite the opposite, I want it to be flexible and pluggable for other use cases other than the most simple. I've gotten a lot of benefit from adding custom features.


I'm not sure but would something like docker swarm qualify?


Re configuration: ksonnet is an option (although I personally find jsonnet a “lipstick on a pig” kind of solution).

There’s some work going on to have something more user-friendly (think Google’s Piccolo) - https://github.com/stripe/skycfg (disclaimer - I contributed to this project)


There's also Kubecfg [1], which uses Jsonnet, but has a much smaller surface area than Ksonnet.

[1] https://github.com/ksonnet/kubecfg


Could you describe Piccolo a bit? Can't find anything on it.


Not going into too much details- it was a Python-esque dsl equivalent to bcl. You still had to learn Borg abstractions but at least you didn’t have to fight the language as much if you wanted to implement DRY in your configs.


Piccolo is very similar to Pystachio.


> I wish there was something drastically simpler

Have you tried Nomad?

https://www.nomadproject.io


> _and_ can't re-create the cluster with a new version every time

actually I used kubeadm and the higher the version was going the better it worked for major upgrades.

At the moment with the new master upgrade methods I did not have any problems so far. on two clusters.

Sadly I created my cluster with an "external" etcd, beside that it is internal and also tried to maintain my own certificates, which is now a pita. (at the time cert handling wasn't as good in kubeadm as it is now).

Also I have a CloudConfig/Ignition Config creator which can bootstrap all necessary configs to bootstrap a kubeadm cluster on ContainerLinux/Flatcar Linux. So if I really have time I can just recreate a new cluster and move everything over. (I.e. the only thing which is problematic in "moving" over is the database created with kubedb)

Also you can use keepalived as your kubeadm load balancer.


Nomad is drastically simpler than Kubernetes. All you need is consul and nomad to get a running cluster


I think Istio (https://istio.io) is a nice effort to create both an abstraction on top of k8s and to package a set of commonly needed functionality out of the box. Unsure of its production status or overhead though.

Also I'd only go with a managed k8s solution and I'm not sure I'd consider k8s for older or non-microservice/containerized architectures. In the later case though I don't think there's anything better out there in terms of orchestration.


I have pretty mixed feelings about Istio. It's trying to solve a lot of fundamental problems by introducing yet another layer of stuff. It's basically the middleware box all over again.


Lots of magic for me, I've broken a (dev) k8s cluster by installing Istio via Gitlab k8s integration. The overhead appeared to be non-negligible, but I noped-out-of-there pretty quick, so I don't have the data to back that up.


Hi bvm, GitLab PM here. Sorry to hear your dev cluster broke. Would like to offer any help we can provide and if possible learn more about the failure so we can take corrective action to avoid this in the future. Thanks.


Hi drugesso - Thanks for getting back to me. I actually signed up to premium to get support for this issue. Got one email back, replied and then never heard back :(

It would be really great if there is a human I could speak to at GitLab about this. I've put my email in my profile.


Hello Tom, thanks for reaching out about this. I've forwarded your request to the team internally, please let us know when everything is sorted out :-).


we used to maintain our own k8s cluster and it's a pain in the ass given we have no dedicated ops. the cluster crashed every one or two month and we never tried making it up to date.

I suggest every startup use a hosted k8s solution, which takes care of most things like authentication, networking, monitoring, updating, etc.

also keep away from templating system such as jsonnet which is a huge overkill. you will end up writing a lot code you will hate to read later. instead write your own yaml builder in CI, together with parts that do docker image building, and code that deploys the microservices

imo Google did a really smart move with open sourcing k8s, as a latecomer of cloud provider. now infrastructure become so insignificant since everything runs on docker and pods.


There was Rancher 1.6 with Cattle was the sweet spot for us. Rancher 2 went full kubernetes. Probably makes sense for their customers. We're looking for a replacement in that sweet spot.


Very true !!! There is no greater culprit responsible for "complex systems" than the act of "extensible/future proof" in software design !


I think that the unix philosophy of focused and relatively simple tools that are easy to glue together is a better way to future-proof. Yet to do that you need to have a stable substrata to provide the basis of composition. In k8s case it seems that k8s _is_ the basis where the composition is to happen upon.


In conversations, I often compare the Kubernetes API to the Linux Kernel API (as analogy) - both provide primitives we kind of "agreed on" in the industry. I hope the Kubernetes ecosystem will flourish in the same way as the Linux base.


Having used Docker Compose/Swarm for last two years, I remember having problems with them twice. One of which was an MTU setting which I didn't really understand why, but overall I was relatively happy with them. Since Kubernetes seems to have won, I decided to learn it but got some disappointments.

The first disappointment is setting up a local development environment. I failed to get minikube running on a Macbook Air 2013 and a Ubuntu Thinkpad. Both have VTx enabled and Docker and VirtualBox running flawlessly. Their online interactive tutorial was good though, enough for the learning purpose.

Production setup is a bigger disappointment. The only easy and reliable ways to have a production grade Kubernetes cluster are to lock yourself into either a big player cloud provider, or an enterprise OS (Redhat/Ubuntu), or introduce a new layer on top of Kubernetes [1]. Locking myself into enterprise Ubuntu/Redhad is expensive, and I'm not comfortable with adding a new, moving, unreliable layer on top of Kubernestes which is built on top of Docker. One thing I like about the Docker movement is that they commoditize infrastructure and reduce lock-ins. I can design my infrastructure so it can utilize an open source based cloud product first and easily move to others or self-host if needed. With Kubernetes, things are going the other way. Even if I never moved out of the big 3 (AWS/Azure/GCloud), the migration process could be painful since their Kubernetes may introduce further lock-ins for logging, monitoring, and so on.

[1]: https://kubernetes.io/docs/setup/pick-right-solution/


> The only easy and reliable ways to have a production grade Kubernetes cluster are to lock yourself into either a big player cloud provider, or an enterprise OS (Redhat/Ubuntu), or introduce a new layer on top of Kubernetes [1]

I think you might have misunderstood that page. The standard and universal way to deploy Kubernetes on to either your own bare metal or any cloud provider is to use kubeadm. However, if you would like a simpler and more automated solution and/or one backed by a vendor, you are welcome to pick any of the hosted platforms, distributions, or installers. CNCF has certified 70 conformant solutions: https://www.cncf.io/certification/software-conformance/

> Even if I never moved out of the big 3 (AWS/Azure/GCloud), the migration process could be painful since their Kubernetes may introduce further lock-ins for logging, monitoring, and so on.

If you choose open source solutions for logging and monitoring like Fluentd and Prometheus, then you can avoid locking into anyone's value added services and remain completely portable. If you decide to go with a vendor's solution, you may trade convenience for higher switching costs.

[1]: https://kubernetes.io/docs/setup/pick-right-solution/

Disclosure: I'm executive director of CNCF and run the conformance program.


The point is not about the minimum conformance, but rather the lock-in provided by the maximum configuration / extensions of each vendor.

Take AWS EKS as an example. Their feature page[1] does mention conformance. Then it mentions 20 other non-conformance focused features that create an effective lock-in.

k8s is becoming like OpenStack in this regards. You need to embrace a vendor version of k8s in order to have a functional cluster without a massive team.

[1] - https://aws.amazon.com/eks/features/


This isn’t my experience at all. I as one person taught myself over the past couple years docker then Kubernetes and am now managing a small 3 node bare metal cluster on my own.

But using rancher 2.0 has helped a bunch to ease me into it. Now I feel comfortable enough to start up a cluster on my own without it.


I strongly feel that Rancher 2.0 is just not ready for release yet. I run into so many weird edge cases using it, and don’t get me started on persistent storage...

Longhorn is pretty easy to set up, but it seems to have some issues actually working (and slow).


I've had issues with longhorn as well, though the problem I was having (volume attachment race condition) is supposedly resolved by upgrading kubernetes.

I'm also trying out rook/ceph at the moment.


Just curious, is your cluster used in production or for any serious purpose?


Not for production yet. I'm in the process of moving our infrastructure from VMs that no one was really managing to Kubernetes. We're about to put together the production cluster and move over a couple of the applications that are fully ready and have been tested on the development cluster for a while.

Currently the cluster I'm talking about is being heavily used for QA testing purposes and Beta testing of applications. Kubernetes and a small operator I've written allows us to dynamically create QA servers for issues as developers finish their changes. Beta testing is easily deployed by a helm chart I wrote up.

Next we'll start up the production cluster and use it for internal applications and finally we'll be moving over our customer facing applications.


I've had small-ish docker swarms in production for a couple of years as well, and I really don't understand why it doesn't seem to be popular at all. I feel like I need to move to K8S just because swarm seems to be going away, but I'm really not seeing the technical advantages at all.

If someone could point me to an article explaining why k8s is so much better than swarm, I'd really appreciate it. Are the big advantages only at 100-node scales?


I'm also constantly surprised at how unpopular docker swarm is given that everyone already uses docker itself. Why do you think swarm is going away though? I love the idea of just using my docker compose file as my deployment config.


My company tried using Swarm 2-3 years ago and ran into the problem that it didn't actually work. Containers would just go missing from the network. Consequently we switched to Kubernetes. I imagine it does work now but it seems to be too late.

I've recently started using Kompose to autogenerate Helm charts from docker compose files and I've found that pretty satisfactory.


I'm not familiar with Docker Swarm, but the Kubernetes API is the major strong point for me (not saying that it's perfect): it has the right abstractions (CronJob, StatefulSet, ..) and is extensible (Custom Resource Definitions). There are many ways to run containerized workloads (ECS, Mesos, Docker Swarm, ..), but the de-facto agreement on the Kubernetes API is a game changer: now we can start building things on top of it :-)


The second version of Swarm (the first version built into the stack) was really unstable. It got a bad reputation very quickly in one area that it could not afford to.

That said, I was at a conference, talking with a CIO of a K8s related company, back during the initial days of that startup ecosystem. I was told off the record that Google and Redhat were offering an "marketing budget" for companies that would come on board the Kubernetes ecosystem. The person rightly stated that K8s was going to roll everyone else because of it.

Can't validate that - it's hearsay, but I definitely think that it was a big factor


"Going away" might be an overstatement, but I don't see much evidence of it being in use. When I search for help on topics, I don't see much beyond the primary docs. There doesn't seem to be much of a community for swarm out there, and now even the desktop versions of docker come with k8s.


Swarm is not going anywhere.


Or you could just learn how to manage infrastructure the old fashioned way, which was never broken for small business and mid-sized enterprise environments. The only time you need the complexity and overhead of something like kubernetes is when you are truly large or when you have caught the in-fashion disease.

It's quite simple for a 20 year SA to stand up a highly integrated environment with modular monitoring, directory service, virtualization and hybrid cloud options for all services in a week. Why don't you hire one of these for the job instead of recipe/containering yourself into 'doesn't work, I dunno' posts.


>Why don't you hire one of these for the job instead of recipe/containering yourself into 'doesn't work, I dunno' posts.

Because every.single.one of these "integrated environments" I've ever come across was an objective mess, poorly documented and littered with tech-debt.

It was clear the "20-year SA" had forced 20-year old administration abstractions and ideas on top of modern infrastructure and application concerns. It was cheaper/better/easier to throw it out and rebuild on something like k8s than to make any attempting at "scaling" the existing solution.

You're simply trading the "in-fashion" disease for the "I'm a 20-year Linuxbeard I know best and no one tells me different" disease.


There is some of that, certainly, but you can misapply any solution and perform poorly using any tool or personality.

For your information I find the new breed and k8s FUD to be more of a reaction rather than an educated, seamless and practical systems approach.


Why not hire an SA? Because K8S is free and runs well from small to large and is available on every cloud where the IT infrastructure already is.

Why is it better to spend money to rebuild a fraction of K8S with a patchwork of infrastructure put together by a single person?


Sales talk. This doesn't reflect reality, sadly.


What's sales talk? What's not reality?

K8S can replace quite a lot of sysadmin responsibilities and is currently doing so at thousands of companies.


IMO the commonality at thousands of these companies is a relatively uncomplicated application stack requirement or micro services.

If you have ever had to talk down execs bitten by the k8s hype in a scientific compute or HPC environment you would know that the marketing and sales talk is both prevalent and damaging.


I never used Docker Swarm (so can't compare), but I don't fully understand your point about Kubernetes cloud lock-in. Certainly there are important differences in networking, load balancing, persistent volumes, and other cloud features, but that's not something any platform can just hide/eliminate (e.g. think about AWS ELB/ALB/NLB vs Google Load Balancer). The Kubernetes concepts (Deployment, Ingress, Service) still work mostly the same for the user across clouds. Some other details like non-standardized Ingress annotations are obviously due to not having them agreed in Kubernetes core API (nginx ingress supports other annotations than say Google LB or Skipper).


> I don't fully understand your point about Kubernetes cloud lock-in

The kubernetes folks describe a tentative solution to cloud lock-in here: https://kubernetes.io/docs/concepts/cluster-administration/f... OP isn't the only one with those concerns.

It would be nice when you can switch your cluster load from any of the cloud providers, or your own on-prem setup as you go. For instance, I could see people wanting to have a default small cluster on their on-prem setup, and be ready to scale on cloud when needed.


Kubernetes federation doesn’t really help with lock-in. Its for multi-cluster management. It’s also kind of stalled / going to be rethought - it has too many issues to be used at scale. Some may still like it of course.

There are two points of lock-in: beneath Kubernetes or on top of Kubernetes.

You still need to install and manage your cluster, so the low level lock-in of a distro is hard to avoid. What you may be looking for is kubeadm. https://kubernetes.io/docs/reference/setup-tools/kubeadm/kub...

It is the closest primitive to help standardize “here’s how to install and run a K8s cluster in a certified way”. It’s unfortunately incomplete.

As for what’s on top of Kubernetes, that will be an area ripe for completion and lock-in.


Federation is a solution for multi-cloud and hybrid cloud, which is a different problem than cloud portability.


Most of the kubernetes toolchain provides nice support for delineating the requirements from separate cloud providers, too. Compared to most alternatives, a little HELM magic to support hybrid cloud installations is a piece of cake.


The kubeadm api for ”phases” going beta in 1.12 -> 1.13 has actually made rolling k8s clusters (almost) a breeze.

It used to be really clunky, but these days all you need is a simple bash script or ansible play (or whatever you’re comfortable with) to get going.

But yeah, no unix philosphy vibes from k8s as a whole...


Kubernetes does not provide functionalities like logging & monitoring. The way how this works is totally a bunch of open source solutions like Prometheus & Fluentd.

Actually, I barely saw any Kubernetes cloud provider provides meaningful service which can lock myself in, they are basically managed Kubernetes clusters with their cloud services as plugins. You can verify this by comparing GKE/AKE/EKS, you'll find they are almost same thing.


Out of curiosity, why do you need RHEL or subscription from Canonical for the production Kubernetes setup? What's wrong with plain Ubuntu or CentOS?


It’s common to pay for things to make them easier to configure/manage.

Red Hat OpenShift on RHEL, Pivotal Container Service on Ubuntu, Red Hat’s nextgen CoreOS based Kubernetes, Canonical’s Charmed Kubernetes Distribution on Ubuntu, etc. all have different config management , install, upgrade, patching mechanisms that vary from Ansible, to Terraform, to BOSH, to Juju. Some handle PXE bare metal, some don’t. Etc.

There usually are free / no pay versions of the above that you can use self-supported, but then you’ll also need to coordinate your own upgrades and use community forums for q&a rather than being able to contractually have someone looking out for you and answering your questions.

If you’d prefer to avoid lock-in, All of that plumbing would otherwise have to be configured and scripted yourself with your chosen toolchain plus the newer “k8s small tools” like Kubeadm, Kops, Kube-spray, etc.

As the old saying goes, open source is only free (as in beer) if your time has no value.


> It’s common to pay for things to make them easier to configure/manage.

Yes but in the case of RedHat specifically those goals are not achieved.


I’m sure IBM will make it better.


You probably don't want to configure kubernetes manually... Kops is a thing though, but it's still a lot error potential, if you want to go to production


This. We use Rancher within our data center, because configuring and managing k8s is not trivial. Rancher eases a lot of that pain for our small team.


Digital Ocean's K8S offering is out of beta now: https://www.digitalocean.com/products/kubernetes/


Migrated my very small cluster from GKE to DigitalOcean's K8s a few weeks ago. I was using 3 nodes on GKE with 1 core & 3.75GB RAM per node, and the cost was around 100 $ per month including load balancer for the cheapest region, `us-central1-a`. Now, on DigitalOcean, I have 3 nodes with 1 core & 2GB RAM per node. The cost is exactly 40$ including load balancer.

I am a pretty basic user, I have started using k8s on this project as a learning and 100$ was too much for the learning price, but now on DO I get a similar cluster for less than half of GKE price and I feel like it is worth it, considering all the simplicity and observability of deployments. Also, DO allows me to select regions without any price difference, so I was able to select Amsterdam to get 10 times better latency from where I live. My setup is quite basic, my app with aroud 8-10 pods, + additional stuff such as cert-manager and prometheus.

YMMV, but so far I am really happy with DO's offering, both in terms of performance, simplicity and performance. I am not a power user and definitely operate at no scale, but using DO in general is much simpler than using GCP with GKE.


The problem with that is that I can almost guarantee that it would still be cheaper and easier to manage if you just leveraged whatever cloud provider's managed service was there to run your stuff.


Probably yes, but that approach has its disadvantages as well.

First, the biggest problem I see is the huge vendor lock-in you accept with the PaaS offerings such as AWS EBS or GCP App Engine. When you commit to one of these platforms, it is really hard to get out of it; it requires engineering effort to move to another provider and feature parity between the providers for your application to be supported. Plus, you get to learn platform-specific stuff which has no standards across providers. Plus, it is usually slow and bloated; have you ever tried deploying something to EBS? It takes at least five minutes without any meaningful information about what is going on or if your deployment succeeded.

Second, the tooling you get is usually very small compared to what Kubernetes ecosystem has. Each and every platform ask you to use their own tools, but there is a high possibility that the tools don't fit your usecase, or you may need to modify your workflow. With a solution like k8s, you only need to support the standard, which is k8s itself roughly, and you are free to use whatever tooling you want.

Third, done right, Kubernetes allows you to move to another provider very easily without changing a single line of code in your Kubernetes definitions or your application. You define the desired state of your cluster, you check in all these stuff into your VCS, and since k8s forces you to do these stuff from the beginning, at the end you usually have a nice, reproducible system that is more or less cloud agnostic. You have logging, horizontal scalability, isolation, easy deployments, easy rollbacks and all that stuff. I have migrated from GKE to DO's Kubernetes offering without changing a single line in my Kubernetes definitions or my application. Of course, my usecase is very very very small compared to most of people around here, but that was my experience.

FWIW, I think Kubernetes is still a good learning to understand about current state of infrastructure, deployments and the ideal state we all try to achieve. Whether or not a business should depend on it is a whole another topic.


For PaaS I thought the Heroku model was nice - the benefits of containerisation built into the stack and you don't have to manage any of it - ahead of Fargate and way ahead of K8s. On par with serverless, but with better compability with monolithic or partial microservice architectures, albeit higher cost.

There's no strong vendor lock-in either, buildpacks and backend services are much of a muchness across Dokku, Herokuish, Flynn, Cloud Foundry etc. If your app is 12-factor with externalised state, you're plain sailing with most PaaS and simple docker setups, or at least I don't get what K8s brings to the table in terms of operational simplicity.


What kind of managed services would you use?


DO K8s is pretty neat, but last time I checked it did not have the metrics server (CPU/mem metrics, also "kubectl top") yet.


It’s good, but the storage layer has some bugs. For example, if you create a pvc, then resize the volume according to their docs, the new size doesn’t reflect in k8s. Also you can create pv’s manually and it won’t show up in the dashboard.


Hi, I'm the maintainer of the csi-digitalocean driver. Thanks for the feedback! Unfortunately "resizing" is still not supported by csi yet (the resizer sidecar is still in development), hence there is no way to provide the resize functionality to the customers. Once the `external-resizer` sidecar is finalized, it'll be part of the csi-digitalocean driver.

When you say dashboard, which dashboard do you mean? Happy to look at it. Thanks again for the feedback.


I am a developer and I find k8s frustrating. To me, its documentation is confusing and scattered among too many places (best example: overlay networks). I have read multiple books and gazillions of articles and yet I have the feeling that I am lacking the bigger picture.

I was able to set it up successfully a couple of times, with more or less time required. Last time, I gave up after four days because I realized that what I need was a "I just want to run a simple cluster" solution and while k8s might provide that, its flexibility makes it hard for me to use it.


Have you used other google products? I find their documentation routinely incomprehensible and difficult.


Agreed! I am an engineer and have written documentation off and on throughout my career. I'm continuously dismayed at the incomprehensible documentation generated by most companies. Google's documentation is particularly bad though.


I have a theory that the type of people who make it past the google interview are smart people who are bad at teaching. Like they get all the concepts, algos etc.. but when it comes to distilling it into an Explain-Like-Im-5 tutorial, it just goes to hell very quickly.

What they need to do is hire some people who are great teachers, explainers etc.. Avoid people who rely on already attained technical knowledge, design patterns, algos etc.. to pattern match on new tech to instantly grok it. The 'noob' people who question the engineers who designed the tools and ask a ton of dumb questions about how it works so they can then translate it into everyday tutorial paragraphs.


Kubernetes has always had an identity crisis.

Who is aimed at, app developers or platform operators? Clear, obvious contracts between the two roles are valuable, even if you decide to combine them.

I'm moderately hopeful that Knative will help in that regard, as it is more conclusively oriented towards the developer. But I am wary that since it leaves the implementation details completely visible, it may not achieve that goal.

Disclosure: I work for Pivotal, we have products based on both of these.


> app developers or platform operators

Definitely not the former. The YAML-based configuration is not a pleasant app deployment experience. Companies end up needing to do some sort of auto-generation for it to make it sane for app devs.

App developers want experiences similar to heroku. They want to git push and have applications safely roll out without downtime or configuration.


I’m a little behind on Cloud Native adoption so I gotta ask - what’s preventing a “git push to heroku” from being the norm here? Are we using the wrong abstractions? Or are the abstractions still too low?

...why is Heroku/buildpack not running away with it?


The main alternative was and still is Cloud Foundry (by way of disclosure, I've been in and around CF for years). But it's always been pitched to enterprise customers and approximately zero effort has ever been made to expand awareness outside that world. Kubernetes had Google's massively powerful halo blowing in its sails.

> ...why is Heroku/buildpack not running away with it?

I predict that buildpacks are going to make a big comeback, because they make everything about Day 2 much easier. Pivotal and Heroku have been cooperating on the Cloud Native Buildpacks[0] spec under the CNCF sandbox process. There is some seriously cool stuff coming down the pipe.

[0] https://buildpacks.io/


Personally, I stopped using Heroku because it is more expensive than manually deploying to something like Digital Ocean. I also found dealing with third-party services that were here today and gone tomorrow a little irritating. Maybe things have improved recently. I haven't used Heroku for a few years.


My team has configured gitlab with post-commit hooks so that pushes to dev branch get compiled/packaged/deployed to the dev environment with a simple git push.

Also, I don't find YAMLs bad for deployment.


There's no question that you can build higher-level deployment strategies on top of the system (or any other system). It's just not baked in.

Also, YAML in general is not a bad thing, but kubernetes presents you with a ton of boilerplate and a complicated DSL. There's a lot of rough edges (annotations don't get validated, for example).


Achieving that is a difficulty task, though. Personally, I’d not like to rely on an abstraction on top of a system with that level of complexity for production because I expect to run into situations that can only be solved with deep knowledge of k8s.


I feel like this separation is achievable, if only because it's been achieved several times already.

Google App Engine did it. Heroku did it. Cloud Foundry did it. In none of these situations does an app developer need to know or care how the bits are plumbed, they only need to enjoy hot and cold running code.


Kelsey has the answer for you: "Kubernetes is a platform for building platforms. It's a better place to start; not the endgame." (https://twitter.com/kelseyhightower/status/93525292372179353...)

As an application developer, you probably also don't work with the Kernel and syscalls directly (anymore), so I guess you can expect higher abstractions and a smoother experience for Kubernetes in the future.


Kubernetes doesn't specify anything about overlay networks. That's up to the CNI provider. Are you referring to flannel's documentation?


Perhaps there is a good plural sight course? Sounds like a tech you have to invest many hours to learn.


I don't understand all the negative comments here, K8S solves many problems regardless of scale. You get a single platform that can run namespaced applications using simple declarative files with consolidated logging, monitoring, load-balancing, and failover built-in. What company would not want this?


this is too broad. i think that may actually be the problem: in theory it can do a lot of things, but in the real world it’s hard to get all those theoretical benefits.

for me, if you’re in the cloud you don’t need k8s. your favorite cloud provider has already figured out logging and monitoring and the basic things you need to get going. (another story if you run on bare metal)

if you’re not running a legacy app you don’t really need containers either. containers are great for legacy apps, for poorly written software or if you like overengineering. the abstraction you need is called a vm. use it. (again if you are in the cloud).

your app/service/thing is not as complicated as you think it is (or at least it should not be). I see a lot of people feeling like they need to experiment with new technology, on the job, on whatever they are doing now. actually building something that works and is simple as fuck seems to take a backseat and these types of people will create a narrative around using the new flashy thing. this is how you end up with production systems leveraging tools in beta and you end up closing shop when you finally figure out that you don’t have the resources to understand and maintain what you’ve created.

there is a time and place to experiment and learn. on small projects or on your own time. it takes experience to understand the hype cycle and to distinguish good tech from the hype.

as for k8s? yes, it solves some problems but it also creates others. do you like basically spending the time you’ve saved on setup and deployment to maintain/troubleshoot/upgrade your cluster? knock yourself out.


There is a very big gap between IaaS and PaaS. K8S is an abstraction on top of VMs so you can have a customizable PaaS that runs on YAML code. It has nothing to do with how complex your app is because K8S is about running it with less work in a declarative fashion. I'm currently in and have worked with dozens of startups that have saved lots of time by removing all the ops overhead with K8S because it runs the servers and we can just deploy our apps.

It seems like most of the problems are actually about installing and running K8S software itself, but then 95% of companies won't be doing that and using the managed offerings instead. This is no different than companies using the cloud over running their own DCs.


"Every sufficiently large K8s deployment contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a PaaS"

K8s may well be the best foundation upon which to build a PaaS, but I think building a PaaS should be met with the same eye-rolls as writing your own crypto. Use Heroku or GAE or Elastic Beanstalk or IBM BlueMix or Azure PaaS or Cloud Foundry or Openshift.


It's just YAML files, there's nothing to build.


It's just YAML until you need:

* multi-tenancy

* auto-scaling

* security auditing

* automated os/container patching

* multi-tenant self-service route/ingress management

* multi-tenant self-service logging, monitoring and alerting

* multi-tenant self-service databases

and so on


You can go entirely with YAML in a PaaS. Unless you feel you want it.

Just push code. That's it. That's all you need to do.


How is that different though? You have to setup the PaaS the first time, the same as you write the YAML file the first time.

After that you push code regardless of destination.


I think we have different ideas of what a PaaS is.


From a developers' perspective, k8s feels like a holy Grail. Having fully embraced it with my latest rails app, I can say confidently that I've never had a more straightforward and enjoyable experience than K8s. It's absolutely the correct abstraction layer for me; it gives me all the power I could ask for in as concise a definition as I could possibly expect.

I think a lot of the complaints against K8s are from the ops side of things. In my org, I don't actually run or upgrade the K8s cluster myself, so those pain points aren't mine to bear. When you're running your own k8s, the operational complexity of managing the cluster itself is not trivial and the change in mindset for traditional sysadmin types is a substantial hurdle.

My own take: K8s (or something very much like it) is absolutely the future, but the operational challenges of migrating to it at this time should not be ignored if you want to run it yourself and have existing ops experience. This will only get easier over time as tooling improves and sysadmins start seeing that this is the future they have to embrace.


Agree, although I don't think the Ops portion is that hard either, at least certainly not that different from all the other complex software that used to be installed and maintained. I feel it's just the usual pushback against change and general commoditization of IT that's leading to most of the complaints.


I think I agree with this also, but my experience is limited enough to allow for the likelihood of additional complexity and gotchas that will become evident over time.

I'm sympathetic to concerns of ops, having worked in that space before working as a developer. It's tough to dive into a new approach and discard existing, well understood solutions. It's an especially hard sell when the benefits of the new tech are opaque and/or don't solve problems you personally have (until I actually used k8s for my own project, I really didn't understand the hype either).

I'd suggest though that the "usual pushback" is a powerful force on its own that may make k8s a poor fit for an org (right now). If some portion of the staff - especially those responsible for reliably maintaining the infrastructure - is resistant to the approach, rushing adoption is highly likely to lead to failure (if for entirely non-technical reasons). The benefits of k8s will only become more apparent and more easily realized over time, and the opportunity cost of waiting until there are fewer "Kubernetes Failure Stories" on HN might not be all that great.


I very much agree that kubernetes is useful in an environment that doesn’t need to scale, but do tell how it enables consolidated logging and monitoring, since my medium/small shop is spending quite some time setting up our own infrastructure for it.


Installing a managed log ingestor is stupidly easy in Kubernetes. For example, on GCP here's the guide to getting it done [1]. Two kubectl commands, and you get centralized logging across hundreds of nodes in your cluster and thousands of containers within them. Most other platforms (like Datadog) have similar setups.

Infrastructure level monitoring is also very easy. For example, if you're on Datadog, you flip KUBERNETES=true as an environment variable in the datadog agent, and you'll instantly get events for stopped containers, with stopped reason (OOM, evictions, etc), which you can configure granular alerting on.

Let's say you're in a service-oriented environment and you want detailed network-level metrics between services (request latency, status codes, etc). No problem, two commands and you have Istio [2]. Istio has Jaeger built-in for distributed tracing, with an in-cluster dashboard, or you can export the OpenTracing spans to any service that supports OpenTracing. You can also export these metrics to Datadog or most other metrics services you use.

[1] https://kubernetes.io/docs/tasks/debug-application-cluster/l...

[2] https://istio.io/docs/setup/kubernetes/quick-start/


I will admit that these things are slightly easier Kubernetes, my original point was mostly just to say that Kubernetes itself doesn't really provide any of these things in meaningful ways - you just described a bunch of separate, nontrivial systems, that solve many but not all logging/monitoring needs.


I run a Filebeat container with privileges to read stdout/stderr of all other pods, which then forwards to ElasticSearch. (https://www.elastic.co/guide/en/beats/filebeat/master/runnin...). It's fairly straight forward, then Kibana + Watcher can ship alters to PagerDuty based on log patterns / queries / limits, etc. I think Watcher is open-source/free now?

I also have Prometheus + grafana, which similarly collects lots of stats from around the cluster, but I'm fairly sure I'm the only person who uses that dashboard, since the only things hooked up to Prometheus are databases and such, no internal applications (yet!).

Being able to aggregate stdout/stderr across dozens of machines previously would have cost either a lot of Chef setup time or a contract with some provider. Now I get a fairly straight forward open-source stack that can be refined over time, and the yaml re-used very easily in any cluster. Plus, the metadata collected from Kubernetes about each log line is extremely useful (For example, out of the box you can query by Kubernetes labels for your graphs etc)


Are you running it yourself? The K8S dashboard gives you logs and basic monitoring out of the box, or you can get logs directly from kubectl.


We are. I guess I wouldn’t consider that enough for our purposes. Actually retaining and alerting on logs, or alerting on anything for that matter is not out of the box unless I’m missing something.


The typical approach is to setup Fluentd for logging. You set it up as a daemonset, and have it mount /var/docker from the host. That gives it access to all container logs, which you then stream to your desired store.


Yeah - that’s far from batteries included though, and comes with many limitations, especially for non-12-factor apps. It also doesn’t begin to answer questions about what that log store is, or how to alert on the contents of logs.


Logging and monitoring are not built in to K8S, at least not something you would rely on for operational purposes.

I believe most people use an EFK/ELK stack for centralized logging and Prometheus for Monitoring.


The k8s hype feels like the Hadoop hype from a few years ago. Both solve problems that most don't have and there is a lot of complexity - some due to the nature of the problem, some because everything is new and moving.

Of course it's 2019 and you have to migrate Hadoop to run on k8s now :)

My impression is that if you are a small shop and have the money, use k8s on google and be happy, but don't attempt to set it up for yourself.

If you only have a few dedicated boxes somewhere just use Docker Swarm and something like Portainer.


Docker swarm is really nice. I wish it had more traction. I fear it's going to be dropped and leave me holding a bag full of bugs.


Swarm isn’t going anywhere. It has a growing community and the team is activly working in the repos. See my updates: https://www.bretfisher.com/the-future-of-docker-swarm/


Beyond strictly runtime failures, 2018 feels like the year that most of my friends tried kube but not everybody stayed on.

The adoption failures are mostly networking issues specific to their cloud. Performance and box limits vary widely depending on cloud vendor and I still don't quite understand the performance penalty of the different overlay networks / adapters.


Network is a high performance system, and each layer you add adds latency.

Consider a traditional monolithic application. In comes your HTTP request in one end, a bunch of cross thread communication happens, and database queries come out the other end. With that, you have 2 points of network communication.

Now with a micro-service, you might have 4 or 5 applications that are needed to replace the above monolith. Throw in a service mesh on top of your cloud providers SDN, you've turned 2 points of network communication into 20 or more. The 5 micro-services talking to each other and the service meshes talking to each other. Add on top the additional processing overhead of maybe 1 to 2ms, you've just added at best 10ms round trip time to get to your databases and some more CPU. And to what benefit? TLS? You can do this in your application, or trust your private network is private. Tracing? You can do this with PID matching and watching the kernel's networking stack.


So true. For some low latency applications, anything above the bare minimal virtualization is not acceptable.

For what I do, in theory, many things should not impact results. In practice, anything that upon measurement impact results is stripped away. Think A/B testing but for every single component - including the major version of say the python interpreter.

That's how you end up running many things baremetal.

I'll say the future is not serverless but cloudless


If you do trading or adtech (which, let's face it, is trading) then yes, the overhead from virtualisation and cloud environments is an unacceptable tradeoff.

For most companies and their operating environments cloud gives flexibility in capacity planning, freedom to experiment wildly, access to practically unlimited storage[ß], plus reliable perimeter load balancing. And of course, for most companies engineering costs far more than compute. Paying the premium for cloud environment makes business sense if it means your development teams don't have to spend time waiting for available resources.

Funnily enough, even a number of high-end trading firms (including HFT shops) are moving to cloud. Number crunching for models, backtesting, CI, analysis and simulation pipelines, ... All of those require resources and rarely need to operate in tandem with the real-time trading systems. The same flexibility, ease of capacity expansion and freedom to bring transient resources online as needed saves on expensive development time.

If you're big enough, or need to operate fast enough then cloudless and baremetal are going to bring better ROI. Most companies aren't at these extreme ends, though.

ß: within limits; if you need to produce and store petabytes of changing data on a daily basis...


I would argue that the longtail of applications does not really care about the impact of overlay networks. For us, the biggest impact on low-latency applications (where 1ms makes a difference) on Kubernetes was disabling CPU throttling in all clusters (you can also remove container limits). Background: a Kernel CFS quota bug leads to throttling even if quota is not yet reached, see https://www.youtube.com/watch?v=eBChCFD9hfs&feature=youtu.be...


Overlay networks compound unstable networks. If your internal network latency spikes to 2ms from 0.5ms, normally that's not a huge issue but if you have micro-services that need to talk to each other- using my example, a 1.5ms round trip time would cause an additional 8ms of latency.

Sure, that's a low number but if you already have 150ms of processing, adding another 10ms might cause issues.

Also, if you're disabling the resource controls on kubernetes- you're kinda defeating the whole point.


I'll be retired by then..and I disagree that it will be cloudless but for some compute the cloud is a risky option being forced upon them by exec level marketing, solutions hype (like k8s and docker).

Some of the old hands always keep a colo + baremetal in the back pocket for always on and base testing; only pushing to the cloud after due diligence in comparative testing. That's a more realistic approach than 'cloudless' for scale.


Trusting your private network is private turns that whole network into a candy store once a beach head in that network has been established.

Defense in depth exists for a reason.


If you're a smaller organisation, then spending too much time on "what if my private network is not private" will also cause you to struggle.

For most people, in most practical scenarios, you have to hang your hat on something.

Yes, take basic precautions, but if you lacked the chops to keep your private network private, then you have little or no chance of preventing the ensuing attacks.


> adoption failures are mostly networking issues specific to their cloud

Do you have any pointers/write-ups with more information or plans in this direction? I would be interested to learn more.


this article is about feature completeness of the different managed kubes

https://kubedex.com/google-gke-vs-microsoft-aks-vs-amazon-ek...

It's pretty easy to dig up speed tests of the overlay networks, but a lot of these are just rating userspace overlay networks. The new hotness is the plugins provided by the cloud vendor which integrate with their SDN, and I haven't seen a good benchmark for those yet.

Most interesting reading will be to look up managed kube networking plugins on github and look for open/closed issues with lots of stars.


A team at my work has spent a stupid amount of time trying to nail down networking issues with hand rolled k8 in AWS. HAd to move away from using node ports to fix it. Total pain in the ass.


I managed multiple mesos+marathon clusters on production a little over 1.5 years, and when I switched over to the K8s the only thing that felt like an improvement was the kubectl cli.

I really liked/missed the beauty of simplicity in marathon that everything was a task, the load balancer, autoscaler, app servers everything. I think it failed because provisioning was not easy, lack of first-class integrations with cloud vendors and horrible horrible documentation.

Kind of sad to see it lost the hype battle, and since then even Mesosphere had to come up with a K8s offering.


I've started the planning phase of a Kubernetes course, geared toward developers more so than the enterprise gatekeepers. As I read stories like these, I jump between different thoughts and feelings:

1) no matter what I think I know, there's too many dark corners to create an adequate course

2) K8S is such a dumpster fire that I shouldn't encourage others

3) there's a hell of an opportunity here

Thoughts? Worth pursuing? Anything in particular that should be included that usually isn't in this kind of training?


All three. It’s a gold rush, but as with any gold rush, conditions are hard going - that’s why there’s an opportunity.

Best way to think of Kubernetes is that it was designed to be a successful open source project that was widely adopted as a standard foundation to build products. It wasn’t designed to be a useable product on its own.

We are at the equivalent stage of Slackware and SLS and Debian Red Hat pre-1.0 stages of GNU/Linux distros circa 1994. Red Hat eventually ran away most of the money by the late 90s, but in the meantime, lots of opportunity to fill an unmet need.


Don't forget SuSE the sole surviving competitor. Best Buy SuSE Linux gecko box 2.2.14 kernel veteran.


As a person who loves tech writing, who owes his career to free coursera courses and online tutorials, and who is eager to teach people, this is an insanely difficult thing to get right.

Writing an ok tutorial isn't good enough. Writing an amazing tutorial is fine, if it is on a platform people know (such as LinuxAcademy, Pluralsight, or something similar).

I once wrote an article on getting started with a static website generator. I received a ton of praise in the comments, saying how great the step-by-step instructions are, and I felt great... Only to discover that I made a typo in one of the commands, and that if you actually went through the tutorial, there's no way you'd get past that one step, unless you knew what you were doing (in which case you wouldn't go through a getting started guide, most likely).

All I'm saying is, unless you can write an amazing content on a platform where people go to learn and advance their career, no one's gonna use it, I'm afraid.


I think a good course would be setting up a k8s cluster for a simple "hello world" production app, which then includes topics perhaps about monitoring, upgrading, etc all the kind of stuff you want to know for getting an app up and running.


A better example would be a Wordpress installation. Hello World means skipping much of the important stuff, like storage and persistence.

As a bonus, show how to use Gitlab for deploying and managing the app. Gitlab + Kubernetes could be the holy grail for modern, self-hosted development, however a good, complete tutorial/documentation is very hard to come by. One has to pick the pieces from a lot of different places with sometimes conflicting information.

I'd happily pay 100 Euros for such a course.


If it was easy, there would be dozens of courses out there already! Sounds like you've found a pain point you can solve.


I think it's ok not to know something to make a course for it. Even if only to learn it better. But I'd be careful given 2, pursuing this could lead to burnout.

There is an opportunity for anything infrastructure related.


The answer is always 3.


Kubernetes solves a problem that most of the companies don't have. That is why I don't understand why the hype around it is so big.

For the majority, it just adds a little value when you compare to added complexity to infrastructure and the cost of a learning curve and the ongoing operation and maintenance.


I disagree, almost entirely. Kubernetes solves problems that every single cloud-based software company has.

What's the alternative? We spin up VMs, templated with AMIs, provisioned with an ASG? That works fine. But we want centralized logging. We want graceful restarts. We want automated rollbacks. The list goes on. These are not Google scale desires, these are "cost of doing business" asks for any cloud company. You can start building all of this on that core architecture of AMIs, or your cloud provider's equivalent, but all you're going to do re-invent what Kubernetes does, probably worse.

Kubernetes' problem isn't that it solves problems most companies don't have. The problem is that these problems most companies have could be solved in a simpler way than Kubernetes, because most companies have the exact same problems.


> Kubernetes solves a problem that most of the companies don't have

Actually most medium to large companies do have this problem.

There are often a lot of different languages, libraries, versions, deployment methods etc. And the appeal of Docker was that you can treat them all as block boxes. And the appeal of Kubernetes is that you have this rich support infrastructure to run them all hands-off at scale.

It definitely solves a problem. Just not particularly well.


In my experience most companies lack common conventions and automations.

Kubernetes "done right" is almost a part of your application. It becomes this "machine" that you throw stuff into and good stuff happens.

You'll need a team to integrate it into the pieces you require (auth, secrets, loadbalancers, permissions/app identities, monitoring and logging) but many places lack bits and pieces, and in my opinion k8s gives you a fast track to create a uniform application delivery platform.

What I don't like is that it kind of is the opposite of "the unix philosophy" and in that regard I prefer the hashicorp stack.


Those are your startups and web/app tier shops. Yes, they suck at sysadmin routinely and they need to be bottle fed a solution that fits the scatter/gather shape of their business. They don't want OPs discipline. They want a programmable solution that performs systems magic with a single toolset to learn.


No no, these are your enterprises I’m talking about mainly.

Places entrenched in manual processes for release and change management.

With true service delivery in a CI/CD fashion (including infrastructure, as it should be codified) many of these manual processes becomes obsolete.

Don’t get me wrong, the processes still exist, they are just sped up by a magnitude and automated.


Who said anything about manual processes? That's not what the modern SA does...it's mostly designing repeatable processes, creating recipes and integration in my experience.

The real problem with the K8s and devops world is no understanding of why there is no magic pill in 'codifying' a bad system.


I did. As I see it all to often mainly at the larger places.

Modern SA is about knowing that your job to help bring business value. Most of the time this is down to automation.


Did you consider that when there is a manual process it is in place to bring combined attention to what surely is (by 2019) a critical section: that needs consensus not provided by some monitoring hook. Sure, everything is about automation and it has been since 1999 in my experience.


Of course. I’ve been employed and consulted enterprise IT&Dev for almost 20 years.

The amount of people heating office spaces at your non-tech large enterprise is astounding in my opinion.

I enjoy discussing the reasons for this, but it’s a lenghty one!

In super-short: lack of competency, meaning IT support and tools are not used even remotely optimal. This lack of competency, which starts at the top, results in laughable lead times for the simplest of tasks and processes. This in turn has resulted in mass outsourcing and off-shoring of a bunch of tasks (processes) that really should have been automated years ago.

Usually the incentive to improve this is 0 with these ”service providers” and things detoriate even further.

The sad state of affairs is that many believe this is the way ”IT” works — slow and error prone.

Awesome example: one place, one of the 500s, built an on-prem ”cloud” within a business unit. Over 2000 physical xen hosts. I wanted them to apply a patch. They refused. The last patch had taken 6 months to roll out. The process was: ssh to server, scp patch, run sudo install patch. The entire operation was bought by a renowned ”service provider”. Ouch.

I could talk about this for weeks! Of course there are those that manage an awesome shop, but my experience is that this is usually isolated teams that are somewhat shielded from the crazyness of big money politics.


Agreed with your experiences. Identified my niche a long time ago in HPC and scientific development enterprise and core internet services (DNS, IP routing) + security (even though snake oil is big in sec now). The type of incompetence you describe doesn't flourish in these domains.


Also, David Greaber has some points with his ”Bullshit Jobs”.

I personally believe a society built on and around services in the end need either massive amounts of bullshit jobs or basic income of some sort.


>> I don't understand why the hype around it is so big

Probably because it started at Google, if it was created by IBM then we'd only hear about it on TV ads.


Don't agree. My current client is rather small, but I just had a meeting which would have been concluded with "we'll have to set up 15 vm's before you can start developing and have 5 alignment meetings before they're correctly set up", versus "I just created a namespace for you guys to do whatever you want in, and mailed you the link to the docs for the CI/CD and deploy guidelines".

They run a self-hosted OpenShift cluster, which is managed internally by a team of 4. Not only makes this situation it a lot easier to spin up new environments, it also forces devs to include the ops team from the start for stuff they don't know, and corrections can be made early on.


Because you can run your apps using simple YAML files with monitoring, logging, rolling updates, load balancing, failover, persistence built in?


I'd be interested in a related "microservices failure stories". Must be a big overlap with this.


I have two. One was caused by data inconsistency between services and regions. One is more hypothetical: the microservices had gotten to the point that no one knew how to start the system if all services are down, and it's possible that services have circular dependencies to the point that it would be incredibly hard to do a cold start.


I've actually seen your hypothetical in action, but the bug was even more subtle. Assume service A, B and C. A and C both need information from each other which is usually cached. Normally, you'd deploy one service at a time so the call chain would go A -> B -> C -> A or A -> C then A -> B -> C but in this particular instance, A and C's caches were cold, causing an explosion of service calls that took both services down.


> A and C both need information from each other

Sounds like a monolith pulled apart :-)


You handle the hypothetical by validating a new environment can be built and bootstrapped. Doing this on a regular basis, either by tearing down and rebuilding dev, or in a separate environment just for this purpose, is not a "nice to have". This same problem exists with monoliths with complicated dependencies, nothing new here.

Just like backups, if infrastructure as code isn't tested, it's worthless.


I think that if there is a genuine circular dependency then the services won't start ever. But I think it is possible to introduce services that assume other services are up and have an apparent dependency circularity. The trick is to have all your services resilient to it's start requirements not being met - basically the service has to back off and wait if information it needs isn't yet in the environment... and then ask again - so that when other services are up everything syncs and comes up.


That's fine, until the environment doesn't come up because a hundred services all queried the source of truth simultaneously and caused load that prevented any service from getting its config


Which is easy to deal with by introducing a random factor into the backoff.


> One is more hypothetical: the microservices had gotten to the point that no one knew how to start the system if all services are down,

this is a good one and have been thinking about this myself. Even with smallish projects that might have 10s of container based applications / services. In my case I end up with what is essentially a 3 tier architecture with each tier being a group of containers/machines with their own rules for startup/shutdown.


I’d think the best way to handle a cold start is to have each microservice fail fast and be wrapped in a supervisor. They will converge to uptime.

Unless you have a truly circular dependency, at which point they probably should be collapsed into a single service.


Microservices failure stories? “All of them. The End”


I'm not convinced all microservices ventures are failures. Having worked in the space a bit as a founder/CTO of a vendor in that space I've just seen many examples of misguided attempts due to fashion / CV-driven development / hype driven development. The pattern is valid, just not for everyone at every time.


ThoughtWorks even has a term for it: "Microservice envy" https://www.thoughtworks.com/radar/techniques/microservice-e...


The good part of "microservices" is "services". The bad part is "micro".


When the definition turns to some kind of dogma, then it will fail.


I'm consulting on a micro services back end right now with mostly prior experience with monoliths. What is the selling point that drives companies down this direction? It's insane, and my client keeps trying to hire new developers and bring on more consultants to build this thing, but the amount of knowledge required is more than any one person can handle. I have similar issues with their choice of db (nosql) and its inflexibility.


The biggest reason for choosing microservices _should_ be scaling development teams: microservices allow multiple teams to work on different code bases, without stepping on each other's toes.

What actually happens, tough, is that (uninformed) people choose it because they think it brings them scalability (wrong), it's more cloud compatible (wrong) or the worst offender, it's more modern.


How does a properly designed microservice architecture not achieve greater scalability over a monolith? Of course you can scale up a monolith but with microservices you can independently scale selected services thus increasing your benefit cost ratio.


There's a great deal of overhead when communicating between services that isn't typically present in a more tightly-integrated monolith. Networks are very, very slow relative to RAM, setting aside the other costs involved in serializing/deserializing every communication.

And "properly designed" is very hard to achieve.

It's often much more productive to start with a monolith and experience the pain of extracting a piece that needs horizontal scaling, than to start by horizontally scaling everything.

Microservices can be a good architecture, but it's far from a no-brainer, even for high scalability. Servers are incredibly fast these days and can typically scale very well vertically.


I personally feel like microservices don't necessarily assist in allowing multiple disconnected teams to work on different codebases. In my mind the reason is similar to why it's so difficult to reuse code - requirements are slightly different between two things, so you either design a service to handle both, turning it into mini rather than micro, or you have two services - in which case you're just adding more plumbing to what could have just been a separate monolith.

The "services" part makes a lot of sense in certain situations, and I feel like one of the best application architectures is what I call disconnected monoliths. Centralise and standardise core concerns - like authentication, external communications. Build monoliths for everything else.


If you achieve this scale of teams suggested, it means you have a lot of implicit gains here as well, right?

Such as higher quality, higher "velocity", separation of concern and hopefully a clear sense of ownership.

These are some of the things in my opinion that allows you to scale.


> If you achieve this scale of teams suggested, it means you have a lot of implicit gains here as well, right?

> Such as higher quality, higher "velocity", separation of concern and hopefully a clear sense of ownership.

No, it means you pay a huge overhead. Quality and velocity both drop as your day-to-day development requires a lot more setup and faff to do anything, and counterintuitively so does separation of concerns as your interfaces become more rigid. Small organisations should do things that don't scale, turn their size into an advantage.

If you think of your overhead as ax + bx^2 where x is the number of developers, microservices are a way to reduce b, but at the cost of a big increase to a. It makes sense when x is huge but not before. My litmus test would be: do you need to do multiple (unrelated) deployments of different services at once? If your organisation is small enough that you can get away with only deploying one thing at a time, you'll probably have less overhead if you work without microservices.


Which is why you’ll need the input and help from experienced sysadmins that like to do development, which actually is ”automation”.

This done right reduces overhead and gives you an edge in repeatability and quality. Both are required to scale.


Automated deployment and the like is worthwhile whether your system is microservice or monolith. But no amount of automation can eliminate the overhead a network boundary brings to local development.


We view this a bit different, and that’s fine (not the automation part, here we agree).

Automation in this context is for me more than just the deploy bit, it’s also about testing and service management which includes for example service relationships and discovery.

If you could do local dev on that app as a monolith, it can most likely be done broken up in smaller services as well.

There are no silver bullets to be had anywhere, right? Just use whatever processes and tools that work, until they don’t I guess.


> If you could do local dev on that app as a monolith, it can most likely be done broken up in smaller services as well.

It's possible, but the overhead is a lot higher, and that weighs down everything you do. Your edit-test cycle gets longer, development gets slower.

> There are no silver bullets to be had anywhere, right? Just use whatever processes and tools that work, until they don’t I guess.

Nothing is perfect but often one choice is better than another. I've seen microservices go badly much more often than monoliths, and most successful microservice systems were built as a monolith first with services separated out only when it became necessary.


Yes, there is bound to be overhead as you describe. For some it might be worth the effort, but no doubt effort is involved.

Regarding monolith -> ms — I’m starting to think this is the way to do it — once you have the flow of data already estblished and the (working) system(s) somewhat defined it becomes easier to decouple bits and pieces.


In think your interlocutor's point is that type checking by a compiler is a lot simpler, more reliable and more performant than any networked service discovery scheme so far conceived.

I think microservice architectures can genuinely decouple teams to iterate faster and consolidate efforts. But it's not a free lunch.


I totally agree.

What I was trying to say is that _if_ you go down the many services route, a lot of automation and integrations will be needed, and perhaps in completely new places.

No free lunches ever... just a lot of hard work. :)


Microservices requires proper, structured systems management. Most seem to think "devops" and microservices are ways to ignore this, when in fact the complete opposite applies.

Make sure you have a true service delivery and service management pipeline in place where it is real easy for dev and ops to deploy and decommission services.

Service metadata is key in my experience. Not too much though, just enough (such as service owner, deployment scopes, a release database, etc...). Nothing gets deployed without this metadata present, and make sure to automate the creation and keeping of these records.

Make sure convention before configuration applies, in most cases this is doable.

This stuff, in my experience, takes somewhere 6-12 months to put into place, including the actual automations/orchestration (be it the hashicorp stack, kubernetes, dc/os, triton, whatever). Do not even think about starting to move to prod before everyone have agreed upon the conventions and operational situation surrounding the stack and services.

Other than this, it's just code and integrations. Business logic. =)

In my last project we moved from 0 to 200 fully managed microservices using the hashicorp stack (plus a bunch of other stuff and homegrown things as well such as service metadata and release database/apis) and the biggest challenge was having everyone agree on the conventions.

Convincing the developers that this type of house-keeping will be necessary, perhaps not today, or next month, but down the line took a few months. In the end it was a massive success that really accelerated the way we could deliver value to the business.

When it comes to the data and ETL part you get a lot for free if the above is done decently. From my perspective the data scientists and ML guys need this stuff as well to be able to deploy and modify ETL flows and data pipelines at will. They'll most likely want to deploy some python, R and/or shiny apps as well! =)

They (can) benefit greatly from getting integrated in the same automations

Selling points achieved in above mentioned "transformation"/project:

- 14.000 production deployments with 45 devs and 3 "ops" guys (dev, ops, devops?)" in about a year. ("But why!?" someone will ask! And I'll be happy to respond.)

- Things started to happen, such as "could we just not provide this [insert awesome thing] to end users?" And what would have taken 6 months before could be pushed to canary days or even hours later.

- One specific feature I can think about increased revenue with more than a million dollars per month, and it took one of the teams exactly two days to build and release.

Granted, this was within a multibillion turnaround enterprise.

Most businesses would benefit from this transformation, but it requires a blend of people and technology that probably is difficult to attain outside the proper "tech" industry.

And my experience from within "small" tech is that focus sometimes is, understandably so, not situated around what I call systems management and service delivery.

Phew, that went on... sorry! =)


> Granted, this was within a multibillion turnaround enterprise.

> Most businesses would benefit from this transformation...

That is a great story from the front lines. Thank you for sharing.

My worry is when I see small companies or startups using microservices. That seems nutty to me


The money bit was more about rolling a change that means instant revenue increase in the millions, rather than people or product scale, but I agree that you need to put some serious effort into service management that might not make sense for a really small team.

Not sure where there’s a natural threshold other than when you realize you can’t deliver on business needs and requirements rapidly enough.

This is probably not in the beginning with a small team.


But why!?


Good question! :)

My thoughts and experiences:

Because once you are up to speed the business and devs work more in concert, which lets things flow fast, especially if changes are small and many, rather than fewer and larger.

As devs grow secure in the infrastructure (”it kinda just works” from their end) as well as the ability to roll back within seconds, deploying things to production becomes no big deal.

The ”separation of concerns” and containerization enable some of this, but only part is tech — I find a great deal is rooted in the culture and people of the same mindset working together.


Christian already followed the example and created a similar list for Serverless: https://github.com/cristim/serverless-failure-stories


Is there also a list for Docker failure stories?


IMHO this would be less interesting, some people already run other container runtimes such as containerd with Kubernetes (e.g. Datadog: https://www.youtube.com/watch?v=2dsCwp_j0yQ) --- so Docker might stay as some user interface for local development, but I would not know what "Docker failures" would be in the future.


Docker is using containerd under the hood as its container runtime component.


The third example on that list is a nice little short story. Simple mistake, gets right to the point.

I’m setting up a lambda test right now so I find it perfectly timed!


I run a single node cluster at home. In order to handle updates. I just wipe the cluster with kubeadm reset. Then kubeadm init; followed by running a simple bash script. which loops of files in nested subdirectories applying yaml configs. Only have to make sure I only ever edit the yaml files and not mess with kubectl edit etc.

for f in /.yaml ...

with a directory structure of:

  drwxrwsrwx+ 1 root 1002 176 Jan 20 21:15 .
  drwxrwsrwx+ 1 root 1002 194 Nov 17 20:06 ..
  drwxrwsrwx+ 1 root 1002  68 Jan 20 20:50 0-pod-network
  drwxrwsrwx+ 1 root 1002 104 Nov  1 11:18 1-cert-manager
  drwxrwsrwx+ 1 root 1002  34 Jul 11  2018 2-ingress
  -rwxrwxrwx+ 1 root 1002  93 Jan 20 21:15 apply-config.sh
  drwxrwsrwx+ 1 root 1002  22 Jul 14  2018 cockpit
  drwxrwsrwx+ 1 root 1002  36 Jul  3  2018 samba
  drwxrwsrwx+ 1 root 1002  76 Jul  6  2018 staticfiles


I just went through all of the post-mortems for my own company's purposes of evaluating Kubernetes. I've been running Kubernetes clusters for about a year and a half and have run into a few of these, but here's what I found striking:

* About half of the post-mortems involve issues with AWS load balancers (mostly ELB, one with ALB) * Two of the post-mortems involve running control plane components dependent on consensus on Amazon's `t2` series nodes

This was pretty surprising to me because I've never run Kubernetes on AWS. I've run it on Azure using acs-engine and more recently AKS since its release, and on Google Cloud Platform using GKE; and it's a good reminder not to to run critical code on T series instances because AWS can and will throttle or pause these instances.


Nice observation, I haven't done statistics on the linked postportems myself yet. Please note that your observation might also be due to the fact that AWS has a far larger market share and did not provide managed Kubernetes until recently (so people roll their own). We can therefore assume that any random sample of Kubernetes postmortems would be biased towards seeing more incidents with Kubernetes on AWS (compared to other cloud providers).


That's a good point. In 2017 there weren't widely available managed Kubernetes deployments, and now each platform has their own and much more reliable integrations.


There is now a Kubernetes podcast episode with me about the topic: https://kubernetespodcast.com/episode/038-kubernetes-failure...


Dang. I wish I had my SRE Wiki up and running already, or I'd add a "public postmortems" section.


Looking forward to your public postmortems.. (either yours or whatever you find in the wild)


Just put it on Github like this and the Serverless one I also created after I saw this.


Just saw it already exists: https://github.com/danluu/post-mortems




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: