Hacker News new | past | comments | ask | show | jobs | submit login
Maybe You Don't Need Kubernetes (matthias-endler.de)
500 points by ra7 on March 22, 2019 | hide | past | favorite | 315 comments

The argument that Kubernetes adds complexity is, in my opinion, bogus. Kubernetes is a "define once, forget about it" type of infrastructure. You define the state you want your infrastructure to and Kubernetes takes care of maintaining that state. Tools like Ansible and Puppet, as great as they are, do not guarantee your infrastructure will end up in the state you defined and you easily end up with broken services. The only complexity in kubernetes is the fact that it forces you to think and carefully design your infra in a way people aren't used to, yet. More upfront, careful thinking isn't complexity. It can only benefit you in the long run.

There is, however, a learning curve to Kubernetes, but it isn't this sharp. It does require you to sit down and read the doc for 8 hours, but that a small price to pay.

A few month back I wrote a blog post[1] that, through walking through the few different infrastructures my company experimented with over the years, surfaces many reasons one would want to use [a managed] Kubernetes. (For a shorter read, you can probably start at reading at [2])

[1]: https://boxunix.com/post/bare_metal_to_kube

[2]: https://boxunix.com/post/bare_metal_to_kube/#_hardware_infra...

It's pretty common for new technologies to advertise themselves as "adopt, and forget about it", but in my experience it's unheard of that any actually deliver on this promise.

Any technology you adopt today is a technology you're going to have to troubleshoot tomorrow. (I don't think the 15,000 Kubernetes questions on StackOverflow are all from initial setup.) I can't remember the last [application / service / file format / website / language / anything related to computer software] that was so simple and reliable that I wasn't searching the internet for answers (and banging my head against the wall because of) the very next month. It was probably something on my C=64.

As Kernighan said back in the 1970's, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" I've never used Kubernetes, but I've read some articles about it and watched some videos, and despite the nonstop bragging about its simplicity (red flag #1), I'm not sure I can figure out how to deploy with it. I'm fairly certain I wouldn't have any hope of fixing it when it breaks next month.

Hearing testimonials only from people who say "it doesn't break!" is red flag #2. No technology works perfectly for everyone, so I want to hear from the people who had to troubleshoot it, not the people who think it's all sunshine and rainbows. And those people are not kind, and make it sound like the cost is way more than just "8 hours reading the docs" -- in fact, the docs are often called out as part of the problem.

Disclaimer: I work for Red Hat as an OpenShift Consulting Architect.

If you want a gentle, free introduction to OpenShift (our Kubernetes distribution), I recommend trying out Katacoda portal, Learn OpenShift [0]. Katacoda [1] also has vanilla Kubernetes lessons as well.

[0] https://learn.openshift.com/

[1] https://www.katacoda.com/

> As Kernighan said back in the 1970's, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

What a great quote. Thanks

Some more quotes taken from http://typicalprogrammer.com/what-does-code-readability-mean that you might find interesting.

Any fool can write code that a computer can understand. Good programmers write code that humans can understand. – Martin Fowler

Just because people tell you it can’t be done, that doesn’t necessarily mean that it can’t be done. It just means that they can’t do it. – Anders Hejlsberg

The true test of intelligence is not how much we know how to do, but how to behave when we don’t know what to do. – John Holt

Controlling complexity is the essence of computer programming. – Brian W. Kernighan

The most important property of a program is whether it accomplishes the intention of its user. – C.A.R. Hoare

No one in the brief history of computing has ever written a piece of perfect software. It’s unlikely that you’ll be the first. – Andy Hunt

I have yet to see a K instance that doesn't somehow manage to break.

In my experience, the clusters that break are clusters that are architected poorly and/or go unmaintained. There's also some of us that believe clusters need to be treated more like cattle rather than pets.

Yeah so you need only to be super genious to design the system properly and then it won't break.

Somehow I like being stupid and use technologies that allow me being stupid and still work good enough.

Disclaimer: I work as a Red Hat Consulting Architect focused on OpenShift (our Kubernetes distro).

I didn’t say anything about being a genius or being stupid. But it does help to have some something to reference [0][1].

In short, there’s several things that help.

   1) Start with 3 Master nodes that sit behind an Enterprise Load Balancer (or even HAProxy) and a VIP (console.example.com)
   2) We choose to have 3 Infrastructure nodes that perform Container load balancing (app routers), log aggregation, metrics collection, and registry hosting
   3) We then put another VIP and Load Balancer in front of Application Subdomain (*.apps.example.com) so that apps can be exposed outside the cluster (myjavaapp.apps.example.com)
   4) Stand up 3 or more worker nodes

Yes. This is more complex than plain old Containers. Yes. This is harder than putting an Apache server with RoR and MariaDB on the same Linux VM. But there is lots of value there if you put in the effort. Lots of consultants run “all-in-one” VMs or minishift/minikube On their laptops or homelabs.

[0] http://uncontained.io/articles/openshift-ha-installation/

[1] https://docs.openshift.com/container-platform/3.11/install/i...

This makes sense to me. I'm going to give it a whirl.

This is not my experience with k8s at all. Sure, everything will work fine when it works (although I don't think it's as easy as you make it out to be), but when something unexpected happens it's a massive pain to debug.

"8 hours" initial investment is already large for a small/simple scenario; and it's not the full costs either, because fixing problems will be much harder/time-consuming.

I wrote a post about this just the other day[1]:

> What does it mean for a framework, library, or tool to be “easy”? There are many possible definitions one could use, but my definition is usually that it’s easy to debug. I often see people advertise a particular program, framework, library, file format, or something else as easy because “look with how little effort I can do task X, this is so easy!” That’s great, but an incomplete picture.


> Abstractions which make something easier to write often come at the cost of make things harder to understand. Sometimes this is a good trade-off, but often it’s not. In general I will happily spend a little but more effort writing something now if that makes things easier to understand and debug later on, as it’s often a net time-saver.

[1]: https://arp242.net/weblog/easy.html

I strongly agree with "something unexpected happens it's a massive pain to debug"; though it's somewhat under-appreciated how often this applies to open source SW in general.

Having to debug closed-source software is not fun either. Plus in many cases you're left with crappy documentation and no access to the source code.

We spent weeks and weeks at work working through k8 performance issues, we got nowhere and fell back to a crappy work around.

Sounds like it wasn't K8S performance issues then, but performance issues with your environment.

I don't think it matters if it was a k8s issue or not, what matters is that k8s made it much harder to get to the bottom of it.

We actually had similar issues when we deployed k8s. In the end it turned out to be a misconfiguration, but took weeks to figure out, and only because our entire dev team looked at it (and not the k8s guru who implemented it all).

The problem is that a k8s cluster requires hundreds/thousands of lines of yaml configuration just for the core components, including a choice of network overlay. Everybody copies the same default configs hoping they will work out of the box, without understanding how each option actually affects the cluster. Add to this the container and yaml upgrades that should be applied per component every month or so, and it's nigh impossible for most companies to handle.

Kubernetes is far too complex to set up from scratch. The only way to reduce the complexity – or rather, to offload it – is to use managed Kubernetes via AWS or GCE. The fact that using AWS or GCE is effectively the only viable method for running a production Kubernetes cluster speaks volumes to how non-simple the stack truly is.

New kubeadm makes is easier to set up a cluster from scratch. But the complexity truly is there and it's difficult to understand all parts of it. But - clusters are complicated. Running hundreds of workload jobs in a manageable way shouldn't be easy, right? Maybe it is not a problem of k8s but the fact is we now require complicated stack with many moving parts and it is not easy to manage it reliably.

Kubernetes used to be hard to set up from scratch. Recently we had to set up a new cluster and it just worked with very little faff.

I can recommend Kubernetes the right way [0] as a way to get started with your own k8s setup from scratch in a controlled manner.

0: https://github.com/amimof/kubernetes-the-right-way/blob/mast...

You need to be able to debug and separate out problems no matter your infrastructure.

You'll get lots of SMEs that can use a product but can't trace or profile, and it isn't the underlying products fault beyond hype driven development.

If you want to spin up ephemeral environments to test your code, kube is one of the easier ways to achieve this. Similarly, it helps you solve problems such as app health, dying nodes, etc.

It's a great product, especially when you've scaled past a single app and a single team, but that's largely because it's a framework, similarly to how rails is, and if you do something serious with it, you need to know how it works.

The alternative is you build your own framework, but your new hires will prefer k8s.

Realistically though, use GKE and forget about it until you grow. Otherwise you're using something closed or something bespoke, the latter being fine for a 1-2 man team, but kinda pointless when you can have a managed service.

> The alternative is you build your own framework

These are far from the only two options available.

Reminds me of:

"Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

-- https://en.m.wikipedia.org/wiki/Greenspun%27s_tenth_rule

I agree, but it's fun and sometimes enlightening to let the debate be framed this way


Chapter 1:

> “Nobody has ever built cloud-native apps without a platform. They either build one themselves or they use Cloud Foundry.”

[1]: https://content.pivotal.io/ebooks/cloud-foundry-the-cloud-na...

You do realize you are quoting advertising material here? "Use our product or reinvent it yourself" :-/

> it's fun and sometimes enlightening to let the debate be framed this way

Not really. It's fairly close-minded.

If you read the book you would not call it advertising material. It's very open-minded and it's honestly not a book that is designed to sell you on Pivotal PAS in any direct way.

The first line in the first chapter is more like an "ok, full disclosure, this is advertising material" but if you read the full text, the Pivotal suite of tools is hardly mentioned. It merely paints a picture you can understand to demonstrate that some kind of platform is definitely needed for basically any enterprise of nontrivial size. (Now if you want to read the real promotional material, get yourself a copy of "Cloud Foundry the Definitive Guide")

This book is only advertising in the sense that if its message is delivered successfully, you will concede that you should use a platform, and we will both agree on the meaning of the word platform. That's it. It's only a bit of tongue-in-cheek first since Pivotal actually makes such a platform, that it says on Page 1, "our platform is the best and only platform" – the rest of the book isn't at all like that.

But if you take it in context for the date of publication, it might make more sense that it was framed this way. Obviously it does not pre-date Heroku. It is a different platform than Heroku. But it is a platform, and a cloud-native one. It's an example of "why you might not need Kubernetes." Also might help to note that this book was published in 2016.

It might be (read: definitely) harder to assert in 2019 that there really isn't any other platform worth considering (than CloudFoundry in 2016), but for a medium-large enterprise in 2016 I'm honestly not so sure. Kubernetes was still new on the scene. People in 2016 in large part still needed to be convinced that such a platform was needed, or even possible, since few existed in the wild/open source commons. (Name your favorite platform that is older than 2016, if you still are in disagreement with the thesis. There are sure several right answers, and I seriously don't doubt this at all.)

To take the contrary position to the extreme a bit, present company including the original poster I think nobody serious (I hope) is suggesting "you may not need Kubernetes, and there is also no serious contender which you should consider in the race for a thing like K8S that you may need for your enterprise, either." Even the original post recommends something instead (Nomad). There is a decent chance you are already sold on this idea, if you're reading this article.

The central argument isn't that you should definitely pick Nomad, it's that you really do need a platform that works for you, even if it only gets you 80% of the way there. The author of "Maybe You Don't Need Kubernetes" started out explaining why Kubernetes just wouldn't work for them. The book, just like this article, comes off a bit like "hey, why not this product instead of whatever it is you're doing there."

But just a little. I strongly agree myself – "In 2019, pick something, anything. (Even Nomad) Just don't do nothing."

I agree to some extent. One of the key value propositions of Kubernetes is indeed in its abstractions, which are mostly quite good. And I really like its declarative approach.

However, I've been dealing with its ins and outs for a while now, partly because I'm developing tooling around it, and I've found there are already a lot of idiosyncrasies in there. Some APIs have very frustrating flaws, and most annoying of all is how tightly some core functionality is coupled to kubectl. A remarkable amount of logic is embedded in the CLI, as opposed to the API layer, and if you really get in the weeds you're likely to start pulling your hair.

Which is to say, even once you get through the learning curve, you may still find yourself struggling with it as you use it. Sometimes when you declare your intent, you find that it just doesn't work, and it can take a while to figure out why. Before you know it, you're using various operators that are meant to mask some of the deficiencies in the core orchestrator, and by extension you're outside of the core abstractions. And don't get me started on Istio... (but that's a tangent).

Anyhow. If you're not trying to do anything unusual, it's a good orchestrator. Not perfect, and for sure heavy, but a good choice for a lot of use cases.

Pretty much my experience as well.

I’ve actually started to wrap calls to kubectl to not have to replicate code and logic taking place in the CLI.

And istio... I can’t even... envoy is awesome though!

The internet is both full of people complaining that Kubernetes is too complicated and full of people arguing that it makes things easier.

The people saying it's simple are looking up from the world of IaaS and non-declarative orchestration software (Ansible, Puppet etc) and saying "hey this is much easier".

The people saying it's complicated are looking down from the world of PaaS and FaaS (Heroku, AWS Lambda, Google App Engine) and saying "why they hell would I want to manage this?"

I agree with this: after you know k8s, it is pretty darn great. Two more points:

1. K8s of 2019 is significantly better than K8s of 2017. With stable Deployments, StatefulSets and CronJobs and binaryData in ConfigMaps etc., I haven't missed a single feature in my latest setup: I just typed away helm'd manifests and things just worked. This is in stark contrast to the dance around missing features with initContainer and crond container kludges you had to do in 2017, when I built my first kubernetes setup.

2. People tend to conflate maintaining a k8s cluster to using it. Setting up your own k8s cluster with HA masters, is likely still a royal pain (disclaimer: my attempt at that are from 2017 so could be wrong, see point 1). But for a small company, whipping up a cluster from GKE is a breeze, and for big corps, company-wide cluster(s) setup by an ops team (OpenShift has been ok) is the way to go. The end result is that as a developer you just push containers and apply manifests.

I am scared of Helm. I would expect most people install helm chart like a package in Ubuntu, without knowing what it is going to do, what images are going to run and why. What if something malfunctions? What if helm upgrade won't success? Are you going to call a Helm support line?

I was referring to using helm to configure my own charts to work in different test environments. Wrt public helm charts, you want to understand the chart install. Most of them are really dead simple if you know k8s.

> Tools like Ansible and Puppet, as great as they are, do not guarantee your infrastructure will end up in the state you defined and you easily end up with broken services.

False dilemma. Ansible and Puppet are great tools for configuring kubernetes, kubernetes worker nodes, and building container images.

Kubernetes does not solve for host OS maintenance; though there are a number of host OS projects which remove most of what they consider to be unnecessary services, there's still need to upgrade kubernetes nodes and move pods out of the way first (which can be done with e.g. Puppet or Ansible).

As well, it may not be appropriate for monitoring to depend upon kubernetes; there again you have nodes to manage with an SCM tool.

> There is, however, a learning curve to Kubernetes, but it isn't this sharp. It does require you to sit down and read the doc for 8 hours, but that a small price to pay.

Last time I sat down with k8s docs it was a broken mess.

It was impossible for me to follow any of the examples I tried through.

And then you start digging through GH issues... that is a proper rabbit hole.

I would not recommend most people to run their own k8s cluster. Even Amazon's EKS is enough of a pain in the ass I don't think it's close to a "define once, forget about it" type of experience.

However, this is pretty much my experience using GKE on Google Cloud. You can get a cluster up and running and start deploying stuff to it in a matter of minutes.

I have never heard any infrastructure explained as “define once, forget about it.” Ever. Everything requires maintenance, monitoring, upgrades, security, etc. And things just randomly break.

I like Kubernetes but it is not without problems. And when your entire business is now leaning on Kubernetes, it can really hurt when it has problems and you can’t figure it out quickly.

And if your expertise is not running infrastructure, a lot of people, as this post explains, are better off on AWS or Google Cloud using managed services.

Or you might have a problem with the underlying storage platform that Kubernetes uses, so you now have to be a storage expert.

I think it’s telling how many job offers I receive from companies already running Kubernetes and need help with it.

This has been my experience. It isn't that complex, just a steep learning curve.

If complexity isn't the reason for the steep learning curve what is?

My personal take is that it's because Kubernetes has a lot of compositional features and abstractions, rather than a bunch of one off ones that don't play well with other features.

A good example is label selectors with Services, Pods, and Deployments.

You create a deployment, which is basically a scaling group for pods, and pods are the unit of deployment running containers.

A service exposes your pods ports to other components through a cluster local virtual IP, or a load balancer.

You cannot simply say "this service is for this app". You must instead say, this service will map it's service port, to the following pod port, for all pods that match the following label selector.

There are 5 different concepts (pods, deployments, services, labels and selectors) you need to learn before you can actually make your app accessible. But what's nice is that these concepts are used elsewhere in Kubernetes.

Labels and selectors are used to query over any group of objects, and allow grouping different things in arbitrary ways. Pods are concepts that get reused by anything that deploys a container. There's Deployments, Statefulsets, Daemonsets, Jobs, CronJobs, these all create pods. So it's nice that pods are their own decoupled concept, but now you gotta understand how they're used, and relate to the other Resource types.

Part of what's great about this all is also that now that people are building custom operators/controllers, you can understand how to use them fairly easily because they build on all these existing concepts.

I find that in Kubernetes I do a great deal of tedious boilerplate work. Sometimes I just want to run some damn software and let the platform do the boring stuff.

Disclosure: I work for Pivotal, we have a hand in a few such things.

I've tried that, but then been burned by PaaS switching things up in unexpected ways. Best to make the investment and just own it. I absolutely share the sentiment, though.

Good news is, it only took me a week to really pick up k8s.

Kubernetes switches things up a bit too.

But before it existed I could type `cf push` and get a running app with automatic routing, logging and service injection.

Whereas nowadays I have to type `cf push` and get a running app with automatic routing, logging and service injection.

The documentation around it isn't that great. Once you know the key actors -- which are a dozen or so key types of resources, and how they play together, you understand how to create the configuration for your environment.

The YAML files are quite intimidating at first, if you look at examples out there.

I'm probably not a great teacher, but I think I could boil it down to a slide deck of no more than 25, and an hour's worth of time, and clearly teach how to interface with Kubernetes for a basic web app w/ horizontal scaling, Let's Encrypt SSL certificates, etc.

I'd be interested

I currently use Ansible to provision and configure about 50 instances with various home grown services deployed on them, and I'm quite happy with that.

I don't use Docker, or Kubernetes. What am I missing here? This is an honest question.

I don't particularly see the point of docker in production. I guess it can help resolve clashes between dependencies, but I'm doing ok with Ansible.

One guy in another team just tells his application developers to throw everything into a docker container and then deploys whatever they give him. I guess that prevents dependency clashes, sure, but to me that seems like it's just inviting different problems though. (Containers with a lot of disorganized junk in them).

And we don't need to grow the cluster for now, so I don't see the point of k8. We already have log aggregation, and service restarting, service monitring/metrics.

To me, Kubernetes feels like a brand new alternative OS with crappy documentation.

On the other hand I've been managing UNIX machines for 20+ years, and that I do know how to fix (mostly, since they keep changing linux so much these days, it's almost as hard to keep up with as javascript).

If somebody could tell me what I'm missing by using this approach, I'd be grateful.

You aren't "missing" anything, you are just doing things differently. Containers are a way of storing code and configurations in a single deployable entity.

Let's say I want to change a services version from 1.2 to 1.3.

If I did this in Ansible, I might do it by commanding all of my services to pull the code of version 1.3 from a git repo and restarting to make the change.

If I did this with containers, I'd tell them to pull the image version 1.3 from docker registry and after it's pulled to shut down v.1.2 and start v.1.3.

Those things just have some smaller differences that are good in terms of A and worse in terms of B.

Ansible helps you by keeping it closer to the traditional way of just pushing code changes to a machine and that's that, and Docker opens up this world where the infrastructure can scale up or down (elasticity) quicker horizontally.

Kubernetes is just an extrapolation of that elasticity.

Ansible is "good enough" if you don't need elasticity or you have programmed another custom solution to handle that.

Docker/Kube and Ansible both have Git at their hearts, so if your app components have all they need as code, and you don't worry about elasticity, you won't gain a lot from wrapping it up in a Dockerfile (as opposed to just git cloning).

I use Ansible as well and it does not allow me to scale my app containers easily, especially because of dependencies. I think I either have to manage _everything_ by Ansible including firewalls, IRB/SVI on switches, Openstack security zones, users, roles etc. or I am asking for problems and they will definitely come. NB I do not like to manage my switches, routers and firewalls with Ansible.

But my infrastructure is fairly complicated, some bare metal services, many VMs, several docker hosts with dozens of various app containers, private Openstack and some public cloud services (VPNs), Ceph cluster, nagios, ELK etc.

Ansible does not watch and maintain the infrastructure and services state - it is a passive tool (I am not sure if Tower is different). You definitely could configure your monitoring to invoke Ansible script if your app server goes down and reschedule services that ran on that host to a different one and reconfigure other parts of infrastructure as needed (dependencies, VXLANs, security rules, service discovery db update, load balancing update etc.) But you would essentially copy the Kubernetes controller functionality.

Kubernetes can take care of the service discovery, load balancing, deployment, configuration and networking and other parts of your infrastructure and it does it pretty well in my experience. It maintains the declared state and reacts to its changes.

Somebody posted here that it takes just several hours to learn it. YMMV but deploying a resilient cluster in my environment took me much more time. And I have to agree that documentation is pretty weak. E.g. provisioning Cinder volumes and connecting them to the VMs running Kubernetes nodes was a real horror.

Btw Ansible generates many of my k8s YAMLs and deploys them.

> Ansible does not watch and maintain the infrastructure and services state

Yeah, but what I've done is that i've deployed a tiny go binary on each cloud instance that is run every few minutes. Instances take turns checking on each other using a round robin sort of approach (no complicated leader election algorithms etc...)

The script knows how to check the health of the other instances/services and to restart them or alarm if they get stuck.

For fifty hosts, it works fairly well.

I didn't say in the original post, but we are not a product group, so our stuff is 'semi production'. We don't have customers to worry about.

>Somebody posted here that it takes just several hours to learn it.

Sure I can probably learn it quickly, but what I don't want is to now have complicated and mysterious kubernetes problems to solve on a deadline.

I understand linux pretty well, been using it for 20+years, so I'm not intimidated by OS level troubleshooting. Sure without containers, you have to be more careful to keep your dependencies from overlapping, but it hasn't been a problem we couldn't handle till now.

I don't particularly want to trade problems I'm familiar with solving for a whole new set of unfamiliar problems unless there is a clear benefit.

you get the ability to hire people with a keyword ... "kubernetes SRE"

but i think thats pretty much it if you're a smallish team with an already well implemented IaC stack.

Though i'd definitely encourage anyone to try the GCP Kubernetes before trying to self host it...

The former gets you a taste for why its getting such good publicity. the later explains why its still controversial.

as someone in almost the same situation (about 50 instances), but chose to go with docker, I just want to say you made the right choice.

Of course a few years ago, being young and a one man ops team, I wanted to use the cool new thing and so did everyone else. I went with Ansible after being a Chef guy for a while due to ease of getting started. That was the easy part. Love Ansible, but it was slow to get new Docker functionality for a while, and not having the built in functionality of having Chef runs happening periodically without paying for Tower means I have to be more mindful and deliberate with keeping my infra in sync and up to date.

Enter Docker and months and months of headaches just to get something usable on the local workstation level. So much wasted time dealing with breaking changes and figuring out exactly which version to use and very rarely upgrading. Debugging and troubleshooting becomes an unintuitive nightmare. Even after all that, our site still ended up running really slow locally in Docker. This is because with apps (such as CMS) that handle many, many files, I/O slows wayyy down. Ended up discovering Dinghy (shout out to codekitchen) and got it to a decent state (but make sure developers don't accidentally install Docker for Mac as it is and always has been a CPU gobbling mess).

Then on top of that there is the container orchestration (consul-template), monitoring (Prometheus is cool, but takes a bit to understand what is needed to get the metrics you want), logging (fluentd is again, cool, but oh man is parsing logs a PITA to understand), debugging tools etc.

I can't even imagine what adding k8 on top of the many quirks of Docker would mean. I luckily took one look at needing a whole zookeeper cluster just to get started, and immediately gave up on that.

What do you use for log aggregation and monitoring/metrics? And how do you deploy new versions with zero downtime? And how do you manage kernel/OS upgrades?

> log aggregation

Graylog, with applications using various GELF libraries to send logs to it.

> metrics


> And how do you deploy new versions with zero downtime?

I don't deploy with zero downtime, but within say ~5min. This is acceptable to us. We use jenkins with a lot of tests to ensure components are in good shape, then we mostly manually deploy them, but with a script. Sometimes we deploy directly from jenkins.

> Kernel os upgrades

All our stuff is internally hosted, so we handle those infrequently. This stuff isn't exposed the the open internet, so we upgrade when we get around to it, or when we encounter a bug.

Thanks for following up! I'm not surprised because I noticed on HN and in other online discussions that almost all the people that advocate against Kubernetes and PaaS (Heroku, App Engine, Clever Cloud, Scalingo, etc.) are willing to tolerate downtime during deploys and during OS upgrades.

> are willing to tolerate downtime during deploys and during OS upgrades.

Yeah, that's not a big deal for us. We don't have customers, and this environment is for internal use at the company.

You could say it's a 'semi production' environment.

I like the "semi-production" concept :-)

Deploying with zero down time with Ansible is kinda the same as it's always been:

  - Deploy to one host  
  - Wait for health checks to pass  
  - Deploy to second host.
Load balancer takes care of the rest.

Because it's cool, new and good for your blog/CV. I don't see the need too, if stuff is in cloud.

Over the years, I've deployed applications using various combinations of custom RPMs, Chef, Heroku, Docker, ECS and Kubernetes.

If you can, you should probably deploy to Heroku. (Or a similar service.) It's far cheaper than spending time on devops. Just run "git push" and you're running.

When I've deployed on ECS (or other "simple" orchestrators), I've found that I ultimately wound up re-inventing lots of Kubernetes features.

So if you can't use Heroku, but you do know basic Docker, then it's worth considering somebody's fully-managed Kubernetes. Google's is nice. Amazon's is a considerably more work. I hear Microsoft's is still a bit sketchy. And I'd love to take a look at Digital Ocean's. But do not attempt to host your own Kubernetes if you can possibly avoid it.

If you do try Kubernetes, then read a book like Kubernetes: Up & Running first. Kubernetes is not self-explanatory, but it's pretty straightforward if you're willing to spend a few days reading.

Finally, don't overcomplicate it. Just use the basic stuff for as long as you can before trying to layer all sorts of other tools over it.

What pain is Heroku saving you to justify being 5x more expensive than Lightsail/Digital Ocean/Linode?

All websites I maintain/deploy are either built as a Docker image and published by CI on git check-in or deployed locally with a single rsync/supervisord bash script (or right-click Web Deploy for some older IIS/ASP.NET Apps). I probably have over 50 sites I'm currently hosting so using anything that much more expensive wont enter into consideration.

But I'm not seeing what could justify the extra cost? especially as the cost is reoccurring, if it's some kind of effortless/magical scalability I'd rather put that additional cost towards more hardware and buy more headroom.

Heroku turns a strategical liability ("I have only one employee who understands what rsync is; if he leaves I'm screwed") into a fiscal one. Any company I've ever known will always choose the latter, for anything outside of their core competency.

We use Heroku heavily, and we went from 2 full time devops engineers to 0. Everything is now buttons and sliders. There are no "security patches". The CEO could log in and scale if he needed to; it's a slider. We get monitoring for free (labour free, not cost, i.e. the "good" free). Memory usage, CPU usage, HTTP status codes, logging: it's all there. We spend no time thinking about rsync or devops or any of that: we just solve the problems we're good at.

Of course, everything is a matter of scale. Legend has it, Deliveroo UK only moved off Heroku after they grew so large, Heroku wasn't willing to offer them more dynos on their account. That sounds like a reasonable time to go in house. But any <100 people company.. why bother? focus on what you're good at, and let other people do devops.

This may just relate to the circles I move in, but I'd have a hard time finding a group of what I'd call skilled developers, where not a single one can manage rsync or find their way around a Linux server in general.

It's all well and good at the "we need Kubernetes" scale to say you need specialists, but a team that can't manage a VPS is strange to me.

Wait until your VPS is breached because your OS wasn't patched, or because the firewall wasn't correctly configured. Learning is good, however, at a certain point I think it's fair that you can't minimize DevOps expertise to just "rsync" a jar or a docker image.

Nothing about "just use a PAAS" changes this. Sacking ops and telling developers to they should "just use EKS" or whatever is precisely why we keep seeing open S3 buckets and ES servers.

IMO the fear is being overstated, everything's written down in either in single deploy script or configured CI, i.e. there's not going to be some loss of know how. It's the same as if the person in charge of Heroku leaves, someone else needs the login credentials and know how to setup Heroku as they would with any CI.

I don't know what Heroku is offering, but Lightsail and ECS instances also have metrics and pretty graphs (tho admittingly I rarely check them myself), maybe it will save me some ssh sessions to manually update security patches, I was recently able to upgrade my Lightsail instance to the latest Ubuntu 18.04 LTS with just:

  $ do-release-upgrade
> But any <100 people company.. why bother?

Because it's 5x more expensive.

> focus on what you're good at, and let other people do devops.

But it already takes hardly any time/effort to keep doing what I'm already doing.

I guess it's for different Companies who see the value-added benefits that justify the cost, but it's being propositioned here that everyone should be using Heroku first, just boggles my mind why most people would do that as the first option when it's so much more expensive. I already think the cloud is too expensive, so there's little chance I'm going to be paying a re-occurring premium for something that's not going to save me any time over what I'm already doing.

If it replaces one engineer, that's like $150K+/year (when including taxes and overhead, that is not at all a high estimate). So it depends on what the 'x' is in '5x more expensive'. And will probably be more reliable than what you'd get paying one (more) engineer to do it in-house too.

If you're hiring a "devops engineer" whose total responsibility is cloud touching, sure. You're right.

But where is that actually the case and your app can run comfortably on Heroku?

At multiple jobs I've been the only person who could credibly claim to understand the entire stack used at the company, from the web frontend to the OS the backend database runs on and the person to whom teams would come to validate their designs for scaling and reliability. I didn't write product code in those roles. But I multiplied the effectiveness of the people who did.

Heroku is a wonderful tool that doesn't get you the actually hard parts of the job req.

at 5x the cost of 10k/year in infrastructure spend, Heroku is significantly less than a dedicated DevOps team. however, If you're product is backups as a service and you'll need 100PB of storage, then Heroku is probably not the best option.

> at 5x the cost of 10k/year in infrastructure spend, Heroku is significantly less than a dedicated DevOps team.

How do these costs scale with 10x or 100x the traffic/load?

At 100x the traffic the business would be at 5 million dollars a year in spend, and would exceed the standard pricing model of Heroku. The business can then

1) Negotiate with Heroku for an enterprise contract 2) Consider migrating to a more cost effective platform 3) Dedicate time home-growing a solution.

Part of the reason Heroku charges so much is because their customers are typically small, but they'd likely rather find a price that keeps you on their platform vs. home-growing a solution.

Upgrading with do-release-update is simple if it works... but what happens when it breaks?

> Upgrading with do-release-update is simple if it works... but what happens when it breaks?

That is what snapshots are for.

So now the non-engineer needs to understand snapshots, ensuring snapshots don't break, ssh, roll back, etc. That's assuming the bad upgrade didn't leave any damage behind (like DB data, etc.). And assuming they didn't lose the post-it they wrote the CLI credentials on since you easily can't reset those unlike Heroku.

Snapshots are typically part of the provider's web management interface and are basically the first thing anybody who makes changes to anything should learn how to use.

Moreover, if you're having someone else manage your systems and they upgrade them, now what do you do when the new version causes problems?

I ran into an issue recently where newer systems default to a newer version of a protocol but the implementation of the newer protocol has a major bug the old one didn't. When that happens on your systems you roll back until you can solve the issue. When it happens on systems managed by someone else, better hope you can identify and solve the issue quickly because in the meantime your system is broken.

> Heroku turns a strategical liability ("I have only one employee who understands what rsync is; if he leaves I'm screwed") into a fiscal one

So, what's your plan in case Heroku shuts down, or gets bought and changed completely? Isn't that also a strategic liability, just a much larger and arguably less likely one?

Heroku got bought by Salesforce several years ago.

If you have 12-factor apps then you have a fighting chance of moving off it anyway.

Apart from Dokku, I'd say Cloud Foundry is the closest next environment that you can install and operate directly, though it's an 800-pound gorilla by design. But there are fully hosted services for it (eg. Pivotal Web Services, IBM BlueMix, SwissCom Application Cloud). There're also semi-hosted options (Rackspace Managed Cloud Foundry) and IaaS-provided installer kits for AWS, Azure and I think GCP as well. You can also buy commercial distributions from Pivotal, IBM, Atos, SUSE, IBM and SAP.

Disclosure: I work for Pivotal, we sell Cloud Foundry and Kubernetes distributions (PAS and PKS).

It's a negligible liability compared to the risk of the devops guy leaving the company within a year, without leaving any documentation or any clue how the app was deployed or run. The same thing will happen next year with the replacement guy, if there ever is a replacement.

Use Google App Engine, Clever Cloud or Scalingo.

Any project that still fits Heroku has much cheaper options available with all the same reliability features: Elastic Beanstalk, OpsWorks, ECS, etc (I’m naming AWS options because I’m not knowledgeable enough about other cloud providers to suggest anything else, but I know all the major players have something in this realm.)

It really doesn’t take a devops engineer to run these if your app still fits on Heroku. A little bit of overhead goes into learning the service, much like you’d learn any new API or programming library.

The AWS docs are significantly worse than Heroku's, and in general you'll be likely to get tripped up and waste time figuring something out.

So? Continue to wade through it and you’ll end up with a significantly lower IT bill at the end of the month. The extra work pays off quickly.

> What pain is Heroku saving you to justify being 5x more expensive than Lightsail/Digital Ocean/Linode?

The pain of setting it up, applying security patches, making sure you set it up securely to begin with. The pain of having a mental model more complex than "the server is what I git push to."

> I probably have over 50 sites I'm currently hosting so using anything that much more expensive wont enter into consideration.

No one would argue someone in your position should use heroku. It's for people who are willing to pay to avoid sysadmin work... which is a lot of developers.

How does Heroku and other managed services perform updates that might contain breaking changes? Or do they only perform minor updates or security updates with no breaking changes?

My biggest fear with managed hosting and managed databases is being given too short of a window before they update.


tl;dr: You get a few choices of Ubuntu LTS releases, which they maintain for a long time (currently they still support 14.04, now nearly 5 years old). Or you can push Docker images, at which point the underlying OS is squarely back in your court — technically they must be applying kernel patches, but Linus is fairly religious about not breaking userland.

I agree. You can set up a Heroku like setup pretty easily. Push to GitLab to trigger your pipeline that builds your container. Now your $5 vps with docker compose and watchtower is updated.

Sounds like you should start a Heroku competitor :)

Btw watchtower seems abandoned, anyone know is there a story to it?

With your stack, how do you deploy new versions with zero downtime? Do you have a load balancer and start the new version, switch new traffic to the new version, drain connections to the old version, then stop the old version?

And how do you update the OS and the kernel without downtime? Do you setup a new machine, deploy to it, switch the traffic to the new machine, and decommission the old one?

I'm asking because these are the kind of things Heroku and other PaaS do for you.

1 person company: sure, let's do everything myselve. As cheap as possible.

Two: here I'll teach you, take over DevOps so I have more time.

10 person team, everyone new and your application runs in the cloud. The person who took over, by now, left the company. Extended the infrastructure, didn't inform you and the application has been reworked so it runs on the cloud.

Welcome to the perils of working with people.

Isn't Heroku really expensive though? I looked at it a few weeks ago and the cheapest plan was $25/month (there's also a $7 "Hobby" plan though, but that seems just for, well, hobby stuff?)

I instead got a Linode VPS at $5/month, which gives me more than the $25 Heroku plan? Setting up a VPS is not very hard either – although this may depend a bit on your environment, my app compiles to a static binary – and a lot more flexible.

My devops stack thus far consists of scp and tmux.

Suggesting that $25/month is expensive illustrates how insanely cost-sensitive this community is.

Don’t come to HN for a representative take of how people in US businesses evaluate vendors and their pricing.

Do you know how much it cost my employer for me to spend an hour reading about Kubernetes?

Well, $25 is 5 times as much as $5, so if the costs scale at a similar level then your $500 hosting costs will end up being $2,500 which, depending on your business, may or may not be a significant cost.

Back when I worked for a hosting company specializing in RoR we had more than a few customers migrate to us because Heroku costs were getting out of hand (and we weren't all that cheap either!)

In my specific case I'm prototyping what could perhaps be a startup, and with a few small cheap Linode VPS's I can get a lot of bang for my buck.

> Do you know how much it cost my employer for me to spend an hour reading about Kubernetes?

A lot, which is why you shouldn't use it. You can run a VPS without k8s.

> Suggesting that $25/month is expensive illustrates how insanely cost-sensitive this community is.

I completely agree but I’m not sure “cost-sensitive” is an adequate term because it’s like a cost on hosting (or an app you’ll use for years) triggers extremely high awareness by people who are extremely blasé about hemorrhaging staff time on support and slipped deadlines.

Sometimes it can be hard to give up sysadmin work because people consider it to be a core competency and to let that go feels like a loss. Kubernetes hits the sweet spot where you get to use a tool to manage sysadmin work that is at least as much work as hiring a sysadmin.

Not everyone is thinking in terms of an employer, or looking to build a profitable business. One of the main sticking point for my side projects is related to deployment environments.

Heroku is painless enough for unprofitable hobby projects, but far too expensive. Using AWS, GCP or Azure directly is relatively affordable, but requires a lot of extra work.

I'm aware of various tools that are supposed to make working with the various cloud platforms much easier, but every time I see one of these used in practice (eg at work), people seem to spend an enormous amount of time getting things working properly. It still feels like we're missing a sweet spot for hobbyists who don't want to invest their spare time learning about and wrangling with devops, just to get a simple project up and running.

Has anyone ever recommended Kubernetes to anybody as a production environment for an "unprofitable hobby project"? In the context of the article we’re discussing, the suggestion that $30 is an unreasonable operations overhead for a team of four engineers is thoroughly preposterous.

Tangential, but if anyone wants something Heroku-like in for deployments and monitoring but less [eae-of-use gui] control over scaling I recommend Digital Oceans “one click” Dokku deployment.

I’ve used it at work for sunset of our projects and it’s been pretty good once you learn a few of the gotchas. Once you’re set up it’s pretty painless. There are limitations for sure, but it’s been handy for a few situations where we didn’t want to focus on deployments and keep them as simple as possible.

If you're going for Heroku because of its simplicity, I don't consider the alternative to be "spend an hour reading about Kubernetes" (and presumably many more hours when I have to troubleshoot it next week). I'm not touching Kubernetes or Docker or any of that.

Even on AWS, I can deploy with one short command, and it's much cheaper than Heroku.

Can you run a one-off instance like “heroku run <command>”?

So true - billing rates for consultants run 200+/hr. Even a small $1mm expense budget - $25 is not relevant and in my experience you can quickly be at 75%+ personnel cost (w2/1099 + desks / space etc for them)

Not all businesses pay $200/hour for consultants. There are a lot of small businesses out there.

"A small $1M expense budget" may be "small" for certain companies, but it's more than the yearly revenue for a lot of businesses, never mind people who are trying to start a business and don't have any revenue at all (yet).

Of course you need to be reasonable and not penny-pinch or "spend money to save money", but in general I would say that frugality is a virtue.

Look into Dokku, it is a free Heroku clone that you can run on a cheap VPS. I set it up last week and now all I have to do to deploy is type "git push dokku master".

I missed this comment and made a similar one up-thread. I’ll second.

I’m ignorant as to whether large projects have run using it, but for smaller ones it’s useful.

Thanks! Maybe it would be worth replacing my 100 line nodejs script that does the same thing with this in the future :)

(not sarcasm, if this works the same and adds more value I'd use it)

Second dokku. Its essentially a 100 line node js script if u remember correctly. Nice eco system growing around it. No scaling though

I'll second this. Have had a great experience with dokku, about as simple as Heroku.

Try caprover.com, minimalistic orchestration similar to Heroku or Dokku (but with a GUI). Can scale too with Docker Swarm. I'm using it and it's a breeze, really happy with it. Previously I was using Rancher, but didn't like the switch from Cattle to Kubernetes.

Just tried CapRover this weekend for a hobby project and it was even easier to set up than Dokku. The NetData monitoring integration is great too, and just as painless to set up.

For use cases where support plans and SLAs are not an issue (hobby), it's a great option.

A third recommendation for CapRover. It's simple, it's beautiful, and it's open source. I've been enjoying it so far :)

CapRover also features one click SSL generation via Let's Encrypt, similar to what is offered on Heroku.

i dont get this thinking of how $25/month is considered expensive when heroku has already figured out and abstracted away all the busy work that's required for deploying code

For $300/yr you could get multiple Squarespace websites, and they abstract even more.

Squarespace is great for solving lots of problems and if it solves your problem you should definitely use it before heroku. But as devs we're usually brought in after someone has figured out their problem can't be solved with Squarespace.

So, I don't really understand this comment. I'm not a programmer and I don't really understand what kubernetes, heroku, etc do. But I thought they were for applications, not websites.

I've done things like go to the Wikipedia page for kubernetes. It hasn't helped me figure out what such things do.

Anyone care to point me to some kind of 101 explanation to help me follow the conversation?

Squarespace abstracts away code so non-developers can make websites. To the best of my knoweledge, you can't really do anything complicated with Squarespace but it takes care of a lot of the hassles of building a website such as figuring out security, social media integrations, and design layouts.

Web applications (web apps) are essentially websites but with some additional functionality. Web apps can work the same as websites by having urls for different pages but are distinct in that they can do things that a website hosted on Squarespace can't do. This is because to build a web app requires some coding up front which is both an advantage and disadvantage of webapps in comparison to a site hosted on Squarespace.

Heroku is a platform as a service (PaaS) offering that lets developers utilize version control tools to update their webapps. This means the level of complexity for deploying a new version of webapp is incredibly simple.

Imagine that you collaborate in an office that publishes technical documentation and various technical writers can work on multiple parts of the same document. Each new version of the document gets published to a PDF that users can access.

This is essentially what Heroku provides along with abstracting away some of the difficulties of getting a web app hosted such as security, setting it up so your site uses https, and some basic monitoring and logging.

Sometimes web apps look and feel like a singular thing but are actually multiple pieces working together. In this case, you may want to isolate these different pieces.

Doing so can be really difficult, and to the best of my knowledge, Heroku isn't necessarily designed with this level of orchestration in mind. This is where Kubernetes comes in.

I've never used Kubernetes, so I might get a few things wrong here but from what I understand, Kubernetes gives developers/devops people a lot more fine grained control of how the various pieces of a webapp or webapps get deployed and managed. Kubernetes let's you take advantage of containers which are sort of like micro operating systems but only with the dependencies you need installed for a service to run. This means you can write in a file "I want x replicas of this part of my app and y replicas of this other part of my app to be ran across z number of workers".

What this abstracts is how an webapp should be ran without manually setting up each piece your self.

Here is a video with a bit more info on Kubernetes: https://youtu.be/PH-2FfFD2PU

Thank you.

Sorry to see you're getting downvoted, but it's probably because that's just too long of a story to put in a comment. The parent you're replying to was being sarcastic: he meant to say, "well if we're going to pay other people to do our job, why stop at Heroku? why not pay squarespace even more money to do more work for us?" A reductio ad absurdum.

Ironically, he's right. But, as someone pointed out: Squarespace is probably not relevant in a conversation about Kubernetes, while Heroku is.

What Kubernetes is, is a long story. Suffice to say: if you don't know what it is, count your blessings. :)

So, what I'm hearing is that my confusion is justified because Squarespace doesn't do what kubernetes and heroku do.

Is that accurate enough?

Right. Squarespace is a hosting provider and website builder that anyone should be able to use to build a website, typically a smaller scale website like a blog or a small to medium online store / e-commerce site. It's similar to Wix, Weebly, WordPress.com, and the website builders that some hosting providers offer.

Kubernetes is a tool large companies use to manage a large number of servers. Google invented it and open-sourced it so now it's free for anyone to use. It's not something people would use for a single website unless it was a huge website that required lots of servers, like FoxNews.com or something.

Kubernetes is a tool that lets Squarespace more efficiently host large numbers of websites on a smaller number of machines.


Thanks. That makes me feel a lot less stupid. It's much more in line with what I thought.

yes, but that's on purpose. "This is about Kubernetes, stop changing the subject to Heroku." or something. it's hard to read sarcasm online.


These two online books (made to seem like Children's books) represent a true ELI5 explanation: phippy.io

Thanks. That's perfect.

Heroku is great, but that's 5x the cost of Linode/Digital Ocean + Dokku.

Unless you're a pretty large company, 5x the price in server hosting probably isn't worth your time worrying about.

Or working on a handful of personal projects that aren't revenue generating.

Unless the goal is to get more ops experience, wouldn’t those be an especially tempting place to spend more time on the project and less on support toil?

I think the issue is that for hobby projects, the alternative to Heroku isn't devops toil, it's scp.

You can get surprisingly far on a single box. PlentyOfFish, Mailinator, and Hacker News are all services that got to millions of users with one server. StackOverflow and Google are ones that got to millions of users with a handful of servers, mostly for redundancy.

When you have big teams that are all concurrently making changes and writing code to a live site that's mission critical, then things get complicated. But when there's only one dev and your few hundred thousand users don't mind too much if it goes down? You can just spend some one-time effort installing a stock Postgres install, scp over a single binary or tarball, and run it in the background with nohup or screen. When it's time to re-deploy, upload another version, kill it, and restart it.

I’m not arguing that you can’t do a lot on a single server but think about how many separate skills you mentioned even after cutting corners for reliability, data loss, or security.

I’ve gone to a bunch of hackathons where some of the participants got derailed on that kind of stuff and never even got to the part of the project they cared about. My point was that it can be worth a modest amount of money not have that overhead on a small project.


* Single server on Dokku, with multiple projects


* Switch DB to DBaaS on your hosting provider's cloud

Grow More:

* Multiple Dokku instances behind load balancer/proxy as a service


You can then scale vertically a LOT before you need K8s style infrastructure.

>>> StackOverflow and Google are ones that got to millions of users with a handful of servers, mostly for redundancy.

Servers that can cost $50k each. Not a good example for affordability.

Setting something up like Dokku, which approximates the deployment of Heroku (git push to deploy), is fairly trivial.

Okay, now you have to support a Docker cluster, backups for persistent data, and deal with things like DNS, proxy, etc. management. It's easier than doing those from scratch but it's still a non-zero amount of work which will at best take time away from that small side project and at worst cause major problems if you never come back to installing security updates, running backups, testing the restore process, etc.

LOL... Dokku on DO is pretty simple... as to backups, it's a checkbox option, as to redundancy, for non-revenue generating projects, it's generally acceptable to have some down time. As to back up again, beyond some potential data loss, it's pretty easy to get CI/CD up and pushing to Dokku.

In any case, it doesn't need to be as completely flushed out as a company with millions in investment capital could do.

+1 on this, love Dokku myself. Great option for self-hosting and can even do some redundancy with multiple servers behind a load balancer.

Dokku isn't much more support toil, so to speak and can operate starting on a single server... $10-20/month on Linode or DO. With DO, can eventually grow a bit with hosted DBaaS, and multiple dokku instances, with a load balancer in front. Start small, some room to grow, long before the likes of K8s is needed.

What toil? You put a service up, it stays up.

Unless you’re just dropping a .php file in a shared server account, you have more setup and upgrade work setting up a running service. The first time it fails and you have to manually rebuild it, you’ll learn a valuable lesson about the PaaS value-add, too.

I find that dokku is a REALLY good middle ground... can even grow to have 2-3 servers, with a hosted load balancer and DBaaS, just deploying 3X exactly the same if you need redundancy. And starting off with a single server to experiment and host multiple apps is pretty easy to get started with (comparable to Heroku and similar).

only consider proportion when the problem is scale. Unless you're deploying many many projects I don't think $25/month is expensive

$5 compared to $25 doesn't seem like much, but over a year that becomes $60 spent vs $300.

Look into google app engine. It’s very similar and much cheaper. For hobby projects the free tier is usually sufficient.

It gets much more expensive quite quickly

Not really. I run large enterprise apps on it and it’s cheaper than anything else I’ve used.

The $7 hobby plan is surprising useful if you’re smart about what you’re deploying.

Heroku's $7 instances and the $25 instances are not very different. There is also the free instance that sleeps for 6 hours.

You have to factor in that you get free tiers of many things, all managed together like docker would help with, such as: 30 MB Redis memory cache compute instance for free 500mb Mongo database compute instance for free External logging compute instance for free

and a whole marketplace of all these managed services, with the grouping of containers further managed by heroku.

A Linode VPS at $5/month does not give you all that. If you like configuring all the above (and of course, assuming your use case calls for it at all), then the $5 plan with 1 GB RAM would let you put in a bunch of 256MB containers if you really wanted. The $10 plan with 2 GB RAM would let you put in comparable 512MB containers, but then you should have just been paying $7/month for Heroku already.

Hope that helps!

For a long time I struggled with inability to understand my potential costs for server-side projects, and occasionally read horror stories about other devs who got it wrong.

GKE is head and shoulders above the rest.

EKS is a joke. The only people that use it are those either experimenting, or those that are forced to.

AKS is pretty okay. They're definitely way ahead of EKS. They lack a few things, but they've made some pretty good strides in the last year. Kinda suffers from being part of Azure which likely reflects my personal bias.

Managed k8s or Heroku? I don't understand why people don't try to do AppEngine. It's the oldest "Serverless" platform out there and wickedly mature. If you're worried about vendor lock in (and you should be) I have deployed AppEngine apps on the F/OSS AppScale unmodified.

> I have deployed AppEngine apps on the F/OSS AppScale unmodified.

How did you manage secrets on Google App Engine?

In the flex environment you can inject secrets at container build time. In App Engine standard I've used a deploy wrapper around ansible vault to do it.

> In the flex environment you can inject secrets at container build time.

Does it mean your secrets are stored in plain text in the container image?

> In App Engine standard I've used a deploy wrapper around ansible vault to do it.

What does the deploy wrapper do? Does it produces an app.yaml file with the secrets injected in it, after having been decrypted by Ansible Vault?

Q: Does it mean your secrets are stored in plain text in the container image?

A: No. It means the secrets are stored as environment variables in the container.

Q: What does the deploy wrapper do?

A: It prompts the developer to input the ansible vault password , decrypts the vault and injects the secrets into the environment.

Generally speaking, I follow the 12-factor approach:


What do you mean by "injecting secrets into the environment"?

You mean a section in your app.yaml like this one:

      DB_PASSWORD: "this is a secret"
As for as I know, in the Standard and Flex environments, app.yaml is the only way to define environment variables.

Why should you be concerned about vendor lock in?

I haven't heard a convincing argument why vendor lock in is a problem regarding the cloud.

It can be a problem, but all solutions result in you avoiding the things you went there for in the first place.

There is exceptions to this obviously but I find most people worried about vendor lock in are no where near big enough to bother running multi cloud.

because you have to agree with every change vendor make. If they double pricing, you pay twice. If they ban service type you host, you have to stop. If they make an incompatible change in some of the hosted parts, you have to adapt. If you grow too much they refuse to allocate you more resources or it is expensive for you to use them at your scale, it is difficult to migrate away. That is vendor lock in -> loss of freedom and choice.

neop1x has a great response. A more concrete example would be a gaming company that started out with AppEngine for its server support for the game. Once the game has a proven revenue stream, the economics of moving from AppEngine to AppScale are a critical next step to maximizing ROI. You have all the code, now you just need to host it somewhere. Can't do that with any other serverless Platform.

It's important to note that AppScale is an aPaaS, and API-platform-as-a-service, where you're guaranteed a consistent API with the ability to plug in different implementations. Something even beyond your typical openness in F/OSS software, too.

Broadly agree with this. Use K8S at LastCo -- was great so long as I didn't have to manage it, but actively managing our own K8S cluster was a nightmare. Using Heroku at CurrentCo and it's a breeze. I highly recommend it. The only thing I think that could unseat it of its ilk would be Zeit, which looks very promising. I'm skeptical of all things serverless right now but Zeit looks promising and most impressively, extremely (perhaps as much as is possible) cost effective.

>When I've deployed on ECS (or other "simple" orchestrators), I've found that I ultimately wound up re-inventing lots of Kubernetes features.

Could you elaborate a little on this part? Our team is looking at ECS/Fargate as a possible container solution and I am curious what you felt was missing from it.

[Update: See below, many of these features have been added to ECS since the last time I touched this portion of our infrastructure.]

> Could you elaborate a little on this part? Our team is looking at ECS/Fargate as a possible container solution and I am curious what you felt was missing from it.

Some typical examples:

- Kubernetes allows you to run a monitoring container on every single node using something called a "DaemonSet". On ECS, you'll have to build all your monitoring tools into your base image, or use cloud-init to spawn an ECS task on each machine.

- You're probably going to end up writing a bunch of scripts to generate ECS task definition JSON and to update running services, and you'll need to integrate this into your CI system somehow. With Kubernetes, you can get away with "kubectl apply -f" for a fairly long time.

- Kubernetes makes it relatively easy to allocate and manage persistent disk volumes. I wouldn't necessarily use them for a production database, but they're great for smaller things.

- Kubernetes has autoscaling support, plus the ability to control which containers run on which types of servers.

- Kubernetes has basic secret management built-in. It's nothing as nice as Vault, but it's good enough to get started.

- Kubernetes has support for a whole bunch of useful minor things that would typically wind up as Terraform scripts on AWS.

And so on. None of these is very hard individually, but there's a ton of things like this. So we're slowly migrating pieces of ECS infrastructure over to Kubernetes so that we can stop reinventing so many wheels.

Again, for those things which can run on Heroku, either choice is overkill. And I have to admit that ECS is very reliable at the things it does. If you do decide to look at Kubernetes, I highly recommend skimming the O'Reilly books, which provide a solid overview of how it all fits together.

All the things you mentioned are available on ECS:

- https://aws.amazon.com/about-aws/whats-new/2018/06/amazon-ec...

- https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...

- ECS runs on ASG / EC2 so there is auto scalling

- https://aws.amazon.com/blogs/compute/managing-secrets-for-am...

I think you don't understand all the glue between AWS services, off course ECS doesn't have everything that's why there is EKS, but all the thing above exists on ECS.

The secrets example isn't very comparable. With Kubernetes, the secrets are injected into the container as environment variables or files, whereas the ECS example requires assigning an IAM role to the task and doing a Parameter Store lookup from within the container. This usually requires a custom docker image with an entrypoint to handle that part. The same is true for ConfigMaps, which I believe ECS lacks.

No, you don’t need to do any of that. ECS can inject secrets into environment variables in much the same way as Kubernetes. This behaviour is already built-in, there’s no need to build your own solution for it.


That's good to know. The parent linked to an AWS blog post that did work that way. The documentation you linked is a new feature since I last used ECS.

Exactly. You can deploy a highly available, auto scaled, logging enabled, spot instance or Fargate multi-service ECS reference architecture here by deploying a “one click” cloudformation stack: https://github.com/aws-samples/ecs-refarch-cloudformation/bl...

Nice! I'd been busy with another project and missed a couple of those announcements. The DaemonSet equivalent will allow me to rip up some particularly annoying cruft the next time I need to touch it. Thank you.

I've looked at various guides for setting up cluster autoscaling on ECS, and so far, everything I've found looks far more complicated than Kubernetes cluster autoscaling on Google. Is there a nice guide?

ECS already has all the things you mentioned here. Some, like daemon set deployments, are built in. Others, like secrets management, are provided as integrations with other AWS services. But they are all there.

If you're comparing to Kubernetees, "everything".

But I'm using ECS and Fargate right now (orchestrated with Terraform). https://rivethealth.com

Probably the biggest thing I would like is native secrets management. "Security groups" are limited as it depends on having an AWS interface per container.

ANd I don't know how much longer our time on Fargate will last. It's expensive, but more significantly there are limitations in not having access to the host. Having to install an SSH server in every docker container to be able to debug (simple things, like top) is annoying.

We nixed ECS for the same reason, no secrets through ConfigSet.

FWIW did the fargate approach very happily. I know k8s getting the love but found fargate easy to get going on and set and forget once deployed

Why don't more people just use the native machine imaging built into cloud providers? (I would actually like to know this)

Building an AMI on Amazon is not difficult and if you use it you don't have to reinvent so much architecture on top of AWS (or cloud provider of your choice)

> Why don't more people just use the native machine imaging built into cloud providers? (I would actually like to know this)

I've done that, and I've done Kubernetes, and Kubernetes is definitely easier once you get past the initial setup. The initial setup is also getting easier over time.

It is also more portable. Kubernetes runs on multiple cloud providers as well as your own hardware and presents the same interface and runs the same containers. Docker containers are more portable than AMIs.

If you plan to write your own system to control deployments, secrets, load balancing, and DNS based on AMIs and other AWS features, you may want to consider that you are reinventing the wheel. You are also locking yourself into AWS to a far greater extent than you would if you used Kubernetes.

Expertise is also a big differentiator. You can hire people who know Kubernetes on day one, but you cannot hire people who already know your custom in-house system.

For us, it's much more economical to binpack multiple apps onto a VM than to run one service per VM. Also it makes it much easier to scale out particular services once they outgrow your chosen VM size.

If your application is 50 microservices, how would you manage that on ec2 instances?

One autoscaling group, AMI, instance (or more than one) per service.

A nontrivial portion of those microservices could probably run on Lambda.

Why does the number of microservices have anything to do with where they might live? Wouldn't that be much more related to CPU load?

Because you generally want to deploy them independently. If you have 50 microservices and have to run an AMI build every time that's going to significantly slow you down. With containers you generally only deploy the service that has changes to a VM that doesn't change as often.

You want them to be independent, atomic. How they are deployed and the choice of technology you choose to couple them with? That should be driven by load -- and dynamic.

The reason everybody's so hot on microservices to begin with is that we kept coupling everything all together and it became a huge mess to manage. You don't want to repeat that mistake, only at cloud scale.

Was using Heroku and moved to GCP. It's almost as nice but not as dead simple. App Engine Standard is now updated for latest languages, Python 3.7, Node, etc and really good, its also pretty cost effective. Overall would recommend.

Though, there is a middle ground between running vanilla Kubernetes yourself on your own hardware and tying yourself into the specifics of a cloud provider just to get managed Kubernetes. There are really good Enterprise distros like OpenShift or PKS that relieve you from the hassle of running Kubernetes and focus on using it but don't force you onto a particular platform or cloud.

What parts of Kube did you find yourself reinventing with ECS? It’s not the most complex microservice setup, but I run around 60 services over 3 ECS clusters and have found it pretty easy and solid at this point. Only real pain point we’ve had was the limit on Awsvpc mode service per ec2 instance—never have figured out why instances are limited to just a few NICs.

For my side projects I just wrote a simple service that listens for Github events and just checks out the repo, builds it, and ssh's to the target server and runs a "stop and run" script. This way I can even put all of the configuration for the different databases in version control too.

It's super dirty, but I can deploy dozens of services without issue. Monitoring is another problem, but I don't need Prometheus for this stuff.

(I couldn't use Heroku cost effectively because running graph dbs and some custom stuff, but Heroku is AWESOME)

> do not attempt to host your own Kubernetes if you can possibly avoid it.

We use Google's Kubernetes and are pretty happy with it, but overall this statement bothers me.

It's absolutely true. Don't get me wrong. The problem is what it says about Kubernetes as software. It tells me it's ugly and crufty and in some ways immature. Good software should not require that much babysitting.

Yes. I think Kubernetes has a lot of really useful features but it is needlessly difficult to use and administer. Unfortunately, it seems to have become the de facto standard, so everyone will be forced to deal with it the same way they have to deal with ancient Unixisms (or worse, windowsisms).

The statement is bothersome because it says more about the person saying it than anything else. If your job is to manage infrastructure and you have to deal with thousands of snowflake deployments by other teams then k8s is a godsend.

I just hate unnecessary complexity. I am not sure if the complexity in k8s is necessary or not. Mighy be.

The thing is if you keep to the basics Nomad/consul will do the job just fine.

As the fine article points out.

Why is this upvoted? It's just a bunch of rules based on no reasoning or evidence.

I’m surprised no one has mentioned AWS ElasticBeanstalk. It supports Docker (ECS behind the scenes) and is very close to Heroku in terms functionality but with EC2 pricing.

Beanstalk is a joke.

You can launch a DB through EB but the docs basically state it's a bad idea (as it gets taken down when you delete your app, which you might find yourself doing if EB gets in to an unrecoverable state).

So now you have to manage EB + RDS separately, which should be automated so now you need CloudFormation to add the security groups and manage the vars for the DB connection properties in your EB.

Beanstalk is nice for rapid development. The problem I is that it doesn't scale for nontrivial architiched apps. I've converted failing beanstalk deployments to real orchestarators a few times.

Heroku is really a different solution, I don't know anyone at medium scale project that uses Heroku, Heroku is like the Digital Ocean compare to AWS.

You can't really compare DO, Heroku with k8s and AWS, the laters are much much more powerful.

I work for an occasional competitor. Heroku are legit.

"Finally, don't overcomplicate it. Just use the basic stuff for as long as you can before trying to layer all sorts of other tools over it."

This is I think always the best advice for almost anything.

And we all know it. And yet so often, we ... can ... not ... resist.

This starts with a lot of "you don't need Kubernetes" and then concludes with a pretty compelling argument in favor of using Kubernetes.

From the "The Nomad ecosystem of loosely coupled components" section:

> It integrates very well with other - completely optional - products like Consul (a key-value store) or Vault (for secrets handling).

> At trivago, we tag all services, which expose metrics, with trv-metrics. This way, Prometheus finds the services via Consul and periodically scrapes the /metrics endpoint for new data.

> The same can be done for logs by integrating Loki for example.

> Trigger a Jenkins job using a webhook and Consul watches to redeploy your Nomad job on service config changes.


> Use Ceph to add a distributed file system to Nomad.

> Use fabio for load balancing.

And the icing on the cake:

> All of this allowed us to grow our infrastructure organically without too much up-front commitment.

So if I understand correctly, the author (and his team) preferred to do all the work of integrating/testing/debugging those components, rather than using a tool that provides every single on of those features, and more, out of the box.

Kubernetes isn't a trivial lift but it's a damn sight easier than trying to roll a cheap imitation yourself.

The point is that Nomad allows to add more components as they are needed while Kubernetes is an all-or-nothing solution. Also, we used tools like Jenkins and Ceph before. They have been tested and we trust them so there's no point in replacing them. Quite the contrary: a lot of internal processes depend on them so it would have been painful to migrate away from them.

Nomad is not a cheap imitation of Kubernetes, it is a simple orchestrator which favors composability over an all-in-one approach.

> Nomad is not a cheap imitation of Kubernetes

Nomad isn't a cheap imitation of Kubernetes but all these components taped together are. Kubernetes is no less composable than a system built on Nomad, it just includes more functionality from day one.

If you're running containers with a orchestrator like Nomad, at some point, you'll need DNS. So you do it yourself and then you're managing Nomad and DNS. Then (we'll assume, because you're using a container orchestrator to manage multiple services) you'll need service discovery, so you write some jobs and event handlers to glue together Nomad and your DNS solution. And then, because you're a team of responsible people, you'll want to store secrets securely, so you graft in Vault. And then, you realize zero-downtime config changes would be great, so you slap Consul in there and write some sidecars or library code to handle config updates. Then metrics. Then logs. Then rolling deployments. And it continues, indefinitely, as you add features.

If you started doing this five years ago, fine. If you start doing this today, you're out of your mind. You're just doing work for the sake of "but it's composable". There's a reason why teams still build applications on frameworks like Rails and Django even though they don't need half the features--it's more important to them to get something functional up and running than it is to satisfy delicate sensibilities about only using what's needed. Kubernetes is the analogue in the world of systems and operations.

your description of "grafting in" consul and vault _is_ exactly what it feels like to do that with kubernetes, and it happens rather often.

Using industry standard components like consul and vault is not really a second thought for k8s in production, leaving you with duplicated hunks of infrastructure to step around, where the idea of tacking on kube-dns and kubernetes ~secrets~ onto something else is rather laughable. This, again, is the point which was being made -- you're forced to bear the brunt of kubernetes' NIH.

I'll assume by the contrived situations of inventing some wacky custom mousetrap to bind nomad to dns rather than using the "slapped in" (https://www.nomadproject.io/docs/configuration/consul.html) consul integration for dns, or writing config update/logs/rolling deployment code rather than using the core nomad scheduler features to do that, that you don't actually know, and this is FUD?

We use nomad because it's flexibility in this regard lets us slowly migrate away from our old infrastructure. Moving fully to Kubernetes would not be politically feasible in the organization.

This is a post about why I don't need kubernetes. Not a post about why kubernetes isn't politically feasible in not my organisation.

I've started heavily using Nomad recently and it's a real joy. Adding things in is just so easy because it doesn't do much and (mostly) everything is very well defined. I cannot suggest it highly enough.

An example is I setup a log pipeline recently spanning multiple data centers with full mTLS. It wasn't that hard and thanks to Vault all the certs are refreshed at regular intervals all over the place. Pretty great!

Came here to say this. Nomad is the perfect entry system for getting started with a small cluster for getting to know how container orchestration works and what things to keep in mind when migrating all legacy applications to fit into this type of deployment flow. Nomad allowed us to spin up a 3-worker-node "cluster" with consul in a matter of hours, and it has not needed any maintenance since. With Kubernetes we couldn't even agree on which of the many bootstrapping approaches and scripts-de-jour to use, much less any of the other many decisions that have to be made and will overwhelm you.

I'll name drop CircleCI here as a nomad user in case it peaks anyone's interest. Need to look into it more myself, but have always been impressed with Hashicorp tools.

+1 for Nomad, I'm extremely impressed with all of the Hashicorp suite.

Single binary deployments and upgrades are helpful.

Please, please stop using the single binary argument against kubernetes. Kubernetes has hyperkube, which is a single binary. It's not magically better by having a file with a giant switch statement trying to figure out what you want, rather than a few small ones with small switch statements.

And the reason nomad's single binary is so small is because it doesn't do nearly as much. I'd rather have a platform that I can do things later that I don't know I need now.

even Terraform?

I manage everything soup to nuts with Terraform. If you know something vastly better please let me know, especially considering the changes to 0.12. It's a very well integrated system.

what kind of infrastructure are you managing? if you’re in the aws cloud, CloudFormation is the gold standard for “infrastructure as code”.

canonical maas, ns1, aws, nomad, consul, vault, and a handful of other providers

says you

Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed.

do you know who can rollback and leaves your infra in a consistent state? can you guess?

yea that is surprising that you can't roll back when you first start with terraform but as you gain more experience with it, you realize that not rolling back means you can resume instead. And if you need to roll back, you can do so by just running destroy. It's actually a feature, not a bug.

no it’s not. i want to leave my infrastructure in a consistent state. i am in state A and want to move to state B. I want it to work. I don’t want a half-assed attempt to make it work.

what does terraform bring to the table? I have to use HCL to describe my infrastructure in terms that are NOT cloud agnostic (therefore introducing another layer) and in the face of adversity it throws its hands in the air and now you’ve got to figure out what went wrong, manually, by yourself. This is what I call True Devops (TM).

I have seen Terraform crap out and it cannot recover. It cannot move forward, it cannot rollback, it cannot destroy. It’s stuck. At that point you start praying that someone really understands the underlying cloud + knows the shenanigans terraform plays to fix it now and also make terraform happy moving forward.

we’re talking basic stuff here.

i don’t want to go into more advanced issues like: losing network connectivity, terraform process crashing (think oom conditions) or being killed or non-responsive cloud apis.

not to mention that destroying infrastructure you’ve created almost never works (unless it’s trivial infrastructure).

based on what I’ve seen up until now I would not use terraform in a production environment.

If I had experienced what you just described, I would probably have the same opinion - but after the initial learning curve, I haven't really had any of the problems you've listed. The only times I've had to go manually modify cloud resources to fix something was always because I was doing it wrong in the first place.

On the other hand, CloudFormation is not perfect either. The rollback does not work 100% of the time and I've had it roll back a set of templates that took 45 minutes to deploy because there was some inconsequential timeout that could have been ignored. I've also had pre-built templates developed by AWS outright fail, which is strange considering AWS themselves built it.

Use what works best for you and your team.

so I ran into the issues I’m mentioning while test driving it.

i have never experienced unpredictable behavior from CloudFormation, but it’s possible YMMV.

Can you recommend any up to date best practices or guides for the ecosystem? I'm looking at the Hashicorp ecosystem for our new infrastructure.

This should get you started: https://learn.hashicorp.com

It does but leaves critical questions unanswered without already having an in-depth knowledge of the system.

If I want to use Vault and Nomad, should they share the same Consul cluster?

Should I deploy my Vault and Consul servers via Nomad?

The guidance on server/cluster size is hard to use. They only have 'small' and 'large' with no reference for what constitutes those sizes.

I have more questions as I'm going through this right now, but those are just a few off the top of my head.

I’m no expert but I would say Vault and Nomad should share the same consul cluster.

Don’t deploy your Consul servers with Nomad as that will create cyclic dependency if I’m not mistaken. The same should be true for Vault.

So many comments here are touting the benefits of this, that or the other, but aren't mentioning Nomad at all. Nomad is definitely worth a try even for large deployments. Yea, it doesn't autoscale without 3rd party tools, but you don't always need that.

If you just need to keep your jobs running, load balance to those apps, use externalized config and need a simple UI view system status, then Nomad/Consul/Fabio really works great.

I've been running SmartOS containers for 6+ years and it's the best of all worlds. Easy service management, works with regular binaries or Dockerized deploys, SDN always built-in, service discovery built-in (CNS), even Windows VMs available. Best kept secret in the cloud even though the entire stack is open-source.

On the orchestration side I use Puppet Pipelines (formally Distelli) which works with standard VMs, SmartOS containers, Docker, Linux images, from packaging through testing and deployment. And they just lowered pricing (!??!)

There's no reason to be locked into Kubernetes unless you're sure you need it. SmartOS scales in seconds. As CTO I'm always checking out the next platform, but all the newer solutions look much more complicated.

EDIT: Heroku has weird errors when you push CPU or RAM, also the CPUs aren't so fast, Linode can't be trusted, Digital Ocean is OK but is still very manual roll-your-own, AWS is a behemoth and medium fast, Google is fine but specialized and hard to migrate away from, OpenStack isn't for small orgs... I've run production services on them all. It's easy to try a new service with a Pipelines-style orchestration because it doesn't care which service its talking to. And makes it obvious when platform-specific instructions or concessions are required.

I was also on SmartOS for a long time; however with what happened to most of us old timers when sun went down in the back of my mind, and with the way things are looking in illumos land -- we decided finally it's not a good bet for us to be dependent on a single company for our OS anymore.

While I'm sure Samsung will keep Joyent/SmartOS kicking for now, I'm uneasy being dependent on them. We've since started a migration over to FreeBSD for our infra hosts which do all the same things (mostly running linux and bsd guests, zfs tricks and so on) and the experience has been largely positive. No numbers to back it up, but I've observed performance is a bit better and having more options than ipfilter in the base system is very welcome.

YMMV of course, but you may want to consider an extra basket for all those eggs!

Thanks for your insight.

Why can't Linode be trusted?

They've proven many times that they're not. Read through slashdot and google for "linode bitcoin stolen"

This looks interesting, I have a few questions. How do you deal with Linux dependencies etc.? All your applications must be BSD compatible?

Any advice for getting started with it on DigitalOcean? Or is nested out of the story?

Oh, SmartOS containers run Linux seamlessly too. I guess I take that knowledge for granted now. But many many of the popular cloud tools (hashicorp consul and vault, redis, couchdb, arangodb, postgres) run natively on SmartOS and have better resource and user management from the underlying system.

I haven't used Digital Ocean's kube offering. Otherwise it's much like Linode or other services from that era -- install your own distro and binaries, which needs some kind of orchestration automation to scale.

Do you think you can do a blog post or tutorial for running SmartOS on the popular providers and getting to the point of deploying a basic Rails app with PostgreSQL in Docker etc.? The SmartOS documentation looks ...sparse, and there aren't many recent blog posts on SmartOS, most existing ones are from <2017.

SmartOS runs on Hetzner, OVH, EveryCity, MNX, GigaVPS, and more. The nice thing about Solaris: the system doesn't change too rapidly, so the documentation doesn't need to change either.

On a SmartOS container, Rails and Postgres can be installed from the package manager and automatically configured as a service. Running a Docker image on Joyent Triton cloud is just a one-line CLI command.

SmartOS is built for the cloud, and is usually semi-managed by the provider. If it's worth your effort to run a separate SmartOS cluster, you'd know. But there is also Project FIFO for that:


Common Lisp does not run in LX zones.. at least Clozure common lisp doesn't.

Apparently Clozure does something 'ungodly' to the signal stack which just throws the kernel and ends up crashing out. Chances are, these days anyway, most people aren't running common lisp, so it probably isn't much of a problem - plus Clozure runs perfectly fine under Solaris. As a result it's probably not high on anyones list to sort out.

It was true that LX didn't initially handle what Clozure does with signals, but I recall adding a mechanism to work around the busted behaviour and I'd expect it to work now. Certainly in some basic testing of CCL things appeared dramatically improved, though this was some time ago now.

I think we tried it just a few weeks ago and it still had the problem. If I recall correctly (and it's very possible I dont, but I can check at work tomorrow), our test is to parse some json with the cl-json library. That library relies heavily on using signals (common lisp signals) and restarts for it's control flow.

For most small deployments you are far better off using Ansible playbooks or similar solutions. Declarative orchestration management seems, for most relatively small and medium-sized deployments, a black box with an obscenely high learning curve which isn't justified if your deployment and scaling needs fall within the vast majority of use cases.

If you already know how the declarative management system works (and if there are other people out in the world that also know it, as is the case for Kubernetes), it seems like that is just creating yourself a less-reliable tarpit for no real reason, especially if you have to learn Ansible to do it.

If someone on a small team brings K8S expertise to the table, it is worth considering. Otherwise, yeah, if you're at the stage where your team is focusing on building the product, then K8S is going to be a distraction.

> If someone on a small team brings K8S expertise to the table, it is worth considering.

Perhaps you have some unstated assumptions here like a small team managing a relatively large infrastructure for their team size? Or, a small team in a much larger infrastructure department.

If it was me, I'd put lots of disclaimers around a single person bringing any kind of specialist expertise to the average small team. I think you need to lean on that person to level the whole team up. If there is any chance of that individual leaving you risk leaving the team high and dry.

You can use managed Kubernetes. And that person with expertise would mostly influence design decisions in order to write software that is easily deployed to kubernetes and orchestrated in containers. They wouldn’t be a dev ops person taking care of k8s cluster obviously that makes no sense for a small team.

It is what I am doing for my team. And yes, if I leave, it would leave the team in a bad state. I have been incrementally cross-training two of the other engineers on this.

We’re not managing our own K8S cluster. I did that once by hand at another place (before kubeadm), enough to know that managed GKE on GCP is a great thing. Also did my own single-node K8S for dev work. Nanokube and later, minikube later mare that much easier.

I used to do stuff with Capistrano and Chef. Enough to know that once you know the core primitives, it is easier just to use K8S.

Some fair points are made here, and I think running kubernetes the hard way, without a managed service, does introduce a lot of complexity. However, its not that bad when using a managed service such as EKS or GKE

Is there any significant difference between e.g. `kubeadm` and `gcloud container clusters`?

I believe when the cluster malfunctions, you're still on your own figuring that out. I had a stuck node on GKE just recently, which broke our CI. The GCE machine was there, but the node wasn't visible with kubeadm and quick SSH onto the VM hadn't discovered any immediately visible obvious issues. Auto-repair was enabled but hadn't worked - and TBH I have no idea how should I've diagnosed why it didn't (if there's even a way).

Thankfully, this was issue with the node, not master, and the CI nodes are all epheremal, so I've quickly dismissed the idea of debugging what went wrong and had just reinitialized the pool. Could've done the same with bare metal.

I don't know. I haven't had any random failures using EKS. Anything crashing was of my own doing.

I did an eval about 4 years ago of schedulers and service discovery software.

Lot’s of cool stuff around (dc/os, triton, hashi, k8s, flynn etc)!

After 2 days with k8s and not so much as a bootstrapped cluster the turn went to hashicorp.

Within 10 min I had a container running, registered in service disc (consul) on my laptop.

We ended up using the hashicorp stack in production with great success. Sure, some custom services was needed (service managment, scaling and such).

Running primarily on prem the complete simpleness and un-opiniated approach is an edge.

It allowed us to implement MACVLAN at first, and the ubuntu FAN-network and integrate totally with existing infrastructure.

Now having spent the last 8 months implementing k8s I’m torn.

I’ve built from scratch, used kubeadm, kops and now EKS. From 1.13 kubeadm honestly works really well and is my prefered way of deploying k8s.

Still it’s a beast... running large deployments with many teams... there’s just so much... stuff.

One GH issue leading down a rabbit hole of other GH and gists with conflicting or not working configurations. I’ve had co-workers bail on the project after mere hours of digging through code and GH discussions and SIG docs.

Nomad/consul and it’s concise docs is a breeze in comparision.

Torn. Cause I see the point of k8s, just not sure about the project. :)

I'm very much in the camp of "Kubernetes is a problem factory".

Personally, I've found very few problems that it solves and more problems that it creates. That said, its probably a great time to be a Kube consultant!

if your team is only a handful of devs who work on the thing that pays the bills + do k8s, yes. bad time. the only way i could justify running this is if you’re using a managed solution like gke or eks.

otoh, if you’re in the cloud, you already have the primitives to build your stuff w/o the k8s headache (vms, autoscalling, managed svcs). but that’s just my opinion. the koolaid is strong.

Honestly, if you have a $GENERIC_SYSTEM that you deploy your code to, you don't need the wasteful overhead.

If you need to (vs want to) rebuild your OS/Environment regularly, you're doing it wrong. IE: PaaS is for you. That said, I've found replicating OS & Environment extremely challenging on certain operating systems so I understand the appeal. I also don't build products on those operating systems and I've found myself much happier :)

yeah. it depends a lot and k8s has its place in some scenarios, but in most cases you don’t need it and its use is purely driven by koolaid

The thing that all the "you don't need kubernetes" articles seem to ignore or skim over, is the huge network effect of a popular setup like k8s.

Not only is it much easier to find services (like helm) and articles for k8s, but there are so many ready to go configurations. For most major services, there is already a ready to go helm chart or at least some well vetted yml config on GitHub to start from.

And most any issue I have come across, I have been able to search and find solutions.

The thing the 'there's a ready to go helm chart' folks always seem to skim over is that outside of an absolute startup position where you can curlbash to hello world heaven, you will have integration to do, often huge amounts of customization (read: complete rebuild) to do to take a toy helm app configuration to a production configuration, and I've never seen even so much as a nascent k8s (or nomad, mesos, etc.) buildout leveraged in a preexisting environment without quite a bit of pipeline, acl/naming convention and ux glue.

The proliferation of operators, CRDs, things like k3s and the growing ecosystem of vendor distributions makes 'stock k8s' an increasingly nebulous target.

I think CRDs are mostly going to be a quagmire. It's Wordpress plugins all over again, if Wordpress plugins could also affect the OS kernel.

Haven't looked into helm-charts deeply, but the ones that I found often needed a considerable amount of time to configure for production. For example, the Logstash chart uses a YAML config whereas we use Logstash's standard config format with many filters and options. Porting these features over to YAML is not a lot of fun. And in the end we would also have to maintain these configs and keep them in sync with upstream changes.

Speaking of documentation, I sometimes feel a bit overwhelmed by the amount of literature on Kubernetes these days. Many articles are quite outdated already and new ones get written every day. It's hard to know what the best practices are - and the ecosystem is still rapidly evolving.

So I don't think I was skimming over these parts, I just think that inertia can also become a disadvantage.

As well as hiring engineers who will be familiar with it from day 1, as opposed to having to learn whatever custom system you use.

This. In a perfect world, it should be more useful to say to an employer, "I know Kubernetes" than "I know AWS". It's a unified abstraction for cloud resources. They all, more or less, do the same thing. Why learn the proprietary, vendor-specific way to do it, when you learn it once, and apply your knowledge to any cloud, e.g, if you switch jobs and your new one uses a different cloud provider.

Unfortunately, Kubernetes isn't a unified abstraction, because vendor-managed Kubernetes setups are not very similar (to each other, or to an unmanaged solution). Somebody who "knows Kubernetes" but has only worked with GKE will not be very well prepared to operate a bare-metal Kubernetes cluster.

In my experience, pods, deployments, and load balancers work similarly enough to not cause any discernible problems (with basic web application processes, like a Dockerized Node.js server) on AWS, GKE, and Digital Ocean. But that's only my experience, and there are more managed offerings than just those few. Even if it's not fully unified, it's unified enough for the use case of basic scaling of a process up/down.

And, if you'll indulge me here, I was referring mostly to managed k8s offerings. Because that's what the vast majority of people would use, if they chose to go in on k8s at all and want to get out the door quickly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact