There is, however, a learning curve to Kubernetes, but it isn't this sharp. It does require you to sit down and read the doc for 8 hours, but that a small price to pay.
A few month back I wrote a blog post that, through walking through the few different infrastructures my company experimented with over the years, surfaces many reasons one would want to use [a managed] Kubernetes. (For a shorter read, you can probably start at reading at )
Any technology you adopt today is a technology you're going to have to troubleshoot tomorrow. (I don't think the 15,000 Kubernetes questions on StackOverflow are all from initial setup.) I can't remember the last [application / service / file format / website / language / anything related to computer software] that was so simple and reliable that I wasn't searching the internet for answers (and banging my head against the wall because of) the very next month. It was probably something on my C=64.
As Kernighan said back in the 1970's, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" I've never used Kubernetes, but I've read some articles about it and watched some videos, and despite the nonstop bragging about its simplicity (red flag #1), I'm not sure I can figure out how to deploy with it. I'm fairly certain I wouldn't have any hope of fixing it when it breaks next month.
Hearing testimonials only from people who say "it doesn't break!" is red flag #2. No technology works perfectly for everyone, so I want to hear from the people who had to troubleshoot it, not the people who think it's all sunshine and rainbows. And those people are not kind, and make it sound like the cost is way more than just "8 hours reading the docs" -- in fact, the docs are often called out as part of the problem.
If you want a gentle, free introduction to OpenShift (our Kubernetes distribution), I recommend trying out Katacoda portal, Learn OpenShift . Katacoda  also has vanilla Kubernetes lessons as well.
What a great quote. Thanks
Any fool can write code that a computer can understand. Good programmers write code that humans can understand. – Martin Fowler
Just because people tell you it can’t be done, that doesn’t necessarily mean that it can’t be done. It just means that they can’t do it. – Anders Hejlsberg
The true test of intelligence is not how much we know how to do, but how to behave when we don’t know what to do. – John Holt
Controlling complexity is the essence of computer programming. – Brian W. Kernighan
The most important property of a program is whether it accomplishes the intention of its user. – C.A.R. Hoare
No one in the brief history of computing has ever written a piece of perfect software. It’s unlikely that you’ll be the first. – Andy Hunt
Somehow I like being stupid and use technologies that allow me being stupid and still work good enough.
I didn’t say anything about being a genius or being stupid. But it does help to have some something to reference .
In short, there’s several things that help.
1) Start with 3 Master nodes that sit behind an Enterprise Load Balancer (or even HAProxy) and a VIP (console.example.com)
2) We choose to have 3 Infrastructure nodes that perform Container load balancing (app routers), log aggregation, metrics collection, and registry hosting
3) We then put another VIP and Load Balancer in front of Application Subdomain (*.apps.example.com) so that apps can be exposed outside the cluster (myjavaapp.apps.example.com)
4) Stand up 3 or more worker nodes
"8 hours" initial investment is already large for a small/simple scenario; and it's not the full costs either, because fixing problems will be much harder/time-consuming.
I wrote a post about this just the other day:
> What does it mean for a framework, library, or tool to be “easy”? There are many possible definitions one could use, but my definition is usually that it’s easy to debug. I often see people advertise a particular program, framework, library, file format, or something else as easy because “look with how little effort I can do task X, this is so easy!” That’s great, but an incomplete picture.
> Abstractions which make something easier to write often come at the cost of make things harder to understand. Sometimes this is a good trade-off, but often it’s not. In general I will happily spend a little but more effort writing something now if that makes things easier to understand and debug later on, as it’s often a net time-saver.
We actually had similar issues when we deployed k8s. In the end it turned out to be a misconfiguration, but took weeks to figure out, and only because our entire dev team looked at it (and not the k8s guru who implemented it all).
Kubernetes is far too complex to set up from scratch. The only way to reduce the complexity – or rather, to offload it – is to use managed Kubernetes via AWS or GCE. The fact that using AWS or GCE is effectively the only viable method for running a production Kubernetes cluster speaks volumes to how non-simple the stack truly is.
You'll get lots of SMEs that can use a product but can't trace or profile, and it isn't the underlying products fault beyond hype driven development.
If you want to spin up ephemeral environments to test your code, kube is one of the easier ways to achieve this. Similarly, it helps you solve problems such as app health, dying nodes, etc.
It's a great product, especially when you've scaled past a single app and a single team, but that's largely because it's a framework, similarly to how rails is, and if you do something serious with it, you need to know how it works.
The alternative is you build your own framework, but your new hires will prefer k8s.
Realistically though, use GKE and forget about it until you grow. Otherwise you're using something closed or something bespoke, the latter being fine for a 1-2 man team, but kinda pointless when you can have a managed service.
These are far from the only two options available.
"Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
> “Nobody has ever built cloud-native apps without a platform. They either build one themselves or they use Cloud Foundry.”
> it's fun and sometimes enlightening to let the debate be framed this way
Not really. It's fairly close-minded.
The first line in the first chapter is more like an "ok, full disclosure, this is advertising material" but if you read the full text, the Pivotal suite of tools is hardly mentioned. It merely paints a picture you can understand to demonstrate that some kind of platform is definitely needed for basically any enterprise of nontrivial size. (Now if you want to read the real promotional material, get yourself a copy of "Cloud Foundry the Definitive Guide")
This book is only advertising in the sense that if its message is delivered successfully, you will concede that you should use a platform, and we will both agree on the meaning of the word platform. That's it. It's only a bit of tongue-in-cheek first since Pivotal actually makes such a platform, that it says on Page 1, "our platform is the best and only platform" – the rest of the book isn't at all like that.
But if you take it in context for the date of publication, it might make more sense that it was framed this way. Obviously it does not pre-date Heroku. It is a different platform than Heroku. But it is a platform, and a cloud-native one. It's an example of "why you might not need Kubernetes." Also might help to note that this book was published in 2016.
It might be (read: definitely) harder to assert in 2019 that there really isn't any other platform worth considering (than CloudFoundry in 2016), but for a medium-large enterprise in 2016 I'm honestly not so sure. Kubernetes was still new on the scene. People in 2016 in large part still needed to be convinced that such a platform was needed, or even possible, since few existed in the wild/open source commons. (Name your favorite platform that is older than 2016, if you still are in disagreement with the thesis. There are sure several right answers, and I seriously don't doubt this at all.)
To take the contrary position to the extreme a bit, present company including the original poster I think nobody serious (I hope) is suggesting "you may not need Kubernetes, and there is also no serious contender which you should consider in the race for a thing like K8S that you may need for your enterprise, either." Even the original post recommends something instead (Nomad). There is a decent chance you are already sold on this idea, if you're reading this article.
The central argument isn't that you should definitely pick Nomad, it's that you really do need a platform that works for you, even if it only gets you 80% of the way there. The author of "Maybe You Don't Need Kubernetes" started out explaining why Kubernetes just wouldn't work for them. The book, just like this article, comes off a bit like "hey, why not this product instead of whatever it is you're doing there."
But just a little. I strongly agree myself – "In 2019, pick something, anything. (Even Nomad) Just don't do nothing."
However, I've been dealing with its ins and outs for a while now, partly because I'm developing tooling around it, and I've found there are already a lot of idiosyncrasies in there. Some APIs have very frustrating flaws, and most annoying of all is how tightly some core functionality is coupled to kubectl. A remarkable amount of logic is embedded in the CLI, as opposed to the API layer, and if you really get in the weeds you're likely to start pulling your hair.
Which is to say, even once you get through the learning curve, you may still find yourself struggling with it as you use it. Sometimes when you declare your intent, you find that it just doesn't work, and it can take a while to figure out why. Before you know it, you're using various operators that are meant to mask some of the deficiencies in the core orchestrator, and by extension you're outside of the core abstractions. And don't get me started on Istio... (but that's a tangent).
Anyhow. If you're not trying to do anything unusual, it's a good orchestrator. Not perfect, and for sure heavy, but a good choice for a lot of use cases.
I’ve actually started to wrap calls to kubectl to not have to replicate code and logic taking place in the CLI.
And istio... I can’t even... envoy is awesome though!
The people saying it's simple are looking up from the world of IaaS and non-declarative orchestration software (Ansible, Puppet etc) and saying "hey this is much easier".
The people saying it's complicated are looking down from the world of PaaS and FaaS (Heroku, AWS Lambda, Google App Engine) and saying "why they hell would I want to manage this?"
1. K8s of 2019 is significantly better than K8s of 2017. With stable Deployments, StatefulSets and CronJobs and binaryData in ConfigMaps etc., I haven't missed a single feature in my latest setup: I just typed away helm'd manifests and things just worked. This is in stark contrast to the dance around missing features with initContainer and crond container kludges you had to do in 2017, when I built my first kubernetes setup.
2. People tend to conflate maintaining a k8s cluster to using it. Setting up your own k8s cluster with HA masters, is likely still a royal pain (disclaimer: my attempt at that are from 2017 so could be wrong, see point 1). But for a small company, whipping up a cluster from GKE is a breeze, and for big corps, company-wide cluster(s) setup by an ops team (OpenShift has been ok) is the way to go. The end result is that as a developer you just push containers and apply manifests.
False dilemma. Ansible and Puppet are great tools for configuring kubernetes, kubernetes worker nodes, and building container images.
Kubernetes does not solve for host OS maintenance; though there are a number of host OS projects which remove most of what they consider to be unnecessary services, there's still need to upgrade kubernetes nodes and move pods out of the way first (which can be done with e.g. Puppet or Ansible).
As well, it may not be appropriate for monitoring to depend upon kubernetes; there again you have nodes to manage with an SCM tool.
Last time I sat down with k8s docs it was a broken mess.
It was impossible for me to follow any of the examples I tried through.
However, this is pretty much my experience using GKE on Google Cloud. You can get a cluster up and running and start deploying stuff to it in a matter of minutes.
I like Kubernetes but it is not without problems. And when your entire business is now leaning on Kubernetes, it can really hurt when it has problems and you can’t figure it out quickly.
And if your expertise is not running infrastructure, a lot of people, as this post explains, are better off on AWS or Google Cloud using managed services.
Or you might have a problem with the underlying storage platform that Kubernetes uses, so you now have to be a storage expert.
I think it’s telling how many job offers I receive from companies already running Kubernetes and need help with it.
A good example is label selectors with Services, Pods, and Deployments.
You create a deployment, which is basically a scaling group for pods, and pods are the unit of deployment running containers.
A service exposes your pods ports to other components through a cluster local virtual IP, or a load balancer.
You cannot simply say "this service is for this app". You must instead say, this service will map it's service port, to the following pod port, for all pods that match the following label selector.
There are 5 different concepts (pods, deployments, services, labels and selectors) you need to learn before you can actually make your app accessible. But what's nice is that these concepts are used elsewhere in Kubernetes.
Labels and selectors are used to query over any group of objects, and allow grouping different things in arbitrary ways.
Pods are concepts that get reused by anything that deploys a container. There's Deployments, Statefulsets, Daemonsets, Jobs, CronJobs, these all create pods. So it's nice that pods are their own decoupled concept, but now you gotta understand how they're used, and relate to the other Resource types.
Part of what's great about this all is also that now that people are building custom operators/controllers, you can understand how to use them fairly easily because they build on all these existing concepts.
Disclosure: I work for Pivotal, we have a hand in a few such things.
Good news is, it only took me a week to really pick up k8s.
But before it existed I could type `cf push` and get a running app with automatic routing, logging and service injection.
Whereas nowadays I have to type `cf push` and get a running app with automatic routing, logging and service injection.
The YAML files are quite intimidating at first, if you look at examples out there.
I'm probably not a great teacher, but I think I could boil it down to a slide deck of no more than 25, and an hour's worth of time, and clearly teach how to interface with Kubernetes for a basic web app w/ horizontal scaling, Let's Encrypt SSL certificates, etc.
I don't use Docker, or Kubernetes. What am I missing here? This is an honest question.
I don't particularly see the point of docker in production. I guess it can help resolve clashes between dependencies, but I'm doing ok with Ansible.
One guy in another team just tells his application developers to throw everything into a docker container and then deploys whatever they give him. I guess that prevents dependency clashes, sure, but to me that seems like it's just inviting different problems though. (Containers with a lot of disorganized junk in them).
And we don't need to grow the cluster for now, so I don't see the point of k8. We already have log aggregation, and service restarting, service monitring/metrics.
To me, Kubernetes feels like a brand new alternative OS with crappy documentation.
If somebody could tell me what I'm missing by using this approach, I'd be grateful.
Let's say I want to change a services version from 1.2 to 1.3.
If I did this in Ansible, I might do it by commanding all of my services to pull the code of version 1.3 from a git repo and restarting to make the change.
If I did this with containers, I'd tell them to pull the image version 1.3 from docker registry and after it's pulled to shut down v.1.2 and start v.1.3.
Those things just have some smaller differences that are good in terms of A and worse in terms of B.
Ansible helps you by keeping it closer to the traditional way of just pushing code changes to a machine and that's that, and Docker opens up this world where the infrastructure can scale up or down (elasticity) quicker horizontally.
Kubernetes is just an extrapolation of that elasticity.
Ansible is "good enough" if you don't need elasticity or you have programmed another custom solution to handle that.
Docker/Kube and Ansible both have Git at their hearts, so if your app components have all they need as code, and you don't worry about elasticity, you won't gain a lot from wrapping it up in a Dockerfile (as opposed to just git cloning).
But my infrastructure is fairly complicated, some bare metal services, many VMs, several docker hosts with dozens of various app containers, private Openstack and some public cloud services (VPNs), Ceph cluster, nagios, ELK etc.
Ansible does not watch and maintain the infrastructure and services state - it is a passive tool (I am not sure if Tower is different). You definitely could configure your monitoring to invoke Ansible script if your app server goes down and reschedule services that ran on that host to a different one and reconfigure other parts of infrastructure as needed (dependencies, VXLANs, security rules, service discovery db update, load balancing update etc.) But you would essentially copy the Kubernetes controller functionality.
Kubernetes can take care of the service discovery, load balancing, deployment, configuration and networking and other parts of your infrastructure and it does it pretty well in my experience. It maintains the declared state and reacts to its changes.
Somebody posted here that it takes just several hours to learn it. YMMV but deploying a resilient cluster in my environment took me much more time. And I have to agree that documentation is pretty weak. E.g. provisioning Cinder volumes and connecting them to the VMs running Kubernetes nodes was a real horror.
Btw Ansible generates many of my k8s YAMLs and deploys them.
Yeah, but what I've done is that i've deployed a tiny go binary on each cloud instance that is run every few minutes. Instances take turns checking on each other using a round robin sort of approach (no complicated leader election algorithms etc...)
The script knows how to check the health of the other instances/services and to restart them or alarm if they get stuck.
For fifty hosts, it works fairly well.
I didn't say in the original post, but we are not a product group, so our stuff is 'semi production'. We don't have customers to worry about.
>Somebody posted here that it takes just several hours to learn it.
Sure I can probably learn it quickly, but what I don't want is to now have complicated and mysterious kubernetes problems to solve on a deadline.
I understand linux pretty well, been using it for 20+years, so I'm not intimidated by OS level troubleshooting. Sure without containers, you have to be more careful to keep your dependencies from overlapping, but it hasn't been a problem we couldn't handle till now.
I don't particularly want to trade problems I'm familiar with solving for a whole new set of unfamiliar problems unless there is a clear benefit.
but i think thats pretty much it if you're a smallish team with an already well implemented IaC stack.
Though i'd definitely encourage anyone to try the GCP Kubernetes before trying to self host it...
The former gets you a taste for why its getting such good publicity.
the later explains why its still controversial.
Of course a few years ago, being young and a one man ops team, I wanted to use the cool new thing and so did everyone else. I went with Ansible after being a Chef guy for a while due to ease of getting started. That was the easy part. Love Ansible, but it was slow to get new Docker functionality for a while, and not having the built in functionality of having Chef runs happening periodically without paying for Tower means I have to be more mindful and deliberate with keeping my infra in sync and up to date.
Enter Docker and months and months of headaches just to get something usable on the local workstation level. So much wasted time dealing with breaking changes and figuring out exactly which version to use and very rarely upgrading. Debugging and troubleshooting becomes an unintuitive nightmare. Even after all that, our site still ended up running really slow locally in Docker. This is because with apps (such as CMS) that handle many, many files, I/O slows wayyy down. Ended up discovering Dinghy (shout out to codekitchen) and got it to a decent state (but make sure developers don't accidentally install Docker for Mac as it is and always has been a CPU gobbling mess).
Then on top of that there is the container orchestration (consul-template), monitoring (Prometheus is cool, but takes a bit to understand what is needed to get the metrics you want), logging (fluentd is again, cool, but oh man is parsing logs a PITA to understand), debugging tools etc.
I can't even imagine what adding k8 on top of the many quirks of Docker would mean. I luckily took one look at needing a whole zookeeper cluster just to get started, and immediately gave up on that.
Graylog, with applications using various GELF libraries to send logs to it.
> And how do you deploy new versions with zero downtime?
I don't deploy with zero downtime, but within say ~5min. This is acceptable to us. We use jenkins with a lot of tests to ensure components are in good shape, then we mostly manually deploy them, but with a script. Sometimes we deploy directly from jenkins.
> Kernel os upgrades
All our stuff is internally hosted, so we handle those infrequently. This stuff isn't exposed the the open internet, so we upgrade when we get around to it, or when we encounter a bug.
Yeah, that's not a big deal for us. We don't have customers, and this environment is for internal use at the company.
You could say it's a 'semi production' environment.
- Deploy to one host
- Wait for health checks to pass
- Deploy to second host.
If you can, you should probably deploy to Heroku. (Or a similar service.) It's far cheaper than spending time on devops. Just run "git push" and you're running.
When I've deployed on ECS (or other "simple" orchestrators), I've found that I ultimately wound up re-inventing lots of Kubernetes features.
So if you can't use Heroku, but you do know basic Docker, then it's worth considering somebody's fully-managed Kubernetes. Google's is nice. Amazon's is a considerably more work. I hear Microsoft's is still a bit sketchy. And I'd love to take a look at Digital Ocean's. But do not attempt to host your own Kubernetes if you can possibly avoid it.
If you do try Kubernetes, then read a book like Kubernetes: Up & Running first. Kubernetes is not self-explanatory, but it's pretty straightforward if you're willing to spend a few days reading.
Finally, don't overcomplicate it. Just use the basic stuff for as long as you can before trying to layer all sorts of other tools over it.
All websites I maintain/deploy are either built as a Docker image and published by CI on git check-in or deployed locally with a single rsync/supervisord bash script (or right-click Web Deploy for some older IIS/ASP.NET Apps). I probably have over 50 sites I'm currently hosting so using anything that much more expensive wont enter into consideration.
But I'm not seeing what could justify the extra cost? especially as the cost is reoccurring, if it's some kind of effortless/magical scalability I'd rather put that additional cost towards more hardware and buy more headroom.
We use Heroku heavily, and we went from 2 full time devops engineers to 0. Everything is now buttons and sliders. There are no "security patches". The CEO could log in and scale if he needed to; it's a slider. We get monitoring for free (labour free, not cost, i.e. the "good" free). Memory usage, CPU usage, HTTP status codes, logging: it's all there. We spend no time thinking about rsync or devops or any of that: we just solve the problems we're good at.
Of course, everything is a matter of scale. Legend has it, Deliveroo UK only moved off Heroku after they grew so large, Heroku wasn't willing to offer them more dynos on their account. That sounds like a reasonable time to go in house. But any <100 people company.. why bother? focus on what you're good at, and let other people do devops.
It's all well and good at the "we need Kubernetes" scale to say you need specialists, but a team that can't manage a VPS is strange to me.
I don't know what Heroku is offering, but Lightsail and ECS instances also have metrics and pretty graphs (tho admittingly I rarely check them myself), maybe it will save me some ssh sessions to manually update security patches, I was recently able to upgrade my Lightsail instance to the latest Ubuntu 18.04 LTS with just:
Because it's 5x more expensive.
> focus on what you're good at, and let other people do devops.
But it already takes hardly any time/effort to keep doing what I'm already doing.
I guess it's for different Companies who see the value-added benefits that justify the cost, but it's being propositioned here that everyone should be using Heroku first, just boggles my mind why most people would do that as the first option when it's so much more expensive. I already think the cloud is too expensive, so there's little chance I'm going to be paying a re-occurring premium for something that's not going to save me any time over what I'm already doing.
But where is that actually the case and your app can run comfortably on Heroku?
At multiple jobs I've been the only person who could credibly claim to understand the entire stack used at the company, from the web frontend to the OS the backend database runs on and the person to whom teams would come to validate their designs for scaling and reliability. I didn't write product code in those roles. But I multiplied the effectiveness of the people who did.
Heroku is a wonderful tool that doesn't get you the actually hard parts of the job req.
How do these costs scale with 10x or 100x the traffic/load?
1) Negotiate with Heroku for an enterprise contract
2) Consider migrating to a more cost effective platform
3) Dedicate time home-growing a solution.
Part of the reason Heroku charges so much is because their customers are typically small, but they'd likely rather find a price that keeps you on their platform vs. home-growing a solution.
That is what snapshots are for.
Moreover, if you're having someone else manage your systems and they upgrade them, now what do you do when the new version causes problems?
I ran into an issue recently where newer systems default to a newer version of a protocol but the implementation of the newer protocol has a major bug the old one didn't. When that happens on your systems you roll back until you can solve the issue. When it happens on systems managed by someone else, better hope you can identify and solve the issue quickly because in the meantime your system is broken.
So, what's your plan in case Heroku shuts down, or gets bought and changed completely? Isn't that also a strategic liability, just a much larger and arguably less likely one?
If you have 12-factor apps then you have a fighting chance of moving off it anyway.
Apart from Dokku, I'd say Cloud Foundry is the closest next environment that you can install and operate directly, though it's an 800-pound gorilla by design. But there are fully hosted services for it (eg. Pivotal Web Services, IBM BlueMix, SwissCom Application Cloud). There're also semi-hosted options (Rackspace Managed Cloud Foundry) and IaaS-provided installer kits for AWS, Azure and I think GCP as well. You can also buy commercial distributions from Pivotal, IBM, Atos, SUSE, IBM and SAP.
Disclosure: I work for Pivotal, we sell Cloud Foundry and Kubernetes distributions (PAS and PKS).
It really doesn’t take a devops engineer to run these if your app still fits on Heroku. A little bit of overhead goes into learning the service, much like you’d learn any new API or programming library.
The pain of setting it up, applying security patches, making sure you set it up securely to begin with. The pain of having a mental model more complex than "the server is what I git push to."
> I probably have over 50 sites I'm currently hosting so using anything that much more expensive wont enter into consideration.
No one would argue someone in your position should use heroku. It's for people who are willing to pay to avoid sysadmin work... which is a lot of developers.
My biggest fear with managed hosting and managed databases is being given too short of a window before they update.
tl;dr: You get a few choices of Ubuntu LTS releases, which they maintain for a long time (currently they still support 14.04, now nearly 5 years old). Or you can push Docker images, at which point the underlying OS is squarely back in your court — technically they must be applying kernel patches, but Linus is fairly religious about not breaking userland.
Btw watchtower seems abandoned, anyone know is there a story to it?
Two: here I'll teach you, take over DevOps so I have more time.
10 person team, everyone new and your application runs in the cloud. The person who took over, by now, left the company. Extended the infrastructure, didn't inform you and the application has been reworked so it runs on the cloud.
Welcome to the perils of working with people.
And how do you update the OS and the kernel without downtime? Do you setup a new machine, deploy to it, switch the traffic to the new machine, and decommission the old one?
I'm asking because these are the kind of things Heroku and other PaaS do for you.
I instead got a Linode VPS at $5/month, which gives me more than the $25 Heroku plan? Setting up a VPS is not very hard either – although this may depend a bit on your environment, my app compiles to a static binary – and a lot more flexible.
My devops stack thus far consists of scp and tmux.
Don’t come to HN for a representative take of how people in US businesses evaluate vendors and their pricing.
Do you know how much it cost my employer for me to spend an hour reading about Kubernetes?
Back when I worked for a hosting company specializing in RoR we had more than a few customers migrate to us because Heroku costs were getting out of hand (and we weren't all that cheap either!)
In my specific case I'm prototyping what could perhaps be a startup, and with a few small cheap Linode VPS's I can get a lot of bang for my buck.
> Do you know how much it cost my employer for me to spend an hour reading about Kubernetes?
A lot, which is why you shouldn't use it. You can run a VPS without k8s.
I completely agree but I’m not sure “cost-sensitive” is an adequate term because it’s like a cost on hosting (or an app you’ll use for years) triggers extremely high awareness by people who are extremely blasé about hemorrhaging staff time on support and slipped deadlines.
Heroku is painless enough for unprofitable hobby projects, but far too expensive. Using AWS, GCP or Azure directly is relatively affordable, but requires a lot of extra work.
I'm aware of various tools that are supposed to make working with the various cloud platforms much easier, but every time I see one of these used in practice (eg at work), people seem to spend an enormous amount of time getting things working properly. It still feels like we're missing a sweet spot for hobbyists who don't want to invest their spare time learning about and wrangling with devops, just to get a simple project up and running.
I’ve used it at work for sunset of our projects and it’s been pretty good once you learn a few of the gotchas. Once you’re set up it’s pretty painless. There are limitations for sure, but it’s been handy for a few situations where we didn’t want to focus on deployments and keep them as simple as possible.
Even on AWS, I can deploy with one short command, and it's much cheaper than Heroku.
"A small $1M expense budget" may be "small" for certain companies, but it's more than the yearly revenue for a lot of businesses, never mind people who are trying to start a business and don't have any revenue at all (yet).
Of course you need to be reasonable and not penny-pinch or "spend money to save money", but in general I would say that frugality is a virtue.
I’m ignorant as to whether large projects have run using it, but for smaller ones it’s useful.
(not sarcasm, if this works the same and adds more value I'd use it)
For use cases where support plans and SLAs are not an issue (hobby), it's a great option.
CapRover also features one click SSL generation via Let's Encrypt, similar to what is offered on Heroku.
I've done things like go to the Wikipedia page for kubernetes. It hasn't helped me figure out what such things do.
Anyone care to point me to some kind of 101 explanation to help me follow the conversation?
Web applications (web apps) are essentially websites but with some additional functionality. Web apps can work the same as websites by having urls for different pages but are distinct in that they can do things that a website hosted on Squarespace can't do. This is because to build a web app requires some coding up front which is both an advantage and disadvantage of webapps in comparison to a site hosted on Squarespace.
Heroku is a platform as a service (PaaS) offering that lets developers utilize version control tools to update their webapps. This means the level of complexity for deploying a new version of webapp is incredibly simple.
Imagine that you collaborate in an office that publishes technical documentation and various technical writers can work on multiple parts of the same document. Each new version of the document gets published to a PDF that users can access.
This is essentially what Heroku provides along with abstracting away some of the difficulties of getting a web app hosted such as security, setting it up so your site uses https, and some basic monitoring and logging.
Sometimes web apps look and feel like a singular thing but are actually multiple pieces working together. In this case, you may want to isolate these different pieces.
Doing so can be really difficult, and to the best of my knowledge, Heroku isn't necessarily designed with this level of orchestration in mind. This is where Kubernetes comes in.
I've never used Kubernetes, so I might get a few things wrong here but from what I understand, Kubernetes gives developers/devops people a lot more fine grained control of how the various pieces of a webapp or webapps get deployed and managed. Kubernetes let's you take advantage of containers which are sort of like micro operating systems but only with the dependencies you need installed for a service to run. This means you can write in a file "I want x replicas of this part of my app and y replicas of this other part of my app to be ran across z number of workers".
What this abstracts is how an webapp should be ran without manually setting up each piece your self.
Here is a video with a bit more info on Kubernetes: https://youtu.be/PH-2FfFD2PU
Ironically, he's right. But, as someone pointed out: Squarespace is probably not relevant in a conversation about Kubernetes, while Heroku is.
What Kubernetes is, is a long story. Suffice to say: if you don't know what it is, count your blessings. :)
Is that accurate enough?
Kubernetes is a tool large companies use to manage a large number of servers. Google invented it and open-sourced it so now it's free for anyone to use. It's not something people would use for a single website unless it was a huge website that required lots of servers, like FoxNews.com or something.
You can get surprisingly far on a single box. PlentyOfFish, Mailinator, and Hacker News are all services that got to millions of users with one server. StackOverflow and Google are ones that got to millions of users with a handful of servers, mostly for redundancy.
When you have big teams that are all concurrently making changes and writing code to a live site that's mission critical, then things get complicated. But when there's only one dev and your few hundred thousand users don't mind too much if it goes down? You can just spend some one-time effort installing a stock Postgres install, scp over a single binary or tarball, and run it in the background with nohup or screen. When it's time to re-deploy, upload another version, kill it, and restart it.
I’ve gone to a bunch of hackathons where some of the participants got derailed on that kind of stuff and never even got to the part of the project they cared about. My point was that it can be worth a modest amount of money not have that overhead on a small project.
* Single server on Dokku, with multiple projects
* Switch DB to DBaaS on your hosting provider's cloud
* Multiple Dokku instances behind load balancer/proxy as a service
You can then scale vertically a LOT before you need K8s style infrastructure.
Servers that can cost $50k each. Not a good example for affordability.
In any case, it doesn't need to be as completely flushed out as a company with millions in investment capital could do.
You have to factor in that you get free tiers of many things, all managed together like docker would help with, such as:
30 MB Redis memory cache compute instance for free
500mb Mongo database compute instance for free
External logging compute instance for free
and a whole marketplace of all these managed services, with the grouping of containers further managed by heroku.
A Linode VPS at $5/month does not give you all that. If you like configuring all the above (and of course, assuming your use case calls for it at all), then the $5 plan with 1 GB RAM would let you put in a bunch of 256MB containers if you really wanted. The $10 plan with 2 GB RAM would let you put in comparable 512MB containers, but then you should have just been paying $7/month for Heroku already.
Hope that helps!
For a long time I struggled with inability to understand my potential costs for server-side projects, and occasionally read horror stories about other devs who got it wrong.
EKS is a joke. The only people that use it are those either experimenting, or those that are forced to.
AKS is pretty okay. They're definitely way ahead of EKS. They lack a few things, but they've made some pretty good strides in the last year. Kinda suffers from being part of Azure which likely reflects my personal bias.
How did you manage secrets on Google App Engine?
Does it mean your secrets are stored in plain text in the container image?
> In App Engine standard I've used a deploy wrapper around ansible vault to do it.
What does the deploy wrapper do? Does it produces an app.yaml file with the secrets injected in it, after having been decrypted by Ansible Vault?
A: No. It means the secrets are stored as environment variables in the container.
Q: What does the deploy wrapper do?
A: It prompts the developer to input the ansible vault password , decrypts the vault and injects the secrets into the environment.
Generally speaking, I follow the 12-factor approach:
You mean a section in your app.yaml like this one:
DB_PASSWORD: "this is a secret"
I haven't heard a convincing argument why vendor lock in is a problem regarding the cloud.
It can be a problem, but all solutions result in you avoiding the things you went there for in the first place.
There is exceptions to this obviously but I find most people worried about vendor lock in are no where near big enough to bother running multi cloud.
It's important to note that AppScale is an aPaaS, and API-platform-as-a-service, where you're guaranteed a consistent API with the ability to plug in different implementations. Something even beyond your typical openness in F/OSS software, too.
Could you elaborate a little on this part? Our team is looking at ECS/Fargate as a possible container solution and I am curious what you felt was missing from it.
> Could you elaborate a little on this part? Our team is looking at ECS/Fargate as a possible container solution and I am curious what you felt was missing from it.
Some typical examples:
- Kubernetes allows you to run a monitoring container on every single node using something called a "DaemonSet". On ECS, you'll have to build all your monitoring tools into your base image, or use cloud-init to spawn an ECS task on each machine.
- You're probably going to end up writing a bunch of scripts to generate ECS task definition JSON and to update running services, and you'll need to integrate this into your CI system somehow. With Kubernetes, you can get away with "kubectl apply -f" for a fairly long time.
- Kubernetes makes it relatively easy to allocate and manage persistent disk volumes. I wouldn't necessarily use them for a production database, but they're great for smaller things.
- Kubernetes has autoscaling support, plus the ability to control which containers run on which types of servers.
- Kubernetes has basic secret management built-in. It's nothing as nice as Vault, but it's good enough to get started.
- Kubernetes has support for a whole bunch of useful minor things that would typically wind up as Terraform scripts on AWS.
And so on. None of these is very hard individually, but there's a ton of things like this. So we're slowly migrating pieces of ECS infrastructure over to Kubernetes so that we can stop reinventing so many wheels.
Again, for those things which can run on Heroku, either choice is overkill. And I have to admit that ECS is very reliable at the things it does. If you do decide to look at Kubernetes, I highly recommend skimming the O'Reilly books, which provide a solid overview of how it all fits together.
- ECS runs on ASG / EC2 so there is auto scalling
I think you don't understand all the glue between AWS services, off course ECS doesn't have everything that's why there is EKS, but all the thing above exists on ECS.
I've looked at various guides for setting up cluster autoscaling on ECS, and so far, everything I've found looks far more complicated than Kubernetes cluster autoscaling on Google. Is there a nice guide?
But I'm using ECS and Fargate right now (orchestrated with Terraform). https://rivethealth.com
Probably the biggest thing I would like is native secrets management. "Security groups" are limited as it depends on having an AWS interface per container.
ANd I don't know how much longer our time on Fargate will last. It's expensive, but more significantly there are limitations in not having access to the host. Having to install an SSH server in every docker container to be able to debug (simple things, like top) is annoying.
Building an AMI on Amazon is not difficult and if you use it you don't have to reinvent so much architecture on top of AWS (or cloud provider of your choice)
I've done that, and I've done Kubernetes, and Kubernetes is definitely easier once you get past the initial setup. The initial setup is also getting easier over time.
It is also more portable. Kubernetes runs on multiple cloud providers as well as your own hardware and presents the same interface and runs the same containers. Docker containers are more portable than AMIs.
If you plan to write your own system to control deployments, secrets, load balancing, and DNS based on AMIs and other AWS features, you may want to consider that you are reinventing the wheel. You are also locking yourself into AWS to a far greater extent than you would if you used Kubernetes.
Expertise is also a big differentiator. You can hire people who know Kubernetes on day one, but you cannot hire people who already know your custom in-house system.
The reason everybody's so hot on microservices to begin with is that we kept coupling everything all together and it became a huge mess to manage. You don't want to repeat that mistake, only at cloud scale.
It's super dirty, but I can deploy dozens of services without issue. Monitoring is another problem, but I don't need Prometheus for this stuff.
(I couldn't use Heroku cost effectively because running graph dbs and some custom stuff, but Heroku is AWESOME)
We use Google's Kubernetes and are pretty happy with it, but overall this statement bothers me.
It's absolutely true. Don't get me wrong. The problem is what it says about Kubernetes as software. It tells me it's ugly and crufty and in some ways immature. Good software should not require that much babysitting.
As the fine article points out.
You can launch a DB through EB but the docs basically state it's a bad idea (as it gets taken down when you delete your app, which you might find yourself doing if EB gets in to an unrecoverable state).
So now you have to manage EB + RDS separately, which should be automated so now you need CloudFormation to add the security groups and manage the vars for the DB connection properties in your EB.
You can't really compare DO, Heroku with k8s and AWS, the laters are much much more powerful.
This is I think always the best advice for almost anything.
And we all know it. And yet so often, we ... can ... not ... resist.
From the "The Nomad ecosystem of loosely coupled components" section:
> It integrates very well with other - completely optional - products like Consul (a key-value store) or Vault (for secrets handling).
> At trivago, we tag all services, which expose metrics, with trv-metrics. This way, Prometheus finds the services via Consul and periodically scrapes the /metrics endpoint for new data.
> The same can be done for logs by integrating Loki for example.
> Trigger a Jenkins job using a webhook and Consul watches to redeploy your Nomad job on service config changes.
> Use Ceph to add a distributed file system to Nomad.
> Use fabio for load balancing.
And the icing on the cake:
> All of this allowed us to grow our infrastructure organically without too much up-front commitment.
So if I understand correctly, the author (and his team) preferred to do all the work of integrating/testing/debugging those components, rather than using a tool that provides every single on of those features, and more, out of the box.
Kubernetes isn't a trivial lift but it's a damn sight easier than trying to roll a cheap imitation yourself.
Nomad is not a cheap imitation of Kubernetes, it is a simple orchestrator which favors composability over an all-in-one approach.
Nomad isn't a cheap imitation of Kubernetes but all these components taped together are. Kubernetes is no less composable than a system built on Nomad, it just includes more functionality from day one.
If you're running containers with a orchestrator like Nomad, at some point, you'll need DNS. So you do it yourself and then you're managing Nomad and DNS. Then (we'll assume, because you're using a container orchestrator to manage multiple services) you'll need service discovery, so you write some jobs and event handlers to glue together Nomad and your DNS solution. And then, because you're a team of responsible people, you'll want to store secrets securely, so you graft in Vault. And then, you realize zero-downtime config changes would be great, so you slap Consul in there and write some sidecars or library code to handle config updates. Then metrics. Then logs. Then rolling deployments. And it continues, indefinitely, as you add features.
If you started doing this five years ago, fine. If you start doing this today, you're out of your mind. You're just doing work for the sake of "but it's composable". There's a reason why teams still build applications on frameworks like Rails and Django even though they don't need half the features--it's more important to them to get something functional up and running than it is to satisfy delicate sensibilities about only using what's needed. Kubernetes is the analogue in the world of systems and operations.
Using industry standard components like consul and vault is not really a second thought for k8s in production, leaving you with duplicated hunks of infrastructure to step around, where the idea of tacking on kube-dns and kubernetes ~secrets~ onto something else is rather laughable. This, again, is the point which was being made -- you're forced to bear the brunt of kubernetes' NIH.
I'll assume by the contrived situations of inventing some wacky custom mousetrap to bind nomad to dns rather than using the "slapped in" (https://www.nomadproject.io/docs/configuration/consul.html) consul integration for dns, or writing config update/logs/rolling deployment code rather than using the core nomad scheduler features to do that, that you don't actually know, and this is FUD?
An example is I setup a log pipeline recently spanning multiple data centers with full mTLS. It wasn't that hard and thanks to Vault all the certs are refreshed at regular intervals all over the place. Pretty great!
Single binary deployments and upgrades are helpful.
And the reason nomad's single binary is so small is because it doesn't do nearly as much. I'd rather have a platform that I can do things later that I don't know I need now.
do you know who can rollback and leaves your infra in a consistent state? can you guess?
what does terraform bring to the table? I have to use HCL to describe my infrastructure in terms that are NOT cloud agnostic (therefore introducing another layer) and in the face of adversity it throws its hands in the air and now you’ve got to figure out what went wrong, manually, by yourself. This is what I call True Devops (TM).
I have seen Terraform crap out and it cannot recover. It cannot move forward, it cannot rollback, it cannot destroy. It’s stuck. At that point you start praying that someone really understands the underlying cloud + knows the shenanigans terraform plays to fix it now and also make terraform happy moving forward.
we’re talking basic stuff here.
i don’t want to go into more advanced issues like: losing network connectivity, terraform process crashing (think oom conditions) or being killed or non-responsive cloud apis.
not to mention that destroying infrastructure you’ve created almost never works (unless it’s trivial infrastructure).
based on what I’ve seen up until now I would not use terraform in a production environment.
On the other hand, CloudFormation is not perfect either. The rollback does not work 100% of the time and I've had it roll back a set of templates that took 45 minutes to deploy because there was some inconsequential timeout that could have been ignored. I've also had pre-built templates developed by AWS outright fail, which is strange considering AWS themselves built it.
Use what works best for you and your team.
i have never experienced unpredictable behavior from CloudFormation, but it’s possible YMMV.
If I want to use Vault and Nomad, should they share the same Consul cluster?
Should I deploy my Vault and Consul servers via Nomad?
The guidance on server/cluster size is hard to use. They only have 'small' and 'large' with no reference for what constitutes those sizes.
I have more questions as I'm going through this right now, but those are just a few off the top of my head.
Don’t deploy your Consul servers with Nomad as that will create cyclic dependency if I’m not mistaken. The same should be true for Vault.
If you just need to keep your jobs running, load balance to those apps, use externalized config and need a simple UI view system status, then Nomad/Consul/Fabio really works great.
On the orchestration side I use Puppet Pipelines (formally Distelli) which works with standard VMs, SmartOS containers, Docker, Linux images, from packaging through testing and deployment. And they just lowered pricing (!??!)
There's no reason to be locked into Kubernetes unless you're sure you need it. SmartOS scales in seconds. As CTO I'm always checking out the next platform, but all the newer solutions look much more complicated.
EDIT: Heroku has weird errors when you push CPU or RAM, also the CPUs aren't so fast, Linode can't be trusted, Digital Ocean is OK but is still very manual roll-your-own, AWS is a behemoth and medium fast, Google is fine but specialized and hard to migrate away from, OpenStack isn't for small orgs... I've run production services on them all. It's easy to try a new service with a Pipelines-style orchestration because it doesn't care which service its talking to. And makes it obvious when platform-specific instructions or concessions are required.
While I'm sure Samsung will keep Joyent/SmartOS kicking for now, I'm uneasy being dependent on them. We've since started a migration over to FreeBSD for our infra hosts which do all the same things (mostly running linux and bsd guests, zfs tricks and so on) and the experience has been largely positive. No numbers to back it up, but I've observed performance is a bit better and having more options than ipfilter in the base system is very welcome.
YMMV of course, but you may want to consider an extra basket for all those eggs!
Why can't Linode be trusted?
Any advice for getting started with it on DigitalOcean? Or is nested out of the story?
I haven't used Digital Ocean's kube offering. Otherwise it's much like Linode or other services from that era -- install your own distro and binaries, which needs some kind of orchestration automation to scale.
On a SmartOS container, Rails and Postgres can be installed from the package manager and automatically configured as a service. Running a Docker image on Joyent Triton cloud is just a one-line CLI command.
SmartOS is built for the cloud, and is usually semi-managed by the provider. If it's worth your effort to run a separate SmartOS cluster, you'd know. But there is also Project FIFO for that:
Apparently Clozure does something 'ungodly' to the signal stack which just throws the kernel and ends up crashing out. Chances are, these days anyway, most people aren't running common lisp, so it probably isn't much of a problem - plus Clozure runs perfectly fine under Solaris. As a result it's probably not high on anyones list to sort out.
Perhaps you have some unstated assumptions here like a small team managing a relatively large infrastructure for their team size? Or, a small team in a much larger infrastructure department.
If it was me, I'd put lots of disclaimers around a single person bringing any kind of specialist expertise to the average small team. I think you need to lean on that person to level the whole team up. If there is any chance of that individual leaving you risk leaving the team high and dry.
We’re not managing our own K8S cluster. I did that once by hand at another place (before kubeadm), enough to know that managed GKE on GCP is a great thing. Also did my own single-node K8S for dev work. Nanokube and later, minikube later mare that much easier.
I used to do stuff with Capistrano and Chef. Enough to know that once you know the core primitives, it is easier just to use K8S.
I believe when the cluster malfunctions, you're still on your own figuring that out. I had a stuck node on GKE just recently, which broke our CI. The GCE machine was there, but the node wasn't visible with kubeadm and quick SSH onto the VM hadn't discovered any immediately visible obvious issues. Auto-repair was enabled but hadn't worked - and TBH I have no idea how should I've diagnosed why it didn't (if there's even a way).
Thankfully, this was issue with the node, not master, and the CI nodes are all epheremal, so I've quickly dismissed the idea of debugging what went wrong and had just reinitialized the pool. Could've done the same with bare metal.
Lot’s of cool stuff around (dc/os, triton, hashi, k8s, flynn etc)!
After 2 days with k8s and not so much as a bootstrapped cluster the turn went to hashicorp.
Within 10 min I had a container running, registered in service disc (consul) on my laptop.
We ended up using the hashicorp stack in production with great success.
Sure, some custom services was needed (service managment, scaling and such).
Running primarily on prem the complete simpleness and un-opiniated approach is an edge.
It allowed us to implement MACVLAN at first, and the ubuntu FAN-network and integrate totally with existing infrastructure.
Now having spent the last 8 months implementing k8s I’m torn.
I’ve built from scratch, used kubeadm, kops and now EKS.
From 1.13 kubeadm honestly works really well and is my prefered way of deploying k8s.
Still it’s a beast... running large deployments with many teams... there’s just so much... stuff.
One GH issue leading down a rabbit hole of other GH and gists with conflicting or not working configurations.
I’ve had co-workers bail on the project after mere hours of digging through code and GH discussions and SIG docs.
Nomad/consul and it’s concise docs is a breeze in comparision.
Torn. Cause I see the point of k8s, just not sure about the project. :)
Personally, I've found very few problems that it solves and more problems that it creates. That said, its probably a great time to be a Kube consultant!
otoh, if you’re in the cloud, you already have the primitives to build your stuff w/o the k8s headache (vms, autoscalling, managed svcs). but that’s just my opinion. the koolaid is strong.
If you need to (vs want to) rebuild your OS/Environment regularly, you're doing it wrong. IE: PaaS is for you.
That said, I've found replicating OS & Environment extremely challenging on certain operating systems so I understand the appeal. I also don't build products on those operating systems and I've found myself much happier :)
Not only is it much easier to find services (like helm) and articles for k8s, but there are so many ready to go configurations. For most major services, there is already a ready to go helm chart or at least some well vetted yml config on GitHub to start from.
And most any issue I have come across, I have been able to search and find solutions.
The proliferation of operators, CRDs, things like k3s and the growing ecosystem of vendor distributions makes 'stock k8s' an increasingly nebulous target.
Speaking of documentation, I sometimes feel a bit overwhelmed by the amount of literature on Kubernetes these days. Many articles are quite outdated already and new ones get written every day. It's hard to know what the best practices are - and the ecosystem is still rapidly evolving.
So I don't think I was skimming over these parts, I just think that inertia can also become a disadvantage.
And, if you'll indulge me here, I was referring mostly to managed k8s offerings. Because that's what the vast majority of people would use, if they chose to go in on k8s at all and want to get out the door quickly.