Hacker News new | past | comments | ask | show | jobs | submit login
Don't use Kubernetes yet (matt-rickard.com)
306 points by rckrd on June 19, 2022 | hide | past | favorite | 288 comments



The question of whether to use K8s or not is like wondering what kind of saw you should use to cut wood. There's different saws for different purposes. But even with the right saw, you still have to know how to use it correctly. Better to use a hand saw correctly than a table saw incorrectly. (you can use a table saw incorrectly, but best case the work ends up crap, worst case you lose a finger)

After building infrastructure for dozens of teams, I'm quite convinced of the following:

- if your people aren't very skilled, they won't build anything well. most software engineers i've seen professionally working in the cloud are handymen trying to build a wood cabinet.

- if your people can't build well, it doesn't matter what technology they use. choosing between building a cabinet out of metal or cherry wood doesn't make much difference if they've never built a cabinet before.

- if the first two holds: then only use the technology which requires the least skill to use well, and where the amount of maintenance is closest to zero. don't build a wood cabinet from scratch when you can buy flat pack. don't buy flat pack when you can buy an assembled cabinet, get it shipped, and carried into your office.

- if using the aforementioned technology requires 'building' or 'assembling', and that is not core to the customer-facing aspect of your product, then you should not be building, you should be buying. if your business doesn't involve assembling flat pack furniture, don't ask your employees to build their own desks and chairs from Home Depot or Ikea parts. buy the premade desk and chairs, use them to make your actual product.

- a software engineer knows as much about cloud architecture as a fine woodworker knows about framing. "it's all just wood" until the house takes 10x as long to frame and is 10x as expensive and still doesn't meet code.

- people will try to build things they don't fully understand and leave the company before anyone realizes the mess they've made. imagine your retail store is accessible by driving a car over a wooden bridge built by a handyman.


The problem I keep seeing in startups is that a lot of startup tech founders and employees don't see a startup as an embryonic business that will grow over the long term into a money-making venture requiring calculated risks and priorities, but rather as a personal playground to do all the fun things they were not allowed to do at their old BigCorp.

Use cool framework or language X? Use Kubernetes? Microservices? Lambdas? No-SQL? Two fingers to BigCorp, I'm CTO/tech employee #1 at a new startup and I get to do whatever I want. If and when it goes down in flames, I now have these skills on my resume I can leverage into a better opportunity.

Ideally your first tech employee/CTO should be a seasoned hand who has been burned before, who keeps an open mind to new approaches but knows when and how to adopt it. But it's quite rare to have seasoned developers in such startups, they're not attracted by the risk and chaos of these environments as they have better options and startups generally can't afford them.


I think that’s true, but also it’s not clear what tools a startup should use. Even if they stick to a single VM, managing that VM in a way that is reasonably safe and reproducible is somewhat important—the more often you have people manually SSHing onto the VM the more likely you will need to recover functionality from scratch while also making it almost impossible to do so in any timely manner (because the functioning state of the VM is the result of many thousands of manual changes, many of which are undocumented, incompletely documented, or incorrectly documented). So then you can start learning about automating changes on a machine, but now you’re into territory that is about as complex as Kubernetes (or perhaps not “complex” as much as “you have to figure out what combination of tools and practices will work”, while a cloud-provider’s managed Kubernetes makes a lot of these kinds of choices for you out of the box and many others are obvious).

Similarly, you might reasonably try to use a PaaS, and it might work at first, but then you need to do something as common as a background task and find that your PaaS doesn’t support things like that (you can try running it in a process in your PaaS’s container and just accept that the background tasks are likely to be interrupted periodically?).

I don’t think Kubernetes is the answer for a startup, but it’s not obvious what “the answer” is, and much of the discussion ignores the complexity in the alternatives. It’s not helpful to say “Kubernetes is too complex” without articulating a simpler alternative (unless you are hinting at a market opportunity).


I think it's clear to people who specialize in that. I don't know why startups don't hire a DevOps/SRE/Syseng/Sysadmin/Architect contractor right off the bat. Hire a really experienced one for one month and they'll give you several different options, lay out the pros and cons, costs, staffing, time estimates. Take their suggestions and implement the one that fits your business plans.

For example, a single VM you ssh into works fine if you do daily VM snapshots and deploy from snapshot. If somebody breaks it, revert to last working snapshot. Blue/green in multiple environments work with it too. Super cheap, super simple, easy to recover, automated. Very boring.


> I don't know why startups don't hire a DevOps/SRE/Syseng/Sysadmin/Architect contractor right off the bat.

It's entirely likely that they are doing this, but the DevOps/SREs that they're hiring are prescribing the tools that they know and have seen work in the organizations that they're coming from.

I've been an SRE for nearly a decade, and I honestly couldn't tell you in detail how to set up a pet-VM deployment with automation and guidance about who deploys to it, on what cadence, etc or who operates it, etc. In my first DevOps job, we were doing something like this, but it was a hot mess--that startup hired the wrong sysadmin to bootstrap us, and the next guy they hired moved us to AWS ECS (running on Fargate) and that was a dramatically more effective solution for us. The point isn't that ECS > pet VMs, but only that we were able to find a successful path with ECS where we were unable to with pet VMs despite hiring someone who allegedly had that experience. For that organization, ECS (rather than pet VMs) was the Very Boring Solution (mind you, I'm not a particularly big fan of ECS either).


People often confound complexity with having to learn something new.


k8s IS complex, not because it's new, which it isn't anymore, but it's so complex that even the developers have lost oversight.

And not just that, it's expensive. It requires a lot of hardware just for the base, 9 servers minimum.

The more complex something gets the harder it is to get done right.

I would downvote you if I could because what you wrote is just ignorant and anyone who dealt with k8s before can tell that you never dealt with it.


> I would downvote you if I could because what you wrote is just ignorant and anyone who dealt with k8s before can tell that you never dealt with it.

I don't know why you're snarking so hard when there are dozens of managed Kubernetes offerings out there.

> It requires a lot of hardware just for the base, 9 servers minimum.

An HA deployment requires 3 nodes for the control plane, and as many or few workers as necessary. But yes, this is 3 more nodes than you would run without automated orchestration. Of course, the cost of these 3 nodes is amortized over all of the worker nodes, so if you're only running 3 worker nodes you're paying 100% for orchestration, but if you're running 30 worker nodes you're only paying 10% (this is assuming equally sized nodes, for simplicity).

> The more complex something gets the harder it is to get done right.

Right, but if someone else packages that complexity up for you in a piece of software (like Kubernetes or Linux) it's often much easier to get it done right than it would be to roll your own from primitives.


If 9 nodes is a lot for you, then I agree that your problems are small enough that manual solutions can be much simpler than using Kubernetes.

You also don't need an angle grinder to open a cardboard box, but that doesn't mean angle grinders are stupid and can all be replaced with a box cutter.


You’ll need to name names and their particular shortcomings if you’re going to say there are no good PaaS alternatives. In particular, both AWS and GCP have cron-as-a-service, which can then be hooked up to AWS Lambda/GCP Functions. Now, I’m not saying this will work for every use case out there (eg there are time limits on both), but without getting into the weeds a little bit on why those PaaS won’t work (for your use case), there are viable alternatives to K8s’ complexity.


> You’ll need to name names and their particular shortcomings if you’re going to say there are no good PaaS alternatives.

I'm not saying there are no good PaaS alternatives, I'm saying that it's not easy to find the right PaaS alternative. You might think you have the right one, and then some key feature is just missing (in my example, the ability to kick off orchestrator-aware background tasks).

> In particular, both AWS and GCP have cron-as-a-service, which can then be hooked up to AWS Lambda/GCP Functions.

Right, but now you're dealing with cloud-provider primitives which doesn't seem significantly more straightforward than dealing with Kubernetes (instead of knowing Kubernetes, you need to know the cloud provider APIs).

> there are viable alternatives to K8s’ complexity.

I'm sure there are, but figuring out which PaaS is appropriate for your use case (or a combination of PaaS + various cloud provider tools + etc) is a hard problem, possibly just as hard as building on a managed Kubernetes offering.

Personally, I'm hoping someone makes an open-source, push-button Kubernetes distro that has the common stuff built-in: external-dns, cert-manager, ingress-controllers, network storage backends, central logging, prometheus/grafana, etc. This would give you even more of the happy path built in without necessarily coupling you to a cloud provider while also giving you plenty of "future-proofing" (if your pass doesn't have some feature, you often need to switch to another platform, but if your Kubernetes distro doesn't have some feature, you can just build it atop Kubernetes).


> The problem I keep seeing in startups is that a lot of startup tech founders and employees don't see a startup as an embryonic business that will grow over the long term into a money-making venture requiring calculated risks and priorities, but rather as a personal playground to do all the fun things they were not allowed to do at their old BigCorp.

I don't think this is an accurate or realistic take.

You're actually stating that tech founders, when starting greenfields projects, actually take the time to invest in frameworks that offer advantages over outdated legacy systems adopted way back in the past which are only in place because there is no chance in hell a company will spend resources to pay off their technical debt.

There was a point in time where adopting Java+Spring, or even React was something only some silly risk-taker would do when playing around in their personal playground. But as the investment paid off so handsomely, they have become baseline options.

If you are free to choose the absolute best tool you have at your reach, why wouldn't you?


The GP is referring to a well-known phenomena that the founder of HN described in http://www.paulgraham.com/before.html.

> We saw this happen so often that we made up a name for it: playing house. Eventually I realized why it was happening. The reason young founders go through the motions of starting a startup is because that's what they've been trained to do for their whole lives up to that point. Think about what you have to do to get into college, for example. Extracurricular activities, check. Even in college classes most of the work is as artificial as running laps.

Many people, and I'd argue most people, start startups for their careers and for fun. Actually focusing on the business and solving business problems is foreign, grunge work. It's no surprise that founders and early employees pursue exciting tech to distract themselves from the challenge of product-market fit.


Not OP, but over the last year, I've worked in 4 startups. (A) reached a $100m Series C. (B) had a tech stack that was far too complicated for what it needed which opened my eyes to how badly teams can shoot themselves in the foot with K8s. (C) was at YCombinator graduate that increased their ARR by $1m in the last month. (D) was funded based on a pitch deck and a bunch of former FAANG.

(B) had 200 unique visitors a month and a stack built on Kubernetes with something like 8 microservices. It was running Apollo, Socket.io, and on and on; literally overcomplicating everything...for 200 unique visitors a month. Every single aspect of that system was over-complicated. CTO was part of the problem, but was complicit in letting tech team go wild; many of whom just up and left after playing around with the tech.

(D) on the other hand had the exact problem that OP described: they started going hog wild building multi-layered service-oriented architecture when the system barely has any traffic. Deployment is a pain. Code is copy-pasted everywhere because the alternative is to publish and pull packages and that's a lot of friction. Team can code, but barely understands the tech stack.

(C) built the dumbest architecture I've ever seen. Shit was stapled together. Webhooks were constantly failing due to quota pressure. Their Firestore queries were inefficient and their sloppy migrations led to many broken documents in the store. But guess what? They are actually growing in this economic environment because they solved a real problem without worrying about growth until they actually saw the growth.

(A) built his solution in Angular because that's what he knew and didn't try to jump on React or other hype.

At (C) (the YC startup), I had a conversation with the COO that I'll never forget. We were discussing an architectural decision which would have scalability and stability impact down the line. What he said was to the effect that "We have 10,000 users today and if we lost them all, there are still 7 billion people. We'll find another 10,000". He made it clear that they wouldn't be happy about it, but that it's speed above all else.

> If you are free to choose the absolute best tool you have at your reach, why wouldn't you?

The purpose of a startup isn't to play around with tech: it's to find product-market fit and zero in on the business value. If you think of a startup as a place to play around with tech, you have no place in a startup.

You should almost always choose the "dumbest" possible technology used in the "dumbest" possible way so that you can hire anyone and they can step in and be productive when you need to scale. It minimizes ramp up time. It minimizes deployment and operational complexity. It minimizes the ways the system can fail. Literally build the dumbest thing that solves the core business problem that creates value for the user and then figure it out from there.


> (B) had 200 unique visitors a month and a stack built on Kubernetes with > something like 8 microservices. It was running Apollo, Socket.io, and on and on; > literally overcomplicating everything...for 200 unique visitors a month. Every > single aspect of that system was over-complicated. CTO was part of the problem, > but was complicit in letting tech team go wild; many of whom just up and left > after playing around with the tech.

I do not understand how can anyone draw any conclusions from this whether or not the technology were at fault here for the implied failure.


>You should almost always choose the "dumbest" possible technology used in the "dumbest" possible way so that you can hire anyone and they can step in and be productive when you need to scale. It minimizes ramp up time. It minimizes deployment and operational complexity. It minimizes the ways the system can fail. Literally build the dumbest thing that solves the core business problem that creates value for the user and then figure it out from there.

Yes!!! Spot on. I always say “make it so simple an idiot can immediately understand it, or me when I’m hungover”. Choose the most boring, simple…yet reliable, stable, well supported tech you can. Focus on business problems and not if you can use fancy-new-tech to solve them!


I think it is strange that the only example you gave of (A) is that they chose angular over react because that's what they knew.

Everyone else failed because their backend systems were overly complicated.

I don't see how there is a lesson to learn from it unless we make a ton of assumptions about (A), absolutely none of which have anything to do with angular or react.

The last paragraph definitely makes sense, though I'm currently in the unenviable position of un-fucking a startup because the initial developers chose a setup that was far too dumb. It's literally going to take at least as long to fix as it took to initially build out. Ce la vie I guess.


To get to $100m you have to do so many things right and so many things wrong that it's hard to isolate any one or two things. But key is that the co-founder and CTO stuck to what he knew and focused on solving a business problem. Angular or React doesn't matter; could have gone the other way.

You probably mis-understand what I mean by "dumb" and why I put it in quotes. Things that are "dumb" are intentionally so; they are built on stacks using technologies and approaches that have low cognitive load. It almost feels boring because of how easy it is to build on the stack and extend the stack without shooting yourself in the foot. A very practical example is something like Google Cloud Run vs Google Cloud Functions with Cloud Run being more flexible, easier to deploy, easier to work with, and easier to reason about. It's stupid simple and rather boring.

But besides the fact, dumb things are generally easier to unfuck than complicated things. Complicated things with many dependencies built on complicated stacks are much harder to untangle and refactor.


because engineering is about tradeoffs and the right tool for the job. The vehicle with the “absolute best” carrying capacity is gonna be a semi-truck, but that’s laughable if you’re going to use one for a supermarket run for groceries.


Newer != better


Because most startups aren’t going to grow in the long term to profitable businesses. Either they are going to flame out completely or they are going to get acquired, the product discontinued and the founder is going to walk away with money and post an article on their blog about “our amazing journey”.

As an employee, it’s in your own self interest to do resume driven development. Your “equity” probably won’t be worth anything so you might as well work toward preparing for your next job.

As a founder, you can’t offer the compensation of a larger company. What you can offer is the chance to work on “cool cutting edge technology”.

At this point in my career, I’m more motivated by being able to describe successful outcomes than cool new technology. But I do understand the motivations. I’m also not an old curmudgeon though that considers the cloud is just a bunch of VMs.


Given the failure rate of startups, "I now have these skills on my resume I can leverage into a better opportunity" may be the actual goal of most employees.


I've seen this frequently in large companies as well. That's why you see legacy systems with random tech.


> Use cool framework or language X? Use Kubernetes? Microservices? Lambdas? No-SQL? Two fingers to BigCorp, I'm CTO/tech employee #1 at a new startup and I get to do whatever I want. If and when it goes down in flames, I now have these skills on my resume I can leverage into a better opportunity.

I was building a SaaS application (as a tech co-founder) and chose rails. It was newish to me (I'd done 1-2 projects) and I admit, I wanted to get better at rails. But it was also, after assessing the options and my (limited) knowledge of the future, the tech with the best chance of helping the company succeed.

I have definitely run into startups where there was overly complicated technology, but tended to be later stage and have raised some money. As a founder/employee #1, I have a hard time believing anyone would pick tech just to burnish their resume; they should be trying to ship so the company succeeds.

What am I missing?


>As a founder/employee #1, I have a hard time believing anyone would pick tech just to burnish their resume; they should be trying to ship so the company succeeds....What am I missing?

That most startups are Underpants Gnomes, the company probably won't succeed even if they ship.

Even if the company is not Underpants Gnomes and their fundamentals are sound, succeeding or not is still largely luck based.

Maybe the reference to Underpants Gnomes is too dated now? - https://youtu.be/a5ih_TQWqCA


Love the underpants gnomes reference and I agree that many startups will fail. But wow, working at a company you feel in your bones will fail seems to me to be a poor way to spend a career.

Not that every company I've worked at succeeded, but I hoped that every startup I joined as an employee would.

Now, as a contractor or consulting company, you still want the client to succeed, but it's much easier to be arms-length, take orders, and collect your check.


You’re missing the fact that employees who are interested in their own career would be less interested in a Rails stack on their resume.


It was over half a decade ago. But the point wasn't about the particular tech, it was that folks who join a startup, especially if early, should prefer the startup to succeed over choosing overcomplicated solutions, and I don't understand why. A couple of possible reasons:

* they don't think the solution is overcomplicated

* they don't really care about the startup succeeding

* they don't know what the ramifications of picking the wrong technology will have

* they think the risk of learning something new is outweighed by the value to the company (or themselves)

I don't think startup employees are altruistic, but picking a tech you know will have a detrimental effect on your employer strikes me as shortsighted.


I don’t think people want their employee to fail. But given a choice between the most “correct” technology and the most “marketable” technologies. It’s short sighted as employee to choose the most “correct” technology. More than likely the startup is going to fail.

If given a choice between Vue and React, I might prefer Vue. But I would still choose React as a tech lead because I know it is more marketable and you would get more self interested developers and it would be easier to recruit for.


Thanks for giving me clarity into your thought process. I agree the choice between "correct" and "marketable" is a difficult one. Sometimes the most marketable tech is the correct choice. Sometimes the correct choice is the one you and your team knows, even if it is not the most marketable. Sometimes it doesn't matter, because when you are building an app, there are many paths to success.

Technical correctness is not the only factor, there are other kinds of correctness, as you point out with React: "you would get more self interested developers and it would be easier to recruit for".


The PROBLEM is that all the job ads require you to have those skills that are impossible to very expensive to acquire if you're not employed at such a company that lets you experiment with that stuff you mentioned.

I write monoliths and I have only experimental experience with cloud, microservice, k8s, lambdas (no-sql is a no brainer). I can not find a job. All the recruiters offering any Go job, demand knowledge of exactly that. But since I'm a single guy with a currently non-existant budget I chose the cheapest way, monolith with a single database on a single server.

How am I supposed to acquire those skills? I don't have the time, motivation or the money to throw at expensive cloud or write overly complicated microservices with events. I need to get things done, not play with tech that my clueless employer wants, because they watched so Kelsey Hightower video.


They are often not required. It's obviously great if you have worked with a lot of the technology used at the new company, but many companies don't apply these as hard filters. I've never seen a candidate that had a 100% technology fit, but not one cares about that.

I view the technologies listed in the job ad much more as information for the applicant to judge if they'd be comfortable working with such a tech stack. Often it's the only place to get a glimpse of what technology a company is using.


I think there are two types of startups and it depends what you are actually doing:

- Startup that has product market fit (or is really close), and is ready to scale: choose boring tech that works

- Startup that is a playground, exploring or building new tech, and may never make money.

You can join either as an employee/cofounder/leader, and have a great time, but if you mistake one for the other, you will have a bad time.


Yeah, I have seen similar experiences, hence why I am a boring tech kind of person.


The last company I worked for was essentially finance bros who had a no-code investment solution but wanted to sprinkle ML on top to get clients. Suddenly it needed to be able to run air-gapped on prem. Oh also on Ali cloud in china. Oh also on GCP and AWS hybrid. Business promised the clients it was ready before we even started building. 90% of the team was under 25. We tried our damned hardest. Used K8s to make the whole thing platform agnostic. It worked but it cost a lot. Business people are the worst.


Isn't a single binary the best fit in this case? Telling clients they need to build up a kubernetes cluster (especially in finance) won't always be the easiest decision.


Sounds like the exact use case for OpenShift.


i dont think there's any use case for openshift


use case: "I would like to use Kubernetes but I also would like a monkey to occasionally come out of a cage and hit my balls with a hammer"


Open shift : an unstable trailing-edge version of Kubernetes with too much security to the point of wasting everybody's time. Or so says my team which is still recovering from that unmitigated disaster ...


Sounds like the are looking for Qovery


Well said. Your first 3 points summarize the entirety of the problem.

Kubernetes is basically best practices (with some quirks) of running containers reliably across a cluster of machines with various deployment types, logging, DNS/discovery, networking, storage, load-balancing, high-availability and other features built-in. If you don't need any of that then don't use it, but if you do then you probably won't build a custom infrastructure that does it better.


> Better to use a hand saw correctly than a table saw incorrectly.

To be fair, better to use a handsaw even incorrectly than a tablesaw incorrectly.


I concur.


> you can use a table saw incorrectly, but best case the work ends up crap, worst case you lose a finger

Funny, just yesterday someone mentioned how Rust is compared to a SawStop: https://news.ycombinator.com/item?id=31784253

And here's a test showing how unlikely it is that it'll chop off your finger: https://youtu.be/SYLAi4jwXcs


But that won't stop your workpiece from flying into your eyeball/jaw/neck/chest, or you from inhaling fine particles. Safety features are nice, but aren't a replacement for knowing how to use the tool safely.

And that's just safety. Knowing how to use the tool properly means using a featherboard, or not using both the miter gauge and fence, using a crosscut sled, and other things which ensure your piece won't be marred or jammed or bind or cut incorrectly. Proper use also leads to increased efficiency and less waste.

Same things apply to K8s. Even if you buy some managed product or safety feature, you still need to know how to use it right after that point, or your work will end up shoddy.


"Safety features are nice, but aren't a replacement for knowing how to use the tool safely."

In many cases, safety features in languages do indeed remove the need to understand a dangerous tool. For instance, if your language doesn't have pointer arithmetic, you don't need to know how to use pointer arithmetic safely.


> eyeball

The rest of your points stand (though I suspect a saw particle colliding with your jaw is unlikely to cause damage as severe as cutting off your finger), but any responsible person will be wearing safety goggles when operating a table saw.


I have never used a table saw before, but from my amateur youtube understanding, safety goggles aren't perfect protection against kickback, right?

Like, if I've got a chunk of wood being hurled at my face at high speeds, I'm definitely going to prefer to be wearing safety goggles, but I can still imagine the wood coming in at an angle and knocking them away, or being hit with enough force to break the goggles.


You're unlikely to get hit by larger pieces of wood at high speed. Dust, small chunks, maybe a part of the saw may come flying at you, but i suspect they're way more likely than large wood chunks. But if you believe this photo was not staged, then safety goggles are pretty good https://external-preview.redd.it/qVMDLhaXQN8Vd9UI6d5fDswF396...


In this analogy, raw unfiltered K8s is the table saw and the SawStop is AWS ECS or GCP GKE


I guess you've never met a competent generalist, then, which is funny because that's exactly who you need as your first employee.


I grew up on a farm, so I've got a very generalist attitude to things for the simple reason that if the thing breaks you have to take the covers off and fix it, or you get to walk home in the rain. And then, tomorrow, you get to walk back out to it in the rain and fix it anyway.

I've used a microscope, a JTAG cable and a MIG welder on the same project on the same day.

Here's the thing though - I like simplicity. I like stuff to be simple. I drive a 25-year-old Range Rover because it's simple, and because if it breaks I can fix it quickly with simple tools. I bring this simplicity to the stuff I build - lowest parts count, easiest to get parts, comprehensive manual.

Build your software stack as though you expect to have to fix it in a boggy field on a pissing wet Sunday afternoon with a hammer and a roll of sticky tape, because that's exactly what you're going to have to do.


My style as well. I would add one thing which I call “critical dimension analysis.” Try to identify the main customer driver for the domain you’re in (could be cost/performance/something entirely different), and invest at little more in that technical area (while still keeping things relatively simple). That will give you a simple yet efficient stack for the domain you’re in, which will be a great base for taking your product forward.


Totally agree about keeping things simple. When [it] hits the fan on Sunday afternoon, you can probably root-cause the issue fairly quickly and get things running again. And, you don't have to wait for that *one guy* who went on vacation Friday to come back a week later to show you how things were configured.


Exactly. A "perfect" fix next week is not better than "drive it gently and bring it back on Monday so I can look at it properly".


Funny part is that „competent generalists“ (those Brent-style types) are very rare and most technology needs to be designed for incompetent specialists. Also funny is that these generalists typically build as if everyone has their own skillset which renders many systems unusable once Brent has left the building.


They're not just rare, the bigger problem is that most companies don't know how to deal with them.

They either can't pay for them, don't give them enough freedom, weigh them down in a big team, or any number of other issues. One person in a specific role (even if they're not very good at it) is far easier for most organizations to deal with than the 10x generalist that can get things done.

It can work well in very early startups, but there's limited reward there for it to be worth it.


Nobody cares if what you built is sustainable and scalable if you can't do it fast enough to get your first customers, and there is plenty of time to make what you've built usable by the time you've left.

If you, as a founder, hire an incompetent specialist as your first employee, maybe you don't deserve to be successful.

Your first hire is either a "Brent", or you have to get very lucky in just about every way.


>those Brent-style types

Which Brent are you talking about?


I get the sense the name is being used in a general way like a “Karen”, but I’ve never heard of a “Brent”.


Brent is the engineer building all the real tech that a company in the Phoenix Project is building, and when the MC (a manager, Bill I believe) realizes this he re-directs resources so Brent can do his job without interruptions, ultimately saving the company. Excellent read, it's the first book managers are told to read when they use SCRUM/AGILE incorrectly.



Expecting someone who's capable of handling most roles isn't always a viable strategy.

Personally, I consider myself such a person and I've been called in numerous times when people:

  - need to use Kubernetes because of project requirements, but all they have is a single node with 8 GB of RAM (K3s sufficed nicely)
  - need to deliver OCI images and Helm charts, yet have no idea how to handle all of the infra aspects here (Nexus to the rescue)
  - need to containerize any number of applications from old timey monoliths that ran in loosely defined Tomcat and JDK versions, badly
  - need to figure out why their apps run badly, despite there being almost no logs ("Log output is spammy!" they say), no APM ("What?") or proper monitoring for whatever they've written
  - need to figure out how container networking works and why them attempting to use localhost for inter-service communication isn't always the best idea
  - need to figure out how to manage configuration properly, especially in situations where they add some random configuration parameter 4 months back, forget about it and then attempt to ask me about what it means, because of course they didn't document it
  - need to set up log rotation, because they absolutely refuse to use log shipping and see no benefit in it, yet don't want to delete the old logs whilst wanting to be able to view the old ones, yet don't know how to configure compression for the older ones and thus run out of disk space on the server
  - need to set up a reverse proxy and manage its configuration, because all they know is Apache/httpd (which I think is a nice project), but haven't bothered learning Nginx which they now need to use because of said requirements
  - need help with their supposedly automatic database migrations actually failing because they failed to account for pre-existing data
  - need help with the project moving ahead at a snail's pace because they wanted to pad their CVs and picked Tailwind CSS instead of some pre-made component library and thus have to reinvent the wheel, as well as use TypeScript which they do not know how to use and are slow with as a result
Somehow such generalists feel few and far between, so relying on them to always be there to save you from yourself won't be reliable.

Furthermore, there will be lots of people with either limited knowledge or limited interest in learning new things. I've see someone say that "restarting services is not in my job description" when asked to restart some environment. Many out there will also be perfectly fine with writing apps like they had 20 years ago, without taking advantage of any of the benefits of 12 Factor Apps or similar principles. They will also reach for the local file system instead of something like S3 for object storage and application memory instead of using Redis for session storage. They will seldom be able to write apps that are horizontally scalable and as a consequence you'll have to rely upon a single monolith which should always be up and working, which, given their lack of ability to write proper unit tests, will rarely work out that way.

Unless you are very selective in your hiring, plan for people like that also being present.


> Furthermore, there will be lots of people with either limited knowledge or limited interest in learning new things.

cries

> Many out there will also be perfectly fine with writing apps like they had 20 years ago

seen this many times

> Unless you are very selective in your hiring, plan for people like that also being present.

Hiring is everything. All issues you have listed would be solved just by having a senior (with real senior experience, not 10+ years written in their CV) in the team.


> Hiring is everything. All issues you have listed would be solved just by having a senior (with real senior experience, not 10+ years written in their CV) in the team.

But that's the crux of the problem, isn't it? The people who are doing the hiring often won't be able to optimally pick out the people that might lead to the success of any given project, given how many facets to judging someone's aptitude like that there might be, and how limited their resources might be.

I guess one can also mention the "ten years of experience" versus "one year of experience ten times" conundrum, but at the end of the day, a lot of it is going to be hit or miss.

As an individual contributor, it's probably a good idea to fix what you can, document what you cannot (and why) and always look for environments where you fit in the best, so that in the end you can work with people with whom you are compatible.


I’ve experienced a lot of people recommending and learning a trendy technology to get it on their resume, and then leaving a mess behind when it gets them a new job. Leadership can’t fully abdicate the decision.


Lots of whom are here on HN recommending trendy technologies.


As intoxicating as they are, we need to move away from using analogies to problem domains most know little about to attempt to gain further insight into our own.


I like what you're saying, but your woodworking metaphor falls a bit flat for me because it teaches me more about woodworking than about building IT infrastructure.


Can small companies afford the complexities of k8s? I'm not yet convinced. Its pros are compelling yet the learning curve is steep. Which suggests a smaller talent pool.


I think so, as long as you're using a managed cluster on a cloud that has automatic handling of Ingress and LoadBalancer. My first production experience with it was at a start-up with 3 initial engineering staff. We already knew how to build docker containers. A basic helm chart was easy to create and push to the container registry. All we needed to do was set the kubeconfig and helm install v0.1. Even if you're not sure how it all works under the covers, your tiny team is pushing to dev cluster several times a day, and your dev site is online (you can hide it behind Cloudflare Access for free). Extras like Cluster Autoscaling, Horizontal Pod Autoscaler, ExternDNS and cert-manager are easy to add when you need to. Anyone can fumble-install Grafana/Prometheus/Loki and have metrics and logs. At that point you have elastic infrastructure, dashboards, alerts, TLS, LoadBalancers, DNS... How long would a small developer team take to do this without k8s? Imagine trying to figure out ECS, Fargate, Lambda, ACM, CloudWatch logs, alarms and metrics? And how much AWS-specific code is in every single one of your services to get it to work on native AWS?


> Imagine trying to figure out ECS, Fargate, Lambda, ACM, CloudWatch logs, alarms and metrics?

In case anyone is curious, we went that route.

- ECS + Fargate + CDK took one of us two (8h/day) weeks for the initial setup. We've sprinkled a few more days here and there since then.

- Cloudwatch logs are "setup-free" (your containers' logs get sent there by default when using CDK constructs).

- ACM... we don't use directly. CDK will easily setup TLS-enabled (with AWS-emitted certificates) ALB (Application Load Balancer(s)) for you.

- Lambdas we don't use much.

- Metrics & Alarms are easy to set up but they generally suck. The custom language for computed metrics is clumsy and quite limited. The anomaly detection sucks. And it is expensive even by cloud standards (don't create metrics and alarms willy-nilly or you'll feel it in the next invoice).

- Our application code doesn't know much about AWS (we do use libraries for S3 and SES, but these are just easy to swap adapters).

- We ended up with ~3k lines of CDK definitions in typescript. These are very easy to read, and only moderately hard to write (you do need to look up the docs). However, I can say without a doubt that it's been the easiest infrastructure definition/description language that I've ever used.

I don't have enough experience with K8s to know whether that route would have been better or worse, but I can say this route hasn't been a pain point for us.


Good to hear. CDK is a huge plus.


A few years, I would have said no. Now, I'm cautiously optimistic about it.

Personally, I think that you can use something like Rancher (https://rancher.com/) or Portainer (https://www.portainer.io/) for easier management and/or dashboard functionality, to make the learning curve a bit more approachable. For example, you can create a deployment through the UI by following a wizard that also offers you configuration that you might want to use (e.g. resource limits) and then later retrieve the YAML manifest, should you wish to do that. They also make interacting with Helm charts (pre-made packages) more easy.

Furthermore, there are certified distributions which are not too resource hungry, especially if you need to self-host clusters, for example K3s (https://k3s.io/) and k0s (https://k0sproject.io/) are both production ready up to a certain scale, don't consume a lot of memory, are easy to setup and work with whilst being mostly OS agnostic (DEB distros will always work best, RPM ones have challenges as soon as you look elsewhere instead of at OpenShift, which is probably only good for enterprises).

If you can automated cluster setup with Ansible and treat the clusters as something that you can easily re-deploy when you inevitably screw up (you might not do that, but better to plan for failure), you should be good! Even Helm charts have gotten pretty easy to write and deploy and K8s works nicely with most CI/CD tools out there, given that kubectl lends itself pretty well to scripting.


the devil is in the details, they all look easy on the surface but there are so so many traps you can step into casually that they really aren't solutions for the k8s being too complex problem.


If you use a good managed k8s offering, the complexities are not that great. GKE with Autopilot is a good option. In that case you don't need to know much more than how to write the yaml for a deployment. I've shown developers at all levels how to do that, it's not a barrier.


+1. GKE Autopilot and sticking to the core ~4 Kubernetes object kinds (Deplyoment, Service, Ingress), was a really easy way for us to get started with K8s (and in many cases can carry you a really long way).


Though easy to run, if there are multiple workloads of varying resource requirements, there will be a lots of wasted CPU and RAM, just because there are minimum CPU and CPU to RAM requirements.


Agree. I think they are working towards providing ISTIO (gateway replaces ingress, TLS internal communication, canary deployments, shadowing etc).

If they can spin up GPU or high memory nodes on demand with Autopilot that would be amazing.


Why would you run it in k8s as opposed to, say, ECS? Honest question here, why not run it on something simpler that requires less new concepts and achieves the same results?


Because ECS sucks. It's slow and unergonomic, like a badly designed k8s with fewer features.

k8s is nicer to use and simpler if you ignore the complex bits. Plus it's the standard.


My experience of ECS is the opposite of yours. It integrates nicely with the AWS ecosystem and was substantially easier to use and educate others on. Would not hesitate to use again on either fargate or BYO EC2. I will acknowledge scheduling is not quite as fast as Nomad but I never found it 'slow'


Personally, I'd rather create a k8s deployment than an ECS task, but I can see your point. If all you want is an integrated with AWS experience, then it makes som sense that ECS is just simpler overall out of the box.

I don't think the delta to make k8s integrated is that much work with EKS, but ability to mutate the entire infrastructe if and when you do scale wins out for me. I think the complexity, most of which you can ignore, is worth the flexibility.

Either way, since k8s landed, AWS itself has started improving too.


I've used both and I would still prefer ecs/fargate to build a rather independent application and k8s to build a long-term platform.


For a typical deployment, ECS isn't simpler than a fully managed k8s system, and doesn't have fewer new concepts. The wealth of concepts in k8s only comes into play when you're doing more advanced things that ECS doesn't have abstractions for anyway.

In ECS you have abstractions like task definitions, tasks, and services, all of which are specific to ECS, and so are new concepts for someone learning it originally. In Kubernetes a typical web app or service deployment uses a Deployment, a Service, and an Ingress. It isn't any harder to learn to use than ECS, and I find the k8s abstractions better designed anyway.

If you're already using ECS, and are happy with it, then there may be no strong reason to switch to k8s. But for anyone deciding which to use for the first time, I'd strongly advise against ECS, for several reasons.

One is that k8s has become an industry standard, and you can deploy k8s applications on many different clouds as well as non-cloud environments. As such, learning k8s is a much more transferable skill. K8s is an open source project run by the Cloud Native Computing Foundation, with many major and minor contributors. You can easily install a k8s distribution on your own machine using one of the small "edge" distributions like k3s or microk8s.

While in theory, some of the above is true for ECS, in practice it just doesn't have anything like the momentum of k8s, and afaict there aren't many people deploying ECS on other clouds or onprem.

Because of these kinds of differences, all in all I don't think there's much of a contest here. It's not so much that ECS is bad, but rather that k8s is technically excellent, and an industry standard backed by many companies, with significant momentum and an enormous ecosystem.


A compelling reason is the large ecosystem of tooling that runs on k8s. Practically anything you want to do has a well maintained open source project ready to go.


For example? What could you do on k8s that you couldn't do on native aws?


Take a look at Kubeflow.org for an example. There are several reasons that a tool like that targets Kubernetes and not native AWS. One of the benefits of k8s is how portable and non-vendor-specific it is. Basically, it's become a standard platform that you can target complex applications to, without become tied to particular vendors, and with the ability to easily deploy in many different environments.


To be clear, I'm not claiming you can't do these things on native AWS, but rather there are wide choice of high-quality projects ready to go that target k8s.

  - Countless Helm charts
  - Development tools like Telepresence
  - Many GUIs / terminal UIs
  - CI/CD tools like Argo
  - Logging and monitoring tools
  - Chaos engineering tools
  - Security and compliance tools
  - Service meshes / request tracing
  - Operators for self-healing services
  - Resource provisioning
  - etc...


There is also a new generation of platform that runs on top of Kubernetes that are emerging. Like Qovery


> Can small companies afford the complexities of k8s?

Where exactly do you see this complexity in Kubernetes?

I have a couple of Hetzner nodes running microk8s and I have a couple of web apps running in them. All it takes to deploy each app is putting together the kustomize script for the app and afterwards simple call to kubectl apply -k ${kustomize_dir}. I'm talking about specifying an ingress, deployments, services,... The basics. I even threw in a couple of secrets to pull Docker images from private container registries.

And everything just runs. With blue-green deployments, deployment history, monitoring, etc.

It's far more complicated to setup a CICD pipeline, and don't get me started on the god-awful mess that is CloudFormation or even CDK.

Where exactly did you see all that complexity you're talking about?


A lot of people in this job just really hate learning anything, and would rather spend way more time spread out over months and years than just investing some time and learning how to use something new.

It seems like somehow some people get into this job by only learning tools that they can pick up without trouble over a weekend?

The concepts you need to deploy stuff on kubernetes really aren't that complicated. It's just a bunch of yaml documents in extremely-well-documented schemas. If you want to run a service with N instances, you just write a deployment with `replicas: N`.

There are a lot of details I'd choose slightly differently if I were designing my perfect ideal cluster orchestration system, but the whole point of open source is that everyone who wants to build a comprehensive cluster orchestration system can just get together and collectively build it once, so I don't have to design and own it all internally. It's got all the pieces to build exactly what I need to make good use of a ton of computers, in a simple, reliable, repeatable, consistent, standard way. It gives you trivial primitives to build HA fault-tolerant deployments.

There are very few good excuses left to ever have any reason to page anyone over "One server had a hardware failure".

It just baffles me that people can see this powerful, industrial-grade, comprehensive tool, and decide "Nah, that'll never be worth starting to learn".


I half agree with you here. It is actually quite simple, but everything in Kubernetes is very explicit, which is good, but also intimidating. If you've never worked with Kubernetes before then it's a lot of added complexity without clear benefits.


> If you've never worked with Kubernetes before then it's a lot of added complexity without clear benefits.

Unless you're someone who only had to work on a monolith deployed to a single box somewhere, Kubernetes adds zero complexity to the problem you're already dealing with.

In fact, Kubernetes simplifies the whole problem of running stuff on a cluster. Network, security, deployment, observanility... That's all provided out of the box. And you can rollback whole deployments with a single command.

Heck, even ssh-ing into a container, regardless of where it's running, became trivial.

How is that harder than deploying stuff to boxes somewhere?


>Unless you're someone who only had to work on a monolith deployed to a single box somewhere,

I think the point is that 90%+ of websites are fine with a few monoliths behind a load balancer. That set up can handle low thousands of requests per second in Rails/Django/etc. Maybe low 10 thousands with a more performant language.

And it's not just k8s. It's the whole microservice/SOA that comes with it. It ramps up the complexity of everything and is a constant time sink in my experience.


You apparently learned all about microk8s and spent time configuring it. I assume you did not get it right at the first time.

It is like Usain Bolt comming over and asking what is so hard for you in running 100m in ~10 seconds when you never left the couch.


> You apparently learned all about microk8s and spent time configuring it. I assume you did not get it right at the first time.

What? With Ubuntu, microk8s works pretty much right out of the box.

The only thing you need to learn is how to install it with Snap.

What are you talking about?

> I assume you did not get it right at the first time.

I did, not because I'm a rocket surgeon but because it is really really that simple.

https://ubuntu.com/tutorials/install-a-local-kubernetes-with...

What exactly leads people like you to complain harshly about how hard it is something you never even tried before?

You're literally wasting far more time complaining in a random online forum about how hard a technology is than what it takes to not only learn the basics but also get it up and running.


You also had a pentest of your setup so you are perfectly sure you don't expose something to the internet that you are not supposed to?

You also considered updates for Ubuntu and microk8s so you have strategy for updating your nodes with newer versions and security patches.

I can follow a tutorial to set something up - but then there is always whole world of things that is never included in tutorials.

Just like kubelet accepting unauthenticated requests by default: https://medium.com/handy-tech/analysis-of-a-kubernetes-hack-...


> You also had a pentest of your setup so you are perfectly sure you don't expose something to the internet that you are not supposed to?

What are you talking about?

With Kubernetes you need to explicitly expose something. In code. And apply that change. And see it explicitly listed in the descriptions.

Outside of Kubernetes, you're already talking about a requirement that applies to all web services, regardless of deployment solution. Why didn't you mentioned that?

> You also considered updates for Ubuntu and microk8s so you have strategy for updating your nodes with newer versions and security patches.

What point were you trying to make?

Do you believe Kubernetes is the only software that requires updating?

Even so, with Kubernetes you can launch a freshly created instance in your cloud provider of choice, add it to the cluster, and go on with your day.

If you'd want, you can drain a node, shut it down, and rebuild the node from scratch.

Where exactly do you see a challenge?


If you're concerned that you've somehow accidentally exposed something to the internet that you didn't explicitly intend to expose to the internet, you can just do a trivial port scan. You just run nmap, look at the output, and you're done in like 30 seconds.

What does this have to do with kubernetes? "Don't expose stuff to the internet that you don't intend for everyone across the planet to be able to access" applies exactly the same to literally everything you could run on your servers.

This isn't remotely "Kubernetes is uniquely scary and complicated"; this is basic fundamental network security, and if you're not already handling this, then you need to go brush up on your basic networking fundamentals, not blame it somehow on kubernetes.

Almost every network service I can think of defaults to accepting unauthenticated connections, or connections authenticated with some default credentials. This is the normal, expected, default situation with network services. If you make the decision to expose something to the entire world, in a professional context, it is your responsibility to know the specific reasons it is safe to do so.

Are you really trying to argue that "Some rando decided to bareback the entire global internet with no firewall, on a personal home server, and didn't bother to type 'kubernetes secure configuration' into google, therefore Kubernetes is super hard and complicated and dangerous"?

It's not like this is some obscure cryptic detail; it's explicitly called out in the documentation that any half-decent professional would read before deploying a production service: https://kubernetes.io/docs/tasks/administer-cluster/securing...

  Controlling access to the Kubelet
  Kubelets expose HTTPS endpoints which grant powerful control over
  the node and containers. By default Kubelets allow unauthenticated
  access to this API.
  Production clusters should enable Kubelet authentication and authorization.
  Consult the Kubelet authentication/authorization reference for more information.
Yes, untrained amateurs sometimes do dumb stuff. Sometimes companies leave their S3 buckets open to the world. Sometimes people expose mysql to the internet with credentials they ship to users. Sometimes people expose unauthenticated Redis to the internet. This does not mean that these technologies are somehow fundamentally too complicated for mere mortals, it just means that it's dangerous to ask amateurs to do something in a professional context.


Try setting it up on your own without Ubuntu doing the legwork. Set up a 3 node control pane, the deployment servers and storage.

You come off as very arrogant who believes he knows everything, some humility would suit you well, but I think all pseudo smart Germans are like that.


> Try setting it up on your own without Ubuntu doing the legwork.

Why? Do you also see any purpose in hopping on one foot to work instead of driving there?

I don't understand what leads people like you to try to move goalposts to pretend something is harder than it is or needs to be.


I've set up quite a few kubernetes clusters on my own, and relied on the clusters I've built for production services at both startups and big tech companies. I've done quite a bit with both local storage and network storage via Ceph.

I am not German, and I have never been to Germany. If we're trading wild speculation about personal details, I think you could use some ambition and self-confidence.


It is not untrained amateurs it is also people who do stuff from tutorial and think they know everything.

So my post is not about Kubernetes per se - but about narration "it is super easy 6 year old could do it", well no not everyone can do it and one has to spend time with any new technology.

Besides nmap in that scenario is not helping as well, beacuse I have to expose port 443 to serve my customers and Kubelets expose https endpoints. If someone runs simple nmap scan sees 443 open and concludes all is correct because he will be serving https websites - so your "you are done in like 30 seconds" seems like shooting oneself in the foot.


Hmm, interesting, I may have been misreading you.

I agree that 6-year-olds and other people without any production sysadmin or SRE experience are going to have a pretty bad time learning to build and deploy a Kubernetes cluster.

My point is that any professional sysadmin or SRE can learn Kubernetes just fine. Yeah, there's a lot of stuff, but there's just about as many moving parts as I expect for a system that handles what Kubernetes does. You also mostly don't have to pay complexity cost for many optional features you don't care about; you can get a minimal cluster up, and then grow it as you need more features.

I don't follow what you're saying about port 443. The kubelet API is not listening on port 443 by default. I'm as confident as I can be without checking that no kubernetes components listen on port 443 by default.

Speaking more broadly, I agree that someone with no SRE experience and no network security experience won't get much value from 30 seconds of nmap. What I was trying to say is that "accidentally exposed the kubelet API to the global internet" is something that I expect a competent sysadmin to be able to detect and notice with 30 seconds of nmap.

When I'm saying "deploying kubernetes is fine", I'm saying that anyone who has any business running nontrivial production services in a professional setting will not have any trouble learning to use and deploy Kubernetes. Deploying a cluster does require competence with sysadmin or SRE fundamentals, but not particularly more so than other systems that handle similarly-complex topics.

Also, any junior sysadmin or programmer should be able to learn to use an already-running kubernetes cluster to deploy basic services with no trouble and just a bit of time. I have trained quite a few people on this, and it really does go just fine.


It's just like any other tech; you read the docs, try it out, do some troubleshooting, and then you know how to use the tool.

I bet you could get microk8s running correctly on your first try. Give it a shot! Here's a doc: https://microk8s.io/docs/getting-started

You could probably get some additional nodes in your cluster on your first try too: https://microk8s.io/docs/clustering

This isn't Usain Bolt. This is normal people doing normal work with normal technical tools, and then somehow people keep claiming they must be some kind of world-class genius to have done it. Try it for yourself before you claim that you'd have to be a world-class peak performer to have installed and configured some simple daemons on a few linux servers.


Of course Usain Bolt is an exaggeration.

But you try it out, set stuff up from tutorial then do some troubleshooting put production data there and you get articles like these:

https://medium.com/handy-tech/analysis-of-a-kubernetes-hack-...

https://www.zdnet.com/article/a-hacker-has-wiped-defaced-mor...

https://thenewstack.io/armo-misconfiguration-is-number-1-kub...


I'll try to take a different approach here than I did in my other recent reply to you.

I hear that you're saying that there are possible configurations that are insecure, and that at least some care and attention needs to be invested to avoid problems. I agree. This isn't specific to Kubernetes, as you show in your second link. This can be a problem, and people have in fact suffered harm due to leaving their doors unlocked.

On the other hand, most important security doors in most professional environments are not left unlocked and unmonitored.

If you have already learned basic sysadmin fundamentals, then you have the skills needed to learn to deploy and use Kubernetes just as securely as any other network service. The way that you learn to apply your general sysadmin skills to Kubernetes is by practicing with it. You can supplement your practice with books and training if you really want to, but it's not necessary. If you happen to have other people who have already gone through this process as support, that can help quite a bit. If there are any other better ways that people learn things like this, I have yet to hear of them.

If you don't already have those skills, then the way to build and develop them is exactly the same process. You try stuff out, read some docs, poke at things to see if you can break them. You can supplement this with classes if you like, but they're not necessary. Peers and mentors are great if you have them, but they're not necessary.

What alternative do you have in mind? What about kubernetes specifically is so monstrously complex? People keep asserting this, but I learned it just fine like any other nontrivial software I've ever worked with professionally. My peers at work learned it just fine like any other software we've worked with. My friends and colleagues I keep in touch with from previous jobs have learned it fine.

I don't really understand what kind of complexity bar you're trying to imply is just objectively too high to be reasonable? Yeah, it's got more moving parts than like Redis, because it does way more than Redis does. Sed is simpler than python, but that doesn't mean that you need to be Usain Bolt to learn python.


The complexity of Kubernetes is overstated, so I think they can. However, it doesn't mean they should. Personally I would just start with a docker-compose.yml and run it on a VPS. In the starting phase you probably don't have much traffic anyway and docker-compose can nicely progress to Kubernetes when you need it. Another upside is that any developer can run the entire stack locally on their machine, which a huge upside as well. This means you can fully focus on producing working code and don't have to bother about infrastructure too much.

Then again, I've never worked in a startup so the above approach is purely theoretical. Curious about what other people think.


I agree that "the complexity of Kubernetes is overstated". Kubernetes itself is actually pretty simple, very reliable, and more mature than it seems. The complexity and challenges of Kubernetes come from all of the add-ons that may not be necessary in most situations.

Vanilla K8s is pretty good. But when you think about admission controllers, policy engines, service meshes, progressive rollout, etc, you are increasing the scope.

Start with k8s, and hold back the temptation to solve 10 other problems with 10 other projects from the CNCF sandbox. Once you have a good system running, really evaluate the complexity of each new solution with value it provides, and make a decision. Say no to most.


Can small delivery companies afford the complexities of flat-bed trucks? Well: are they driving them? Repairing them? Assembling them? Loading/unloading?

You can bet that using a passenger car instead can be more "affordable", because more people know how to drive them, repair them, assemble, load/unload, they're cheaper, etc. However, if what they're delivering can't be hauled by a passenger car, or the logistics wouldn't make financial sense, that again changes the equation. There's no one answer.


So is everything not K8s a passenger car in this analogy?

Solution space is vast. We're all trying and weighing alternatives as we have the time and capacity.


Yes, while I worked at a bigger company, our team of 15 developers and 5 SRE managed to build a few self hosted Kubernetes clusters for our microservices, a pipeline and also set up cloud development for fail over. We used Data Dog, Consul, Elastic Stack and a few SQL and NoSQL dbs.

I mostly self learned and was helped a bit by guys who started before me. The rest self learned, mostly by doing, solving issues, reading articles and tutorials.


I think the upsides to k8s are relatively minor (if any) for most businesses, and the downsides are significant (tech debt and complexity). It's sometimes sold as revolutionary tech, but it's really just an incremental improvement, and the next incremental improvement will come along soon enough.

DevOps is a never ending yak shave.


> if your people aren't very skilled, they won't build anything well

If your people aren't very skilled, making them manage a ton of unnecessary complexity is not a good idea.

k8s is not a saw. It's a hovercraft. Expensive to maintain and very few people need one.


> a software engineer knows as much about cloud architecture as a fine woodworker knows about framing

A mobile or frontend engineer, perhaps. But the idea that implementing and architecting the same type of object (backend services) are mutually exclusive skillsets strikes me as ridiculous. Since architecture decisions are by definition highly consequential and difficult to back out of, you want your most skilled and experienced software engineers making them, but it is precisely their software engineering experience that makes them qualified to do architecture. I would never trust a so-called "cloud architect" who is not also a first-rate backend engineer.


The thing is, Architect is its own role and discipline, and Cloud Architecture is distinct from Software Architecture.

For example, you have to know why using PrivateLink is a bad idea, what kinds of backups to what locations/accounts you can do with an encrypted RDS database with a CMK, the 50 different ways you can screw up access to an S3 object, the limits of how many IPs EKS can use per node, why somebody should use the M5dn instance type, when rolling+maintaining your own complicated expensive service might be way better than using a cloud-managed version, what services you cannot use in what regions and zones. You have to be a budget wizard, keep up on the latest solutions, and know how all this dovetails with cloud security.

A Software Architect should not have all of that Cloud trivia rattling around in their brain, but it's essential for a great Cloud Architect. Conversely, a Cloud Architect doesn't need to be a great coder to design great Cloud architecture. (Similar to how a construction architect doesn't need to be a great framer to know how the code affects framing, and how a great framer doesn't need to know how to calculate bending moments and shear forces in laminated beams to plan what day they'll need a lift for the beam and how to attach it)


This is so spot on. Developers and “software engineers” seem to have a pathological habit of wanting to bring in gee-whiz technologies that they only have a cursory knowledge of and actually do very little to make the products and projects better.


TL;DR:

- if you want to build something yourself -> you need experienced ppl to do that

- if you dont have the right ppl -> you better pick a SaaS or PaaS, but your business will be at mercy of your provider

- there is a huge shortage of experienced skilled ppl


Kubernetes is not a saw. It's a crane.


If you don't use k8s and just run bespoke containers you still have to figure out how those containers find and talk to each other. Maybe you run some custom DNS setup, maybe you run a purpose built service discovery thing like consul, etc. And you have to figure out how you'll do networking to support public/internet-facing workloads vs. private internal services (and how each can talk to the other).

But... if you just use k8s you get things like basic service discovery, networking, ingress, etc. with it and don't have to figure out bespoke solutions (that you'll just chuck anyways once you move to k8s).

I do agree though I would be very hesitant to immediately dive in running stateful workloads like databases, etc. on my own k8s cluster. The cloud hosted database services that every provider has is such a significant time and complexity saver, especially if your databases are just getting started and small-ish.


Surely you recognize that we had services that could talk to each other before k8s existed? And therefore all of the things you point to are solved problems and don't require k8s in any way, shape, or form?

Like, if I'm configuring the foo service and it needs to know how to contact the bar service... that's not a problem? I just put bar-service-lb.mycompany.net in a config file and I'm done. I don't understand why people think they need complex "service discovery". Just give things names!

My cloud provider (or heaven forbid, wires in a data center! gasp!) already gives me perfectly good networking. Why layer anything on top of that?

No one really needs "ingress". Your server program can listen on a port! Like we've been doing for 50 years! This is a completely self-inflicted problem.

It's so depressing seeing the k8s generation completely forget the highly effective ways that we used to do things. All those things are still possible! You can choose to design, build, and run a simple system if you want to!


K8s doesn't reinvent anything. It uses DNS for service discovery, just like you describe. It happily opens ports for ingress if you'd like (or it talks to your cloud provider load balancer to do the same). K8s is literally just a few go-based API endpoints that orchestrate all kinds of existing things like DNS, container scheduling, etc.

What K8s does is provide a simple and idiomatic rest API on top of all this architecture. No longer do you have to read a ton of disparate man pages, figure out what the sysadmin 10 years ago did for DNS, etc.. you just use kubectl or the API. It's the same interface to spin up more replicas of a frontend service as it is to open a port for ingress or even collect and review logs.


> Like, if I'm configuring the foo service and it needs to know how to contact the bar service... that's not a problem? I just put bar-service-lb.mycompany.net in a config file and I'm done. I don't understand why people think they need complex "service discovery". Just give things names!

What about when you're given a few servers for your dev/test environment, but don't have control over the DNS server and adding records to it has an annoying amount of red tape?

Would this be a good option:

  - FOO_URL=http://300.0.0.1:3000
  - BAR_URL=http://300.0.0.1:8080
  - BAZ_URL=http://300.0.0.2:80
  - X_URL=http://300.0.0.2:5601
  - Y_URL=http://300.0.0.3:5602
Or would this be better:

  - FOO_URL=http://foo.dev.svc.cluster.local
  - BAR_URL=http://bar.dev.svc.cluster.local
  - BAZ_URL=http://baz.dev.svc.cluster.local
  - X_URL=http://x.dev.svc.cluster.local
  - Y_URL=http://y.dev.svc.cluster.local
Sure, you can achieve the same with just modifying a hosts file, but what about when you need to propagate some changes to every computer and other server that needs to talk to these apps, while you are still not in control of the DNS server?

I guess you can just run your own DNS server at that point as well, but that might raise some eyebrows in some places, vs just using whatever the orchestrator provides you with out of the box (and Kubernetes also has built in proxy functionality so people can connect to the cluster pretty easily and forward whichever services they want, which can take a bit of work otherwise, though i guess you'd get used to the proper order of arguments in the SSH tunnel commands too).

> No one really needs "ingress". Your server program can listen on a port! Like we've been doing for 50 years! This is a completely self-inflicted problem.

What if you want all of your services to have SSL/TLS certificates provisioned for them? What if you want to ensure that all of the inter-service communication is also encrypted? What about you wanting to add circuit breaking or rate limiting logic in there? Or maybe doing a MitM on yourself so you can debug the network traffic between the services better? Also, do you really want to configure the firewall rules for each of your internal services to be able to talk to others, or wouldn't it also be good to have it piggyback its traffic on some other overlay networking port that's open on all of them for cluster communications?

Oh, also to avoid weirdness with CORS or other technologies, it might be nice to route requests based on domains and paths, instead of ports (e.g. my-app.com/analytics might be better than my-app.com:9000 in some contexts), though of course that is easily achievable with any reverse proxy, just saying that ports aren't always the best option.

Personally, I think that there's a lot of benefit to using an ingress and a service mesh of some sort, as long as those don't add too much undue complexity. I mean, using our own Nginx/Apache/Caddy/Traefik instance instead of the Ingress abstraction was also perfectly passable, but there are certainly benefits to exploring new technologies and what they offer.

In my experience, all of these container technologies providing you with their own DNS implementation out of the box is very useful, especially because recognizing that "300.0.0.1:3000" is the wrong URL is harder than looking at "foo.dev.svc.cluster.local" and seeing that your service probably doesn't intend to connect to foo.

> It's so depressing seeing the k8s generation completely forget the highly effective ways that we used to do things. All those things are still possible! You can choose to design, build, and run a simple system if you want to!

I do agree with this, though! There should be more resources out there to show off how to do basic and useful networking without getting too deeply into some of the fancier technologies.


Making something work with a IPv4 octet of 300 (decimal) is a trick I'd like to see... (and I'm not talking about https://ma.ttias.be/theres-more-than-one-way-to-write-an-ip-... )


That's intentional, just so i don't give someone's actual IP address in my silly example. Or, you know, make people misunderstand the example by providing a local address when I intend to give an example where a set of remote boxes are being discussed. :)

Might as well have dug up https://en.wikipedia.org/wiki/Reserved_IP_addresses#IPv4 but the arbitrary nature of it all hurts my head.


Running your own DNS isn't as hard as you think


I'm not saying that it's too hard at all (even though servers like BIND could probably use some UX improvements), just that not everyone would approve of you running your own server for any given project. And sometimes adding/changing records on the org-wide server must be "thrown over the wall", which will be slow.

Furthermore, if you have 5 different projects that you need to work on and each of those would have their own separate DNS servers, you might find yourself in a slightly uncomfortable situation.

Most container orchestration solutions don't have that problem, because they embed their own server and provide the necessary proxying solutions so that you may connect to the cluster from outside as well.


I'm running a stateful db... it's like $5 to run myself vs $120/mo to have it managed. big difference. haven't had to touch it in years now but im also afraid to try upgrading it.


If you're afraid to upgrade yourself, you may eventually contract out for help. You'd need that contract to cost less than $1380 for each year of upgrades. Otherwise, you should have picked the managed solution.

Doing it yourself is probably only cheaper if you never need maintenance or upgrades, security patching isn't needed, or your time is cheap.


I would argue that if your spend is $5 a month your data is probably not very valuable to yourself or your customers. Which is totally OK for some projects, but $120 is not very much for that peace of mind and time-saving if your data is important to your company.


It depends more on the volume, the frequency of changes and the complexity of implementation. You can have a robust solution for a few bucks infra cost, if those parameters are small.


OK, but how much time are you spending on managing these yourself? If it's non-zero then the cost between $5 and $120 for a company is trivial. For a personal side project, sure, but if we are talking about paying customers' data then even things like automated backup and recovery need to be taken into account.


For backups I agree, they obviously should be automated, but recovery? Not in my experience, not at the scale we're talking about here. You want something that is simple and semi-automated.

Backup recovery in a small, simple, robust system is very rare. Automated recovery is complex and there are quite a bunch of gotchas and perhaps project specific things to take into account.

But automated backups are typically straightforward. You need to know when and how to do them and where to put them.

I think there are tons of reasons to pay for an managed database if you need it. But it is not a baseline requirement for many projects.


Backup is automated on a nightly cron. Restoring the entire DB is easy, I have a script to do it. But ya if something blows up in prod or partially blows up, ... usually the situation is you wrote some shoddy script and blew up half the database so then you have to import it somewhere else and untangle that mess. So you're quite right, can't easily be automated.


It's required very little maintenance so far, and it may even be a positive ROI because I'm using the same setup for development. Since it's part of the same Kubernetes config, it's easy to spin up the DB under Docker Desktop for home development. I don't know how I'd do that with a managed solution.


Well not exactly $5, it's in the same cluster, so.. hard to put a price on it. Plus a couple bucks for backups on B2. Oh plus the persisted volume.. I think I've got 25 GiB at 10c/GiB so $2.50 for 'hot' storage and maybe another $2.50 for 'cold' storage. And then it's running on like a $20 or maybe $40 node which I share with some other stuff. Anyway, not much!

For managed, I don't think I'd want to go lower than the $50 plan which would be roughly equivalent to what I have now.. would include storage but not backups I think.

But ya the data is valuable.. I'm running nightly backups. I want to do continuous backups someday or have some way to not lose up to 24 hours of data if something happens.


Discovery can be quite simple, we've been doing it for years. On ECS/AWS we do it with just ALB and the cluster. Turns out DNS works really well, surprising for a fundamental internet technology. Tho if it's broken it's likely DNS that is the culprit.

If I need to route to other things I just configure other dns entries with SSM Parameter store... or I just manipulate the local resolvers in the VPC.

Very straightforward.

But I'm generally running just my software, not other people's, and consuming aws datastores.

This is for work.


I am doing the same at my current company. We have a small team of 10 engineers, and DNS for discovery on AWS has been enough for us for the last 5 years. There hasn’t been a single issue for us in that time.


Does k8s not just use dns? That's what Hashicorp Nomad uses - DNS via Consul.


It does, every pod, service, etc. in the cluster has a fully qualified DNS name resolved by a DNS server in the cluster.


My point is that people come up with very powerful and complex routing systems using kube and conflate it with with "discovery". We did this at my work. Consul was the "only solution" that would "work" for us (pre kube). Consul is a type of discovery, but people also lop in service mesh features which are well beyond "what IP address do I need to send packets at".

4 years later consul is gone in aws for us replaced by this simple dns/alb scheme. We haven't really needed all the features of advanced discovery.

The key in general to "discovery" seems to be load balancers and dns in our approach (service protect thyself vs well behaved clients). Easy to use LBs on aws is very powerful feature. We don't have that on our on prem and it changes how we approach things. There discovery is harder.


Yep, K8s has managed DNS, much like Nomad allows you to have.

But if that's a feature that you want, you can even use Docker Swarm which I mentioned in another comment here, which basically gives you the same thing, so that you can reference other containers by their names.

Of course, it's a bit simpler than Nomad, so if you're already using Nomad then there's no point in moving away from it, though this still illustrates that there are plenty of simpler options than K8s!


If you have a complicated microservices topology, then yes you're going to need heavyweight infrastructure tools to manage the cross-service traffic, isolate and bin-pack heterogeneous workloads, etc.

You probably don't need to have a complicated microservices topology though. A load balancer->stateless app tier->managed RDBMS architecture should be more than enough until you're at a point where you can afford to scale a serious infrastructure team.


We should choose our technology stack like a hermit crab chooses it's shell. The shell shouldn't be too heavy to move around in but with a little room to expand in.

When it's time, we work to find a new shell.


This is a delightful (and wise, IMO) way to think about right-sizing dependencies.


I love it.


My 2c:

Start with basic k8s.

Just getting a couple of apps deployed on a cluster with logging, scaling, pipelines takes ~ 2 weeks.

Trying to rebuild that basic functionality using ad hoc solutions takes way more time and rapidly becomes more cumbersome as the wheel gets reinvented when k8s as a platform has well documented solutions/tools.

Putting it into production leads to a handful of footguns that take a while to sort out related to pull-policies, caching, scaling, security, etc. But it's fairly manageable. And probably easier to premempt than customized solutions since these pitfalls are somewhat well documented.

Past a certain point though, especially as the work veers into things like operators, sketchy helm packages, service meshes, k8s falls apart fast if you don't have people on it full time, and it's much better to write some customized code.


100% I’ve seen this happen in my current organization as they chose ECS, and then reinvented half of kubernetes in house in a fragile way.

Basic kubernetes where you are just deploying a web app is incredibly simple. There is no more complexity than any other container deployment tool, but Kube can grow with you like few other tools can.


> Trying to rebuild that basic functionality using ad hoc solutions takes way more time and rapidly becomes more cumbersome as the wheel gets reinvented when k8s as a platform has well documented solutions/tools.

That is just not true. 1 day if you know what you're doing. If you use a well performing language you won't even need scaling for a very long while, unless you expect like "Eurovision Song Contest" type of concurrent users.


  > On AWS, that would be Fargate on ECS, or on Google Cloud, Google Cloud Run.
  > You won't have to manage servers, network overlays, logging, or other necessary middleware.
I disagree with this take. EKS is a managed service just like Fargate and you have to learn how to manage both equally (VPCs, CIDR ranges, IAM rules, etc). You might as well start on kubernetes if you are going to switch to it eventually.

  > I'd suggest that teams adopting Kubernetes (even the managed versions) have an SRE team, or at minimum, a dedicated SRE engineer.
I'd love to hear what parts of running EKS require an SRE team and how Fargate/ECS solve that issue and make it self-serviceable.


If your developers already have a basic understanding of AWS, the APIs are more similar between ECS and other AWS services. K8s introduces an unrelated API nested inside other Amazon APIs. Imo ECS service setup is a simpler interface/API and the load balancer integration is very good.

K8s can go a lot of different ways depending on the type of LB and whether you opt for ingress and whether your ingress is cloud provided or runs in your cluster.

Same with logging. ECS you just configure a log group and get persistent logging and basic aggregated searching with Cloudwatch Insights. K8s you get ephemeral, unaggregated logs unless you introduce additional tools.

When I worked with ECS, it was originally setup by software engineers and it was scripted using the AWS SDK and worked almost exactly the same as the rest of the stack that used things like S3.

Starting with k8s would require learning a new SDK/API versus some new endpoints in the one you're already working with.

On the other hand, as we grew, we'd hit weird issues like ECS nodes going dead without the backplane rescheduling containers on working hosts (I think they've improved ECS agent health checks since then, though)


But if we are comparing apples to apples, you would just use AWS provided stuff for EKS right?

  > K8s can go a lot of different ways depending on the type of LB 
k8s flexibility shouldn't be counted against it here. If you are considering k8s against fargate, you should only be considering ALB / NLB ingress and not the many more ways you could. Just use what AWS provides and be happy with it :)

  > ECS you just configure a log group and get persistent logging and basic aggregated searching with Cloudwatch Insights
You can log to cloud watch with EKS as well. Fluentd can log to cloudwatch with very little configuration.

I agree that if you are already "all AWS" and just want to put one more thing in there, Fargate might match your existing patterns better. But saying "Fargate is easier than managed kubernetes" is very wrong.

In general I've seen people have an easier time understanding kubernetes manifests for declaring their services instead of the equivalent terraform to get fargate up and running to do the job.


Fargate is a subset of ECS that uses fully managed VMs. We used our own ASGs with the AWS provided ECS optimized AMI.

Even adding fluent-bit is more software to manage. ECS uses the Cloudwatch Logs driver integrated directly in Docker with full support by AWS. Fluent-bit adds another layer of buffering, permissions, and resource usage you have to account for

Even using AWS provided stuff there's at least 4 ways to put nodes in your cluster (manually managed ASGs, cluster autoscaler managed ASGs, EKS managed node groups, Karpenter)


I've been at several companies. The ones where things were the smoothest were using App Engine, other was using Tsuru, and the other was using heroku. Zero problems.

The companies where we suffered the most, had to fight a lot to get anything shipped, and we where expected to understand a custom in house jungle of yaml files and script and half of the features we needed from the platform were half assed were using Kubernetes.

That's just my experience.


Comparing App Engine and Kubernetes makes it feel like we took a step backwards. Rather than just deploying code and have everything managed, now I suddenly have to worry about networking and load balancers again.


I agree with the initial advice to use a simple Docker container runner hosted service. I initially ran my Django app as a bare Docker container behind an Nginx reverse proxy, with a “docker pull; docker stop <old>; docker run <new>” script as my deploy job, and that was fine for a year. Took all of an hour to build the plumbing there. A hosted service would have been just as good and probably even less time to wire up.

I disagree with the OOM requirements for operating a k8s cluster. I ran our infra on GKE from 3 engineers through 15 as the primary Infra engineer (while also pushing code and being CTO), and it was hours-per-month of labor, not a dedicated SRE. I trained up some of the other engineers and they were able to help with on call after a few hours of training. For a simple app (a few deployments and services) it is really not hard to work with.

All that said I agree you don’t _need_ it at 5-person scale. I would not recommend you learn it if you don’t already know how to use it. But if you do already know it, you can get good value from using it much earlier than the article recommends. (For example I found Review Apps to be very useful for fostering collaboration between frontend and backend engineers, and that feature is not too hard to wire up on top of k8s.)

If I had to give one-sentence pithy advice I’d probably agree with the OP title.


In theory what the article suggest looks sound and easy. I have to strongly disagree though after having the unfortunate experience of hands on trying Azures managed container runtime where everything about it was just plain misery: Getting logs out, updating the running container, connecting to storage, all kinds of esoteric settings hidden under even more strange abstractions, lack of documentation and experience online, strange edge cases everywhere. You still have to understand all the complex cloud stuff like networks, ingress and storage accounts, etc.

We changed to managed kubernetes instead and even for a team with no prior experience it was much smoother. Fast declarative deployments, logs, attach, everything is just one instant kubectl command away. Documentation, blogs and resources are in excess so you will always find a way out. There are still some things that are difficult but I attribute most of that to the cloud generally. As someone else said, use it if you already know it and know what you are getting into. If you don’t, be prepared for some initial bumps and surprises. It has its warts but it’s not nearly as bad as some people portray it, those cases are likely more a problem with a micro service architecture gone out of control rather than k8s itself.

Sadly there is no middle ground alternative. Closest would be docker-compose, assuming someone runs the VM for you and that still comes with a lot of hassle of deploying files to that VM and you still need to configure all the cloud networking.


AWS ECS isn't much better. It's weird Cloud Providers don't see DevX as a priority, they are at serious risk of becoming commodities due k8s and likes.


Docker compose on VPS sounds much better than going straight to Azure/AWS


My 2c: you don't need K8s unless you are google-scale. Even if you think you are google-scale, you are not. Maintenance-wise and $$-wise, two VPS boxes with Cloudflare (even with an enterprise account) setup is usually cheaper than an ordinary K8s setup.

But again, it all depends on your use case, and people usually overestimate their use cases and the company's growth.


Kubernetes is not for Google scale. At that size you have completely custom infrastructure which is why Google continues to use Borg (and it's evolutions) instead.

K8S is specifically designed for the non-Google's to run with similar capabilities and features. Most anti-K8S sentiment comes from the overhead of installing and running it (vs using a managed service) and those that just never learned it. By the time you setup the same functionality, you will end up spending more work on a less reliable and agile setup.


"they" never learn because it's too complex to setup and maintain.

A managed service is too expensive, up to factor 200 prices for outgoing traffic vs a dedicated server with flat traffic (e.g. Hetzner)


K8s is not about scaling, it's about declarative configuration management, which gives you the opportunity to easily scale as an added bonus. It feels extremely liberating to have my code and Kubernetes files in a repository, knowing that I can deploy this with one command to production on any Kubernetes cluster. It's a great infrastructure abstraction layer.

I agree it's expensive though. A docker-compose.yml on a VPS usually works fine as well. Requires some additional configuration outside the Docker ecosystem though.


what's wrong with git clone go build systemctl restart my-service

Why would I want to proxy through docker?


This question refers to the benefits of containerization and not Kubernetes. But to answer your question: your command is dependent on (1) the availability of Git, (2) Go and (3) systemd on your host OS. Without containers, this list of dependencies is dynamic and can change in size for every project. With containers, the list of dependencies reduces to one dependency only for every project: Docker.


> Maintenance-wise and $$-wise, two VPS boxes with Cloudflare (even with an enterprise account) setup is usually cheaper than an ordinary K8s setup.

How do you feel about the benefits of using containers?

And if you do use containers, how do you feel about the benefits of orchestrating them, instead of running something like Docker Compose? E.g. having overlay networks and being able to deploy a new version of your software on multiple nodes simultaneously?

I've found that even when you don't want K8s, something like Docker Swarm or Hashicorp Nomad may still have benefits for you, depending on what you're trying to do. Swarm is basically feature-complete, boring and just like adding multi-node capabilities on top of Docker Compose anyways. Nothing to install apart from Docker and nothing to configure, apart from executing a cluster init/join command.


My tool of choice is usually Ansible, so, from the Ansible point of view, deploying to a container, VPS, bare server, or a cluster, looks pretty much the same.

I haven't mentioned Swarm intentionally, which IMHO I consider "the next iteration" of your containerized setup: Swarm is a boring, just-works, tool if you need to spread multiple containers, add some networking between them, and have to manage everything by a normal person. And boring is good :)


> My tool of choice is usually Ansible, so, from the Ansible point of view, deploying to a container, VPS, bare server, or a cluster, looks pretty much the same.

I agree that Ansible is an excellent tool! Though personally, I enjoy establishing a clear boundary between the "infrastructure" and "app" parts of any setup - the former ensuring that OS has whatever runtimes are necessary for any given application to run, user accounts, groups, folders, services etc., whereas the latter is whatever apps are running on the server.

So I'd use Ansible for most of the former, setup a container cluster and then use something like https://docs.ansible.com/ansible/latest/collections/communit... to manage actual application deployments. Why? To make the apps more throwaway and limit the fallout in case of bad configurations/deployments, as well as make sharing the base Ansible playbooks for what constitutes a production ready server setup easier amongst projects.

Of course, you could as well use Ansible to install something like JDK/Tomcat and set up its configuration which worked well for me in the past, but personally that didn't scale quite as well as running Java/Node/Ruby/Python/PHP in containers and approaching them with Ansible almost like black boxes (e.g. copy a bunch of files, like secrets, into these directories, deploy some arbitrary YAML against the Docker Swarm cluster), which was surprisingly easy to do.


> And if you do use containers, how do you feel about the benefits of orchestrating them

Why would I need that?

A container is just a fancy executable. No one ever talked about "executable orchestration". Container orchestration does not need to be a thing.

If you need anything more complex than `docker run` you are either extremely large and this whole discussion is irrelevant, or you need to question your life choices.


> A container is just a fancy executable.

Not just that, but containers also have a lot of knowledge and tooling around them for running services more painlessly. I suggest that you familiarize yourself with this excellent site: https://12factor.net/

> No one ever talked about "executable orchestration".

Actually thousands of collective developer-years have gone into developing Java EE, OSGi and also figuring out how to run "containers" inside of Tomcat, GlassFish and many other web servers, which basically were executables or modules that contained application logic that had to be run.

Both in the Java ecosystem and many others, it was eventually decided that attempts like this (as well as going in the separate direction and shipping VMs or VM images, a la Vagrant) don't really work that nicely and containers were a middle ground that people settled on - unified bundles of almost everything (sans kernel) that any given application might need to run.

Of course, other interesting projects like Flatpak, AppImage, snaps and even functions-as-a-service all concern themselves with how any piece of executable code should be run, especially in the case of latter. And then there are systems which attempt to distribute work across any number of worker nodes, actually you can just look at the entire HPC industry.

Thus, I believe that one can definitely say that a plethora of options for running things has been explored and will continue to be something that people research and iterate upon in the future!

> Container orchestration does not need to be a thing.

That's a lot like saying that OpenRC, systemd or other init systems shouldn't be a thing. Of course people are going to want to explore standardized options for figuring out how to organize their software that should be running! Even more so when you're running it across multiple nodes and the risks posed by human error are great!

Just look at what happened to Knight Capital because of human error: https://dougseven.com/2014/04/17/knightmare-a-devops-caution...

> If you need anything more complex than `docker run` you are either extremely large and this whole discussion is irrelevant, or you need to question your life choices.

I believe that this is an unnecessarily dismissive tone and a misrepresentation of what scales of work might benefit from container orchestration.

Not all of it needs to be as complex as Kubernetes. Frankly, in many cases, the likes of Docker Swarm will suffice, much like you might also run Docker Compose locally so you don't have to muck about with 5-10 separate Docker run commands just to launch all of the dependencies of an app that you're developing, or software that someone else has written and that you'd like to use.

But if you're running all of your software in prod on a single node, don't have any horizontally scaled services, or prefer to do everything manually, such as rolling out updates on a per-service basis, then the benefits that you'll get from containers will indeed mostly regard their runtimes, configuration, logging, resource limits and similar qualities, rather than any degree of automation, because you'll miss out on that.


I see your point but you might need Kubernetes before going « Google scale ». Google is probably the biggest scale in the world, that would mean they’re the only one requiring Kubernetes, which they’re not.

A big advantage of Kubernetes is that it can help to harmonize the deployment process of numerous teams in a whole company. As long as your company has multiple teams, including a team dedicated to administrating the Kubernetes cluster(s), then it can be a good choice. Because at this point, the cheapest solution is not necessarily cheap anymore.

That’s NOT a problem in startups, even less in early stage startups though.


My advice would be to try to structure your app to only use the database directly and structure all business logic around asynchronous actions that can be called via server less functions or regular services you’re hosting. IMHO this is the future of most softwares architecture.

Set up proper replication for your database and you’re good. Very few companies I’ve seen need more than this in principal. In practice there’s a lot of real time stuff that really isn’t necessary increasing architectural complexity.

The amount of service types that inherently need real time processing can probably be counted on a single hand


> server less functions

This is a nice idea, but many out there will need to work with on-prem infrastructure and self-hosted software.

So my question is: why would be the best open FaaS solution that you can setup on your own infrastructure and use without caring too much about the underlying hardware/OS?

> Set up proper replication for your database and you’re good.

Another thing that I'd consider is how the DB will grow over time. Should you have a single, monolithic DB you'll end up with hundreds if not thousands of tables which will be cumbersome to work with.

Personally I think that being able to use foreign keys across your entire domain is like a superpower, but some might advocate for separate DBs for separate domains, once you get that far.


Set up proper replication for your database and you’re good.

Sorry for maybe direct question, but I work at a small company and we are exploring options regarding cloud db now. So you’re saying that I can just set up two/three different-datacenter vps, put db replicas there and that’s fine? I know that my many safe-to-lose-data vps’s already run for years without any issues, but managed db cost is confusing, as if it was a nontrivial task that I shouldn’t do myself.


What you'd do exactly depends on your database. For Postgres you could use something like Citus, you also could setup your own duplication, but in general yeah setup a few VPS, setup the replicas and some load balancing and you're good.

Managed DB costs can seem confusing, look at your data for egress, storage, read and writes. Most have a calculator where you can plug in the data to estimate.


yup this is what we do at my work. There is data and then all business logic / asynchronous processing is done using serverless functions. Ends up being really cost effective and it can scale well. All the data exists (and setup with replication) so you can always migrate ur app to some other platform or architecture if need be.

we have CI/CD pipeline that builds and deploys serverless functions using serverless containers.

not a serverless enthusiast btw. microservices have their own set of issues but they can be the right fit for certain apps.


k8s is just one component of the whole architecture, yet people talk about it like it's the singular defining characteristic. It reminds of people talking about building "react apps" when react is one of probably hundreds of essential library dependencies and says nothing about 80% of the tooling. In my rig that I manage in a startup of 2 people, k8s (managed EKS on AWS) is the part that generally just works with very little effort. I provision and deploy to it with terraform and helm. The cluster itself is cattle. I can spin up a second identical cluster with a config file, cut dns over, then spin the old one down. It took around 3 weeks to get everything setup, but k8s was not the hard part, it was making all the OTHER decisions and integrations for things like container building, monitoring/logging/apm, secrets management, setting up a VPC correctly, writing some custom config scripts to generate the right setup for the 2 separate apps we run in both staging and production, etc. The work was undoubtedly far greater outside the k8s domain. In fact when it came to the k8s parts, e.g. defining services, ingress, etc, I was generally relieved and pleased. And now that I've done all this, I feel comfortable repeating it. Things run quite well and I have zero pressure to migrate.


I think the article is congruent with your thinking — k8s shapes those OTHER decisions in critical and may in some cases be the reasons why those decisions exist in the first place.

To me, the insight of the article was questioning whether a team of your size should be needing to make those decisions? It may very well be that your case requires k8s. But, I can think of many small shops that would benefit from keeping things simple and paying someone (a cloud provider) in the beginning of their journey and slowly adding complexity as success comes along.


It's funny to read this article and thread while doing exactly what everyone suggests to avoid: building Kubernetes on bare virtual metal for a team with few programmers without any dedicated devops or SRE roles.

The reason I'm doing it is because our business owner thinks that we need scalability and high availability. We have law obligations to keep our data inside a country. And we don't have any managed Kubernetes offerings inside our country. The best cloud stuff I've found is hoster with openstack API and that's what I'm building upon. I thought really hard about going with just docker swarm, but it seems that this tech is dying and we should rather invest into learning Kubernetes.

Honestly so far I spent few weeks just learning Kubernetes and few days writing terraform+ansible scripts and my kubernetes cluster seems to work good enough. I didn't touch storage part yet, though, just kubeadm-installed kubernetes with openstack load balancer, calico network and nginx ingress. I guess hard part will come with storage stuff.

Worst thing is: everyone talks about how hard it is to run Kubernetes on bare metal, yet nobody talks about what exactly issues are and how to avoid them.


For some projects there's no way around it, you have to build Kubernetes on bare metal/virtual machines. I've faced the same issues, the project called for Kubernetes, we can debate that requirement, but it was specified in the contract. Some project simply requires you to build Kubernetes on-prem, mostly for legal or political reasons.

I do question the logic of picking Kubernetes for scalability in those projects though. To make that work, you end up with a lot of excess capacity, unless you can scale down one workload, while scaling up another. E.g. scaling down a website or API at night and use the capacity for batch jobs.

Honestly building the cluster isn't my main concern, that pretty easy, I managed to write Ansible code for deploying a cluster in less than a day. My main concern is debugging and maintenance long term. Reading about companies that spin up a new cluster, because it's easier than figuring out while the old one broke, is an indication that Kubernetes might not be completely ready for production use.


Mixed use is, arguably, where k8s shines the most.

As in, you have a pool of hardware, and you want to optimize use of its capacity. Mix interactive jobs work batch jobs. Maybe even prod and non-prod. Fit as much as possible on smallest amount of nodes, etc.


Sadly what I see is people running separate clusters for prod, non-prod, staging or whatever you call it. I have never seen anyone use on-prem Kubernetes to optimize hardware usage.


K8s allows scale up and down. Having 2 different environments for prod and staging shouldn’t cause much extra costs of each can scale down on low usage. There’s a benefit in doing so: If your staging services accidentally require a huge amount of resources due to some of your own bugs, i.e. memory usage blowing up, you may easily pull down production for that due resource/monthly costs boundaries. The little extra money to run 2 clusters may be worth the money!


Containerization solves a team problem where the system dependencies for an application need to be in control by the team, and allow operations people to specifically focus on the infrastructure supporting the application.

From the technical side you can accomplish nearly all of the same goals using machine images and something like packer combined with any config management tool.

I guess what I'm saying is you should use containerization when the complexity of your application and complexity of your infrastructure is too high for an operations (DevOps) person to deal with. Or when it changes so frequently it's impossible to keep up with the specific application needs.

An example is some poor DevOps engineer who has to maintain terraform scripts for the infrastructure but also needs to know the version of python used in application XXX or the postgres header libs are required for YYY. And a team of 30+ application devs are changing this constantly. It's a burden and a risk to require a DevOps engineer to remember and maintain all of this. So you start looking to docker so this responsibility can be handed off to the team that owns the application.

So in short if you're a small startup and have 1 or 2 DevOps guys you'll probably be okay with a very simple system of building machine images. As the complexity grows this handoff of machine requirements can be handed over to the teams by using docker.

And if you do this properly by abstraction the system build code away through makefiles or bash scripts the transition from machine images to dockerfiles is pretty straight forward and easy. Possibly as easy as creating a packer file that will build the docker image instead of machine images.

Kubernetes is just a tool for the operations people to manage the containers.

I guess what I'm saying is if you can't automate properly with basic machine images, you should really tackle that first. And that containers solve a team logistics problem, not a technical one.


The problem is there are no sane in-between options.

On one end of the spectrum are either neat platforms like Heroku or Vercel or ssh and bear-metal with simple scripts.

On the other end of the spectrum, we have Kubernetes.

Everything in between:

- The learning curve is much steeper than Heroku and Vercel

- The skill is not likely to transfer to the next job

- The ecosystem is not as complete as Kubernetes

Most mid-sized companies went for Kubernetes because the in-betweens are not very optimal and need to take some risks on betting them.


I think docker-compose / docker swarm are sane inbetweens. They only do half of what Kubernetes does but it's the half you need at smaller scale.


I'd have to disagree. I tried docker compose and swarm. nothing but issues. I went back to old fashioned bash


If you’re interested in something that creates a great developer experience on top of container runtimes but which supports more complex workflows and apps than docker swarm, check out withcoherence.com (I’m a cofounder). We orchestrate containers from dev to prod in your own cloud, without abstracting away what’s happening under the hood, but also without forcing you to deal with all the operational complexity of “doing it right”


i find managing aws from a lambda with an aws sdk to be a good in between.

trigger the lambda on events and/or a schedule.

lock around dynamodb to ensure a single lambda at a time is mutating aws.

example:

https://github.com/nathants/libaws/tree/master/examples/comp...

knowledge and intuition about aws primitives is definitely a transferable skill.


> On one end of the spectrum are either neat platforms like Heroku or Vercel or ssh and bear-metal with simple scripts.

> On the other end of the spectrum, we have Kubernetes.

Someone else mentioned Docker Swarm, but allow me to offer my own thoughts.

The simplest option (what you allude to) is probably running containers through the "docker run" command, which can sometimes work for a limited set of circumstances, but doesn't really scale.

For single node setups, Docker Compose can also work really nicely, where you have a description of your entire environment in a YAML file and you can "orchestrate" as many containers as you need, as long as you don't need to scale out.

The aforementioned Docker Swarm is a simple step up that allows you to use the Compose syntax (which is way easier than what K8s would have you use) but orchestrate containers across multiple nodes and also has networking built in. It's simple to set up (already comes preinstalled with Docker), simple to use and maintain (a CLI like that of kubectl, but smaller), the performance and resource usage is small and the feature set is stable. It's an excellent option, as long as you don't need lots of integrations out there and whatever Docker offers is enough.

From there, you might look into the likes of Hashicorp Nomad, though their HCL is a bit more complicated than the Compose format, though still simpler than Kubernetes. The setup is a bit more complicated (if you need Consul and TLS encrypted traffic) but overall it's still just a single binary that can be setup as a client/server depending on your needs. Also, as an added bonus, you can also orchestrate other things than containers, much like how back in the day there was Apache Mesos which supported different types of workloads (e.g. you can launch Java apps or even native processes on nodes that you manage, not just containers).

Of course, even when you get into Kubernetes, there are also projects like K3s and k0s, maybe with tools like Portainer or Rancher, which let you manage it in a simpler manner, either through an easy to use UI or with those K8s distributions being slightly cut down, both in the plugins that they come with, as well as their resulting resource usage and data storage solutions (e.g. use SQLite instead of etcd for smaller deployments).

In my eyes, betting on OCI is a pretty reasonable option and it allows you to run your containers on whatever you need, depending on what your org would be best suited to.

> The skill is not likely to transfer to the next job

I'd argue that if you need to read a book to figure out how services are deployed in any given environment, then you probably should have a DevOps/Ops team and not have to worry about it as a dev. Or, if you're a part of said team, you should still mostly just use OCI containers under the hood and the rest should be much like learning a new programming language for the job (e.g. like going from Java to .NET, which are reasonably similar).

> The ecosystem is not as complete as Kubernetes

Kubernetes largely won the container wars. No other toolchain will ever have as complete of an ecosystem, but at the same time you also dodge the risks of betting on some SaaS solution that will milk your wallet dry with paid tiers and will fold a few years down the line. I'd say that all of the aforementioned technologies support all of the basic concerns (deployments, monitoring, storage, resource limits etc.).


Just wanted to say that I think this is an excellent overview of the landscape!


Thanks!

Of course, things might change somewhat in the next years, we have seen both new tooling be developed and become viable, like Lens (https://k8slens.dev/) and some nice CLI tooling, like k9s (https://k9scli.io/), as well as numerous other options.

Though I guess things won't change as much for Docker Swarm (which is feature complete and doesn't have much new stuff be developed for it) or Hashicorp Nomad (because their "HashiStack" covers most of what you need already).


i would have agreed with OP three years ago when kubernetes was very niche and setting it up was difficult. today, kubes is very easy to get going with.

you can set it up locally with kind or k3s for local dev, and use a cloud vendor's flavor in production. last time i tried to get a local dev env working for lambda, i spent a lot of time hacking on runtime-level stuff. it was not pleasant.

additionally, the market of devs and operators fluent in it has grown by a lot. many people are getting their CKx certs, and there are enough companies using it now to create reliable supply.

i say this because the lift from "my app is working in docker" to "my app is working in kubernetes" is much smaller than it used to be. given that OP is suggesting container runtimes as the alternative (which can get very expensive; much more so than using kubernetes for everything), i think that if a business is at a point where they are containerizing to accelerate releases, then kubernetes is a natural next step. anything else in between is at best a costly dependency and at worst throwaway.


Kubernetes are amazing at what they do, but only relevant to ~0.1% of the companies in my opinion. It's way too complex and too much work for the rest of the world, and not worth the time invested.

A lot can be accomplished with simple virtual machines and some sort of auto scaling groups (depending on your cloud provider they have different names).

Kubernetes are amazing at unifying your workloads on any clouds though. If you care about portability, you should either consider using kubernetes for everything or using a tool that abstracts your configuration in a cloud-agnostic way. Although I'm a bit biased on this one.


Simple is subjective and dependent on where you're coming from. If you're used to Kubernetes then managing virtual machines seems needlessly time consuming and vice versa.


Might never need kubernetes: https://stackexchange.com/performance


Stack Overflow has been increasingly using Kubernetes for last few years, though.


Oh man, my org at FB was the test bench for moving to containerization and fancy service discovery and all that (Tupperware, just a different Borg reimplementation). It eventually got pretty good, but it never stopped being mad overkill for single-thousands of boxes. When you’re in the 10s or 100s of thousands in a fleet, or when you’ve got workloads that don’t neatly slot into your SKUs, Kube/Borg/TW/Mesos are The Way. No doubt about it.

But it’s always seemed zany to me to stack namespaces/cgroups/etc. on top of Xen or whatever EC2 is using. Yo dawg I heard you like an abstract machine so I put…

There are just separate concerns:

- reproducibility (shared libraries argggghh) - resource limits to bin-pack SKUs - service discovery - failover - operational affordance

And I’ve seen it get so ugly to conflate these very different imperatives. Running 100Ks that need to web serve on demand but web index when idle? Yeah, now you need the whole enchilada.

But it’s false and harmful to promote the idea that the minute you need Grafana or DNS or Salt/Ansible/Nix/whatever that you need BORG.

There are scenarios where I would enthusiastically break out Kube, but most of the marketing around it falls into my “and you will do nothing, because you can do nothing.” bucket.


It's really hard to break through the fog of metaphors here. Are you saying that below 10000 boxes it's worth doing something else to deploy on to them than k8s? What would that be?


Well, if you're using it for development, it doesn't make sense.. You're probably doing all the microservices, which should mean that those services are the responsibility of another team. It also means they should have a testing / staging / development version online somewhere.

If you're developing MS Paint.. do you really need to have to compile windows and all the dependencies?


>You're probably doing all the microservices, which should mean that those services are the responsibility of another team.

You can't imagine how many time i had to tell customers what micro-services are.

Micro-service-architecture are teams who declare a common interface, and then can change their service without informing other teams and the whole system still works.

Declaring a Micro-service-architecture need's a lot of time and highly experienced interface planer's (in a perfect world those should never change).


Running Caprover on a VPS has been a very nice alternative to full K8s. It is like your mini Heroku but not too much magic involved. A lightweight wrapper around Docker containers but comes with a nice GUI and handles networking between your applications. the One Click apps are also very useful for quickly spinning up databases and stuff like that.


Startups should almost never use k8s. They need to iterate fast and ignore the complexities of infra. k8s is far too complex for most small companies.

CapRover Droplet on Digital Ocean + deploy your Rails app with git. Scale your single VPS up as needed. Most don’t need much beyond that for quite a while.


I have never heard of CapRover and I dont know how to use it. I do know how to use Kubernetes across Digital Ocean, AWS, GCP and Azure. Just had to learn it once. I don't even use it for scaling. Not complex at all. I've used it for four years. Literally never had a problem with it. It just runs.


my advice is use k8s if you know it, don't if you don't


How can anyone tell if they know kubernetes?

Most people who complain about k8s being complicated, convinced themselves they know it in the first place, and thought it is somehow going to make their life easier down the road. But it is not a plug and play tool as they hoped and unlike most tools where you can learn as you go, with k8s it becomes troubleshooting as you go.

It is like when I tried to use git branches on my first git based plaintext writing project. I thought it is nice and all but when you throw merge conflicts and rebase and remote branches within the first 2 weeks, it shifted from doing the actual thing, to just troubleshooting git. With k8s that is exponentially complicated.


I think you answered it yourself. If I find a lot of pain in using it it means that either I just started on it or I made a half assed attempt at learning it. Most people don't go deep. Why people with 10+ years of exp have the same depth as juniors during interviews? It's because of things like this.

So my suggestion remains the same. If you reached out and really tried to understand k8s, you'll find it so powerful to work with. Otherwise, please just leave it on the shelf.


I doubt it’s the people using k8s wrongly that complain it’s complicated, it’s the people affected by other teams choices and have to suffer unreliable messes as a result. Like me. I can see that it’s too complicated for me, and I want nothing to do with it. The people using it often never acknowledge they’re in way over their heads IME.


On the other hand, if you _think_ you know kubernertes, don't use it, as it means you just have no idea ;)


I figure that Kubernetes will eventually become very mature and stable, the rate of changes will slow down, and it will become a predictable building block like Linux is now. I'm personally choosing to avoid using it for now.


As a person that is deeply involved in Kubernetes and Istio, I'm starting to get the feeling that in the beginning, it should be totally acceptable to run containers from Docker Swarm. If your main need is "restart container on error" (which seems likely for startups), you probably can't beat the fast and easy deploy time of Swarm. Also, when the time comes to upgrade to Kubernetes, you won't be locked in to some incompatible solution.

Of course, if you care about scaling, serverless is probably the way to go.


Is docker swarm still maintained?


Ah sorry, I meant Docker Compose, not Docker Swarm.


There was a time when conventional wisdom was that we all needed to be using XML data exchange, all the time. Now the simpler format JSON dominates.

I hope Kubernetes ends up being the next XML.


As a startup it is very important to spend as much time and money as possible on the features not the tech. If the tech is not the feature that is. Most startups should build one single monolith API running on some default offering in the cloud. Serverless services can be used as compliment for some workloads. Building, running, and maintaining containers is unnecessary for this and it does slow down the dev process.


Curious question: why don’t companies consider an abstraction like EC2? They can run k8s on EC2-like virtualization, right? I had great experience with EC2: the ability to launch thousands of machines reliably with full control to the machines, the ability to manipulate all the metadata and configuration without learning any additional shit like HCL is a huge productivity booster. The simplicity of EC2 seems a great foundation layer for us to build more advanced resource allocation.

Case in point, it drives me nuts that one has to spend hours learning how to use a template system in Nomad to pass in the simplest configurations. I can’t fathom why one would be even slightly interested in learning any shit of Nomad just to deploy a god damn docker container. Don’t we have more interesting problems to solve and more general knowledge to master?


EC2 instance boots in 2 to 3 minutes, a Docker container starts in seconds. You can specify precisely how much CPU/Memory you need for a container, with a VM, you're stuck with what the cloud provider offers. You still need to get your app artifacts on that VM, how do you plan to do this?

EC2 (VM abstraction) is great for some type of software, but don't forget you'll also need a Load Balancer, provisioning TLS certs, security groups, autoscaling groups, VPN, public/private subnets…


ec2 alpine on t3.small boots in 30 seconds. 45 seconds for spot.

default-like vpc with public subnets and internet gateway is fine. zero trust.

route53 health checks and returns up to 8 hosts in random order.

to the gp’s point, using aws sdk to manage ec2 is shockingly easy. a lambda on a 1 minute timer with an aws sdk runs circles around autoscaling groups et al.


Oh, I love container-based deployment. I was just wondering if it’s worth building abstractions like EC2 that requires very little learning to use, use EC2-like systems to manage machines that host containers


That’s ECS without the Fargate.


> with a VM, you're stuck with what the cloud provider offers

AWS has a very big menu of machine types.


A lot of choice yes, but they are all either slow or overpriced.


Kubernetes on AWS won't be cheaper. And if you go with another provider, they're likely to also have an API to automatically provision VMs.


I highly recommend the kops[1] tool from Kubernetes if opting to deploy/manage Kubernetes yourself. I’ve had great experiences with it in the past (have been using since before EKS or Fargate existed).

kops let’s you define your Kubernetes cluster in yaml, then can deploy directly or output terraform that you can use to deploy.

1. https://github.com/kubernetes/kops


> Curious question: why don’t companies consider an abstraction like EC2?

That suggestion doesn't make any sense.

It's like you want the price gouging of AWS but the problems of managing your own custom low-level infrastructure.

If you really want to throw money at AWS and you want Kubernetes, it's hard to argue in favour of any option other than AWS's managed Kubernetes service: EKS.

Your goal is to get an app running, not to manage a cluster that has an app running.


At withcoherence.com, we agree that leaning on managed runtimes for as long as possible makes a ton of sense. I’ve also seen that hiding the complexity of transforming code into deployed containers by fully abstracting away dockerfiles and CI/deploy scripts can lead teams into a tough spot at a bad time to learn what’s really happening.

But the appeal of k8s is often the ecosystem of tools that solve real problems these runtimes leave on the table: managing multiple environments, load balancing across services, SSL/TLS, SSH, long-running tasks, managing multiple versions, integrating tests into pipelines. Coherence is working to solve these problems and create a great developer experience without hiding what’s really going on under the hood.

(Disclosure, I’m a cofounder)


I was in a startup with a small team and we used Kubernetes for servers that required more than 8 GB memory. Cloud Run and App Engine doesn't offer more than that, at least not at the time. The alternative were to deal with virtual machines ourself with Ansible scripts, and I'm not sure how that would auto scale + it would break the existing flow of Docker containers for everything. It took a while to figure out how to put Kubernetes in a closed VPC, but after that it was fairly straight forward.


As far as i know Cloud Run offers now up to 32 GB memory workloads


My advise would be exactly opposite. Dont EVER use serverless container use kubernetes instead if you need scale or not.

1. Use GKS/EKS/LKE/DKS. dont try to setup kubernetes cluster in your server by yourself.

2. It's very simple to setup deployment, database etc. it will take a week of a decent engineer to setup your application in kubernetes cluster - end to end.

3. LKE/DKS is super cheap compared to Heroku.

4. Use github actions (free) and docher hub ($5 month to build your container images) and its very easy, all you need is a weekend.

5. IMPORTANT: It's foolish to architecture your application to fit into serverless container.


I run a quite successful SRE consultancy based on exactly this advice. You don't need all of the features of K8s. You do want enough of them that it's worth going straight to it.


Why the paid [docker] hub recommendation if already using GitHub Actions? Its container registry has a free tier too (I assume there's a limit but so is there for Actions).


Docker Hub is all-you-can-eat with regard to image storage; GitHub's container registry is metered based on storage, and last time I tried it (over a year ago) it was also really hard to clean up old images.


That does seem like one of those "too good to be true" arrangements that can fall victim to pricing updates down the track.

That said, not unreasonable to take advantage of it while it lasts.


Yeah, my immediate reaction was "and that's why Docker Hub will either start charging more or go out of business, both of which will be rather disruptive to your business".


Does it not have paid tiers too though? Like Actions (which you suggested as free) also increases as you need more minutes.

(On the other side, Docker Hub is I think a lot more if not completely free if the underlying repo is public? I know I have a few low use ones and don't pay a penny, anyway.)

I suppose my point really is just why suggest paying for image storage intially/before CI, they can both be free.


We did hit some magical limit on storage and transfer. And we where a paying customer. $250 or something. No warning, no emails, they turned off our our accounts. Hence no deployments. Support was silent.

Now we use quay.io and a self hosted harbor docker registry.


Can you elaborate on 5?

Also why do I need k8s? Beanstalk plus RDS works perfect for my website needs most of the time.


Not OP but web apps are probably an exception to the comment. While running a web app is within Kubernetes capabilities, there are better solutions for that specific use case.

Obviously, the details will vary somewhat based on the web app. A simple static site with occasional HTTP POST/PUT to server will have somewhat different requirements than a streaming media site (like say, Twitch).

In the case of the former, if building in AWS, I’d choose: * Route53 * CloudFront + S3 * API Gateway + Lambda * DynamoDB (or RDS, if necessary) * Terraform + [Terraform Cloud | GitLab | GitHub Actions] * Cognito (if necessary)

A website probably wants the CDN distribution, routing, and caching that Kubernetes doesn’t provide. I mean I think there are some Kubernetes @ Edge services, maybe?, but in general, Kubernetes for a website is somewhat overkill/like bringing a bazooka to a knife fight.


> In the case of the former, if building in AWS, I’d choose: * Route53 * CloudFront + S3 * API Gateway + Lambda * DynamoDB (or RDS, if necessary) * Terraform + [Terraform Cloud | GitLab | GitHub Actions] * Cognito (if necessary)

Sounds like an overkill for “a simple static site with occasional HTTP POST/PUT to server”. I’d say, get a VPS for $5/month, run your python server there with local SQLite DB. Maybe put it behind free Cloudflare if you expect HN-frontpage-level traffic.


AKS (Azure) is good too. Should take a day to set up the cluster unless it is the first time ever.


Use what your ops/DevOps Team know.

K8s managed can easily be used by any size startup.

I use it in mine because I'm the ops team as well and doing k8s for 4 years.


The organization must have a need first.

And then, after you have a legitimate need, use a hosted Kubernetes solution. Don't roll your own.

It's really that simple.


Am I supposed to believe these blanket statements from a person who doesn't realize 1e0 = 1e1 = 1e2 = ... = 1 ?


I thought 1e2 = 1 * 10^2


The title is actually very confusing and so is the article. After reading it a few times I finally understood that the author is actually pro Kubernetes.

He is just saying that if you are in the really early stages of your startup, don't use Kubernetes right away. You will probably want to use it eventually.


My startup just bypassed all this container stuff and went straight for AWS Serverless. If you design it well, it works excellently.

If/when we need long-running workloads we'll go to containers but thus far we're just rocking out with lambda, and sns/sqs and Eventbridge.


I dont get the shit k8s gets. Especially on managed offerings like Aks or fargate, they're very easy to deploy, having several environments is easy, maintainance is straight forward.

As a generic rule I'm against using complex tech where there's no need, but I feel like we're way past that on k8s.


I've worked at 3 organizations that adopted k8s. This all happened over quite a long period so I saw what k8s adoption looked like at varying stages of k8s development.

k8s gets shit for one very simple reason: it forces teams to deal with issues that they're not necessarily ready to deal with just yet. I've seen it happen over and over and over.

Small startups are able to get away with not using auto-scaling, having bad secret management, not running minimally sized containers, etc, etc. Are they going to solve these problems eventually? Yes, if they grow and don't die. But until they grow they simply can't afford dedicating a lot of effort setting up Hashicorp Vault with proper secret rotation (just as an example).

One of the companies I worked at was a 10 person startup that was trying to find product market fit at a break-neck pace. We had customers, big ones. It was a non-trivial piece of software too. We ran the whole thing on a very beefy EC2 instance. The DB and the application were on the same server. Provisioning an application server was all done with a single bash script. One time we had to re-provision this server and we had take everything down for like 3 hours. We did at 3am so none of our customers got disrupted.

Every single ounce of engineering time was devoted to product development. We ended up getting bought and everyone (all 10 of us; founders were generous and awesome) made a crap ton of money. I can tell you with absolute certainty that every major feature we shipped contributed to that acquisition. Had we dedicated any effort at all to proper infra it just wouldn't have happened. The cost of our crappy duck taped infra was that one night we needed to reprovision everything and maybe another 20 total hours of wasted time across all engineers in the company.

That's why the first line in the blog post specifically cites early stage startups.


> it forces teams to deal with issues that they're not necessarily ready to deal with just yet.

Exactly this. Polyrepos, microservices, and their ilk all do the same thing. Bring forward problems you may one day need to solve and make solving them necessary for progress.


Yea. We had a monorepo. The whole thing was one big webapp.

Even our background workers ran in the web app. Oh yea, that's right. I can feel the cringing just writing that! So like if we had an expensive background operation, our web app would start up a thread and do the work. Some might be wondering: "What if you needed to restart it? Like, I dunno... when deploying code?". Answer: we didn't, lol. Our deploy script would hold all new jobs and wait until existing ones finished before restarting. Some background process were extremely time consuming (eg: 10+ hours) and those we wrote in a way that they could just resume if killed.

It's actually amazing how productive you can be with a single monolithic web app on a beefy cloud instance.

I was also surprised how quickly we were able to scale out to a proper multi-service architecture after being bought. It only took us 6 months. The big company that bought us plans ahead years. It was no problem at all.

Now don't get me wrong. I'm not advocating for sloppy engineering here. Our team was some of the smartest people I've ever worked with. It wasn't uncommon for me (a medium-experienced engineer) to bring a proposal to our technical founder that would require like 1 month of work. That freakin' genius would reframe the problem and architecture such that we'd spend 2 days, still achieve the same goal, and take on minimal technical debt. I learned A LOT.

I know I'm rambling, but honestly I don't know if that kind of hustle is even a thing anymore. Are tech startups still scrappy? Are they focusing on the core problems/solutions they're trying to prove out instead of layers of architecture? Do such problems even exist anymore? Back in the mid-2000s it felt like that's how everyone did it.


There's nothing wrong with the architecture you describe. In fact, I've even migrated/merged microservices into a monolith several times. What you describe about restarting, etc. is accurate, and you need to solve for that even in other architectures.

Conway's law is more important: "I need to deploy my team's change without waiting for your team's job to finish".


Thanks for sharing! Which startup / what was the problem / solution?


I'd rather not say but it wasn't anything mind blowing. It was a B2B thing with a great UI in the early 2010s. You'd never have even heard of us. All of this predated the popularity of k8s. Docker wasn't really a thing. AWS was just growing and blowing everyone's minds.


> We ended up getting bought and everyone (all 10 of us; founders were generous and awesome) made a crap ton of money.

Sounds mind blowing enough to me :)


Nothing about using k8s forces you to use microservices or auto scaling or good password management. These are just strawmen.


Sure. We could also not use a container registry. Or resource management. Or liveness/readiness probes. For the database we could use RDS and take the latency/$$$ overhead. For file storage set up a separate EBS volume or whatever. But... why?


Those managed offerings will break (given time, everything breaks) and you're suddenly left with a giant behemoth whose insides you probably don't know well.

I've been there, when one node lost communication with another but not the rest of the cluster - and I've spent some time working with Linux network stack (I mean the actual kernel parts), so I had some clues - but nonetheless I've ended up just scrapping the node with all the workloads and starting a new one. Because I haven't found anything obvious and that was easier than continue debugging. But it wasn't exactly great or fun and painless solution either (I don't exactly remember what sort of trouble I had with draining node but there was something enough to make me swear at the monitor).

So I'm not exactly against K8s but I totally advise to realize its internal complexity (no matter if it comes preconfigured by a cloud provider - it is still there) and ask if one's okay to eventually encounter it.


Agreed. You really get what you pay for with the managed k8s offerings, and you really can't get a lot of good software for ~$72 per month. Not having to run the control plane is a drop in the ocean of complexity that comes with k8s (node provisioning/patching, monitoring, hardening, upgrades, intrusion detection, cluster scaling, multi-cluster, ingress, tls).

That's why things like OpenShift exist which cost an order or magnitude more than just a managed control plane.


I've used k8s in startup and now in a large company, so I've seen both worlds.

In the cloud, running workloads on k8s is much more expensive than something simpler, like containers on instances in an AutoScaling group on Amazon, for example. You have to pay for the control plane in addition to the worker nodes, and if you only have a few workers, your control plane costs are significant for small infra. It's fairly easy to set up on {A,E,G}KS.

This cost, however, buys you flexibility. Say your software needs to run on-prem as well as in the cloud? k8s can be your abstraction, and the additional cost becomes worthwhile.

If you use it as a dumb container scheduler, without all this fancy stuff like Persistent Volumes or crazy scheduling constraints, it's not too bad. It does introduce extra complexity around traffic ingress, and is missing some basic functionality, like de-scheduling a Pod from a load balancer, and terminating it once no more connections exist. There are also difficulties around giving people access to the cluster in a safe way (for example, any admin can see the contents of secrets), and kubelets' decision to run without swap also leads to some annoying node failures that would be mere slowdowns if swap was present.

Like anything else, k8s is a mixed bag. If it doesn't solve specific problems for you, don't use it, it's pointless.


> is missing some basic functionality, like de-scheduling a Pod from a load balancer, and terminating it once no more connections exist. There are also difficulties around giving people access to the cluster in a safe way (for example, any admin can see the contents of secrets), and kubelets' decision to run without swap also leads to some annoying node failures that would be mere slowdowns if swap was present.

As written, none of this is strictly true. I suspect I know why you believe what you wrote is true, but I think you just need to spend some more time understanding the documentation and architecture. As an example, support for swap accounting was added last year, and while swap has not been supported, it's been possible to use swap perhaps forever.


I also think there’s a place for k8s in a lot of orgs, but having administered it for a while I think people expect more than is delivered in the managed offerings. k8s moves fast and you need to stay up to date. If you want to use k8s you have to be an organisation that is prepared to maintain it and update your apps around its maintenance.

The managed offerings especially don’t let you stand still: they kick off old versions of k8s as they are deprecated.


K8s is a real pain if you need to route to specific instances/pods or need to deviate from the ephemeral autobalanced pod setup. If you're just serving a stateless service, it's probably the easiest way to go.


You might be way past that. The teams with great engineers you work with are way past that. But many other teams with middling engineers aren’t.

Here’s my hierarchy of things the average engineer thinks they know well but actually fundamentally never understood and will occasionally screw up badly with escalating consequences as you go down the list:

1. Command line 2. Git 3. eMacs/vim? 4. Docker 5. Kubernetes

I myself personally acknowledge my lack of expertise and try my best to mitigate:

1. This I just try to learn well 2. I only use GitHub desktop (I like it’s limited feature set actually saves you from yourself) 3. Just use py charm? It’s 2022. 4. Docker is necessary but I try to find someone actually decent with it and get them to help me. 5. Just avoid. Unless you have an actual good devops team with 5+ people backing it. Beanstalk or lambda do just fine.


Kubernetes tends to bring in networking, storage, a whole host of Linux kernel features. All of those alone are tripping hazards before rolling them into one piece of software.


GKE autopilot deserves a mention. If you’re going to go k8s, then it’s pretty close to noops


I’m much more interested in creating the simplified replacement technology for K8.


Don't use AWS yet. Cloud setup is so complex that you should avoid using it.


Exactly - this discussion sound exactly like the ones we had when AWS started out


aws is fine. just use the good parts, skip the rest.


I’d just say “don’t learn k8s yet.“ if you know it, it’s fine, but it’s complicated to get right, so delay if you haven’t made by that journey yet.


I never quite got the K8S "it's too complex" hate, but to be fair I haven't scaled it very high.

Sure, it's verbose and there are N levels of abstraction, but it's a declarative API for running foo across multiple environments of bar. I've always wanted this.

I like raw, versioned infrastructure config with no extra crap. I have a little K8S.yml snippet I copy+paste+tweak into repos when I want to throw an ad-hoc experiment into a cluster, and then a bigger setup for IRL projects that looks something like this:

- k8s

  - base
    - api.yml
    - web.yml
    - worker.yml
    - namespace.yml
    - ingress.yml

  - overlays
    - dev
      ... config to merge ...
    - staging
      ... config to merge ...
    - production
      ... config to merge ...

    - shared
      - ... variable declarations, base config maps, etc ...
     
Everything gets merged into a manifest.yml and version stamped and build-artifacted. Deployment just means applying the config overlays via kustomize based on environment and then pushing out.

If things break, I always have an absolute, pull-the-chute, versioned, formal safe point to go back to: kubectl -n production apply -f manifest.version.yml


use K8s if you need it, don't use K8s if you don't, it's that simple ¯\_(ツ)_/¯.

It's not rocket science you don't need to read every week's opinion on K8s and you don't need to write one either.


I’m not sure the question of “will k8s save me time” is particularly easy to answer.


Could someone eli5 what 1e0, 1e1 , 1e2 stand for?


1en = 10 ^ n. So 1e0 = 10 ^ 0 = 1, 1e1 = 10 ^ 1 = 10, 1e2 = 10 ^ 2 = 100, etc.


Thank you!


Maybe I'm salty this morning, but I don't think people should take advice from someone that used more characters to write something in a less clear way.


I'm with you.


What about if you need rolling deployments?


All you need for rolling deployments is >1 instance and load balancing. You can use DNS load balancing which isn't very nuanced, a local load balancer per machine (like nginx with 2 backend on the same machine) or more complex load balancing systems


(2025)


Interesting article, thank you for the insights.


It's funny that so many startups are led to thinking that they "need" K8s


reminds me of when I started at a .com in 99 that failed... they felt they needed oracle and sun and cisco routers... we did everything on linux for dev work but the founder wanted to apparently be able to sell that hardware and impress investors. He got basically nothing back on any of it. They offered me a sun server for like $100 that cost $20k. I took the windows desktop instead.


reminds me about the need for AI/ML in random things. maybe it's to sell the "powered by AI/ML" tag. just another hypestorm


I think they are led to believe it’s a good bet to make early.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: