Doesn't look like the author knows what he is talking about. His point about early stage startup should not use K8S is fine. But the next advice about not using a different language for frontend and backend is wrong. I think the most appropriate advice is to choose a stack which the founding team is most familiar with. If that means RoR then RoR is fine. If it means PHP then PHP is fine too. Another option is to use a technology which is best suited for the product your are trying to build. For example, if you are building a managed cloud service, then building on top of K8S, or FireCracker or Nomad can be a good choice. But then it means you need to learn the tech being used inside out.
Also he talks about all of this and then gives example of WhatsApp at the end. WhatsApp chose Erlang for the backend and their front end was written in Java and Objective-C. They could have chosen Java for backend to keep frontend language same but they didn't. They used Erlang because they based their architecture on Ejabberd which was open source and was built with Erlang. Also WhatsApp managed all their servers by themselves and didn't even move to managed cloud services when they became available. They were self hosting till FB acquired them and moved them to FB data centres later on (Source: http://highscalability.com/blog/2014/2/26/the-whatsapp-archi...).
I don't think their advice about not using it in a startup is correct either. You just need to somewhat know what you're doing.
I know of such a case, where a single engineer could leverage the helm chart open source community, and set up a scalable infrastructure, with prometheus, grafana, worker nodes that can scale independently of web service, a CI/CD pipeline that can spin up complete stacks with TLS automated through nginx and cert-manager, do full integration tests, etc.
I found that to be quite impressive, for one person, one year, and would probably be completely impossible if it wasn't for k8s.
The thing is, unless using those technologies was somehow core to what the single engineer was trying to, it might be technically impressive but might not have actually provided value for users.
Users don't really care if you have a really impressive stack with cool technologies if it doesn't offer anything more than a couple of web servers and a DB server.
Right on. Previous devs at co I joined wanted to play DevOps cowboys. They used Ansible scripts to spin up various AWS services costing the company over 100K/yr.
New lead came in, got rid of that crap by using 3rd party services to spin up infrastructure. Got a load balancer, a few VMs + DB. Reduced the cost down by 85% and greatly simplified the entire stack.
I learned a really valuable lesson without having to make that mistake myself.
I understand why people get excited about tooling. It's cool to learn new things and automate stuff away. I'm prone to that myself and do this on my own server when I get that itch.
Having said that, it's wrong to foist this stuff onto an unsuspecting company where the owners don't know any better about tech, that's why they hire other people to do that for them and seeing that just left a bad taste in my mouth for overcomplicated setups.
I get that SV is different, that's why tools like K8 are made and I would jump on those tools on a heartbeat as needed.
But for other smaller businesses, the truth is they just need a boring monolothic load balanced app with a few VMs and a DB sprinkled with 3rd party services for logging or searching or other stuff not core to the business.
I know this utterly misses the larger point of your comment, but:
> They used Ansible scripts to spin up various AWS services
This seems less about using the "cool/new" tech... rather it's about using the "right" tech. Config management tools like Ansible/Chef/Puppet are very much previous-generation when it comes to cloud infrastructure.
They... can manage cloud infrastructure, but they were created prior to the ubiquity of cloud deployments, and the features are glued on. Not choosing a more modern IaC framework tells me they(those devs) were going to be making sub-optimal implementation decisions regardless.
Yeah, this project was several years old. Take this with a grain of salt, I'm not familiar with timelines in terms of k8s, but I would guess that it had not yet risen to popularity as it has in more recent years.
Rocket science isn't hard if you know it. Should we all build spaceships to deliver groceries? Good luck finding a few local rocket scientists in a pinch.
You can find plenty of auto mechanics though. Cars are cheaper and ubiquitous. Maybe they can't drive to the moon, but they can get most things done.
Unless your business is flying to the moon, stick to cars and trucks over spaceships.
I largely agree with the previous poster. If a single brief yaml file is too complicated, you're asking for too much from your tooling. I've seen hideous configuration management systems that did 1/100th of what kubernetes can do. There is some additional complexity around containers which is not a k8s complexity issue, and there are advanced features that most devs will never need to touch. Regardless, I'll take the complexity saved by k8s over custom built tooling or oversimplified PaaS any day.
His point is to use whatever simplifies workflow / reduces operational overhead. To some people, that indeed would be k8s. To you, that may be "managing a couple of webservers and a db". And that is great for you.
yeah I use k8s for basic webapps and it works wonderfully, and is way way easier than anything else, and yes I started developing in the 90s so I've seen it all. There is a bit of overhead in learning k8s, but once you know it, it's dead simple for every use case I've found and takes you way further than anything else.
> I know of such a case, where a single engineer could leverage the helm chart open source community, and set up a scalable infrastructure, with prometheus, grafana, worker nodes that can scale independently of web service, a CI/CD pipeline that can spin up complete stacks with TLS automated through nginx and cert-manager, do full integration tests, etc. I found that to be quite impressive, for one person, one year, and would probably be completely impossible if it wasn't for k8s.
But that's the thing though: they didn't do it alone. You literally pointed out that this wasn't true almost immediately: "leverage the helm chart open source community". They used the work of others to get to the result.
Also, I highly doubt they could debug something if it went wrong.
I simply cannot believe anyone would advocate, or believe, that because a Helm chart makes it simple to create a complex piece of infrastructure it must also be true that maintaining it is also simple. It's really.
Once you understand the concepts it's not hard to debug. It's fair to acknowledge that kubernetes is complex, but also we should not ignore the real work that has been done in the past few years to make this stuff more accessible to the developers that want to learn it.
Also, saying it's not "alone" in this example I think is not fair. What would you count as "alone"? Setting up the kubernetes cluster from scratch and writing your own helm charts? Using that same logic, I can't say that because someone else designed and built the hardware it's running on. I think it's fair to say that if someone, independent of coaching, regardless of the underlying infrastructure produced some production grade infrastructure by themselves, they certainly did it alone.
Using the helm charts from the community is arguably still doing it alone. There isn't anything back and forth. It's just the right tool for the right job. But, this starts being about language and semantics. Like saying that following a best practices on how to configure nginx isn't doing it alone, because someone else wrote that. Helm charts just very often expose what needs to be configured, and otherwise follows those best practices.
As for debugging. You do have a point that it becomes more difficult. But, this also holds true for any of the alternatives discussed her (lambdas, terraform). I'd argue that when it all comes down to it, that because you can spin up the entire infrastructure locally on things like minikube, that it makes it many times more easy to debug than other cloud-provider-only solutions.
How long did it take him to do this setup, a year you say, and that is impressive? I am not trying to be cute here, my question comes from a genuine place of curiosity. I've love to learn to spin-up a system like that, but from the tech/sales talks I see I am made to believe this can be done in a day. Expectation management is important, if people say ops is just a solved problem then I expect this to take very little time and to be easy to learn. Maybe I am learning the wrong thing here, and should do learn Helm or something more high level.
It took a year, but that was somewhat on the side of also building an OpenAPI based Web service, the gRPC based workers. So, it wasn't just the infrastructure stuff. If I were to estimate how much time for just the infrastructure and devops tooling, then two months. It's been up and running with less than 15 minute downtime over the course of two years.
I do consider this impressive. And, to be clear, I wouldn't say this is because of a "super-developer". In fact, he had no prior k8s experience. But rather that there are thousand upon thousand of infrastructure hours devoted to the helm charts, often maintained by the people who develop the services themselves. It is almost mind boggling how much you get for almost free. Usually with very good and sensible default configurations.
In my precious work place, we had a team of 5 good engineers purely devoted to infrastructure, and I honestly believe that all five would be able to spend their time doing much more valuable things, if k8s had existed.
As for whether or not such devops solutions could be done in a day. Hm. I don't know. These things should be tailored to the problem. If you've done all of this a few times, then maybe you can adjust a bunch of charts that you are already familiar with and do what took a couple of months and impressed me, in a couple of weeks. It's a lot more than just "helm install. Done", that goes into architecting a scalable solution. Implementing monitoring, alerting and logging. Load testing stuff. Etc.
That's seems like a very negative take in my opinion. This 'simpler operational tech' would still need to be able to scale, correct? If you think that there is a good and easier way to deploying 10-15 services, all of which can scale, and all of it defined in rather neat code, to be anything but "simple operational tech", then I believe you are confusing "solving a complex problem", with "simplifying the requirements of a complex problem". The latter of which has been stripped of many important features. K8S isn't anything magic, but it certainly isn't a bad tool to use. At least not in my experience, though I've heard of horror stories.
That does remind me that when that employee started, the existing "simple operational tech" was in fact to SSH into a VM and kill the process, git pull the latest changes, and start the service.
The only way you can solve the actual problem (not a simplified one) would in my opinion either be k8s or terraform of some kind. The latter would mostly define the resources in the cloud provider system, most of which would map to k8s resources anyways. So, I honestly just consider k8s to better solve what terraform was made for.
I'm sure the "simpler operational tech" meets few requirements for short disaster recovery. Unless you have infrastructure as code, I don't think that is possible.
>That's seems like a very negative take in my opinion. This 'simpler operational tech' would still need to be able to scale, correct?
Premature optimization is a top problem in startup engineering. You have no idea what your startup will scale to.
If you have 1,000 users today and 5 year goal of 2,000,000 users, then spending a year building infrastructure that can scale to 100,000,000 is an atrociously terrible idea. A good principal can setup a working git hook, circleci integration, etc capable of automated integration testing and rather close to ci/cd in about a weekend. Like you can go from an empty repo to serving a web app as a startup in a matter of days. A whole year is just wasteful insanity for a startup.
The reality for start-ups running on investor money with very specific plans full of OKRs and sales targets is very different: you need to be building product as fast as possible and not giving any fuck about scale. Your business may pivot 5 times before you get to a million users. Your product may be completely different and green-fielded two times before you hit a million users.
I can't imagine any investor being ok with wasting a quarter of a million+ and a year+ on a principal engineer derping around with k8s while the product stagnated and sales had nothing to drive business -- about as useful as burning money in a pit.
You hire that person in the scale-up phase during like the third greenfield to take you from the poorly-performing 2,000,000 user 'grew-out-of-it' stack to that 100,000,000+ stack, and at that point, you are probably hiring a talented devops team and they do it MUCH faster than a year
If you have a website with 1000 users today and product is going to be re-designed 5 times, it's probably best just to use sqlite and host on a single smallish machine. Not all problems are like this however.
Yeah to be honest, I run a k8s cluster now for my saas. But about 4 times more expensive then my previous company I ran on a VPS.
And scaling is the same that VPS I could just scale the same way. Run a resize in my hosting company panel. (I dont use autorescal atm)
Only if I would hit about 100x times the nrs I would get the advantage of k8s, but even then I could just split up customers into different VPS.
CI / CD can be done good and bad with both.
And in practice K8S's a lot less stable. Maybe because I'm less experienced with K8S. But also because I think its more complex.
To be honest k8s is one of those dev tools that has to reinvent every concept again, so it has it's own jargon. And then there are these ever changing tools on top of it. It reminds me of JS a few years ago.
Any startup that knows what their product is and are done with PoCs, should be able to deal with the consequence of succeeding, without failing. Scaling is one of those things that should be in place before you need it. In our case, scaling was a main concern.
and ... you might be justified in that concern. However... after having been in the web space for 25+ years, it's surprising to me how many people have this as a primary concern ("we gotta scale!") while simultaneously never coming close to having this concern be justified.
I'm not saying it should be an either/or situation, but... I've lost count of how many "can it scale?" discussions I've had where "is it tested?" and "does it work?" almost never cross anyone's lips. One might say "it's assumed it's tested" or "that's a baseline requirement" but there's rarely verification of the tests, nor any effort put in to maintaining the tests as the system evolves.
EDIT: so... when I hear/read "scaling is a main concern" my spidey-sense tingles a bit. It may not be wrong, but it's often not the right questions to be focused on during many of the conversations I have.
> I'm not saying it should be an either/or situation, but... I've lost count of how many "can it scale?" discussions I've had where "is it tested?" and "does it work?" almost never cross anyone's lips.
Also, discussions about rewrites to scale up service capacity, but nobody has actually load tested the current solution to know what it can do.
Just keep it simple, and if you take off scale vertically while you then work on a scalable solution. Since most businesses fail, premature optimisation just means you're wasting time that could have gone on adding more features or performing more tests.
It's a trap many of us fall into - I've done it myself. But next time I'll chuck money at the problem, using whatever services I can buy to get to market as fast as possible to test the idea. Only when it's proven will I go back and rebuild a better product. I'll either run a monolith or 1-2 services on VPSs, or something like Google cloud run or the AWS equivalent.
I should perhaps have clarified, but the 10-15 are not self maintained services. You need nginx for routing and ingress, set up cert-manager and other ingress endpoints are automatically configured to have TLS, deploy prometheus, which comes with node-exporter and alert-manager, deploy grafana.
So far, we're up at 6 services, yet still at almost zero developer overhead cost. Then add the SaaS stack for each environment (api, worker, redis) and you're up at 15.
Sometimes it's faster to implement certain features in another languages and deploy it as microservice instead of fighting your primary language/framework to do it. Deploying microservices in k8s is as easy as writing a single yaml file.
I am not privy to the details of the case, but a rule-of-thumb I heard once is that if it's far enough from your core, a SaaS can be used (obviating the whole question), and if it's part of the core, start by developing it as a separate functionality before moving it to another service.
In a lot of cases it's pattern abuse. I'm dealing with this all the time. People like to split things that can work perfectly as one whole, just for the sake of splitting it.
for example lambda (not microservices, running mini monoliths per lambda function)
yes by simple I mean covering high availability requirements, continuous deployment, good DORA measures - not simple as in half-baked non-functional operations (such as manually sshing to a server to deploy)
Ah, I see. Well, lambdas are also a nice tool to have, but it certainly do not fit for all applications (same as with k8s). I'd also point out that lambdas replace a rather small capabilities of k8s, and the type of systems you can put together. You would end up needing to set up the rest either through a terrible AWS UI or terraform. Neither of which I find to simplify things all that much, but perhaps this is a matter of taste.
In our case, the workers were both quite heavy in size (around 1 GB), and heavy in number crunching. For this reason alone (and there are plenty more), lambdas would be a poor fit. If you start hacking them to keep them alive because of long cold starts, you would lose me at the simple part.
Having very recently done this (almost, another dev had half time on it) solo, It's not _too_ terrible if you go with a hosted offering. Took about a month/month and a half to really get set up and has been running without much of a blip for about 5 months now. Didn't include things like dynamic/elastic scaling, but did include CD, persistent volumes, and a whole slew of terraform to get the rest of AWS set up (VPCs, RDS, etc). I'd say that it was fairly easy because I tinkered with things in my spare time, so I had a good base to work off of when reading docs and setting things up, so YMMV. My super hot take, if you go hosted and you ignore a ton of the marketing speak on OSS geared towards k8s, you'll probably be a-ok. K8s IME is as complex as you make it. If you layer things in gradually but be very conservative with what you pull in, it'll be fairly straightforward.
My otherhot take is to not use helm but rather something like jsonnet or even cue to generate your yaml. My preference is jsonnet because you can very easily make a nice OO interface for the yaml schemas with it. Helm's approach to templating makes for a bit of a mess to try and read, and the values.yml files _really_ leak the details.
With 1YoE I did most of that in about 3 months. Had a deadline of 6 months to get something functional to demonstrate the proposed new direction of the company, and I did just that. If I were to do it today I could probably rush it to a week, but that would mean no progress on the backend development that I was doing in parallel. A day is probably doable with more on-rails/ batteries included approaches.
Not because I'm amazing, but there's a frankly ridiculous amount of information out there, and good chunks of it are high quality too. I think I started the job early January, and by April I had CI/CD, K8s for backend/frontend/DBs, Nginx (server and k8s cluster), auto-renewing certs, Sentry monitoring, Slack alerts for ops issues, K8s node rollback on failures, etc.
The best way to learn, is to do. Cliche, but that's what it really comes down to. There's a fair few new concepts to grasp, and you probably have picked some of these up almost by osmosis. It sounds more overwhelming than it is, truly.
The problem is never spinning things up, it's in maintenance and ops. K8s brings tons of complexity. I wouldn't use it without thinking very carefully for anything other than a very complex startup while you're finding product-market fit.
You can get a majority of those things "running" in few days. If you don't want it to fall over every other day, then you need to have a ton of ancillaries which will take at least several months to set up, not to mention taking care of securing it.
Use a managed k8s cluster (eks, aks or gke). Creating a production ready k8s on vms or baremetal can be time consuming. Yes, you can do lamdba, serverless, etc. but k8s gives you the same thing and is generally cheaper.
It's actually pretty easy to do these days, even on bare metal servers. My go to setup for a small bare metal k8s cluster:
- initial nodes setup: networking configuration (private and public network), sshd setup (disallow password login), setting up docker, prepping an NFS share accessible on every nodes via private network
- install RKE and deploy the cluster, deploy nginx ingress controller
- (optional) install rancher to get the rest of the goodies (graphana, istio, etc). These ate a lot of resources though, so I usually don't do this for small clusters
I agree with that, setting up k8s on bare-metal took me 2 days, and we needed it to deploy elastic and some other helm charts as quickly as possible without loosing our minds maintaining nodes with some clunky shell scripts.
Also we bought us immediately an easy approach to build gitlab ci/cd pipelines + different environments (dev, staging, production) on the same cluster. Took me a week to set everything up completely and saved our team developing rapidly features really a lot of time and headache since then. But the point is, I knew how to do it, focus on the essentials and deliver quick reasonable results with large leverage down the route.
> deploy elastic and some other helm charts as quickly as possible
Bad culture alert! No one needs Elastic "as quickly as possible" unless their business, or the business they work for, is being very poorly run.
I would also argue that you might have got it running quickly, but how are you patching it? Maintaining it? Securing it? Backing it up? Have you got a full D/R plan in place? Can you bring it back to life if I delete everything within 3-6 hours? Doubt it.
> maintaining nodes with some clunky shell scripts.
Puppet Bolt, Ansible, Chef, ...
There are so many tools that are easy to understand that solve this issue.
That’s all solved for you, helm upgrade in ci/cd and bump of versions has been straight forward, if not snapshot rollback via Longhorn, also for DR. Accidentally deleted data => get the last snapshot, 5 minutes it’s back (except that there is of course CI/CD in place for new code + no write permissions for devs on the cluster and „sudden“ data deletion somewhat rare).
Elastic usecase is for crawling crazy amount of data and make it searchable and aggregatable and historically available, don’t know any other solution than elastic who has reasonable response times and easy-to-use access (plus we can add some application logging and APM).
> Puppet Bolt, Ansible, Chef, ...
Helm chart values.yaml and you’re all set, security + easy version bump included.
I believe elastic is available as service from AWS and elastic.co, if you need it fast, use that. If you need it long term it may be worthwhile to deploy your own for cost and flexibility purposes
> with prometheus, grafana, worker nodes that can scale independently of web service, a CI/CD pipeline that can spin up complete stacks with TLS automated through nginx and cert-manager, do full integration tests, etc.
> I found that to be quite impressive, for one person, one year, and would probably be completely impossible if it wasn't for k8s.
I've always found this interesting about web based development. I have no idea what Prometheus, grana, etc do. Ive never used k8s.
And yet, as a solo dev, I've written auto-scaling architecture using, for example, the AWS ec2 apis that let you launch configure and shutdown instances. I don't know what else you need.
Really the only advantage I see to morass of services is you get a common language so other devs can have a slightly easier time of picking up where someone left off. As long as they know all the latest bs.
In short. Prometheus is a worker that knows about all your other services. Each service of interest can expose an endpoint that prometheus scrapes periodically. So the services just say what the current state is, and prometheus, since it keeps asking, knows and stores what happens over time. Grafana is a web service that uses prometheus as a data source and can visualize it very nicely.
Prometheus also comes with an Alert Manager, where you can set up rules for when to trigger an alert, that can end up as an email or slack integration.
They are all very useful, and gives a much needed insight into how things are going.
> And yet, as a solo dev, I've written auto-scaling architecture using, for example, the AWS ec2 apis that let you launch configure and shutdown instances. I don't know what else you need.
This is fine, if you’re on AWS and can use AWS APIs. If you’re not (especially if you’re on bare metal), something like K8s can be nice.
I dont know AWS ec2 apis and for sure I'm not capable of writing auto-scaling architecture. This is the reason why I default to K8s. I have used it easily and successfully by myself for the last 4 years. It just keeps on running and hasn't given me any problems.
I've seen places hire a dev that write all the OPS stuff and they scaled awesomely.. I mean if they had purchased 100servers full time on amazon, they would have spent a fraction of the cost to scale, but they could scale.
In 5 years I think they've never once had to reach even near the 100servers.
At the same time. I can scale heroku to 500 servers, and still be under the cost of one ops person. I can make that change and leave it there. I can do that all in under 30 seconds. Oh. And CICD is built in as a github hook. Even with blue-green deploys.
I think his point was most start-ups don't need to scale more than a site like heroku can offer. If you need more than 500 servers running full time then it's time to start looking to "scale"
> At the same time. I can scale heroku to 500 servers, and still be under the cost of one ops person. I can make that change and leave it there. I can do that all in under 30 seconds. Oh. And CICD is built in as a github hook. Even with blue-green deploys.
And then Heroku shuts down.
If you're building something that needs to scale up rapidly if it succeeds, k8s is worth thinking about. Either you don't succeed, in which case it doesn't matter what your stack was, or you do, in which case you'll be glad that you can scale up easily, you'll be glad you are using a common platform which is easy to hire competent people in, and, if you were smart about how you used k8s, you'll be glad that you can relatively easily move between clouds or move to bare metal.
I think the set of cases where "we need to scale up rapidly if it succeeds" and "Kubernetes solves all of our scaling needs and we aren't going to have problems with other components" is almost empty. On the other hand, there are quite a lot of startups that fail because they put too much focus on the infrastructure and Kubernetes and the future and too little on the actual product for the users. Which is the point of the article, I think. Ultimately what matters is whether you sell your product or not.
> I think the set of cases where "we need to scale up rapidly if it succeeds" and "Kubernetes solves all of our scaling needs and we aren't going to have problems with other components" is almost empty.
I agree, but so what? K8s isn't magic, it won't make all your problems go away, but if you have people who are genuinely skilled with it, it solves a lot of problems and generally makes scaling (especially if you need to move between clouds or move onto bare metal) much smoother. Of course you'll still have other problems to solve.
Given that most startups never need to scale up much, it's not surprising that k8s is mostly used where it's not needed. But people usually prefer not to plan for failure, so it's also not surprising that people keep using it.
I mean, you still have to invest time on putting k8s to work, get people skilled with it, maintain and debug the problems... If Kubernetes didn't cost anything to deploy I'd agree that using it is the better idea, but it costs time and people, and those things might be better invested in features that matter to the users.
It depends. There are many things that carry a cost early but pay for themselves many times over later. Whether that will be the case for your startup depends whether you end up needing to scale quickly or not.
It's also worth considering that appropriate use of k8s can quite likely save you time and money early on as well. It standardises things, making it very easy for new ops people to onboard, and you might otherwise end up spending time reinventing half-baked solutions to orchestration problems anyway.
> It depends. There are many things that carry a cost early but pay for themselves many times over later. Whether that will be the case for your startup depends whether you end up needing to scale quickly or not.
Well, precisely what I said is that 99.9% of startups won't find themselves in a situation where they need to scale quickly and the only scale problems they find can be solved with Kubernetes.
> It's also worth considering that appropriate use of k8s can quite likely save you time and money early on as well. It standardises things, making it very easy for new ops people to onboard, and you might otherwise end up spending time reinventing half-baked solutions to orchestration problems anyway.
The point is that you might not even need orchestration from the start. Instead of thinking how to solve an imagined scenario where you don't even know the constraints, go simple and iterate from that when you need it with the actual requirements in hand. And also, "make it easier for new ops people to onboard" doesn't matter if you don't have a viable product to support new hires.
You seem to be describing very early stage companies, and if so I agree, host it on your laptop if you need to, it makes zero difference. But it's not binary with Netflix on one side and early stage on the other.
There are a lot of companies in the middle, and following dogma like "you don't need k8s" leads them to reinvent the wheel, usually badly, and consequently waste enormous amounts of time and money as they grow.
Knowing when is the right time to think about architecture is a skill; dogmatic "never do it" or "always do it" helps nobody.
What about CD of similar but not identical collections of services to metal? No scaling problem, other than the number of bare metal systems is growing, and potentially the variety of service collections. For instance, would you recommend k8s to tesla dor the CD of software to their cars?
Meanwhile, random_pop_non-tech_website exploding in traffic wasn't setup to scale despite years actively seeking said popularity through virtually any means and spending top dollar on hosting, and it slows down to crawl.
"Why no k8s?", you ask, only to be met with incredulity: "We don't have those skills", says the profiteering web agency. Sure, k8s is hard… Not. Nevermind that it's pretty much the only important part of your job as of 2022.
Obviously not, I was just pointing out that infra like k8s even under-the-hood for intermediaries (like web agencies) is still not always the norm given the real-world failures. There's this intermediary world between startups and giant corporations, you know. ;-)
>infra like k8s even under-the-hood for intermediaries (like web agencies) is still not always the norm
That's because 'the norm' for web agencies is a site that does basically zero traffic. If a company hires a 'web agency' that's by definition because the company's business model does not revolve around either a web property or app.
Whether that's a gas station company or a charity or whatever, the website is not key to their business success and won't be used by most customers apart from incidentally.
With that in mind most agencies know only how to implement a CMS and simple deployment perhaps using Cloudflare or a similar automated traffic handling system. They don't know anything about actual infrastructure that's capable of handling traffic, and why would they?
A lot of agencies are 100% nontechnical (i.e. purely designers) and use a MSP to configure their desktop environment and backups and a hosting agency to manage their deployed sites.
I very much agree with you. I must have been unnecessarily critical in my initial comment, I did not mean it as a rant, more like an observation about where-we're-at towards what seems an inevitable conclusion to me. Sorry that came out wrong, clearly I got carried away.
In asking if "Kubernetes is a red flag signalling premature optimisation", you correctly explain why we're yet on the "yes" side for the typical web agency category.
[Although FWIW I was hinting at a non-trivial category who should know better than not to setup a scale-ready infra for some potentially explosive clients; which is what we do in the entertainment industry for instance, by pooling resources (strong hint that k8s fits): we may not know which site will eventually be a massive hit, but we know x% of them will be, because we assess from the global demand side which is very predictable YoY. It's pretty much the same thing for all few-hits-but-big-hits industries (adjust for ad hoc cycles), and yes gov websites are typically part of those (you never know when a big head shares some domain that's going to get 1000x more hits over the next few days/weeks), it's unthinkable they're not designed to scale properly. Anyway, I'm ranting now ^^; ]
My unspoken contention was that eventually, we move to a world where k8s-like infra is the de facto norm for 99% of infrastructure out there, and on that road we move to the "no" side of the initial question for e.g. web agencies (meaning, we've moved one notch comparable to the move from old-school SysAdmin to DevOps maybe, you know those 10 years circa 2007-2018 or so).
[Sorry for a too-terse initial comment, I try not to be needlessly verbose on HN.]
>My unspoken contention was that eventually, we move to a world where k8s-like infra is the de facto norm for 99% of infrastructure out there, and on that road we move to the "no" side of the initial question for e.g. web agencies (meaning, we've moved one notch comparable to the move from old-school SysAdmin to DevOps maybe, you know those 10 years circa 2007-2018 or so).
This is very very hard to parse BTW. I don't want to reply to what you've written because I can't determine for sure what it is that you're saying.
Essentially I mean: scalable infra may be premature optimization today in a lot of cases, but eventually it becomes the norm for pretty much all systems.
You could similarly parse the early signs of a "devops" paradigm in the mid-2000's. I sure did see the inception of the paradigm we eventually reached by 2018 or so. Most of it would have been premature optimization back then, but ten-ish years later the landscape has changed such that a devops culture fits in many (most?) organizations. Devops being just one example of such historical shifts.
I anticipate the general k8s-like paradigm (generic abstractions on the dev side, a full 'DSL' so to speak, scalable on the ops side) will be a fit for many (most?) organizations by 2030 or so.
> Either you don't succeed, in which case it doesn't matter what your stack was, or you do, in which case you'll be glad that you can scale up easily
This take brushes right past the causes of success and failure. Early stage success depends on relentless focus on the right things. There will be 1000 things you could do for every 1 that you should do. Early on this is going to tend to be product-market fit stuff. If things are going very well then scalability could become a concern, but it would be a huge red flag for me as an investor if an early stage company was focusing on multi-cloud.
I certainly wouldn't recommend that anyone "focus on multi-cloud" in an early-stage company (unless of course multi-cloud is a crucial part of their product in some way).
Kubernetes is basically an industry standard at this point. It's easy to hire ops people competent in it, and if you do hire competent people, it will save you time and money even while you are small. As an investor "we use this product for orchestration rather than trying to roll our own solutions to the same problems, so that we can focus on $PRODUCT rather than reinventing half-baked solutions to mundane ops problems" should be music to your ears.
I agree with all of that. That said, I don't think competence is a binary proposition, and if you hire people who have only worked at scale they will be calibrated very differently to the question of what is table stakes. One of the critical components of competence for early stage tech leadership is a keen sense of overhead and what is good enough to ratchet up to the next milestone.
As many problems as containerization solves, it's not without significant overhead. Personally I'm not convinced the value is there unless you have multiple services which might not be the case for a long time. You can get huge mileage out of RDS + ELB/EC2 using a thinner tooling stack like Terraform + Ansible.
The overhead of containerisation is mostly in the learning curve for teams that are not already familiar with it (and the consequent risk of a poor implementation). A well designed build pipeline and deployment is at least as efficient to work with as your Terraform+Ansible.
If you have such a team, it can of course make sense to delay or avoid containerisation if you don't see obvious major technical benefits.
But those teams will get rarer as time goes on, and since we're talking about startups, honestly it would be questionable to build a new ops team from people with no containers knowledge in 2022.
Success is rarely so rapid that you can't throw money at a problem temporarily and build something more robust.
No one is advocating for a single server running in your closet, but a large and well funded PaaS can handle any realistic amount of growth at least temporarily, and something like Heroku is big enough (and more importantly, owned by a company big enough) that shutting down without notice is not a possibility worth considering.
Almost every k8s project I've looked at in the last few years is database bound. k8s is not really going to solve their scaling needs. They needed to plan more up front about what their application needed to look like in order to avoid that.
Yes, if your application looks like a web application that is cache friendly, k8s can really take you a long way.
In case it's not clear, nothing in my comment suggests that k8s will magically solve all your problems. It just provides abstractions that make growth (in size and complexity) of systems easier to manage, and helps to avoid lock-in to a single cloud vendor. The wider point is that thinking about architecture early will make scaling easier, and for most companies, k8s is likely to end up being a part of that.
The "web application" / cache-friendly part of your comment doesn't make much sense to me; k8s is pretty well agnostic to those kinds of details. You can orchestrate a database-bound system just as well as you can anything else, of course.
> if you were smart about how you used k8s, you'll be glad that you can relatively easily move between clouds or move to bare metal.
I'd argue you should definitely consider multi-cloud strategy from the get-go indeed in 2022. Something like Terraform helps statically setting k8s clusters on most clouds. Especially for startups, it's better to default to vanilla stuff and only complicate on a need-to basis.
Yes, completely agreed. Multi-cloud is really not that difficult nowadays, and it puts you in a better negotiating position (when you end up spending enough to be able to negotiate), as well as giving you more location flexibility and the ability to pick and choose the best services from each cloud.
Oh yes, negotiation is a strong argument in that context. One that makes or breaks a CTO's mission, me thinks, if that company expects a lean path to ROI.
A multi-cloud paradigm is also a great way to teach you about your application and about those clouds themselves. A good reminder that "implementation is where it's at", and "the devil is in the details".
The fact that they purchased 100 nodes has nothing to do with k8s but with their incompetence. You can run it on one machine. Also you can set up auto scaling easily based on whichever parameters.
That isn't the point. If he had a whole year, was there a tangibly better use of his time to get a product to market faster? What might the business implications be for doing or not doing so?
It seems many are focused on the time estimate. That was in creating the overall solution. About two months was to set up the infrastructure mentioned.
These often get developed side by side. GitLab, unit tests, api-server, nginx, cert-manager, deployments, integration tests, prometheus, metrics in services, grafana, alert-manager, log consolidation, work services and scaling, etc.
Just spinning up a cluster, nodepool, nginx, cert-manager w/let's encrypt cluster issuer, prometheus, grafana, can easily be done in a day. So, time estimates kinda depend entirely on what you mean by it.
Spinning up promerheus and grafana with automatic service discovery: one day.
Making good metrics, visualizations, and alerts: everything from a week to a month or two. So, take the time estimates with a grain salt.
As a rule, for anything in an startup, adding that "scalable" adjective is a waste.
Of course, exceptions exist, but it's one of those problems that if you have them, you probably know it, and if you have them and don't know it, you won't succeed anyway. So any useful advice has the form of "don't".
In a typical startup you are going to be strapped for cash and time. You need to optimize and compromise. In general, if you don't know how do something, then figure out if you need to learn it, or whether you can get by with a "good enough" solution, because there will be a queue of other things you need to do that might be more business-critical.
So if you already know Kubernetes, then great, use it. Leverage your existing skills. If you don't, then just use Heroku or fly.io or whatever, or go with AWS if that's your competence. Maybe revisit in a year or two and maybe then you'll have funding to hire a devops person or time to spend a week or two learning to do it yourself. Right now you want to get your SAAS MVP in front of customers and focus on the inevitable product churn of building something people want to pay for. The same advice goes for anything else in your stack. Do you know it well enough? Do you need it right now? Or is there a "good enough" alternative you can use instead?
And yet, to me it sounds like NIH since it's a pretty standard stack; couldn't they just get something like google app engine and get all of that from day one? Because did any of those things mentioned result in a more successful company?
I'd argue that using helm charts is the exact opposite of NIH. The things that take time are not the stack themselves, but the software and solutions. K8s just makes the stack defined in code, and written and managed by dedicated people (helm maintainers) as opposed to a bit "all over the place" and otherwise in-house, directly using cloud provider lock-in resources.
I'm sure there are plenty of use cases where that makes sense, and is a better approach. But, I disagree that k8s suggests a NIH-mindset.
Most startups get basic security for networking and compute wrong, K8s just adds even more things to mess up. Odds are even if you use an out of the box solution, unless you have prior experience you will get it wrong.
I will always recommend using whatever container / function as a service e.g. ECS, GCF, Lambda any day over K8s for a startup. With these services its back to more similar models of security such as networking rules, dependency scanning, authorization and access...
So question then - is it possible to found a tech startup without paying rent to a FAANG? Before I get the answer that anything is possible, I should say is it feasible or advisable to start a company without paying rent to the big guys?
The reality is unless you’re some rich dude who can borrow dad’s datacenter (And that’s cool if so), you’re either going to be renting colo space, virtual servers, etc.
It’s always a challenge in business to avoid the trap of spending dollars to save pennies.
IMO, you’re better off working in AWS/GCP/Azure and engineering around the strengths of those platforms. That’s all about team and engineering discipline. I’m not in the startup world, but I’ve seen people lift and shift on-prem architecture and business process to cloud and set money on fire. Likewise, I’ve seen systems that reduced 5 year TCO by 80% by building to the platform strengths.
I'm aware that no man is an island in some sense, but I'm not comfortable with locking myself into one of 3 companies who need to increase their revenue by double digits year over year. And as you say, a lift and shift is basically setting money on fire. Currently I run sort of a hybrid approach with a small IaaS provider and a colo. It seems to work well for us both technically and financially though that seems to go contrary to what is considered conventional wisdom these days.
That’s awesome. The most important thing is to understand why you’re making the decisions that you do.
Where I work, we can deliver most services cheaper on-prem due to our relative scale and cloud margins. But… we’re finding that vendors in the hardware space struggle to meet their SLAs. (Look at HPE — they literally sold off their field services teams and only have 1-2 engineers covering huge geographic regions. So increasingly critical workloads make the most sense in the cloud.
If you're priorities are 'which companies do my values align with among generally very high integrity companies to begin with' - then you might want to reconsider.
Google is not evil. They're just big, and define some practices which we might think should change in the future.
Once you have the thing up and running, you can start to think about hosting your own.
Also, you don't need to use fancy services because most startups can run just fine on a single instance of whatever, meaning, there are a lot of cloud providers out there.
If and only if your business model depends on it. A startup's job is mostly to find product market fit; if being decoupled from AWS isn't part of your market, you are spending money on a non-problem.
There is nothing stopping you from hosting your own OpenStack, managed k8s, and all that, on your own hardware. You would need a good reason to not let someone else deal with all of this though.
For a small enough company you could even just use use k3s + offsite backups. Once you grow large enough you can setup machines in 2-4 locations across the land mass where your users exist. If you have enough than a hardware fault in one isn't an emergency and you'd be able to fly out to fix things if needed.
Realistically, on all flash, you are very unlikely to need to maintain anything on a server for a few years after deployment.
That is probably a good idea for many startups. However, once you get into the world of audits and compliance certifications, things become a lot harder. But, but then again, at this point, I suppose it is easy enough to transition to some managed hardware.
Agree the author is wrong on that specific point, though thankfully the bulk of the article content deals with the headline, and is mostly fine wrt k8s.
Rather than the author "not knowing" what they're talking about, I suspect they're taking narrow experience and generalising it to the entire industry. Their background is selling k8s as a solution to small/medium enterprises: it strikes me that there may be a strong correlation between startups interested in that offering and those deploying failed overengineered multilang micro-architectures. Suspect the author has seen their fair share of bad multilang stacks and not a lot of counter examples.
The Whole advice of using same language is especially silly - iOS is stuck with Swift, and the web is stuck with JS, and maybe you need an applitation that scales using actors across mutiple machines with Golang or Java, or maybe you need to plug into Windows tightly and need C#.
Kubernetes is not 'harder' if all you need is to host a webapp. Where it falls on the hardness spectrum depends on what you are trying to do, and what is the alternative. I am very fluent with Kubernetes but have no skills in managing traditional virtual machines.
> The Whole advice of using same language is especially silly - iOS is stuck with Swift, and the web is stuck with JS, and maybe you need an applitation that scales using actors across mutiple machines with Golang or Java, or maybe you need to plug into Windows tightly and need C#
And you're also forgetting Android and macOS and Linux.
That's why cross-platform frameworks like Electron and React Native are so popular. The time wasted in going native for every single platform is just infeasible for most non-huge companies.
But you could also have 2 people working on React Native and have 1 person each for getting it to play nice with iOS/Android, and eliminate the need for an extra engineer.
Well, if React native is anything like the many react websites, then this isn't too far off actually. "modern" websites can already send your CPU puffing, when you hover over some element with your mouse pointer and it triggers some JS emulated style for :hover.
tss.. some people dont like being reminded that their favourite tech performs worse on an Nvidia 3090 than Winforms did on 800 mhz cpu running windows 98
Choosing more than one language as a startup can become really expensive quickly. As long as your tribes are small, chances are high that you one day run out of. e.g., python developers while you still have a lot of Java guys (or vice versa). This introduces unnecessary pain. (And obviously, you should have used Rust or Haskell from the get go for everything.)
The sole exception to this rule I would make is javascript which is more or less required for frontend stuff and should be avoided like the plague for any other development. As soon as you can get your frontend done in Rust, though, you should also switch.
Idk, I am someone, who has looked at many programming language, including all of those you mentioned. But a capable developer can be expected to learn a new language over the course of a few weeks if needed. I don't see how you could "run out of devs of language x", if you have capable devs on board. Especially, when those languages are all in the same programming language family/club.
Even the most capable developer that learns a new language in a few weeks will not be an expert in it. The difference in productivity and quality of the code will be huge. This is because in different languages things can be done very differently, it is not about the syntax as much as the best ways to do things.
I also thought WhatsApp is a bad example. They not only hosted themselves, but they used solely FreeBSD (as far as I know) in their servers. (which don't get me wrong, I find great as a FreeBSD sysadmin myself).
Using WhatsApp as an example of a lean engineering org should almost be banned at this point. WhatsApp had a high performing engineering team that used basically the perfect set of tools to build their application (which also had a narrow feature scope; plaintext messaging). Even with hindsight there is very little you could do to improve on how they executed.
Just because WhatsApp scaled to almost half a billion users with a small engineering team doesn't mean that's the standard, or even achievable, for almost all teams.
>Doesn't look like the author knows what he is talking about.
This was my first thought, and I was to comment so, but saw you already did. The only reason we see this comment is because HN has an irrational hate of K8s, for us that do run things in production at scale, k8s is the best option. The rest is either wrapped in licenses or lacks basic functionality.
I suspect a lot of the gripes and grousing about Kubernetes comes from SMEs trying to run it themselves. That will often result in pain and cost.
Kubernetes is a perfectly good platform for any size operation, but until you are a large org, just use a managed service from Google/Amazon/DigitalOcean/whoever. Kubernetes, the data plane, is really no more complex that eg Docker Compose, and with managed services, the control plane won't bother you.
K8s allows composability of apps/services/authentication/monitoring/logging/etc in a standardised way, much more so than any roll-your-own or 3rd-party alternative IMO, the OSS ecosystem around it is large and getting larger, and the "StackOverflowability" is strong too (ie you can look up the answers to most questions easily).
So, TLDR, just use a managed K8s until you properly need your own cluster.
Exactly! I chose k8s managed by Google years ago. As a solo developer you can have a cluster up and running with a web server, fronted by a Google LB with Google managed certificates in under an hour. Just follow one of the hundred tutorials. Just as quick and easy as setting up a single VM. But that really isn't the point is it. If that is all I needed, yes I'd use a $10 VPS. But for an "application", not a web site, you always need more.
My k8s "cluster" is a single node that doesn't cost me any more than a VM would. I don't need the scalability at the moment. But I do need different environments for dev, qa, and prod. And all three are running identically next to each other using Namespaces. Saved us a ton of maintenance and cost.
Any project that grows has its needs change. GKE gives you a ton of integrated tools right from the start including logging, alerting, metrics, easy access to hosted databases, pub/sub, object storage, easy and automatic network setup, easy firewall setup, dns management, and a lot more. k8s is no different than using any other hosted service. It provides a great set of features that you configure using fairly consistent yaml configuration files. And it is all accessible from the web based "Google Console" as well.
Learning the k8s yaml format and some basic kubectl commands is all you need to get going and it saves a TON of time that can go back into developing your application rather than dealing with configuring disparate pieces with their own configuration methods.
I was fairly early to k8s while they were still competing with other similar solutions and other tools like Puppet and Chef. I tested all of them and truthfully, k8s was the easiest to learn, implement, and maintain my app with. Using GKE of course. I would NEVER as a one man or even small team of developers take on managing an installation of k8s myself.
> I think the most appropriate advice is to choose a stack which the founding team is most familiar with. If that means RoR then RoR is fine. If it means PHP then PHP is fine too.
Taking human resource information into consideration sounds very wise. Although, learning a new language is generally not that a huge barrier, while changing your whole stack once the minimum viable product cap is passed can be very expensive. And if you need to scale the team, the available developer pool is not the same depending on which technology you have to stick with.
It doesn’t invalidate your point, but maybe it brings some relevant nuances.
> But the next advice about not using a different language for frontend and backend is wrong.
Being charitable, what I think they are getting at is maybe more about having fully separated frontend and backend applications (since the front-end examples he gives are not languages but frameworks / libraries). Otherwise it seems really backwards - I'm definitely an advocate of not always needing SPA-type libraries, but using literally zero Javascript unless your backend is also JS seems like it goes to a too-far extreme.
Re: single language, there's a grain of truth to it - see http://boringtechnology.club/ - but that one mainly says there is a cost to adding more and more languages. When it comes to back- and frontend though, I would argue there is a cost to forcing the use of a single language. e.g. NodeJS is suboptimal, and web based mobile apps are always kinda bleh.
"I think the most appropriate advice is to choose a stack which the founding team is most familiar with."
I'd think that's exactly what typically happens most of the time. But the degree of stack-lockin that occurs with startups still surprises me even when it's clear a better choice might have been made. Mostly due to management not being prepared to grant the necessary rewrite time.
sounds like it just boils down to: try to choose the technology your team is familiar with, not what other teams are successful with
Of course there's some balance needed. If your team is familiar with some niche language then long term that might not be a good strategy if you intend to bring more devs on board later.
One side of this which I don't think is discussed often is the fun of choosing new technology. How do you balance having fun and being realistic at the same time?
Fun meaning trying new technology, learning as you go, setting up systems that make you feel proud, etc. It can lead to failure, but I think having fun is important too.
Also he talks about all of this and then gives example of WhatsApp at the end. WhatsApp chose Erlang for the backend and their front end was written in Java and Objective-C. They could have chosen Java for backend to keep frontend language same but they didn't. They used Erlang because they based their architecture on Ejabberd which was open source and was built with Erlang. Also WhatsApp managed all their servers by themselves and didn't even move to managed cloud services when they became available. They were self hosting till FB acquired them and moved them to FB data centres later on (Source: http://highscalability.com/blog/2014/2/26/the-whatsapp-archi...).