Hacker News new | past | comments | ask | show | jobs | submit login
We don’t use Kubernetes (ably.com)
338 points by rocketpastsix on July 20, 2021 | hide | past | favorite | 250 comments




To others wondering what this company does:

> Ably is a Pub/Sub messaging platform that companies can use to develop realtime features in their products.

At this time, https://status.ably.com/ is reporting all green.

Althought their entire website is returning 500 errors, including the blog.

It is very hard not to point the irony of the situation.

In general I would not be so critic, but this is a company claiming to run highly available, mission critical distributed computing systems. Yet, they publish a popular blog article and it brings down their enire web presence?


> Althought their entire website is returning 500 errors, including the blog

I've been here in a similar situation, and my guess is they've:

1. Reverse-proxied /blog to a crappy Wordpress instance run without caching.

2. HN traffic killed the blog.

3. Cloudflare, in their infinite wisdom, if they see enough 5xx errors from /blog, will start returning 5xx errors for every uri on the site, with some opaque retry interval.

4. Voila, the entire site appears to be dead.


More likely the entire www. server is a CMS that fell over, and the application is on a separate subdomain, and they are only monitoring the application.


CF doesn't do 3) to my knowledge.


Highly doubt it they'd do it


Can you detail (3)? Is this a product feature documented anywhere? As others have pointed out, this does not seem likely


Ably CTO here. Well that went well ...

1) The realtime service and website are different things. The blog post is talking about the service, which has been continuously available.

2) Oops, the website fell over. We'll fix that. Thanks for all the advice :)


If you're running the public website through Cloudflare, have a look at APO makes caching WP at the edge easy


As a ton of engineers have transitioned to kube already. Like years ago, you have just made ramping up new engineers to your custom setup a pain point for scaling. But you probably already knew that. =)


They're not selling a blogging platform, as long as pub/sub works it's fair to say that they're up.


This is the face of your company, if you can’t handle the increased load on your blog why should I trust your other systems.

I run on k8s and it works really well


>This is the face of your company, if you can’t handle the increased load on your blog why should I trust your other systems.

I suspect the sort of individual visiting this website and reading the blog will know that their services and blog do not run on the same machines.


Auto scaling to cover insane unpredicted load like this is t really representative of anything other than a failure to have cost management in place.


I would have agreed with you if 'their blog AND their main service' also went down; Not just the website. But their main service is still up.

95% of the comments here parade about a hiccup of this service having a blog going down when that is not even their main service but continues to use an unreliable service like GitHub even when that goes down every month and they also don't use K8s either.

GitHub (and GitHub Actions) was proven to be very unreliable and the whole service went down multiple times and somehow that gets a pass? Even GitHub's status page returned 500s due to panicking developers DOSing the site. Same with GitHub pages.

But a single blog getting 500s but the whole service didn't go down? Nope.


It’s to be expected. Criticism of Kubernetes is taken personally by people who have dedicated themselves to it since it’s in vogue. It is very much showing the cracks in its perceived perfection and this company is far from the only one bucking the trend. We are seeing the long tail of people who don’t realize (or acknowledge) that Kubernetes is already on the other side of the hype cycle. That long tail is currently driving this conversation because commenting on a 500 is the lowest effort imaginable, but they do not speak for the industry. (Source: 24 years in FAANG-scale ops. I remember this cycle for many, many other fetish tools.)

Seriously, a Wordpress blog goes down (like that’s never happened under HN load) and all of HN is saying “lol what dogshit if only they used k8s for basically static content, the morons” which tells you a lot about the psychology going on here. A LOT. Otherwise smart engineers just can’t help themselves and redefine “irony” even though it contributes absolutely nothing to the discussion and doesn’t even address the fundamental criticism.

We are in a thread that essentially started with “the blog is down, how can I trust the product?” If you can figure out that logic please let me know. That’s basically saying “I don’t understand operations at all,” but here we are, listening to it and its Strong Opinions on a resource allocator and orchestrator.

This thread basically confirmed for me that Kubernetes is losing air. I already knew that, but the signal is getting a little bit stronger every month.


You assume A LOT, personally I’m tired of the “don’t use kubernetes” edgy blog posts that have very little substance to offer and mostly seem like people who haven’t studied distributed systems

Kubernetes isn’t perfect but it’s the best we have. Losing air to what?


The blog post itself explains it's existence:

"the following question comes up a lot:

“So… do you use Kubernetes?”

This has been asked by current and potential customers, by developers interested in our platform, and by candidates interviewing for roles at Ably. We have even had interesting candidates walk away from job offers citing the fact that we don’t use Kubernetes as the reason!"

Why do you think this is an "edgy blog post"? This is a piece of Marketing, both recruitment and regular. The implied subtext of the post is: "we don't use Kubernetes but its not because we don't know what we are doing".

Separately, what does having studied distributed systems or not have to do with Kubernetes?


It’s edgy because k8s is clearly the best system we have and has a lot of notoriety. The article is clickbait at best.

The distributed state algorithms that underlying kubernetes are what makes it what it is. I see very little added to the discussion. I see a hacked together AWS system that will be subject to the laws of entropy.


What's hacked together about using NLBs and EC2 instances with statically launched Docker instances on each EC2 instance driven by Terraform and Cloudformation? That's all pretty standard AWS stuff. I'd guess from having run that sort of infrastructure in the past they have less custom tooling than the average K8s shop. It's certainly easier to understand the failure modes.

Plus this doesn't look like a heavy microservice environment. With a monolith or small number of services that aren't changing often with dedicated LBs and autoscale groups I'm guessing their environment is fairly easy to monitor and manage.


> It’s edgy because k8s is clearly the best system we have and has a lot of notoriety.

Best system for what?


Very well said and agreed.


> This is the face of your company, if you can’t handle the increased load on your blog why should I trust your other systems.

https://xkcd.com/932/


They are selling a highly-available product. You don't pay for pub/sub, anybody can run that, you pay for reliability and scalability. Their approach to other parts of their infrastructure definitely can and should inform your decision to buy into their product.

In addition to insight in their infrastructure practices, this gives you a unique opportunity to look into how they deal with outages, whether they update their status page honestly (and automatically), and how fast they can solve technical issues.


Well said. Their main service and offering is still up. They are not hosting blogs or websites as their business.

This reaction is a storm in a tea cup left inside of a sunny cottage.

Downvoters: I don't see anyone screaming at this stock trading app that also stopped using Kubernetes [0].

[0] https://freetrade.io/blog/killing-kubernetes


It is embarrassing.

It also highlights a growing pet peeve of mine: the uselessness of status dashboards, if not done very well.

They're much harder than they look for complex systems. Most companies want to put a human in the loop for verification/sanity check purposes, and then you get this result. Automate updates in too-simplistic a way, and you end up with customers with robots reacting to phantoms.


Status pages are designed to be green so that Sales can tell people you're never down. Everyone does it, so you have to do it too.

My experience with monitoring 100% of outgoing API calls is that many services have well below a 100% success rate. Sometimes there are just unexplained periods of less than 100% success, with no update to the status page, and sometimes there's even a total outage. (I had a sales call with one of these providers, and they tried to sell me a $40,000 a year plan. I just wanted their engineers to have my dashboards so they can see how broken their shit is.)

The one shining light in the software as a service community is Let's Encrypt. At the first sign of problems, their status page has useful information about the outage.


If a marketing website/blog is not the thing you are selling, it very often does not make sense to host it yourself. Your time and energy is better spent focussing on other things, and paying a separate hosting/marketing company to manage that.

I can't say for certain that this is what happened here, and the irony is definitely there, but overall it's not valid to judge a company's reliability on a piece that is not what they are selling, and are likely to not even be managing themselves.


Is your assertion that if they were on Kubernetes, this wouldn't have happened? What's ironic about a blog going down from a company selling pub/sub software?


Besides, though the situation's undeniably funny, there are much more appropriate, simple, and proportionate ways to ensure your WordPress marketing site or blog don't fall over than putting them on k8s, even if you are a k8s shop.


I see their status page showing red for the website. https://imgur.com/a/mBbfLtZ

I'm not surprised when a company's status page only reports on (highly available) services, not the website or blog—which are likely run by marketing, not engineering.

Still, it's simple and free to set up status pages so there isn't much excuse.


> In general I would not be so critic, but this is a company claiming to run highly available, mission critical distributed computing systems. Yet, they publish a popular blog article and it brings down their enire web presence?

You may expect that company blog have a bit lower uptime than services it offers. As someone on purchasing side, I don't give a f** if a company website or blog is down for weeks, if their services are operational.

As a disclaimer - never heard of Ably, but wholeheartedly support non-kubernetes environment. Being CTO in large e-commerce company we do not use or even plan to use kubernetes.


.. and Kubernetes does health monitoring!


….and autoscaling


The "Website portal and dashboards" is picking up the disruption now.


It's stated on the status page:

> Our blog and website are experiencing load-related issues, leading to slow loading and 5xx errors.

All backend services (realtime and REST apis) are on entirely separate infrastructure and are unaffected.


99.99% SLA achieved. Jokes aside, I think the article is well written.


The HN post could have brought down their website, but do you know that for a fact? Maybe it was some unrelated incident or attack.


this implies their actual service and marketing website are running on the same infrastructure, which is unlikely


> To move to Kubernetes, an organization needs a full engineering team just to keep the Kubernetes clusters running

Perhaps my team runs a simpler cluster, but we have been running a Kubernetes cluster for 2+ years as a team of 2 and it has been nothing less than worth it

The way the author describes the costs of moving to Kubernetes makes me think that they don't have the experience with Kubernetes to actually realize the major benefits over the initial costs


Yes! Was going to say the same. Kubernetes is far easier to learn than some random bespoke setup. After reading the article, it just sounds like they reinvented the wheel but made it AWS specific for some reason.

Was brought on as a consultant years ago and found their bespoke setup of random ec2 instances and lambda scripts to be far more difficult to understand than just spinning up a managed Kubernetes cluster and having a generic interface to deploy the application, as well as monitoring, logging, metrics, etc.


> Kubernetes is far easier to learn than some random bespoke setup

This, to me, is the biggest advantage of kubernetes. Yes, you can do all the things yourself, with your own custom solution that does everything you need to do just as well as kubernetes.

What kubernetes gives you is a shared way of doing things. By using a common method, you can easily integrate different services together, as well as onboard new hires easily.

Using something ubiquitous like kubernetes helps both your code onboarding and your people onboarding.


Also not having to debug EC2 lifecycle issues is really nice. And you don't have to deal with all of the garbage of setting up ec2 instances (configuring SSH, process management, configuration management, log exfiltration, monitoring, infra as code, nginx, albs, etc).


Well, that and also a "nice" programmatic interface.


In addition to ease, why would I, as a s/w engr, want to invest in learning your weird stack instead of learning or using an equivalent tech stack that is actually marketable? Learning a technology can be a huge investment. Why would I want to invest in a technology with not much future and not much ability to translate into another position should this gig go south.


Note that Kubernetes skills are _hot_ right now. 6 months of Kubernetes experience is easily an extra $20k in salary.


Well, they say they're using CloudFormation and Terraform. Those are also fairly standard, popular tools, aren't they?


because you then know how to apply technology instead of a technology ?


So, I should be OK to use my most precious asset -- time -- on an investment in a dead-end technology, because that investment could possibly, maybe translate (in some reduced way) to me being able to use a different technology later on that is not a dead-end technology? How about I just invest directly in technologies that have a payback and not try to cram the entirety of all software engineering systems into my brain. I already have an unending amount of things and systems to learn in s/w, why shouldn't I prioritize the learning process towards investments that have the best paybacks.


Kubernetes is literally more complicated than any other similar system in existence. It is so by definition, because it builds in more complexity than any other system has. But it also lacks the features you often need, so you have to learn even more to be able to get it to do what you want. You could write several books on running K8s in production.


I was doing research to move set up some new system into scalable cloud infrastructure. My first option was K8s (EKS) and second was plain ECS+Fargate. Ramping up in K8s was so convoluted and painful that I decided to follow up with ECS. That has been quite straightforward.


My experiences with k8s have led me to never propose k8s in a new project. The k8s instances I have seen were bureaucratic and technical nightmares of unnecessary complexity. It doesn't provide a common ground for deployments because everyone will use and configure it differently.


>It doesn't provide a common ground for deployments because everyone will use and configure it differently.

Helm charts are used by 99% of the open source projects I've seen that run on top of Kubernetes. They are all written in a similar style so transferring settings between them is fairly easy. Helm will even create a barebones chart for you automatically in the common style.


A helm chart is not a complete deployment system, it's just a wrapper around kubectl. Neither provides everything you need to completely manage deploys.


But you can't write books on running Linux in production, or Apache, or Windows, or Oracle, or... since the book shelves are (not literally but metaphorically) too crowded for yet another one on the subject


What do you mean?

You used to work with Ably?


Exactly, plus all major cloud provider will happily host a kubernetes cluster for you if you ask them with your money. In a previous project, we managed more than 20 clusters on Azure (AKS) and AWS (EC2) with 4 people. The only tool that made this possible was kubernetes.

I've been running my own kubernetes cluster on a Raspberry pi. Does my cat count as an engineering team?


I feel like I must be stupid because I've tried several times to set up k8s on in-house hardware and it has never worked. I've tried several different "recipes" and the networking never works right. IDK if it's because things are changing so fast that the popular "recipes" are already outdated, or if they are being published without actually testing them, but it's left a bad taste in my mouth.

I'm sure the big cloud providers make it easy for end users to use, but that doesn't help me.


You can start with K3s: https://k3s.io/

The networking part is always the most challenging. Everything between your router and your kubernetes cluster should still be routed and firewalled manually. However, if you can live with your home router firewall and a simple port mapping to your machines/cluster, then routing the traffic and setting up the cluster should be relatively painless.


This^^ I started out with kubespray because I was familiar with Ansible (I even contributed a very small bug fix), BUT k3s is just so awesome and out of your way. It’s not only easy to install but easy to remove.


Don't run your own cluster, unless you do it to learn. If you're in the cloud, just use a managed instance and focus on building software.


How do people who do this debug issues in the k8s layer?

Or is this just the Jenga Model of Modern Software Development in action?


There's definitely a lot of Jenga going on here, but on the other hand: Kubernetes, when set up, has some very simple constraints that work. You don't often need to touch things at that layer; They become invisible, and when you're dealing with a cloud provider's hosted K8s, you don't get to touch them directly anyway.

K8s was a lot more simple earlier on. It's actually dying from adoption: There's a ton of dumb ideas bolted on top, that have become "standard" and "supported" because of demands from bad customers. The core is very clean, though, and you rarely need to interact with that.


How does Ably deal with issues in the EC2 layer?


That's one of those great questions about AWS. We actually have had to contact AWS on multiple occasions about EC2 layer issues, and each time I was thankful that a VM construct is very simple, comparatively, to reason about.


Little known but latest VMware Fusion and VMware Workstation comes with kubernetes OOTB (vctl, kind). It has never been easier to start up a cluster.


I don't know what specific problems you had with networking but I found using Kubespray an easy way to setup a cluster on different clouds.


Sounds like a useless use of cat.


This comes up a lot around here; many people look at “Kubernetes the hard way” and think they need to run their own cluster.

Just use GKE or whatever the equivalent managed offering is for your cloud provider. A couple of clicks in the ui and you have a cluster. Its really easy to run, I rarely have to look at the cluster at all.


I've been on a team of 6-8 running relatively large-scale clusters (2 big ones with lower hundreds of workloads through tens of namespaces/tenants, plus a couple smaller ones). To "keep the Kubernetes clusters running" is an afterthought at most. I mostly agree with your comment.


Do/Did you use some specific method or orchestration product to set it up?


We use GKE but there's nothing stopping most organisations to do the same.

It doesn't bear much weight though, and I've had experience with other toolings largely to the same effect.

I agree with the sibling comment about upgrades - this is IMO where GKE really shines as a cluster upgrade is mostly a button-pressing exercise.


Not OP, but similar workloads, team of 5 We use kops to deplot and manage the clusters. The hardest part is updating, due to the workloads running on it. Other than that, little to no problems with kube itself.


We used to run a very well run Kubernetes clusters with a single devops, then two. If you have senior devops engineers who know what they are doing and have the opportunity to greenfield it's really not that bad. We now have 4 devops engineers.

At the end of the day it's not necessarily Kubernetes that's difficult/time consuming, it's all the context around it.

How do you monitor/observability? Security? What's your CI/CD to deploy everything? How do you fully automate everything in reproducible ways within the k8s clusters, and outside?

I've been doing infrastructure management / devops for well over 23 years. Built & managed datacenters running tens of thousands of servers.

Kubernetes and AWS have made my life easier. What's really important is having the right senior people fully aware of the latest practices to get started. Greenfield now is easier than ever.


Same. We've been running EKS at my last two companies using Terraform without issues and very small teams. There's a learning curve for sure but the ongoing maintenance is pretty minimal. I suspect going with GKE would have a smaller learning curve.


Genuine question because I honestly have no idea: how much additional learning curve is added by the average cloud provider to create a Kubernetes hosting context/setup?

Knowing nothing about K, I’m constantly wondering how it could be simpler than dumping a binary into a PaaS (web) hosting context and letting it do its thing. I’m interested to learn.


I am running 5 k8s clusters on behalf of customers on 3 different cloud providers.

By myself.


I wanted to refrain from commenting because honestly I’m not the biggest fan of relatively opaque complexity and kubernetes tries its hardest to be this. (Cloud providers doing magic things with annotations for example)

But, I have to say that kubernetes is not the devil. Lock-in, is the devil.

I recently underwent the task of getting us off of AWS, which was not as painful as it could have been (I talk about it here[0])

But the thing is: I like auto healing, auto scaling and staggered rollouts.

I had previously implemented/deployed this all myself using custom C++ code, salt and a lot of python glue. It worked super well but it was also many years of testing and trial and error.

Doing all of that again is an insane effort.

Kubernetes is 80% of the same stuff if your workload fits in it, but you have to learn the edge cases, which of course increases tremendously from the standard: python, Linux, terraform stuff most operators know.

Anyway.

I’m not saying go for it. But don’t replace it with lock-in.

[0]: https://www.gcppodcast.com/post/episode-265-sharkmob-games-w...


I don't think lock-in should be a problem for most startups - in the same sense most startups don't need kubernetes.

For almost everyone, I'd say: just pick a cloud provider, stick to it, and (unless your whole business model is about computing resource management) your time is almost certainly better spent on other things.

I'm working at a company that had moved into AWS before I joined and I don't see it ever moving out of AWS. Of course we have some issues with our infrastructure, but "we're stuck at AWS" is the least of my concern. Any project to move stuff out of AWS is not going to be worth the engineering cost.


I guess the context is incredibly different.

Usually I’m not working in startups. Usually I’m responsible for 10s-100s of millions of Euro projects.

Of course being pragmatic is a large part of what has led me to a successful career; and in that spirit of course whatever works for you is the best and I’m not going to refute it.

I would also argue for a single VPS over kubernetes for a startup, it’s incredibly unnecessary for an MVP or a relatively limited number of users.

But I wouldn’t argue for the kind of lock-in you describe.

I have seen many times how hitching your wagon to another company can hurt your long term goals. Using a couple of VPSs leaves your options open.

As soon as you’re buying the kook-aid of something that can’t easily be replicated outside the you’re hoping that none of the scenarios I’ve seen happen again.

Things I’ve seen:

Locayta: a search system, was so deeply integrated that our SaaS search was permanently crippled. That company went under but we could not possibly move away. It was a multi-year effort.

One of our version control systems changed pricing model so that we paid 90x more overall. We could do nothing because we’d completely hitched our wagon on this system. Replacing all build scripts, documentation and training users was a year long effort at the least. So we paid the piper.

This happens all the time: Adobe being another example that hasn’t impacted me directly.

It’s important in negotiations to be able to walk away.


Why did you leave AWS?


I made a relatively large list of reasons. Most are going to sound fickle but I consider some to be very problematic if you’re woken up at 3am and have to orient yourself- others I consider problematic because they cause an order of magnitude increase in complexity.

Mostly it’s an issue of perception too, a cloud saves me time. If it doesn’t save me time it is not worth the premiums and for our case- it would not save time. (Due to the complexity mentioned before).

But here’s part of list (with project specific items redacted):

3am topics:

* Project name (impossible to see which project you're in, usually it's based on "account" but that gets messed up with SSO)

* instance/object names (`i-987348ff`, `eip-7338971`, `sub-87326`) are hard to understand meaning of.

* Terminated instances fill UI.

* Resources in other regions may as well not exist, they're invisible- sometimes only found after checking the bill for that month.

Time cost topics (stumbling things that make things slower):

* Placements only supported on certain instances

* EBS optimised only supported on certain instances

Other:

* Launch configurations (user_data) only 16KiB, life-cycling is hard also, user-data is a terrible name.

* 58% more objects and relationships (239 -> 378 LoC after terraform graph)

* networking model does not make best practice easy (Zonal based network, not regional)

* Committed use (vs sustained use) discounts means you have to run cost projections _anyway_ (W.R.T. cost planning on-prem vs cloud)

* no such thing as an unmanaged instance group (you need an ASG which can be provisioned exclusively with a user-data (launch script in real terms)

* managed to create a VPC where nothing could talk to anything. Even cloud experts couldn't figure it out, not very transparent or easy to debug.

Sticky topics (things that make you buy more AWS services or lock-in):

* Use Managed ES! -> AWS ES Kibana requires usage of separately billed cognito service if you want SAML SSO.

* Number of services brought in for a simple pipeline: https://aws.amazon.com/solutions/implementations/game-analyt...

* Simple things like auto-scaler instances being incrementally named requires lambda: https://stackoverflow.com/questions/53615049/method-to-provi...

* CDK/Cloudformation is the only "simple" way to automatically provision infra.


Well, his post is on GCP (Google Cloud Platform) Podcast, so there might have been a discount from Google.


I can promise very sincerely that no discount was given on the basis of switching.

In fact; cost did not factor at all.


The risk with using Docker in production is that you can end up building your own bad version of Kubernetes over time. K8s is fairly complex but it solves a lot of useful problems such as zero downtime upgrades, service discovery, declarative config etc


I was a big advocate for Docker-as-deployment-packaging for a long time, and cautious about Kubernetes. I imagined building something like Ably describes would be the most practical.

I was wrong. What I didn't understand is how easy kubernetes is to use for the application developers. It's the most natural, seamless way to do Docker-as-deployment-packaging.

If you're going to have infrastructure guys at all, might as well have them maintain k8s.


This 100%

People think that Kubernetes is just for service orchestration and it's true, it's very good for that. But what really sets it apart for me is the ability to extend kubernetes through operators and hooks that enables platform teams to really start to abstract away most of the underlying platform.

A good example is the custom resources we've created that heavily abstract away what's happening underneath. The apps gives us a docker image and describe how they want to deploy and it's all done.


The two things that have scared me away from k8s:

1) Churn + upgrade horror stories,

2) Apparent heavy reliance on 3rd party projects to use it well and sanely which can (in my experience, in other ecosystems) seriously hamper one's ability to resist the constant-upgrade train to mitigate problem #1.

Basically, I'm afraid it's the Nodejs (or React Native—folks who've been there will know what I mean) of cluster management, not the... I dunno, Debian, reliable and slowly changing and sits-there-and-does-its-job. Does it seem that way to you and is just so good it's worth it anyway, or am I misunderstanding the situation?


The K8s API is very conservative with backwards compatability and graceful deprecation. In general, cluster upgrades should have minimal impact on your workloads. The horror stories are usually from a cluster admin point of view and devs are usually shielded from that


Using Docker Swarm as an alternative to K8s in prod has been incredibly helpful for us as a growing startup. You get 90% of the advertised K8s goodies for almost zero extra complexity.

Using deployment scripts + docker alone would be insane, even at our small scale.


I see this argument made pretty frequently, and it might be correct for a team that's just starting out.

But service discovery and zero downtime upgrades are not that hard to implement and maintain. Our company's implementations of those things from 12 years ago have required ~zero maintenance and work completely fine. Sure, it uses Zookeeper which is out of vogue, but that also has been "just working" with very little attention.

A thought experiment where we had instead been running a Kubernetes cluster for 12 years comes out pretty lopsided in favor of the path we took for which one minimizes the effort spent and complexity of the overall system.


You don't even need to implement service discovery. AWS load balancers do that for you.


Came here to say this. It sounds like Ably is slowly moving towards creating their own orchestration tooling. Before you know it they'll need a bunch of additional features such as scaling up an individual service that runs on the same EC2 as 5 other services. Or permissions. Or whatever else. Kubernetes has it all solved. With the current abstractions given by major cloud providers such as EKS and GKE, it makes running it a lot simpler than it used to be. All of this effort they've spent on doing this would have likely taken a lot less effort than their current setup.


You don't build your own Kubernetes, though. You run EC2 instances that are sized for the container they run and let autoscaling take care of it. Load balancers distribute the load. Kubernetes makes use of a lot of the same stuff, but with even more complexity.

Orchestration, in general, isn't needed. The major cloud providers are already doing it with virtual machines, and have been for a long time. It's better isolation than Kubernetes can provide.


You are still orchestrating though, you still need to provision nodes and related resources, ensure containers are running on the node and are healthy, mount volumes, clean up old docker resources. Plus you are now completely locked in to EC2, which may not be a concern for you but it is if your product or service needs to run on different cloud platforms like ours, or the CTO decides to change cloud providers one day because they got a sweet deal. I’m glad it works for you but after working with k8s for 2 years I never want to write another Ansible / Terraform script to provision IaaS resources


> Orchestration, in general, isn't needed. The major cloud providers are already doing it with virtual machines, and have been for a long time. It's better isolation than Kubernetes can provide.

Sure it's not needed! But cloud providers aren't needed in much the same way.


Uh... what? Apples and oranges.

Whatever convenience you think Kubernetes provides over EC2/autoscaling (which Kubernetes uses, by the way) is several orders of magnitude less than the convenience of using a cloud provider.

That you would draw on-prem versus cloud as an equivalency to Kube vs other deployment methods reeks of inexperience, to me.

edit: Oh no, I seem to triggered a drive by Kube fanboy or two. Yes, stealth downvote because you disagree without defending your position. You will do so much to show how right you are.


Right, my point is, you can always run your infra much the same AWS runs their infra. Cloud providers give you certain advantages that on prem doesn't have, and vice versa. Equally, running container orchestration gives you advantages AWS cannot give you (as well as k8s does anyway), and vice versa.

I'm a definitely fan of K8s, but I'm not defending it here, however, to saying orchestration isn't needed is silly. In a way what AWS provides is orchestration, it's just for VMs instead of containers.

As a devops engineer I've worked with a lot of individuals and with a lot of tooling, and so far I can only say my opinion for container orchestation has only grown stronger. I recall having to explain to certain developers how they have to first figure out more than 5 different services for AWS, then use packer to build an AMI, which is provisioned using Chef, then they have to terraform the relevant services, then use cloud-init to pull configuration values. All in all I had to do most of the work, and the code was scattered in several places. Compare that with a Dockerfile, a pipeline, and some manifest in the same repo. I've seen teams deploy a completely new application from first commit to production in less than a week, with next to zero knowledge of K8s when they started. The same teams, who weren't pros but had a bit of experience with AWS, took several weeks to deploy services of similar complexity on EC2. Saving two weeks of Developer's time and headaches is a lot, considering they're one of the most expensive expenses a company has.


> I recall having to explain to certain developers how they have to first figure out more than 5 different services for AWS, then use packer to build an AMI, which is provisioned using Chef, then they have to terraform the relevant services, then use cloud-init to pull configuration values. All in all I had to do most of the work, and the code was scattered in several places.

That sounds horrible. You can actually do all that from a single repository, using just ansible alone, with one cli invocation:

- use packer to build an AMI => ansible

- provisioned using Chef => ansible

- terraform the relevant services => ansible

- use cloud-init to pull configuration values => baked into the image


Ok, where to start...

> Packing servers has the minor advantage of using spare resources on existing machines instead of additional machines for small-footprint services. It also has the major disadvantage of running heterogeneous services on the same machine, competing for resources. ...

Have a look at your CPU/MEM resource distributions, specifically the tails. That 'spare' resource is often 25-50% of resource used for the last 5% of usage. Cost optimization on the cloud is a matter of raising utilization. Have a look at your pods' use covariance and you can find populations to stochastically 'take turns' on that extra CPU/RAM.

> One possible approach is to attempt to preserve the “one VM, one service” model while using Kubernetes. The Kubernetes minions don’t have to be identical, they can be virtual machines of different sizes, and Kubernetes scheduling constraints can be used to run exactly one logical service on each minion. This raises the question, though: if you are running fixed sets of containers on specific groups of EC2 instances, why do you have a Kubernetes layer in there instead of just doing that?

The real reason is your AWS bill. Remember that splitting up a large .metal into smaller VMs means that you're paying the CPU/RAM bill for a kernel + basic services multiple times for the same motherboard. Static allocation is inefficient when exposed to load variance. Allocating small VMs to reduce the sizes of your static allocations costs a lot more overhead than tuning your pod requests and scheduling prefs.

Think of it like trucks for transporting packages. Yes you can pay AWS to rent you just the right truck, in the right number for each package you want to carry. Or you can just rent big-rigs and carry many, many packages. You'll have to figure out how to pack them in the trailer, and to make sure they survive the vibration of the trip, but you will almost certainly save money.

EDIT: Formatting


it's absolutely insane, they claim several thousands (!) of VMs and each autoscaling group having their own load balancer. Just the cost worth saving when they have only 10-20% wasted resource and an ingress controller instead of multiple hundred ELBs pays for several years of development (+ headcount to maintain).


Great point. By packing heterogeneous workloads on the same underlying VM, you can amortize the spare capacity against non-correlated workloads versus the one workload per VM they are pushing.


This page is returning a 500 for me. Perhaps they should use Kubernetes.


I know you're being facetious and I chuckled, but it's pretty amazing to see sites fall over under even a moderate load in 2021. A simple Go web server running on a pretty low-end machine can handle an HN-sized torrent with ease, even before sticking a CDN in front of it.


It's usually a WordPress/Joomla/Magento website that can do 1-5 requests/second at most.

EDIT: I'm not saying these softwares can't do more, it's just that usually on most default configurations, people don't bother to use a CDN, optimize the database, use a caching plugin, etc. They just install it, then install another 39 plugins and then ask why everything is slow. It's very common to see WordPress websites failing under load (and I've helped many to optimize their installs so I know it's possible, it's just not what the average WordPress install looks like).


I used to run WordPress site on $5 vps. It could easily handle 100-500 request per second, the trick was to have a caching plugin.


For 4 years I had Static site generated in Hugo git/CI deployed to CloudFront/S3. Then the team wanted to move and moved to hubspot, it was simply harder to find and train marketing talent who is ready to learn git or even HTML. Far easier to find hubspot/WordPress designer/developers/content writers . Also it becomes easy to blame on the unfamiliar thing your company uses when things don't work for them the way they want it to.

I have known teams move to back to very simple default wordpress hosting from more advanced stacks.

My point is it harder to get your marketing dept to upgrade tech even if we could do it.

I have learnt over the years that solutions that have to be for the people who will use everyday, even if it's poorly managed WordPress.

In your example You or your SRE/devops will do all the basic configuration tunning pretty much out of the box. Your IT dept may not be able to at all or unless someone tells them to


So one thing I've seen very little of, but have written (to great success), is basically a wordpress/etc scraper. I was lucky in that the site was fully crawlable, but it meant that the non-technical website editors could edit things easily using a CMS, and then when they were ready, someone could deploy to prod. A CI pipeline would then crawl the site, saving all the generated static files, diff them, and upload only the parts that had changed.

It meant that we -never- had naked requests hitting the underlying CMS, so our security footprint there was miniscule (to access it you'd need to already be in the network), there was no chance of failure (i.e., even with a caching layer, a bunch of updates and enough traffic could lead to a spike hitting the CMS, and in fact, if there was a bad edit in the CMS that lead to part of the site being malformed, the crawler would die, alerting us to the bad edits before going live), and we could host using any static site/CDN we wanted to, with the only downside being deploys took a while. Even that downside we could have worked around, if we'd wanted to get into the CMS internals enough to track modifications; we just really didn't want to.


It is easier today, Netlify like apps offer the best of both parts ,integrate SSG to graphical UI. You can add CI pipelines for things like say W3C html validation, translation or WCAG compliance or image/asset optimization (designers sometimes forget), AMP generation, S3/CDN like hosting etc.

I was tempted to building something like like you did, but came to the same conclusion many do : it is not our core business and marketing will still have issues as the offering won't be as good as a professional one.

Even a solution to bridge the two setups needs dev time to maintain. End of the day managed solutions like Hubspot comes in cheaper in total cost of ownership even though all of this kind of architecture is technically superior.


So we were dealing with a Django thing that was a complete cluster of unmaintainable, and it was about 6 years ago, so more limited options than today. It took...maybe 40 man hours to write a script, get it into the build pipeline, and have everything work? A few days for a coupla guys, basically. It would have taken us longer to switch CMSes, just in user training, I suspect.


That's what we did here. Moved from WordPress to Netlify CMS and Eleventy and users managed fine (I'm told).


Netlify is a good solution, glad that it is getting some love.

The move for us was before netlify became popular so never could consider it.

That's why I was tempted to build, then I realized bulk of the work will be to build the graphical UI editor and templating tools - the parts I hate to begin with.


Agreed. With a very aggressive caching setup, WP can actually handle it pretty well.

I used to have a 5mil pageviews per month site running on 5$ instance from DigitalOcean


I dislike Wordpress as much as the next guy but a modest WP site with a cache or a free tier CDN in front could eat that kind of traffic for breakfast…


'with a cache or a free tier CDN in front'

Awfully presumptive there aren't you. :P

I've had software solutions delivered by consultants in past jobs that were billed as being 'complete', that were missing that. 10 rps = Django fell over, site is down. That was the first thing we fixed.


In this case, its appears to be their main web-page as well.


Cloudflare is in front of the blog, but they seem to have purposefully disabled caching both in the browser via Cache-Control headers[1], and apparently in Cloudflare too[2].

[1] See, for example, https://ably.com/support , which is currently still up.

[2] The cf-cache-status header says "DYNAMIC", which means "This resource is not cached by default and there are no explicit settings configured to cache it."


IIRC CloudFlare doesn't cache HTML by default, so if your HTML is dynamically generated (and not, at the very least, cached on your own server/application/WordPress) and you don't enable HTML caching on CF, you can still easily run into trouble.

I'm pretty sure cf-cache-status is added by CF—that's not the site saying not to cache it, it's CF reporting that it's not cached (which, again, I think is the default, not something the site owner deliberately turned off).


They appear to be not caching it on purpose, because of some non-obvious stuff like the dynamic "Site Status" thing in the footer. If it were me, though, I'd figure out how to cache stuff like the blog.


Ought to move any small, self-contained dynamic sections to something loaded by JS, with failure allowed without (user-facing) complaint. Or kick it old-school and use an iframe.

Then again, most orgs have a ton of stuff they ought to do and haven't gotten around to. You just hope it doesn't bite you quite so publicly and ironically as this.


right so

Need to use a "Page Rule" telling CF to cache everything, and provide the required Cache-Control header values from app side.

It's not an option in the caching settings (or need a enterprise plan?)


It's a page rule, available at every plan level (I just checked the docs). Might be a problem if you've already used up your quota of page rules at a given level, but otherwise, no big deal.

Enterprise is largely about very high levels of bandwidth use, serving non-Webpage traffic, and all kinds of advanced networking features. It's also a hell of a jump in price from the $200/m top-tier "self-serve" plan (which is also the only self-serve plan with any SLA whatsoever, so, the only one any business beyond the "just testing the waters" phase should be on)


I mean the Page Rules can be used by anyone (according to their quota usage)

But the Cache Settings page, it doesn't have the "Cache Everything" option, or it's reserved to Enterprise plan.


Right, the page rule is the setting, and AFAIK that's the same even on Enterprise. You're right that there's no check-box/toggle-slider to make it happen.

It's covered in what I'd consider to be one of a handful of docs pages that're must-read for most any level of usage of the service:

https://support.cloudflare.com/hc/en-us/articles/200172516-U...

(it also links to a page that describes exactly how to create a "cache everything" page rule)


Got curious and started looking around and found a few blog posts about being on HN front page, and it appears like you might get a traffic spike of 15-50k hits over the course of the day. That is a ridiculously low standard...basic WordPress with a caching plugin on a free tier AWS instance or raspberry pi can handle that sort of volume.

If you're going to make a big deal about being contrarian with your technological choices, you either have to be really good at what you do, or be really honest about what tradeoffs you're making. I'd be extremely embarrassed if I pushed this blog post out.


My thumb rule of how many hits a site on HN gets is roughly 100x upvotes. So 100 upvotes = 10k clicks over 8-10 hours. This is from my experience with 3 links that hit the front page with between 150-420 upvotes.

I was serving static HTML off a $5 VPN and later off Cloudflare Pages. The response times were stable in both cases for all users.


Totally agree, but hold out. The contrite follow-up blog post about why they switched to K8's will be a great read.


I’ve had a few pieces on my personal site hit the front page here, and there was no effect on site behavior at all. Just Apache on a very modest VPS, with some non-essential dynamic content handled by Django. I don’t notice the traffic spikes until I look at my server logs the next morning.


I guess that a default, basic WordPress install with no optimization cannot handle the HN hug of death, even in 2021


“Optimization” would literally mean installing a cache plugin in 2 minutes.

I’m no WP expert but maybe they have a good reason to leave caching out of the core.


Plenty of organizations in which the marketing or PR department maintains the company blog, not their core IT staff. This is likely their "good reason".

Unpatched WP instances full of vulns are also rampant for this reason.


OP meant the Wordpress devs should at this point include the caching plugin in the core install and enable it by default.


I'm not a WP expert either, but I did implement a CMS for my company. Caching interferes with people editing the site and getting to see the updates immediately, and the easy way out of that is to disable caching. My guess is that default WP takes the easy way out, because of their largely untrained userbase.


That's interesting. Most of the WP cache plugins I've seen handle that by bypassing the cache for logged in users.


Most of the times the server is not the problem, the database is. And those scale a lot harder if you don't have a proper design for reads, writes and caching. Business wants to see the typo fix to their blogpost online the very second they make it.


A well configured web server easily scales really well.

Now, application servers and database tiers, that's a different story. Epic novel, really.


Yes, just use Netlify or something similar.


It may have been posted elsewhere, ofc, not just HN.


Put your blog on a separate domain blog.company.tld and throw it behind Cloudflare free tier or something. It's pretty amazing how many companies fail to do that, and a moderately viral blog post would bring down their whole damn site when it's the critical moment to convert customers.

Having this happen to a post bragging about infrastructure is just adding insult to injury.

Edit: As another comment mentioned, this site is actually behind Cloudflare, with cf-cache-status: DYNAMIC (i.e. caching is disabled). I don't know what to say...


CF doesn't cache html by default. You have to add a page rule to do it. If your html being dynamically generated is most of your load (it probably is, even if it's not most of your bandwidth—serving plain files with Apache or Nginx or whatever is incredibly cheap, processor-wise) this means CF doesn't help much unless you tell it to.


Yea, or at least something different than what they’re using…

You gotta admit, it’s a pretty bad look to come here with a boldly titled systems architecture blog post and have your website crash…

A junior web developer nowadays can easily build a system to serve static content that avoids all but the worst hugs of death.

If this wasn’t a hug of death then you have service availability issues, regardless…


For a blog? That is not their main service is it?

You don't need Kubernetes for a blog. If you do then something is wrong with your whole backend stack.

Perhaps they are the smart ones who want to save money and not burn millions of VC money.

The one that REALLY needs to use Kubernetes is actually GitHub. [0]

[0] https://news.ycombinator.com/item?id=27366397


Do you pay for Github? just curious.


Interesting, as it's Cloudflare fronted, but returns "Cache-Control: max-age=0, private, must-revalidate" in the headers, killing any browser caching.


Should the browser be caching 500 responses?


No, this is from one of the intermittent successful responses. You can see it on https://ably.com/support also, which seems to be still up.

And cloudflare caching of some previous successful page fetch seems to be turned off as well. A cf-cache-status header of "DYNAMIC" means "This resource is not cached by default and there are no explicit settings configured to cache it.".


WPengine is configured like that, and you can’t change it, even on their highest tiers. Their support doesn’t understand that it defeats the whole purpose of WPengine’s reseller relationship with cloudflare.


It's ironic they're preaching technical excellence from their high horse, yet their website is returning errors and the status page incorrectly shows everything is working as expected.


Which raises the question why would I listen to anything else they have to say when they've just demonstrated their inability to scale an even modest amount?


Or make simpler pages. It’s possible to achieve a modern responsive design for 250kb per page and <25 elements even for someone like me who is far from this part of the stack in interest and ability. If they sold a 256MB LightSail I’d buy it.


It is possible. If they didn't include well over 1MB of scripts that don't seem to do much but are an order of magnitude fatter than the PNG images on the site.

I don't even know what those scripts are supposed to do. This is a static blog post.


Probably a classic Hug of Death caused by being #1 on HN.


Even a $5 VPS can easily handle like 10k page views per minute if you use a performant web server i.e. not PHP or Ruby.

Or even those with caching.


> use a performant web server i.e. not PHP or Ruby.

PHP in an ordinary deployment behind Apache2 or nginx is very fast. If you're writing a Web service and your top two criteria for the language are "fast" and "scripting language" and PHP doesn't at least make your short-list, you've likely screwed up. It's the one to beat—you pretty much have to go with something compiled to have a good chance of beating it at serving dynamic responses to ordinary Web requests, and even then it's easy to screw up in minor ways and end up slower. Persistent connections, sure, maybe you want something else, but basic request-response stuff? Performance is not a reason to avoid it, unless your other candidate languages are C++ or very well-tuned and carefully written Java or Go, something along those lines. Speed-wise, it is certainly in a whole other category from Ruby or Python (in fact, even "speedy" Nodejs struggles to match it).

Now, WordPress is slow because a top design goal is that it's infinitely at-runtime extensible. And lots of PHP sites are slow for the same reason that lots of Ruby, or Node, or Python, or Java, or .Net, or whatever, sites are slow: the developer was very bad at querying databases.


Caching is the key here, it could still be making database requests which are blocked, unless it's using async.


DB queries in blogs (especially with this kind of traffic spike) are super correlated, you probably don't even need a cache if you batch identical reads together when the requests arrive within say 10 or 100ms of each other. Not a common technique though.

A proof of concept demo: https://github.com/MatthewSteel/carpool


I'm actually curious as to how big of an issue this actually is. HN is a relatively niche site. Do we actually have the power to take down a blog of a tech company?


Happens regularly. I am surprised you haven’t seen it happen before, given that you’ve been on HN since at least 2011.


I moved and live in a different TZ now. Not in the American TZ. That's probably why.


I did think the same


Best HN comment ever


I wouldn't follow the path that this company took. I am a solo developer, use Kubernetes on Google cloud and I couldn't be happier. Parts of my application runs on AWS, taking advantage of what AWS does better, such as SES (Simple Email Service).

All I had to learn is Docker and Kubernetes. If Kubernetes didn't exist I would have had to learn myriad tools and services, cloud-specific tools and services, and my application would be permanently wedded to one cloud.

Thanks to Kubernetes my application can be moved to another cloud in whole or in part. Kubernetes is so well designed, it is the kind of thing you learn just because it is well designed. I am glad I invested the time. The knowledge I acquired is both durable and portable.


"No, we don’t use Kubernetes - because we bodged together a homegrown version ourselves for some reason"

Jokes aside, it sounds like they should just use ECS instead.


Agreed. I worked on a team that built out a solution like Ably described and we would run into lots of weird issues with ec2 lifecycle (often init script / configuration management issues) and deploys would take a long time (had to drain ec2 instances and bring new instances up) and there's just a lot more to manage (autoscaling groups, instance profiles, AMIs, configuration management, process management, log exfiltration, SSH keys, infra as code, etc). If you really don't want to use Kubernetes, you can get a lot of the benefits by using ECS/Fargate, but I really don't know why you wouldn't just go all-in on EKS at that point.


I would be tempted to skip the Docker part entirely and use raw EC2 instances plus Packer or whatever tool of choice to.deploy the code. You still get autoscaling that way, and additional isolation of processes too. In addition, Amazon handle all the packing of VMs into hosts. With the Docker container route, you're still doing some of that work yourself to minimise waste.


They aren't using containers, though. They're using pure EC2 _but_ are using Docker images as the deployment artifact.


That's.... containers. To quote the article:

> A small custom boot service on each instance that is part of our boot image looks at the instance configuration, pulls the right container images, and starts the containers.

> There are lightweight monitoring services on each instance that will respawn a required container if it dies, and self-terminate the instance if it is running a version of any software that is no longer the preferred version for that cluster.

They've built a poor-mans Kubernetes that they won't be able to hire talent for, scales slower and costs more.


I didn't think they were using containerd; this phrase made me think that:

> Functionally, we still do the same, as Docker images are just a bunch of tarballs bundled with a metadata JSON blob, but curl and tar have been replaced by docker pull.

but, yes, I agree; it's a hand-made Kubernetes


Yeah the wording is confusing. But they are definitely using “docker run” rather than say “docker pull” then somehow extracting the image and executing the contents. That would be totally bonkers.


Interestingly enough I normally recommend people avoid Kubernetes, if they don't have a real need, which most don't.

This is one of the first cases where I think that maybe Kubernetes would be the right solution, and it's an article about not using it. While there's a lot of information in the article, there might be some underlaying reason why this isn't a good fit for Kubernetes.

One thing that is highlighted very well is that fact that Kubernetes is pretty much just viewed as orchestration now. It's no longer amount utilizing your hardware better (in fact it uses more hardware in a many cases).


It takes quite a bit of work to optimize your deployments and resource requests to get good bin packing/utilization of the hardware. If you do it right there can be pretty good savings for larger deployments.

If your scale isn't high enough though I think you're right, it's often not worth the complexity and Kubernetes' overhead might actually be detrimental.


> Interestingly enough I normally recommend people avoid Kubernetes, if they don't have a real need, which most don't.

I would have agreed with this statement 2 years ago but now I think K8s has been commoditized to the point where it makes sense for many organisations. The problem I have seen in the past is that every team rolls their own orchestration using something like Ansible, at least K8s brings some consistency across teams


> K8s has been commoditized to the point where it makes sense for many organisations

Absolutely, but we still need to lose the control plane for it to make sense for small scale systems. Most of my customers run on something like two VMs, and that's only for redundancy. Asking them to pay for an additional three VMs for a control plane isn't feasible.

I think we need something like kubectl, but which works directly on a VM.


A lot of this new cloud stuff doesn't really scale down, it reminds me of Oracle, back in the day.

I had something like a desktop PC with 1GB of RAM, maybe 20 years ago, don't remember exactly how long ago. It was average, not too much but not too low either. Once I installed Oracle DB, it was idling at 512MB. Absolutely no DBs and no data set up, just the DB engine, idling at half the RAM of an average desktop PC.


Funny, I remember doing exactly that and have the same experience with an Oracle database.


Single node K8s is a thing (but slightly non-standard; you just have to untaint your “control plane” node. I’m not 100%, but I think that works with multiple control nodes too).


I find it awfully sad and depressing how people on the internet pile onto a company that tells an opposing view without reading or discussing further into their choices.

Instead it is easier to to critique the low hanging fruit point rather than discussing their actual reason for not using this 'kubernetes' software.

So is their blog the main product? If not, then the 'their blog gone down lol' quips are irrelevant.

I found their post rather interesting and didn't suffer any issues the rest of the 'commenters' are facing.


Count how many times the words "custom $something" are mentioned in this article and you have a pretty strong case for using Kubernetes.


Three*? That doesn't seem that much of an indictment

* Well four, but one of the mentions of "custom x" is talking about k8s way of doing things.


ably ceo: Our website is throwing 500 errors!!!

ably cto: Go talk to the CMO, I can't help.

ably ceo: What!!

ably cto: Remember when you said the website was "Totally under the control of the CMO" and "I should mind my own business"? Well I don't even have a Wordpress login. I literally can't help.


We use AWS ElasticBeanstalk with Docker images for our app infrastructure and it has served us well. I think it ends up being similar in that it pushes docker images to EC2 containers. It may not be cutting edge but affords us all the conveniences of docker images for deps while not needing the deep knowledge (and team resources) Kubernetes often requires.


Totally agree, it can scale to a lot of requests until you need to level up to container orchestration, but to be fair it is not competing on the same space as K8s.


Do you use the "Multi-container Docker running on 64bit Amazon Linux/2.26.2" platform?


Our platform is relatively simple so we are able to use Amazon Linux 2 Docker platform.


I'm all for not using Kubernetes (or any other tech) just because, but seeing their website giving 500.. I can't help but feel all the k8s' laughter. :(

Google Cache doesn't work either https://webcache.googleusercontent.com/search?q=cache:YECd_I...

Luckily there's an Internet Archive https://web.archive.org/web/20210720134229/https://ably.com/...


I think the laughter is mostly just fun. I don't think anyone laughing believes K8s would have auto prevented the issue.

In fact there is probably a growing problem that the ways this could have been solved in Kubernetes is slowly approaching the number of ways that it could have been solved on good ol' baremetal/ec2's. Kubernetes != non-bespoke system.

I like the fact, however, that there do at least exist standard primitives in Kubernetes that could approach horizontal scaling (if that was the issue as is insinuated by the aforementioned laughing) since the best thing Kubernetes does for me is helping me keep my cloud vendor (aws,gcp,azure,etc) knowledge shallow for typical web hosting solutions.


Using Docker on EC2 but not Kubernetes is like using a car but deciding you will never use the gas or break pedals. At that point might as well walk (simpler) or fly (use Kubernetes).

They have essentially semi mocked Kubernetes without any of the benefits.


As someone working in a place where we did the same (and it was mainly my idea), I beg to differ. You don't have to learn/cope with all the new low level k8s plumbing, especially pre-EKS. Just plain old EC2, in an immutable fashion, just using docker as your package manager instead of baking a RPM or DEB. ANd even with the advent of EKS in AWS there are still a lot of overlapping functionalities and new failure modes (HPA and its metrics/limited rules WRT EC2 ASG rules, coreDNS etc).


That's not really true. Running a service and orchestrating a fleet of services are two entirely separate problems. The orchestration requirements for many workloads are different than what k8s provides.

If your orchestration needs aren't "use every last % of server capacity" you might not need k8s.


I’m not sure I follow. Kubernetes is “just” an orchestrator for handling containers. This company use Docker Engine directly for handling their containers. They still get the benefit of containers even if they don’t happen to use Kubernetes. What’s wrong with that?


Docker Engine is not a scheduler. It’s a container runtime. It doesn’t have visibility over all your VMs, networking etc.


Indeed. But in the article he mentioned that they have other tools for doing that kind of stuff.


Kubernetes is not for orchestrating container workloads, it's for making orchestrated containers __portable__.


Portable between what?


They are using cloudformation which predates Kubernetes and it's meant to solve the same problem. CloudFormation still uses docker, originally it used AMIs. I've worked for a lot of Fortune 500 companies that don't use Kubernetes and use Docker.


I'll take a walk most of the time (virtual instance provisioned with shell script). Flying yes, but only with pilots and the crew - no way in hell I'll start tinkering with it on my own as a sidetask.


I might use a different analogy -- it might be more like using a bike versus a motorcycle. You can run a delivery business on a bike, but you'd be able to do it with much less tinkering, if you were using a motorcycle.


If you use AWS, just use Fargate. Fargate is the parts of Kubernetes you actually want, without all the unnecessary complexity, and with all the AWS ecosystem as an option. It even autoscales better than Kubernetes. It's cheaper, it's easier, it's better. If you can't use Fargate, use ECS. But for God's sake, don't poorly re-invent your own version of ECS. If you're on AWS, and not using the AWS ecosystem, you're probably wasting time and money.

And if you eventually need to use Kubernetes, you can always spin up EKS. Just don't rush to ship on the Ever Given when an 18 wheeler works fine.


"we would be doing mostly the same things, but in a more complicated way" is a pretty good summary considering their use cases seem to be well covered by autoscaling groups (which, incidentally, are a thing other clouds have)

It's OK to not use k8s. We should normalize that.


This reads as they don't have any experience with it and decided to roll their own. I had no experience with kube 3 months ago, we had rancher 1 cluster with 30-50 services, and I just migrated it, just me. Ended up on eks with CNI (pod networking) - Using the lb controller with ip target types and a targetgroupbinding for ingress and its great. Each pod gets its own secondary ip on the ec2 instance automatically. I'm deploying rancher as a management ui.

I also now have a k3s cluster at home. The learning curve was insane, and I hated it all for about 8 weeks but then it all just clicked and it's working great. The arrogance to roll your own without assessing the standard fully speaks volumes. Candidates figured that out and saw the red flag.. Writing your own image bootstrapper... What about all the other features, plus the community and things h things like helm charts.


Site down. Perhaps they forgot to set autoscaling on the pods.


Read this as 'sit down'.


I don't use K8s either, but we at least use an orchestration tool.

ECS Fargate has been awesome. No reason to add the complexity of K8s/EKS. We're all in on AWS and everything works together.

But this... you guys re-invented the wheel. You're probably going to find it's not round in certain spots too.


Good. Kubernetes is a swiss army knife, and I just need a fork and a spoon. Sure, the swiss army knife comes with a fold-out fork and a spoon, but then I have to build up institutional knowledge around where they are, and how to avoid stabbing myself with the adjacent blades.


"How we made our own container orchestration infrastructure over AWS. It doesn't have all Kubernetes features, but hey! we can then train our own employees on how it works.

And we got to maintain the code all by ourselves too! It might take a bit too long to implement a new feature, but hey! its ours!"

Really, Kubernetes is complex, but the problem it solves it even more complex.

If you are ok solving a part of the problem, nice. You just built a competitor to google. Good luck hiring people who come in already knowing how to operate it.

Good luck trying to keep it modern and useful too.

But I totally understand the appeal.


I think that Kubernetes has advantages for many small services but a few large services are still worth managing directly on bare machines/VMs.

Where I disagree with this article is on Kubernetes stability and manageability. The caveat is that GKE is easy to manage and EKS is straightforward but not quite easy. Terraform with a few flags for the google-gke module can manage dozens of clusters with helm_release resources making the clusters production-ready with very little human management overhead. EKS is still manageable but does require a bit more setup per cluster, but it all lives in the automation and can be standardized across clusters.

Daily autoscaling is one of those things that some people can get away with, but most won't save money. For example, prices for reservations/commitments are ~65% of on-demand. Can a service really scale so low during off-hours that average utilization from peak machine count is under 35%? If so, then autoscale aggressively and it's totally worth it. Most services I've seen can't actually achieve that and instead would be ~60% utilized over a whole day (mostly global customer bases). The exception is if you can scale (or run entirely with loose enough SLOs) into spot or preemptible instances which should be about as cheap as committed instances at the risk of someday not being available.


The hard thing of kubernetes, is that it sounds easy to do something/anything. It does, and because it's "easy" people tend to skip to understand how it's working: That's where the problem occurs.

It _forces_ you to become a "yaml engineer" and to forget the other part of the systems. I was interviewed by a company and when I replied the next step I could do was to write some operators for the ops things, they simply rejected because I'm too experienced lolz


Exactly, now you have people making an EKS cluster and then deploying a vendor supplied Helm chart to it. When something breaks they literally have no idea how to start fixing it. It’s deceptively “easy”


> This has been asked by current and potential customers, by developers interested in our platform, and by candidates interviewing for roles at Ably. We have even had interesting candidates walk away from job offers citing the fact that we don’t use Kubernetes as the reason!

I celebrate a diversity in opinion on infrastructure but… if I was a CTO/VP of engineering and I read that line, that would be enough to convince me to use kubernetes.


Someone walking away from an offer because the company doesn't use unnecessary tool X is potentially a sign of a flaky engineer. I'd hesitate before working at any company with fewer than 100 engineers that does use k8s, because it's way more complexity and make-work than is appropriate for small services and teams.


Would you use the same reasoning to adopt Java/Cobol/NodeJS/...?


ECS has continued to be great for us. I haven't run Kubernetes in production, but from my perspective, we have everything we would need from K8S with only a fraction of the effort. I've also been able to do some fun things via the AWS API that may have been challenging or even impossible with K8S (again, I may be naive here)


I think the killer feature of Kubernetes is really the infrastructure as code part -- it just makes it very easy to spin services up or down as desired without thinking too hard about it. But as the article alludes to, if you're comfortable with the lock-in, you can get that from your cloud provider with tighter integration.


> if you are still deploying to plain EC2 instances, you might as well be feeding punch cards.

This type of language isn't something that should come out of a company, and may be a signal that there are other reasons developers refused to offer their services other than they just don't use K8s.


I think many people new to Kubernetes get intimidated by its perceived complexity. It has so many resources, huge manifests, and a billion tools, with more coming online by the day. I was a huge Kubernetes hater for a while because of this, but I grew to love it and wouldn't recommend anything else now.

I'm saying this because while their architecture seems reasonable, albeit crazy expensive (though I'd say it's small-scale if they use network CIDRs and tags for service discovery), it also seems like they wrote this without even trying to use Kubernetes. If they did, it isn't expressed clearly by this post.

For instance, this:

> Writing YAML files for Kubernetes is not the only way to manage Infrastructure as Code, and in many cases, not even the most appropriate way.

and this:

> There is a controller that will automatically create AWS load balancers and point them directly at the right set of pods when an Ingress or Service section is added to the Kubernetes specification for the service. Overall, this would not be more complicated than the way we expose our traffic routing instances now.

> The hidden downside here, of course, is that this excellent level of integration is completely AWS-specific. For anyone trying to use Kubernetes as a way to go multi-cloud, it is therefore not very helpful.

Sound like theoretical statements rather than ones driven by experience.

Few would ever use raw YAMLs to deploy Kubernetes resources. Most would use tools like Helm or Kustomize for this purpose. These tools came online relatively soon after Kubernetes saw growth and are battle-tested.

One would also know that while ingress controllers _can_ create cloud-provider-specific networking appliances, swapping them out for other ingress controllers is not only easy to do, but, in many cases, it can be done without affecting other Ingresses (unless they are using controller-specific functionality).

I'd also ask them to reconsider is how they are using Docker images as a deployment package. They're using Docker images as a replacement for tarballs. This is evidenced by them using EC2 instances to run their services. I can see how they arrived at this (Docker images are just filesystem layers compressed as a gzipped tarball), but because images were meant to be used by containers, dealing with where Docker puts those images and moving things around must be a challenge.

I would encourage them to try running their services on Docker containers. The lift is pretty small, but the amount of portability they can gain is massive. If containers legitimately won't work for them, then they should try something like Ansible for provisioning their machines.



> We have even had interesting candidates walk away from job offers citing the fact that we don’t use Kubernetes as the reason!

This is not too surprising. Candidates want to join companies that are perceived to be hip and with it technology-wise in order to further their own resume.


You want your skills to be transferrable (unsolicited career advice). Someone isn't going to work on your bespoke infra management system for below or at market rate when those skills won't transfer to another org (and they'll be behind the curve compared to others who have been "in the trenches" with k8s during the same window of time). Google, Facebook, and similar get a pass because they will shower you in money for their bespoke infra work (and it looks good on your CV to other orgs, typically).

Personally, you don't have to be hip, but I want to be able to have another job in hand based on the tech I was working with and the work I was working on in a matter of days when I decide to bounce (or the decision is made for me). This is just good risk and career management.

(disclosure: infra roles previously)


There is also the quality of work factor. What is easier to use a bespoke infra or kubernetes? For the same pay I'd rather work with k8s than deal with whatever gremlins are buried in 'custom' infra.


> Google, Facebook, and similar get a pass because they will shower you in money for their bespoke infra work (and it looks good on your CV to other orgs, typically).

You left out "Their bespoke infra work tends to be the genesis of the commodity infrastructure other people are using."

It's not hard for a Googler familiar with Borg to pick up Kubernetes. It's not hard for a Facebooker with some UI work under their belt to figure out React.


OR....bear with me for a minute...they could have strong opinions about how something should be done, and if a company is not doing it that way it might be a warning sign.

If a company says they don't use any modern JS framework and stick to JQuery, I'd be out before they even explain why. Not because I want to be hip and further my own resume but because I'd hate my job if I worked there.


I wouldn't walk away from a company just because they were not using a given technology.

I would walk away if they were using a technology not fit for the job, or were actively building an in-house version based on out-dated knowledge (the use of the word "minion" in Kubernetes was deprecated in 2014 IIRC).

Think about this in another realm, like plumbing. If you were a plumber, would you accept a plumbing job if the company told you that they didn't buy materials and tools, and were instead designing and building their own in-house version?


> based on out-dated knowledge (the use of the word "minion" in Kubernetes was deprecated in 2014 IIRC).

If they decided in 2014 not to use k8s, how much effort do you expect them to spend on keeping up-to-date with the latest and greatest of a technology that they're not using?


Sure, but if you don't want to keep up to date with something, don't write an article about it. If you choose to write an article in 2021 about it, I would expect some level of updated research first.


Counter-point, these candidates foresee constant frustration and struggles because the company has a culture of NIH. Not saying that's actually the case, but since their website is down, it's kinda hard to tell.


Yeah I could totally see the director of engineering at my old company gleefully writing a smug blog post about how they don’t use technology X because they don’t go for all the hip new trends (translation: we actively avoid anything trendy even if it’s good, and we have an extreme NIH attitude). As a result, everything from search indexing to queuing to RPC to deployment is 100% homegrown and you need a training course in how to use everything, everything has specific experts that you must consult, and everything is in need of debugging sometimes when it should just be standard tools used everywhere.


Yeah I can't read the article, but as an applicant this would immediately trigger a yellow flag. If the CTO/director/whoever decided can justify it as a unique differentiator then maybe it's fine and it's a cost of doing business. If they just don't want to deal with k8s because they like their home grown thing, it's going to be a drag on the company long-term to maintain.


> If they just don't want to deal with k8s because they like their home grown thing

This is another way of saying that they don't want to read too much documentation.


That doesn't seem like the only possible reason, another one could be that it appeared the company was anti-k8s for the sake of being anti-k8s. I would walk away from a company that I felt was proud of not using the right tool for a job.

p.s. Not claiming it was the right tool for the job in this case, it would all depend on context.


I mean sure, that can be a case, but I for instance am bored of companies who waste enormous resources to reinvent public cloud offerings with their own version written in Bash/Python/Terraform etc. that constantly has to be maintained/debugged/documented and worked around.

So maybe if people hear, hey our infrastructure is running on some 10 year old Frankenstein cobbled together from Docker and Virtual Machines and AWS and on premise servers they just pass.


So much this.

I’m tired of having to learn an new infrastructure management/deployment tool every time I shift jobs. Tired of having to work with company custom platform. If you’re deployed on kubernetes it would take me a day to figure out how/where everything is deployed just by poking around with kubectl. Probably around that much time to deploy new software too.


It will take you 18 months to get kubectl access on just the staging environment though, and then another 6 to track down only 90% of the obscure people and processes that generate and mutate the manifests before deploying. Finally you'll realize that the whole time it was one guy in a corner cubicle in a remote office typing custom magic commands acting as the sole link between the engineering org and the cluster.

K8s is just another giant footgun. Good and bad orgs will still be good or bad, with or without it, but it definitely amplifies the bad.


If we’re looking at bad orgs, they’re probably gonna fuck up their custom orchestration tools to the same extent.

If we’re looking at the median, most orgs will more or less work with k8s manifests, maybe with some abstraction layer on top to make it easier for developers.


I'm accessing this website/blog from southeast asia. Its working perfectly.

Ably using ghost blogging platform (https://ghost.ably.com) seeing requests in network console.


They are not in a line of business where they might be deplatformed, shut down, or that AWS exists as a major liability. They also do not require any form of multi cloud redundancy that is not already being served by AWS.


That great now live with your bad/good design patterns on your own for the life cycle of your product or service, you get non of the shared knowledge and advancement through the k8s community or platform.


Scaling is a sexy problem to have. Most places crate problems to solve just for the appeal. They're spending angel investor money, not their own, so prolonging the time to production is in their favor.


Off topic but: The chat icon obscures the x to close the cookie banner. Safari on iPhone X.


sure you can do homegrown and probably be more efficient and cost savy, but you can't beat the amount of tooling and that new sysops you will have to train to those new tools compared to a pure k8 platform...


"no, we don't use Kubernetes, we are stuck in AWS ecosystem"


Employers: We are looking for an applicant with 3/5/7/10/20 years of experience in the following technologies.

Also Employers: Some of our applicants turned down our job offer, when we revealed that we don't use specific technologies!


I'm honestly curious if this is a prank from ably.com or what the content of the page was.


Well it seems a SEO farm caught their content before going down.

https://hispanicbusinesstv.com/no-we-dont-use-kubernetes/


Um....are we getting trolled by some content marketing SEO hack here?

https://www.sebastianbuza.com/2021/07/20/no-we-dont-use-kube...

Another pub/sub startup publishing almost the identical blog post also today, July 20, 2021.


Founder of Ably here That’s odd and clearly a rip off of our post :( Thanks for flagging


lol, what the hell is this


Given that Ably looks like an actual product, I'm guessing the link I posted is some sketchy content repurposing farm. I don't know what recourse Ably has in this case, but that's unfortunate.


typical not invented here syndrome.

seems like a me too, i know more than you type of project. looks geared towards cryptos though.


is it really NIH if they made no attempts to build anything resembling a k8s clone themselves? They decided container orchestration is just not a major problem for them. Its a completely different model, they're embracing immutable infrastructure principles instead of the k8s model (container versions in their stack are bound to the lifetime of the ec2 instance they're on).

This reads as simply a choice to have less complexity in their stack, which I think is admirable.


The piss races like we use x or we don't use y in engineering sound hilarious for some reason. such discussions focus on the process, but few talk about the results ¯\_(ツ)_/¯




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: