Hacker News new | past | comments | ask | show | jobs | submit login

> Kubernetes is a solution worse than the problem.

I disagree. Maybe it's because we haven't gone "full Kubernetes" where every cronjob is now a Kubernetes cronjob, or what have you, but for all the complexity we've had to overcome, we've seen substantial benefits.

For example, autoscaling instances of my data pipeline apps based on Kafka lag.

Was it complex? Yes. We had to expose Kafka topic lag as a metric to Prometheus, then configure the k8s metrics server that HPAs (Horizontal Pod Autoscalers) use to scale to pull that in from Prometheus using k8s-prometheus-adapter as a custom metric, then set reasonable scaling limits in the data pipeline apps HPAs.

Was it worth it? Fuck yes. We no longer have to worry about data arriving out of time at our data warehouse, because the scaling we've configured (in YAML, god rest its dirty soul) ensures that our data will always arrive at the data warehouse within 5 minutes, which is ideal for a near real time reporting product that greatly enables our end users.

Is Kubernetes worth it? For us, definitely. For anyone else? Really depends on your use case, don't cargo cult this shit.

How much data do your "data pipeline apps" process with Kafka, Kubernetes and probably other K-named things?

Several terabytes a day, but it varies, hence why I love the auto-scaling.

That's 10tb = 10000gb = 10000/24/60/60 = .11gb/s. My 2015 desktop could handle that.

Where do you work? I'd like to pitch my radical idea of edge consolidated cloud computing.

"Big data" frameworks are very good at throwing a lot of hardware at problems. See eg the classic big data vs laptop treatment: http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...

The inevitable comment lol. I'm sure it could. To clarify, when the app scales, it's pods (container instances) that are scaled, not necessarily machines,that is managed separately by k8s. Depending on how much of the cluster is currently being used.

Yep, we could handle it easily on one machine, if we gave it unfettered access to all the resource, but our throughput varies during the time of day, so dedicating a machine to it to handle the peak throughput wouldn't make financial sense at 3am at night. So it's far simpler to scale pods with known resource usage as needed.

It also helps prevent issues when throughput suddenly doubles because of a business decision that you're left out of the loop on.

So autoscaling pods is ideal for our use case.

Were talking about $5k of hardware to meet your needs 20 times over with the new threadrippers.

Is your aws bill really under $500 a month?

On AWS you would be paying for many other things besides elastic CPU power. CPU and bandwidth is infamously expensive there. (But I don't think AWS was mentioned anywhere in this thread...)

You don’t need kubernetes to do it, any cloud provider has api that will make it possible. Kube main point is declarative model, which comes with a big price (like sql had). Yaml - is error prone way to achieve anything.

Will see if history repeats itself and we will see be imperative systems.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact