Hacker News new | past | comments | ask | show | jobs | submit login

> Kubernetes made my latency 10x higher

The title is a bit misleading, kub didn't cause the 10x latency - also latency was lower after they fixed their issues

TL;DR version - Migrate from EC2 to Kub; due to some default settings in Kiam & AWS Java SDK, latency of application increased, fixed after reconfiguration and kub latency lower than EC2






It is relevant though as k8s makes everything more complicated so you have to deal with stuff like this. Also if it was a brand new app theyd maybe not notice the problem in the first place.

I'm not saying it's not relevant - it was actually a good technical read, they went through alot of detail with their troubleshooting.

I'm just saying the title is misleading, and clickbaity.


>> It is relevant though as k8s makes everything more complicated

More complicated than what? Without a baseline for the comparison it's not that useful. In our case we transitioned over the last four years from running hundreds of VMs provisioned with puppet and jenkins to running K8S workloads (on a lot fewer nodes than we had VMs) provisioned with helm/kustomize and using gitlab ci/cd pipelines. In my opinion the current platform is much less complex to understand and manage than the old one was. Yeah there are layers of complexity there that didn't exist in the previous platform, i.e the k8s control plane, the whole service/pod indirection, new kinds of code resources to manage, but it's all pretty consistent and works logically, and isn't really any harder to internalize than any other platform-specific complexity we've had to deal with in the past. And in terms of day-to-day development, deployment and operations k8s has been completely transformative for us.


I don’t think the interactions between the AWS SDK and a third-party AWS IAM plug-in is really an inherent part of k8s’s domain of control.

It’s sort of like cursing Microsoft every time an application crashes... because it’s running on windows.


> k8s makes everything more complicated

No it doesn't.

This is only true if you don't need the features in the first place, and you haven't gone through the learning curve yet.

So I can deploy a single container. Works fine. You can tell docker to restart it. Works fine. If that's all you need, you don't need k8s.

But you may now want to deploy two containers. And let's say they are servers. Now, you need incoming connections from a single port going to them. So maybe you are going to deploy something like haproxy – which now you have to learn.

Also, you now need a way to tell if your service is down. So you add a health check. But then you want to kill the container if it goes bad. So now you add a cronjob or similar.

At some point, one machine is not enough. So do you replicate the exact workloads across machines? If you do, it is a bit easier - but potentially more expensive. But if you need to spread them across different machine types, it gets more difficult. And now you also have storage distributed which you need to keep track of. If you haven't automated your deployment, now it is the time.

Now you have this nice setup. It is time to upgrade. How do you do that? Probably some custom script again.

Instead, you can forget all of the above. You can spin up a k8s cluster with any of the currently available tools (or use one of the cloud providers). Once you have k8s, a lot is taken care for you. Multiple copies of the service (with the networking taken care for you) ? Check. Want to increase? kubectl scale deploy --replicas=<n>. Want to upgrade? Apply the new YAML. Health checks and auto-restart? Add a couple of lines to your YAML. Want to have different workloads over different machine types? That's also an easy change. Want to have storage (that follows your workload around)? Easy. You are in a cloud provider like GCP and want a new loadbalancer (with automated SSL cert provisioning!)? A couple of lines of YAML. Want granular access controls? It's built in. I can go on.

Of course, there's a learning curve. But the learning curve is also there if you stitching together a bunch of solutions to replicate the same features. Once you get used to it, it's difficult to go back.


hah. So, title should read: 'Using default Kubernetes config settings made my latency 10x higher'

I might even go further - kiam is not a standard deployment supported by the kubernetes project like the api server or the scheduler or the autoscaler. See https://github.com/uswitch/kiam

That said it is a very common deployment strategy in ec2 to run kiam or kube2iam. I wish the kube core teams took over the development of an aws iam role service since issues like bad defaults would be solved much quicker. Your only other alternative is to use iam access keys and nobody likes that (security wise and it’s a pain to configure).


It's built in to the AWS SDK since September https://aws.amazon.com/blogs/opensource/introducing-fine-gra...

I think that responsibility falls on Amazon actually

Which isn't surprising. Kubernetes default settings are to work in all environments using overlay, not to be optimized for performance (make it work, then make it fast).

In other words, switching to Kubernetes _did_ make their latency higher — otherwise they wouldn't have needed to reconfigure anything, would they? If you want to help k8s, try starting a pull-request to make the defaults better rather than playing spin-doctor telling people that well-documented problems don't exist.

It wasn’t. It was KIAM and the Java SDK, neither of which are part of K8S.

KIAM is not part of the core Kubernetes system but it was a necessary component to avoid introducing a security regression as part of the switch.

Again, my point was that rather than trying to do PR damage-control it would be better to work to improve things so this doesn't understand. Someone went to the trouble of posting a detailed examination of a real problem with a fix and some references to upstream improvements. That's a lot more useful than trying to draw a line between one of two tools commonly used to meet security requirements when operating Kubernetes in one of the most popular cloud environments.


There are many alternatives to kiam within the K8S space.

So would disagree that is specifically is a necessary component.


It’s not necessary to use kiam - as I mentioned that multiple times - but you need something like that if you’re trying to maintain security coming from a good EC2/ECS deployment. Since it’s one of the more popular options, and Java is not uncommon, it seemed reasonable to consider this a likely pain point for many users.

It’s not PR. It’s being accurate that this has nothing to do with Kubernetes.

I would not describe a service which is only ever used on Kubernetes workers and is only necessary for code running on Kubernetes as having nothing to do with Kubernetes. The fact that you and the OP are so emotionally driven to find a way to dismiss it is what makes it sound like PR — why not just acknowledge there's a real problem which is being fixed and be glad someone documented it well enough to save other people time? Nobody is saying that Kubernetes is bad, or that you shouldn't use it, only that there's an interaction which Java users should know about if they're running on AWS.

You're being strangely aggressive about this. Nobody cares about PR here, nor is anyone denying anything. We're just being accurate.

KIAM and the Java SDK have bad timeout overlap. That's the problem, has nothing to do with Kubernetes, and looks like it's well documented and now resolved.


This was a well-written post about a problem a lot of people would encounter with a common Kubernetes deployment. Instead of talking about that, most of the comments were people complaining about the title.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: