Hacker News new | past | comments | ask | show | jobs | submit login

Very misleading title, was hoping for a more substantive read. Kubernetes itself wasn't causing latency issues, it was some config in their auth service and AWS environment.

In the takeaways section, the author blames the issue on merging together complicated software systems. While absolutely true, this isn't specific to k8s at all. To specifically call out k8s as the reason for latency spiking is misleading.

Ok, we've replaced "Kubernetes" with a more accurate and representative phrase from the article. If someone suggests a better title, we can change it again.

I think the point is that this was a real problem that happened because of combining k8s and aws, which is a pretty common scenario. And it underscores that the bug was hard to find - I'm not sure how many people on my team would be comfortable looking deeply at both GC and wireshark. It required asking "Why" a few more levels deep than bugs usually require, and I think a lot of developers would get stumped after the first couple of levels. So it's another piece of data just counseling that a proper k8s integration is not as easy as people might expect.

I also get the sense that that team has a better than average allocation of resources. Some teams I've been on, this type of problem would be the responsibility of one person within an afternoon, or with impatient product people and managers checking for status after that.

This is exactly what happens when you abstract away anything, in this case, infrastructure. Most of the time people focus on the value-added by the abstraction. This time somebody had to face the additional burden that was introduced by it, making it harder to track down the bug.

I really like the title and I don't think it's link-bait-y.


Because too many engineers I've worked with would bump into this situation and this would be their answer. They wouldn't take the time to debug the situation deeply enough and they'd blame k8s, or blame the network, or blame...

In my experience, the most common issues with complex distributed systems are much more likely to be due to misconfiguration because of a limited understanding of the systems involved than such issues are to be caused by core, underlying bugs. And I believe that's why some engineers shy away from otherwise valuable frameworks and platforms: they have a natural and understandable bias to solve problems via engineering (writing code) than via messing with configuration parameters.

Hi, author here. That was exactly the intent of the title, reflecting the reaction we (almost always) get from developers: "k8s is at fault", the result of most investigations is "not really". I try to make that evident at the conclusions, but I agree without realizing that intent the title is misleading.

> this isn't specific to k8s at all.

I don't know. k8s is pretty complicated. How many small/medium apps need more than this nginx/Terraform/Docker example? This would be a lot more difficult to set up in k8s (pods, ingress, etc.)


resource "docker_container" "app" { count = 2

  name  = "app-${count.index}"
  hostname = "app-${count.index}"
  image = "app:${var.app_version}"
  restart = "always"

  env = [

  ports {
    internal = parseint("${var.app_port}", 10) + count.index
    external = parseint("${var.app_port}", 10) + count.index

  networks_advanced {
    name = "${docker_network.private_network.name}"
    aliases = ["app-${count.index}"]

  depends_on = [


http { ...

  upstream app {
    server app-0:3000;
    server app-1:3001;

  server {
    listen 80;

    root /usr/share/nginx/html;

    location ~* ^/api/ {
      rewrite ^/api/(.*) /$1 break;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $host;
      proxy_http_version 1.1;
      proxy_pass http://app;

I don't know understand the point of this comment.

Is there some epidemic of developers setting up complex, multi-node Kubernetes clusters just to run a small web app ?

How many developers need to be running multi-node complex Kubernetes clusters?

Well, it was a service they were running to deal with differences between EC2 and containers in terms of AWS auth, so it is relevant.

Yes it’s a total linkbait title. I guess it worked because it’s on the front page.

A better way to complain about a title is to suggest a better one—i.e. an accurate and neutral alternative—so we can replace it.

I like the analysis but I hate the title. This wasn’t an editor rewriting a boring title for more clicks either.The editor was the writer.

If k8s is a complicated software system, then it's specific to k8s (among other systems).

Specific to k8s would mean it only affects k8s, which is false. It affects a large group of systems, of which k8s is one.

Better (more accurate) title would have been: "Merging complicated software systems made my latency 10x higher"

But with that title the author wouldn't have gotten those juicy clicks

I think that’s exactly OP’s point. We need to stop reinforcing this kind of behavior.

So true hahaha

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact