
Our Failure Migrating to Kubernetes - ahawkins
https://engineering.saltside.se/our-failure-migrating-to-kubernetes-25c28e6dd604
======
millisecond
First I'd get a tcpdump on one of the kubernetes boxes and see what's really
going on at the network level.

One hunch is that ELBs don't like ephemeral ports and Kubernetes lives on
them. Might be a mismatch here. ALBs are much more flexible with ports if
that's an option.

~~~
kkirsche
Just for my sake what's ALB and ELB mean. I assume elastic load balanced and …
load balancer?

~~~
cosmie
Application load balancer

[http://docs.aws.amazon.com/elasticloadbalancing/latest/appli...](http://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html)
Gives a good overview of the two types

------
sandGorgon
What tool are you using to setup k8s? I really recommend kops - because it
takes care of some cloud specific config clashes.

For example, it users the carrier grade subnet 100.64 for AWS because other up
subnet conflict.

I have moved on from k8s to docker swarm because of the ingress complexity in
k8s. I'm hoping some of the momentum behind Istio will solve it in the future.

~~~
buahahaha
it's clear the ingress complexity is what's burning them here. It's just a
simple exposed port mixup causing that error rate.

~~~
ahawkins
We considered that as well. We have a smoke test for the chart that hits each
service with a generic ping RPC. I thought that was good enough upon
reevaluating it. The generic ping RPC comes from an inherited service which
means _any_ or our Thrift services could respond correctly with a pong. Which
means the correct service may run on the same port. I verified this is not the
case. I'll update the smoke test to use a different test rpc as regression for
this possibility.

