
Kubernetes Journey – How to setup the HAProxy Cluster with high availability - mvallim
https://itnext.io/kubernetes-journey-up-and-running-out-of-the-cloud-how-to-setup-the-haproxy-cluster-with-high-ee5eb9a7f2e1
======
q3k
I don't understand why you would use HAProxy here. Generally on k8s, you'd use
a LoadBalancer (backed by eg. metallb on bare metal) to route external traffic
into your cluster. This goes for both 'normal' payloads/services and the
Kubernetes API endpoint itself.

In addition, it seems like the quorum configuration shown here is pretty poor
- only uses two nodes. This is a recipe for split-brain.

~~~
pepemon
> Generally on k8s, you'd use a LoadBalancer

Are you sure? All encountered k8s installations (including ones rolled from
scratch by me) use Ingress controllers as the option for getting traffic into
the cluster. Community NGINX Ingress controller is the de facto standard. One
need to use LoadBalancer service type because of its managed origin (metallb
is a different beast and I wouldn't recommend it, if you need to load balance
traffic to your ingresses on on-premises infra, it's better to do it outside
of k8s). Anyway, you lose all flexibility and observability of Ingress
solutions with LoadBalancer service types if you use them directly as traffic
routers to your backend.

~~~
q3k
They're complementary - LoadBalancer Services are L3, Ingresses are L4+. More
often than not in-cluster Ingress providers (eg. nginx-ingress-controller)
will in fact use LoadBalancers to actually route L3 traffic into the cluster
in the first place.

Without being able to create LoadBalancer services there's no easy way to get
any traffic into your cluster other than using NodePorts, and these have tons
of shortcomings.

~~~
pepemon
NodePort is a bad practice for a such case, you are right. You get additional
unnecessary routing on CNI level between the cluster nodes just to get traffic
into the Ingress controller pod, for example. The solution here is to use
hostNetwork mode with the pool of workers solely dedicated to scheduling
Ingress controllers. Traffic directly hits
NGINX/Envoy/HAProxy/Traefik/whatever and gets into the cluster without
additional intermediates. You need a load balancing solution for this pool of
Ingress controllers, that's right, but as I said before, this setup gives you
the flexibility to cook load balancing as you desire.

BTW, community NGINX Ingress controller is able to ingress L3 (TCP) traffic
into the cluster.

~~~
q3k
> You need a load balancing solution for this pool of Ingress controllers,
> that's right, but as I said before, this setup gives you the flexibility to
> cook load balancing as you desire.

Sure, but this means that you cannot use LoadBalancers, which is painful. It
means every payload has to be configured both at k8s level and then
externally. That somewhat defeats the use of k8s as a self-service platform
internally in an organization (other dev/ops teams need to go through a
centralized ops channel to get traffic ingressing into the cluster if for some
reason they can't use an Ingress).

> BTW, community NGINX Ingress controller is able to ingress L3 (TCP) traffic
> into the cluster.

Yes, but it's configurable via a single ConfigMap (which limits self-service
if you're running a multi-tenant orga-wide cluster, unless you bring your own
automation), and still you only have one namespace of ports, ie. a single
external address for all ports - unless you complicate your 'external' LB
system even further.

With all these caveats, I really don't understand why not just run metallb.

------
blyry
When we migrated to k8s we stuck with haproxy instead of using ingress for
some of the reasons others have outlined already -- we've been running haproxy
for a decade. Our configurations are tuned for our applications, we know the
cpu usage and failure modes, and haproxy 1.9/2x support for srv records made
it really easy. Bring able to trust k8s and our health checks and drop our
previous 3 vm + vrrp setup was a no brainer!

~~~
blyry
That got us on k8s faster and gave us time to evaluate if we /really/ needed a
service mesh or not (we don't, but the tracing is nice so we might still add
istio yet). We may move to ingress based solutions eventually but our
ecosystem is big and there's a lot more bigger fish to fry for now!

------
redwood
Anyone wonder if k8s is about to enter its trough of disillusionment phase?

~~~
Nux
Indeed. I'm still holding off, in the hope it will go away (like Openstack),
but it does seem like it has some more stamina as a project.

~~~
arvinsim
Sticking to Docker for local projects for now

~~~
marshmellowtest
No thanks. It's a huge security dumpster fire as well.

~~~
pepemon
Well if you're uncomfortable with its priviged daemon, you can always switch
to CRI-O with Red Hat tooling for it. But for all my years with Docker as the
container runtime, all security related problems have occured within the
backend code, not Docker, not Linux cgroups, not Linux itself.

~~~
freedomben
I've worked with some big customers in the financial industry, and this is
exactly what we do. Podman implements the same CLI as docker, so you can
basically just `s/docker/podman/g` (as long as you don't use docker-compose).

It's also a lot easier to debug and see what's happening without that daemon
sitting in the middle of all the traditional linux tools.

------
mvallim
Please, before making your criticisms, understand the purpose of what was
written.

The official documentation itself addresses this.

[https://kubernetes.io/docs/setup/production-
environment/tool...](https://kubernetes.io/docs/setup/production-
environment/tools/kubeadm/high-availability/#create-load-balancer-for-kube-
apiserver)

------
vinay_ys
If you have a L3 IP network within your datacenter, doing a BGP anycast of the
VIP to a bunch of HAP servers is the best way to go. With ECMP, you can have
your cake and eat it too – all of your HAP nodes will be handling traffic and
when one of the nodes die, other nodes pick up the load nicely.

------
therealmarv
everything old is new again? If there is not any good easy method in
Kubernetes as q3k describes than why should somebody go into this
overengineering approach?

~~~
hijinks
as someone that works in operations.. This is pretty common think. Lets still
do things how we did it on bare metal inside kubernetes.

They are stuck in the same mindset and just think Kubernetes is a automation
tool.

I see it all day long as I interview people.

~~~
notyourday
> as someone that works in operations.. This is pretty common think. Lets
> still do things how we did it on bare metal inside kubernetes.

Because haproxy and nginx are proven technologies with limited and very well
known failure modes, which means there's exactly 5-6 well documented, well
understood and very well known ways haproxy and nginx can fail.

Experienced ops people understand that one does not optimize for the blue
skies -- during "everything works wonderfully" all non completely broken
technologies perform at approximately the same level. Rather these people
optimize for the quick recovery from the "it is not working" state.

~~~
tyingq
The article, though, suggests also adding corosync and pacemaker. So 4 things
on top of the already complex K8S. I bet someone later throws in a service
mesh. Imagine troubleshooting all that.

~~~
wtarreau
That's exatcly in order to blow in this fog that you want to install a layer7
load balancer designed with observability in mind. Seeing what's happening is
critical when everything changes by itself under your feed. With a component
like haproxy you get accurate and detailed information about what's happening
and the cause of occasional failures allowing you to address them before they
become the new standard.

~~~
tyingq
That explains nginx. Running a corosync cluster inside a k8s cluster, with a
separate pacemaker cluster resource plane, etc, though. No thanks.

~~~
darkwater
1000 times this. Corosync and pacemaker alone are more or less as complex as
K8S itself. Well, I'm exaggerating a bit but really, all the HA clusters that
I saw, done with corosync in the past 10 years ended up failing anyway (and
with fireworks!) one way or another. Add this on top of Kubernetes? No,
thanks. Life is stressful enough.

~~~
jskrablin
Yup. Corosync + Pacemaker can and will implode in a spectacular fashion
exactly when you don't want it to. And 2 node cluster will split brain sooner
rather than later. I'd rather use keepalived if required, since it's a lot
easier to understand and manage.

