Naming. Hard problem in computer science.
With the plethora of options, what missing for me is what ones perform well under heavy loads. Its painful to find out after the fact since each ingress controller deploys in differing (and sometimes not compatible) ways.
For example, with nginx-ingress there are gotchas under heavy load. nginx-ingress doesn't support SSL session caching on the upstream (nginx<->your pod). This is a deficiency in the lua-balancer implementation. You can tune keep-alive requests on the upstream, but it isn't always enough. That 50% CPU savings from SSL resumption is costly to lose at times.
This has bitten me when a client side connection burst requires a connection burst in nginx<->service. Upstream services then burn a lot of CPU negotiating SSL, to the detriment of request processing. This then causes more nginx connections to open up due to slower request processing; and might cause healthchecks to fail. There just aren't enough tuning parameters to control how hard nginx hits your upstream PODs.
That issue also says that an easy way is coming in 1.9.
It is the only real, standards compliant way in which you can preserve client information (IP address, etc) while its moving inside a kubernetes cluster. Not all ingresses have support for injecting it. Most ingresses can read it (assuming a cloud based load balancer has inserted it already).
We had moved to haproxy for this reason.
Is Proxy Protocol different to making sure to add client information as an extra header?
At least thats what we normally do, we make sure that client information like the client ip are added as header filds which can easily be used in monitoring and co.
It wraps a TCP connection (not just HTTP) and forwards it, and preserves the originating IP information as a small header on the proxied connection.
It's supported beyond HAProxy, too; e.g., AWS ELBs support it. I've used it when forwarding TLS connections, but not wanting to lose the remote IP information. (And not wanting to decrypt on the ELB, which is required to add, e.g., an HTTP Forwarded-For header.)
If you add your header and the next hop is to apache or some other software (for e.g.), it won't know what you did.
Proxy protocol is a standard. Every software that supports proxy protocol reads and write it in the same way.
This is supported outside of software. All clouds (AWS, GCP, Azure ) support it. So does third party services like Cloudflare - https://developers.cloudflare.com/spectrum/proxy-protocol
We are setting up our cluster and we ended up going with Traefik and I just published an interview with our architect where he explained why he choose Traefik. Excuse the plug, but its here:
The short version is that he finds it easier to setup than Nginx. I think learning curves are an important metric that must be considered as well.
This allowed our front end devs to run it locally when they were running local api servers.
This is why things like K3s will bundle Traefik with it to save you the pain, but really this should be the standard. It should be swappable (like it is in K3s), but come with something already available.
Updated - Thanks for the reminder - it's ingress controller, not ingress.
Anyway - the reason K3s can come with an Ingress Controller OOTB is purely a result of the fact that K8s has made it possible by making the architecture pluggable. Nearly every K8s-a-a-S solution comes with its own built-in flavour of an Ingress Controller (and other controllers, too). K3s is just such an implementation. K8s could come with its own standard implementation of an Ingress Controller, but considering how much the implementation varies between deployments, it makes sense they have opted out of that.
Comparing K8s to K8s distributions (like K3s) is somewhat missing the point of what K8s is - a framework to build your own implementation upon, with whatever fits your deployment.
Setting up K8s clusters from scratch on bare metal (or even worse, VMs) is generally not something you should be doing, unless you really know what you're doing and have a good reason to. Think: going 'Linux from Scratch' instead of installing Ubuntu. K3s is not an alternative to K8s, it really just is a distribution for bare metal deployments. So is Rancher, OpenShift, etc.
If you deploying to almost any cloud provider they have an implementation for you. If you're on bare metal, you have to make that choice yourself.
Even if you're hosting yourself you don't technically need a load balancer, you could use host ports, or node ports and it would work for some definition of work. I have in the past set up an ingress to listen on host ports of every node and it works well, if a little janky.
I'm surprised you're complaining about this which is an easy thing to replace and not your choice of network plugin, which can result in having to reprovision the cluster or similar if you want to replace it with something different.
And to me, even if it was a full rewrite: if it quacks like a K8s, walks like a K8s, and passes K8s conformance tests, it's a K8s :). I don't require a distribution to take upstream k8s binaries as is and just orchestrate them with a configuration management system, I think a lot of value comes from slightly moving things around and changing them to fit your vision. Better that than a murder of shell scripts.
I get it, but it's often helpful to distinguish between interfaces and implementations. If you want to say that the term "kubernetes" reforms to the API interface such that it can be implemented by any of a number of implementations, that's fine but we should have a separate term for the default implementation.
> I don't require a distribution to take upstream k8s binaries as is and just orchestrate them with a configuration management system, I think a lot of value comes from slightly moving things around and changing them to fit your vision. Better that than a murder of shell scripts.
Distributions are nice because you often don't want to have to build out your own logging, monitoring, cert-management, secret-encryption, ingress controller, load balancer controller, volume controllers, object storage, etc, etc, etc every time you stand up a cluster. Basically for the same reasons we have Linux distributions (most people don't want to have to roll their own Linux from scratch every time), Kubernetes distributions would be nice. None of this requires "a murder of shell scripts" (but I like that collective noun) nor does it prevent you from swapping the distro-standard components for your own.
Thanks for the effort! A very nice overview, which makes choosing between load balancer implementations when looking for a specific feature a lot easier. Somehow tables like these are hard to find when you actually need them, good to know this one exists.
Typically we want to create a service mesh overlay across our applications and their services - to secure and observe the underlying service traffic - and still expose a subset of those via an API GW (and via an Ingress Controller) at the edge, to either mobile applications or an ecosystem of partners (where a sidecar pattern model is not feasible).
With Kuma and its "gateway" data plane proxy mode, this can be easily achieved via the Kong Ingress Controller, which is mentioned in this spreadsheet.
Disclaimer: I am a maintainer of both Kong and Kuma.
 - https://github.com/Kong/kong
 - https://github.com/kumahq/kuma
I couldn't find a good comparison like the one in OP about ingress controllers.
If you have questions, you can always reach out at https://kuma.io/community
The large underlying problem is that the Ingress controller is the place where people need to do a lot of very important things, and the API doesn't specify a compatible way to do those things. Even something as simple as routing ingress:/api/v1/(.*)$ to a backend api-server-v1:/($1) isn't specified. Nginx has its own proprietary way to do it. Traefik has its own proprietary way to do it. Every reverse proxy has a way to do this, because it's a common demand. But to do this in Kubernetes, you will have to hope that there is some magical annotation that does it for you (different between every Ingress controller, so you can never switch), or come up with some workaround.
Composing route tables is another problem (which order do the routing rules get evaulated in), and Ingress again punts. Some controllers pick date-of-admission on the Ingress resource, meaning that you'll never be able to recreate your configuration again. (Do you store resource application date in your gitops repo? Didn't think so.) Some controllers don't even define an order! The API truly fails at even medium complexity operations. (It's good, I guess, for deploying hello-app in minikube. But everything is good at running hello-app on your workstation.)
Then there are deeper features that are simply not implemented, and seriously hurt the ecosystem in general. One big feature that apps need is authentication and authorization handled at the ingress controller level. If that was reliable, then apps wouldn't have to bundle Dex or roll their own non-single-sign-on. Cluster administrators are forced to configure that every time, and users are forced to sign in 100 times a day. But the promise of containerization was that you'd never have to worry about that again -- the environment would provide crucial services like authentication and the developer just had to worry about writing their app to that API. The result, of course, is a lot of horrifying workarounds (pay a billion dollars a month to Auth0 and use oauth-proxy, etc.). (I wrote my own at my last job and boy was it wonderful. I'm verrrrry slowly writing an open-source successor, but I'm not even going to link it because it's in such an early stage. Meanwhile, I continue to suffer from not having this every single day.)
It's not just auth; it's really all cross-cutting concerns. Every ingress controller handles metrics differently. ingress-nginx has some basic prometheus metrics and can start Zipkin traces. Ambassador and Istio can do more (statsd, opencensus, opentracing plugins), but only with their own configuration layer on top of raw Envoy configuration (and you often have to build your own container to get the drivers). The result is that something that's pretty easy to do is nearly impossible for all but the most dedicated users. The promise of containerization basically failed, if you really look hard enough you'll see that you're no better off than nginx sitting in front of your PHP app. At least you can edit nginx.conf in that situation.
My personal opinion is to not use it. I just use an Envoy front proxy and an xDS server that listens to the Kubernetes API server to setup backends (github.com/jrockway/ekglue). Adding the backends to the configuration automatically saves a lot of configuration toil, but I still write the route table manually so it can do exactly what I want. It doesn't have to be this way, but it is. So many people are turned off of Kubernetes because the first thing they have to do is find an Ingress controller. In the best case, they decide they don't need one. In the worst case, they end up locked into a proprietary hell. It makes me very sad.
I recently tried every single K8s IC, one by one, painfully. The biggest challenge was documentation, even something as simple as an example was missing for many of them. They would have examples for one use case, but not per use case. It was incredibly frustrating.
Proxies seem to add a lot of complexity and indirection (not to mention inefficiency).
We don't use NAT. When people say RFC1918 addresses aren't routable they mean you can't advertise them on the public Internet. You can totally have 10/8 switched and routed internally. Even if you don't want to mess with that, it's entirely possible to set a proxy to proxy from its public IP to another public IP, but why waste the IP space, especially in v4? I can server 80 applications each with half a dozen or more backend systems from about six proxy servers. The proxy isn't running the application and connecting to the RDBMS so each one can handle way, way more traffic than even an efficient application.
Using a proxy that's layer 7 aware also lets you do things like scale different parts of a URI tree individually. If you're not proxying, every copy of app.example.com serves everything under https://app.example.com/ even if it's one of 23 (about the most A records that will fit in UDP DNS) servers doing it. With a proxy, I can decide that https://app.example.com/freestuff/ is too busy and spikes the backends, and that the load of that and the rest of the stuff under / is too hard to scale for properly when taken together. So I just tell my proxies that everything for https://app.example.com/freestuff goes to server set A and the rest of stuff in / goes to server set B. Then the different performance and different demand for the two can be studied, improved, monitored, and reported separately at the VM or container (or even bare metal backend) level.
I can also throw memcached or Redis into the mix and do site-wide limiting on an IP that's scanning or attacking my entire laundry list of applications. Even without those I can rate-limit what each proxy will accept from a single visitor globally or per backend type.
I appreciate proxying does give you slightly better reaction time but this has to be weighed against the costs and outages caused by the complexity in the proxying.
Re you app.example.com scenario: I was not thinking of serving the static assets and frontend code from the individual services, but rather having eg your SPA know about the different endpoints and serving the resources from their own service which would be stable.
Re the memcached angle - the static serving can have caching by whatever mechanism and the API endpoints can also use memcached all they want, with at least as good control granularity as in the proxy case. But if you were thinking about a caching reverse proxy, I think that would be more prone to poisoning the cache with error replies than having the endpoints do their caching individually.
Again, think of all the failure prone machinery you can elide in this scenario.
Varnish rules are also not that difficult, but I said nothing of caching reverse proxy. I mentioned Memcached specifically in the context of federating traffic counts across multiple non-caching proxies.
Whether an application uses Memcached or not is irrelevant to the proxies.
End-to-end addrssing with globally unique addresses is just a good idea, it's a big reason the internet model won over competing networking technologies.
I'm not sure what you mean by fallback proxies without proxying being involved. Could you elaborate and clarify?
If you're concerned that this would still be an issue you could easily do some measurements. There's no incentive for ISPs to break the this given the "my internet is broken" UX and how tiny DNS traffic is.
With an internal ingress controller, one can combine all ingresses into e.g. a single nginx service. That service can then be fronted by one single network LB for the whole cluster.
The implementing side for Services has to then do the appropriate operations to either get you that IP, or fail creation of Service. On GCP, if you set loadBalancerIP to IP of a persistent, unattached public IP you own, it will automatically attach to it. I expect similar behaviour on AWS and others.
How you route that traffic to your cluster is another problem.