What Is a Service Mesh?

AtlasBarfed · on Oct 29, 2022

"Data planes are proxy sidecars"

Wow, kubernetes has completely polluted the architecture nomenclature.

This is like saying "abstract your data access layer". Proxy sidecars? seriously? Doesn't that imply some pointless IPC communication before invoking the ... wait for it ... SERVICE url that actually has the data? So rather than your ... ?motorcycle? directly calling the database ("data plane"? seriously?) via the SERVICE URL, you call some stupid sidecar to do it, I mean, isn't that the EXACT OPPOSITE of what you'd want a "service mesh" to be? Some prescribed fixed (virtual) hardware config rather than a pure service invocation?

Weird.

aliqot · on Oct 29, 2022

I have a theory that people who coin new terms that catch on stand better chances at being promoted or hired.

Maybe it's marketing running the show and I'm just an old grouch. Both are equally possible.

remram · on Oct 29, 2022

Having read that, I still don't get the point. Especially when running on Kubernetes, where coredns and kube-proxy provide service discovery, load-balancing, and dynamic configuration with no overhead (kube-proxy is not a proxy, it uses NAT). Especially with your own microservices, where you can put all the observability and TLS you want in the services.

Is this another of those things that has valid use cases for very specific applications but everyone is mistaking for a best practice somehow?

ec109685 · on Oct 29, 2022

Uniform error handling, smart retries, application independent tracing, load shedding, layer 7 based traffic splitting, etc. are benefits not included in the stock l4 based machinery.

If all your micro services use the same base libraries, then it is true you can implement these things there.

Also, the overhead of service meshes are rapidly reducing, with more happening in the net filter layer.

remram · on Oct 29, 2022

What kind of error handling are you talking about? What kind of tracing can you really do without either endpoint's cooperation? I am familiar with OpenTelemetry, which is all about carrying identifiers through multiple layers; you can't do that with "just" a smart transport.

I just don't get it. Maybe I haven't dealt with a messy enough mix of microservices.

In any case, if you want to add goodies to layer 4, you have to add overhead, by definition. Netfilter can't do any of that "layer 7 traffic splitting" or "smart retries" or even "error handling" since it operates (mostly) at layer 3.

ec109685 · on Oct 30, 2022

Example error handling like retries, but only up to a certain budget so you don’t 2x traffic when there’s a downstream outage. Traffic aware routing within an AZ so you don’t pay cross-az traffic unless an error causes you for retry.

funstuff007 · on Oct 29, 2022

> coredns and kube-proxy provide service discovery

Is service discovery an anti-pattern? Is someone is already in your house, do you want to give the burglar directions to where the valuables are kept?

ec109685 · on Oct 29, 2022

Service Discovery isn’t about exposing a directory of services, but mapping a name to an endpoint.

fragmede · on Oct 29, 2022

If you have pods going up and down, you need some way of discovering which ones are currently up and healthy and can be used vs which ones are not. That information has to exist somewhere.

tptacek · on Oct 29, 2022

Envoy isn't a Hashi product.

alexeldeib · on Oct 29, 2022

That table isn’t super clear. Creators refers to the creators of the mesh in the first column (Consul).

They all use Envoy except linkerd.

riadsila · on Nov 2, 2022

True, the table isn't super clear. It'd be better to list the creators before listing the proxy technology each mesh uses.

pid-1 · on Oct 29, 2022

And Consul has some enterprise features, hence not entirely open source.

fmajid · on Oct 30, 2022

A pretentious word for a proxy server. Can't charge enterprise licensing fees for the latter.

KronisLV · on Oct 29, 2022

It pains me to see that the concept of a service mesh often become Kubernetes centric, because I believe that they can be useful even with other, simpler orchestrators (Docker Swarm or Hashicorp Nomad, or just running any container runtime where you have overlay networks and such across possibly multiple nodes), or even when you don't use containers at all.

For example, to me it feels that even with something simpler like Docker Compose/Swarm and a bit of configuration for the web server of your choice (which might work as an ingress that's bound to ports 80 and 443, without even needing to learn of the Kubernetes Ingress Controller concept), you can get many of the benefits. I am actually using the decidedly old school Apache httpd web server (previously used Caddy) for something like that in my own personal deployments, in front of every single one of my sites, here's a quick look into some common configurations: https://blog.kronis.dev/tutorials/how-and-why-to-use-apache-...

Let me give you a few examples (if you think these are silly, my summary is at the bottom):

> Service discovery

Docker actually provides you with DNS out of the box, so your web server can easily refer to services, like such:

  ProxyPass "/" "http://my_application:80/"

More information: https://docs.docker.com/network/

> Load balancing

The above will also distribute the traffic based on how many instances you have running, from as many web servers as you have running. Throw in health checks (such as the container running curl against itself, to check that the API/web interface is available when starting up, as well as periodically during operation) so no traffic gets routed before your application can receive them and you're good for the most part: https://docs.docker.com/engine/swarm/services/#publish-ports

> TLS encryption

Let's Encrypt as well as your own custom certificates are supported by most web servers out there rather easily, even Apache now has mod_md for automating this: https://httpd.apache.org/docs/trunk/mod/mod_md.html

Also, if you want, you can encrypt the network traffic between the nodes as well and not worry about having to manage the internal certificates manually either: https://docs.docker.com/engine/swarm/networking/#customize-a...

> Authentication and authorization

Once again, web servers are pretty good at this, you can configure most forms or auth easily and even the aforementioned Apache now has mod_auth_openidc which supports OpenID Connect, so you can even configure it to be a Relying Party and not worry as much about letting your applications themselves manage that (given that if you have 5 different tech stacks running, you'd need 5 bits of separate configuration and libraries for that): https://github.com/zmartzone/mod_auth_openidc

> Metrics aggregation, such as request throughput and response time, Distributed tracing

This might be a little bit more tricky! The old Apache outputs its server status with a handler that you can configure (see a live example here: https://issues.apache.org/server-status ) thanks to mod-status: https://httpd.apache.org/docs/2.4/mod/mod_status.html and there's similar output for the ACME certificate status as well, which you can configure. The logs also contain metrics about the requests, which once again are configurable.

Other web servers might give you more functionality in that regard (or you might shop around for Apache modules), Traefik, Caddy as well as Nginx Proxy Manager might all be good choices both when you're looking to hook up for something external to aggregate the metrics with minimal work, or want a dashboard of some sort, for example: https://doc.traefik.io/traefik/operations/dashboard/

> Rate limiting

In Apache, it's a bit more troublesome (other servers do this better most of the time), depending on which approach you use, but something basic isn't too hard to set up: https://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

> Routing and traffic management, Traffic splitting, Request retries

I'm grouping these together, because what people expect from this sort of functionality might vary a lot. You can get most of the basic stuff out of most web servers, which will be enough for the majority of the web servers out there.

Something like blue/green deployments, A/B testing or circuit breaking logic is possible with a bit more work, but here I'll concede that for the more advanced setups out there something like Istio and Kiali would be better solutions. Then again, those projects won't be the majority of the ones out there.

> Error handling

Depends on what you want to do here, custom error pages (or handlers), or something in regards to routing or checking for the presence of resources isn't too hard and has been done for years.

Summary:

But what's my point here? Should everyone abandon using Linkerd or Istio? Not at all! I'm just saying that even with lightweight technologies and for simpler tech stacks, having and ingress as well as something that covers most of what a service mesh would (e.g. the aforementioned Docker overlay networking, or similar solutions) can be immensely useful.

After putting Nginx in front of many of the services for projects at work, path rewriting, as well as handling special rules for certain apps has become way easier, certificate management is a breeze since it can be done with Ansible just against a single type of service, in addition to something like client certs or OIDC (though admittedly, that's mostly on my homelab, with Apache).

Once you actually grow past that, or have your entire business built on Kubernetes, then feel free to adopt whatever solutions you deem necessary! But don't shy away from things like this even when you have <10 applications running in about as many containers (or when you have some horizontal scalability across your nodes).

Note: you can still have HAProxy or some fancy WAF solution for your org, or anything of that sort as well - I was talking more about the scale of individual projects/applications here. It's really nice that you can layer whatever you need.

remram · on Oct 30, 2022

The point is to do all this between services, not between the user-facing apps and the users.

KronisLV · on Oct 30, 2022

In the above example, that's where Docker networking would come in (load balancing, health checks, transparent encryption of traffic) for many of those concerns.

Not all of them, unless you'd want to run a web server sidecar for your apps, which would be very much like a slightly simplified implementation of a service mesh anyways!

Then again, in my eyes the "sweet spot" of simple architecture is something like: [users] <==> [ingress] <==> [service] <==> [database/message queue/key-value store] with maybe some middleware. If you have a lot of traffic between the internal services, then maybe your architecture mandates a more fully fledged solution.

remram · on Oct 30, 2022

You don't have to use a service mesh. What you describe is not a service mesh.

It's just a little weird to start your comment with "It pains me to see that the concept of a service mesh often become Kubernetes centric" and then describe an architecture that doesn't have a service mesh.

KronisLV · on Oct 30, 2022

Fair enough, thanks for your input.