> Why introduce a new concept to solve the problems that have been solved before?
Well, let me try and answer that -- I use istio in production and the reason I use it instead of, for instance, managing 30+ .p12's for each individual java service that needs to talk to an external api via mtls (not to mention all of those plus the 20 or so others which all automatically get mtls 'in-mesh') is because it's an absolute shitload easier to manage.
Is it complicated? Yes. Is it more complicated than orcestrating the cfgmgt to achieve the same thing via more traditional methods? No.
Don't get me wrong, I have my problems with istio:
* Until about 1.5 the docs were at best sparodic.
* Finding out how to handle things like talking to an entpoint that does some ssl renegotiation without crapping out is still not easy
* The configuration / yaml spec for the various options is not simple to write
However, those are really offset by what we get by using it. I have logs of every external api call my entire estate (a big complicated one) makes, I get automatic tracing, I can do some really cool shit like mirror traffic to a standby pod to do a/b testing, none of my serivces need credentials for mtls calls baked into them (they just call http and 'magic' happens on the egress).
End of the day, it's a tool. If you don't need it, don't use it. This article comes across like it was written in the same vein as the old man screaming "get off my lawn".
Eh. I think it's fine for there to be an equal number of "You might not need ____" articles as there are "Definitely use ____". Trendy technologies can be a trap, and I think it's fair to warn others.
This article makes some really great points, and comes to a great conclusion that I 100% agree with:
> before committing to a piece of technology, it’s crucial to understand the problems it solves, and the context in which the solution was made.
On the other hand, I think the title is doing it a huge disservice. Because, unfortunately, a lot of people don't read beyond the title, and that only helps to perpetuate the anti-k8s/Istio/devops sentiment that has become so popular and pervasive lately.
Are there a lot of situations where a service mesh complicates things and causes more problems than it's worth? Absolutely. Are there a lot of situations where a service mesh will really help? Without a doubt. Should you use it when you build your personal niche side-project? Probably not. But maybe a goal of your side project is to learn Istio - in which case, you totally should use it.
Having well set expectations & standards for building out your microservices seems like a fairly obvious way to claw back many of the advantages / normalizing behaviors that the pro-Monolith camp advocates.
This article seems to argue that making everyone use an RPC library is enough. But rarely is there good monitoring & observability built in to that. Other tasks like discovery &c may seem superflous, but it brings a well-defined set of expectations that incoming engineers stand a good chance of knowing. You're not just bringing in Monolith style genetic material & practice, you're bringing in common genetic material that others will know & be familiar with.
The overhead seems debateable. Performance can be quite good in some cases, depending on choices of underlying tech. The author talks about the cost of maintaining the system, but in the two modest sized environments I've seen (under 100 nodes), the transition was extremely painless & fast & required little effort, & nearly no maintenance. It also drastically reduced the effort we had spent & intended to spend instrumenting our existing services, so it felt to us like the maintenance cost was hugely negative.
The arguments about the proxies limiting your available options are somewhat interesting. For 99.999% of companies, I'd say that using RSocket or Aeron is way more fancy tech than you should be targeting. I do hope HTTP3 support emerges relatively quickly. My hope is that gRPC has some plans in the works somewhere, and that once gRPC starts to progress, the proxies will rapidly catch up.
> This article seems to argue that making everyone use an RPC library is enough. But rarely is there good monitoring & observability built in to that.
On the contrary, a proper RPC library tends to have those things built in. If your services are homogeneous enough that you can have everyone use the same RPC library, you can hook up that library's monitoring, discovery etc. functionality, and achieve the benefits of service mesh with much less overhead.
IME it's much the same for a lot of microservice scenarios - if your environment is heterogeneous then it makes sense, but in a homogeneous environment making a library instead often gets you most of the benefits at a much smaller cost.
Consistent metrics and retry/circuit-breaker behavior are my number one reasons for advocating a service mesh. There are other advantages, such as offering a consistent interface across multiple languages client to some shared resource- redis being a prime example, with various client implementations that behave very differently.
But, like everything there is a cost- in this case another level of complexity.
I had a discussion on the complexity of service mesh with a friend 2 days ago. Both of us working in cluster management (google etc.).
I feel service mesh provides good value in application layer policy control traffic management (routing, rate limiting etc.), security, deep observability. But it nonetheless still only provide the mechanisms at this point. I.e., one can implement to a high degree of flexibility whatever policy for their applications running on k8s clusters. But the complexity is exposed naked to the user.
A solution is to provide more abstraction, but it's not clear what that abstraction would be.
My friend said similar thing. And he said unless there are thousands of applications to manage, service meshes does not worth the investment. Or the break-even point for service mesh today is very high, probably only makes sense for orgs that is at or beyond Uber's size.
Would love to see how others view the value prop of service mesh, and what additional investment could make it easier to get value from service mesh.
I’m giving a talk [1] at SRECon about a tool [2] I’ve written that presents an alternate take one of the major problems that Service Mesh solves - Encrypting internal traffic. My solution is very simple, to enable your applications to use mutual TLS to authenticate with each other - By making Kubernetes put the certificates in a standard place and making the drop-in middleware so your application can be modified to use them with minimal effort. My point is the same - Service Mesh is great, your middleware already does most of it - The problem is policy and automation, so we’ve solved those for you.
With istio though, I don't need any motifications. I literally inject the sidecar and everything is mtls'd automatically, the apps just call http:// and the proxy does the rest.
More seriously, if your istio injector stops working, your app will keep working over plaintext. For some users, that’s a feature. For others, that’s a breach of contract with fines. My app targets the latter.
I think this misses the point of a service mesh. Yes a library may feat your needs but if you need an old Java application deployed on premise that only has a VPN connection to datacenter A with a private connection to datacenter B a service mesh will help you make everybody talk to each other nicely without to make tricky configuration with NATs, route tables, BGP, etc.
The logging, authentication, circuit breaker, etc. is nice to have on top of that but I don't think it's the main advantage, or at least not the only one.
This is not my read of the article at all. It says right at the top it's about standardizing a heterogenous environment of many services implemented in many languages. The features themselves are not the point, it's the standardization and isolation from service failure at the application process layer which is the real value add.
Kind of hijacking this conversation, but is there a service-mesh-like tool that allows reverse TCP tunneling via a central gateway server, kind of like services like ngrok / localtunnel just with all the bells and whistles of a modern service mesh? My use-case is that I want to be able to deploy a HTTP service across a heterogeneous set of distributed host, many of which don't allow any incoming connections / are behind NAT, and I am looking for a good solution to have these boxes connect out to a central gateway server which doesn't involve OpenVPN or SSH reverse tunnels.
I think Netifi is building a somewhat similar solution [1]. As far as I understand, their connects all services via a centralised broker. However, I'm wondering whether Cloudflare Argo can fit your use-case, [2]. It's a daemon that runs next to your software and exposes it to Cloudflare which means that you can open your software to the world even though it's behind NAT.
Many paid SSO/IDP solutions offer this, e.g. IDaptive has an App Gateway that has worked well for me, and Azure SSO has a more limited one as well. You run some agent behind the firewall, it talks outgoing to the cloud sso provider, and your end users get proxied through with the benefit of authenticating to the IDP before they even hit your service at all. Great way to slap 2FA on a lot of things without having to worry about VPNs.
The difference between this and ESB is that the service bus wasn't just a communications channel like rabbitmq. The ESB would process the messages and make decisions and kick off other actions based on the messages. It was a central coordinator of message handling. Very enterprisey.
I like this. I am a firm believer of keeping things simple until you need them. Service meshes have their place, and they are useful when your running at scale, or you really need all the features, but most people don't need them. You often hear people asking for enhanced telemetry, or that they need it to load balance grpc. The simple solution is to fix the telemetry in your app, and get your grpc client to use a headless k8s service and set it up correctly. This is a bit more involved upfront, but its better than maintaining another tool, and adding the extra complexity to your infra.
I do agree your first point about keeping things simple. The problem comes when you grow big enough to then need these complicated things, it becomes hard to "turn the ship" re-engineering everything. I could easily argue that laying down the "main" parts of your intra up front, especially on new projects, might be worth it.
It's true that it's more expensive to do the work later, but you also have way more resources later; in my experience the second term increases faster than the first one. Every day you spend wiring up infra is a day you're not iterating on your product, and from the accounts I've seen, most startups fail because they run out of money before they find PMF, not because they later got bogged down refactoring their V1 architecture.
I advise keeping architecture as simple as possible at first by building a monolith (or maybe FaaS if you're really doing throwaway experimentation), and keep refactoring it until your domain knowledge has solidified and it's clear where your service boundaries actually are; it's very unlikely that you'll get the service split correct the first time if you do it too early.
That does require predicting or assuming what the future will look like. If it becomes an unsuitable choice, then you end up with the same problem of turning the ship and the pain that comes with it. Or if you can be obstinate enough, pass the overhead on to others around you which brings its own pain.
The only other reason I can think of to use a service mesh is if you want to restrict which services can talk to each other. That's easier to do when centrally deployed via proxy configuration, though the building blocks are the same as what Kubernetes provides out of the box, so you could do that on your own. If you investigate that far, though, you're not installing service meshes due to hype, you're analyzing service meshes as a form of application architecture. Instead of deploying dependencies using language-specific tooling, you can deploy dependencies using Kubernetes and the network, with some overhead of course, and a lot of configuration-as-code. Parallels to "You don't need no service mesh" could also be drawn for things like the OpenTelemetry Collector or maybe projects like OpenPolicyAgent, or "Identity-Aware Proxies" and other cloud services -- including Kubernetes itself. In the end, all of these tools are generic enough that you can use them in many ways to save time -- but you don't have to, as this article points out, and it might not actually save you any time at all if your environment is small enough.
If you think a proxy can constrain communications between two services, I suggest that is not correct. How does it prevent the two services from speaking out of band? It is a fundamental tenet in computer security that access controls be enforced at the endpoints. It's not any harder to distribute ACLs to your endpoints than it is to distribute the same ACLs to a fleet of proxy processes.
I think the article is well balanced for when you can use a mesh and what the actual problem (heterogeneous tools) is.
What I think is missing is, that there is an underlying assumption that you can tweak each software you have in your environment, which might not be the case when you use paid software + self-programmed software + SaaS. The burden to configure each software to have tracing or mTLS is higher than finding out how each software might handle HTTP_PROXY.
So yeah, I am a big fan of gRPC with contexts, deadlines client certificates, but in an integrated heterogeneous environment you can only do so much.
At least this is my current understanding. If somebody has better/other experiences please help :)
Well, let me try and answer that -- I use istio in production and the reason I use it instead of, for instance, managing 30+ .p12's for each individual java service that needs to talk to an external api via mtls (not to mention all of those plus the 20 or so others which all automatically get mtls 'in-mesh') is because it's an absolute shitload easier to manage.
Is it complicated? Yes. Is it more complicated than orcestrating the cfgmgt to achieve the same thing via more traditional methods? No.
Don't get me wrong, I have my problems with istio:
* Until about 1.5 the docs were at best sparodic.
* Finding out how to handle things like talking to an entpoint that does some ssl renegotiation without crapping out is still not easy
* The configuration / yaml spec for the various options is not simple to write
However, those are really offset by what we get by using it. I have logs of every external api call my entire estate (a big complicated one) makes, I get automatic tracing, I can do some really cool shit like mirror traffic to a standby pod to do a/b testing, none of my serivces need credentials for mtls calls baked into them (they just call http and 'magic' happens on the egress).
End of the day, it's a tool. If you don't need it, don't use it. This article comes across like it was written in the same vein as the old man screaming "get off my lawn".