I have to say, though, that I am always perplexed by these toy microservice architectures because they are not solving a problem but creating it. For example, we would not need sophisticated distributed tracing if we used less moving parts; we would not need to use highly optimized (but hard to debug) protocols like grpc and protobuf if we did not have to rely on a massive graph of calls for each service, and so on.
Of course, there are a bunch of lovely use cases for microservices, mostly around "how do we make 1k developers collaborate over our codebase?" (answer: enlarge the codebase!), which always sounded like a self-fulfilling prophecy to me, but then again, I loathe working for huge companies.
If that is your case, then these tools will make your developer life a little more bearable. If you are playing with microservices because they are fresh and new, please consider a more conservative architecture first.
gRPC isn’t necessary hard to debug, and it brings a whole host of improvements over whatever ad-hoc mess you used before.
I certainly did not say these architectures appeared out of thin air. They were invented to solve the problems of internet-scale companies.
> Micro services solve tangible, real world problems.
Incorrect. Microservices solve tangible, real-world problems when they are applied to the correct problems. Microservices might also create tangible, real-world problems whether they are applied correctly or not. In fact, that's one of the points of the article, if you read between the lines.
> gRPC isn’t necessary hard to debug, and it brings a whole host of improvements over whatever ad-hoc mess you used before.
I don't know what you are referring to. If your microservices are well done, good for you. You still need to optimize the transport, which is a cost you pay only because of your choice of architecture. If you used thicker services or a monolith you would have different problems, but not that one.
As an elixir/erlang enthusiast, haven't we just reinvented BEAM and observer?
I mean, that's probably easy to do for microservices; you fire off a function, it does something useful and returns back quickly. Having K8s do a home-grown lambda hosting for you is useful there.
But the moment you introduce any state, the whole idea of ephemeral containers is much more of a hindrance than something actually useful. A singular BEAM VM might run (realistically) tens of thousands of processes that do a lot of useful work. It's just not practical to kill off such a container and hand-wave away the huge costs of spinning it back up.
IMO K8s and Erlang/Elixir are mutually exclusive at this point, sadly.
There are projects in progress that attempt to solve the distributed supervisors ideas -- Swarm, Firenest, Horde -- but even without that, any Erlang/Elixir app can go a long way before needing distributed coordination.
What problem are we solving that we were not solving 15 years ago, again? How much time and effort are being saved (by organizations smaller than google)?
The industry is in a big transition. First we ran VMs in datacenters, then we ran them in cloud providers. Then we started running containers and realized that their ephemeral nature made it possible to treat deployment like code, and Kubernetes is now the standard for doing that.
So, Kubernetes gives us easy abstractions for deployment. But having lots of little ephemeral containers that are constantly changing creates problems of security and visibility and routing. Service mesh is an attempt to solve that problem.
I would point out that Kubernetes has become an industry standard at this point. A few years ago it was reasonable to think Mesos would become a standard, or that multiple solutions would coexist. In fact, Kubernetes has destroyed its competition and Mesos is basically dead. It would currently be insane to adopt one of Kubernetes's competitors or try to roll your own.
If your organization standardized on Mesos a few years ago, you probably regret that decision today, and you are probably forced to plan a transition to Kubernetes.
The service mesh market is very immature at the moment, and the barrier to entry is still low. Witness the recent arrival of AWS App Mesh, which is probably going to do quite well. I would be very wary of committing to a particular mesh until the dust settles a bit and we see a clear winner. Otherwise you run the risk of choosing the Mesos of the mesh world.
With time I believe this will get better, but lack of mature client libraries in Java is probably a good reason why this is currently the case. Most of these stacks are all Java based. The people who write software that uses Mesos, or work on Mesos, which is all Java based don't have Kubernetes libraries with the same maturity as the Go based libraries. So it's either wait/fix the Java libraries, or write (and maintain?) an operator in a language that isn't what you regularly use.
Mesos’s “batteries not included” approach means it does not now and will not ever have feature parity with K8s, but we’ve managed to cobble enough batteries together to make it operationally sufficient for our needs. We certainly aren’t planning a transition away as far as I can tell. Anyways, just my 2¢! :)
Even Docker, which competes directly with Kubernetes via Swarm and Compose, has felt the need to include a Kubernetes cluster in Docker for Desktop. That's how comprehensively Kubernetes has taken over the container orchestration market.
In any case I was not specifically criticizing Mesos, but rather using the Mesos vs Kubernetes example to point out that committing to a particular service mesh this early in the evolution of the concept is probably unwise.
While we as developers have taken on a lot more complexity today, soon we'll only care about our apps as most of these components will standardize and be available on all clouds. At least i hope so.
No deployment automation, no service discovery/load balancing, no tracing, no time series aggregation or visualization.
The ones they sell to the public aren’t needed at all internally?
I don't care how good your hiring practices are, there is zero chance you get 5000 devs using this stack correctly.
Less is more at this scale.
Much of this stuff was developed by such infrastructure teams to solve their own problems; people realized they were common across the industry and started collaborating on open source. The alternative is typically half-baked, homegrown deployment automation, service discovery, etc. not a fundamentally simpler architecture.
The idea was that they could manage the infrastructure their services need - ingresses, secrets, certificates, pods, load balancing, service connections, etc as code. And it’s been a wild success.
I wouldn't draw many conclusions about the technology itself from that though.
Google is able to pull it off. Obviously people make mistakes but there are systems and processes in place to take care of that. It actually does work pretty well.
if everyone make their platform out of those small and generic blocks, it will be very easy for them to provide standar services around the blocks. while if you had an efficient and sane application with two of the basic concepts those blocks provides coded in, the cloud providers would have to provide custom bindings for all the things they wrap around the small blocks. making their lives much more difficult.
thats why you mostly see larger companies advertising the sexiness of containers. because they are either the cloud provider, or they already are binded to a cloud provider, or they work internally by departments that in the end act much like cloud providers anyway.
15 years ago doing this stuff well took at least 10x the engineers it does today and you would have had to cut yourself on every sharp corner that these tools make smooth.
At my last company (a startup) I (1 person) bootstrapped infra for a continuously deployed web app that distributed scientific workloads across 10k+ cores of compute in 6 months with no prior experience running large clusters. 15 years ago that would have been impossible. Frankly it wasn't really difficult enough to be fun with these tools.
How do you justify these technologies versus what we had 15 years ago?
Kubernetes: 15 years ago, you waited 3 months for Dell to ship you a new server, then you went to your datacenter on Saturday to install it in your rack. Hmm, the air conditioner seems broken. File a support ticket with the datacenter, pay $10,000, then spend next weekend migrating your application to the new server. Now? Edit the line that says "replicas: 1" to say "replicas: 2" and kubectl apply it. Enjoy the rest of your weekend with your family. Now, your customers can purchase your products on Black Friday, meaning extra money for your company and extra salary for you. "Come back later." They never do.
Istio: Istio exists because of flaws in the design of Kubernetes's load balancing. It thinks that "1 TCP connection = 1 transaction" but in the world of persistent connections (gRPC, HTTP/2), that is untrue, so Istio exists to bring that sort of abstraction back. 15 years ago, you're right, you didn't need it. Your website just said "MySQL connection limit exceeded" and you hoped that your customers would come back when you got around to fixing it.
Docker: Instead of manually installing Linux on a bunch of computers, you have a scripting language to set up your production environment. The result is the ability to run your complicated application on hundreds of cloud providers or your own infrastructure, with no manual work. You reduce the attack surface, protecting your users' data, and your ensure that bugs don't get out of control by limiting each application's resource usage. 15 years ago, you spent hours configuring each machine your software ran on, crossing your fingers and praying that your machine never died and that the new version of Red Hat didn't break your app.
Envoy: 15 years ago you used Apache. Now you use Envoy. From transcoding protocols (HTTP/1.1 to HTTP/2, gRPC to gRPC/Web or HTTP+JSON) to centralizing the access and error logs across thousands of application to providing observability for anything using the network, it's the swiss army knife of HTTP. It's light, it's fast, it's configurable, and it does what it claims to do extremely well. Maybe you don't need it, but SOMETHING has to terminate your SSL connections and provide your backend application servers with vhosts. Might as well be Envoy. It's the best.
Prometheus: 15 years ago, you waited for your users to report bugs in your software. Now you can monitor them 24/7, and get an alert in Slack before your users notice that things are going south. I am not sure how you argue against monitoring and metrics. Maybe you like reading the log files or waiting for your coworkers to swarm your desk saying they can't work because your software blew up. I hate that. Prometheus lets me see inside my applications so I never have to wonder whether or not it's working.
Grafana: A nice UI for looking at Prometheus metrics and annoying me when they are not good. Clean code, nice UI, great featureset... could not live without.
Jaeger: Jaeger exists in a world that's moved past "our app" to "our cluster". Maybe you hate microservices, it's a pretty popular thing to hate. But if you are using them, you need to know they are communicating, and Jaeger is that. Another service I couldn't live without. (At Google, we had a shitty version of Jaeger called "Dapper". It was indispensable. Jaeger is just a version of that that works better and you can use outside of Google.)
Kiali: Never used it. I imagine it's good when you have a production environment shared by multiple teams, and you want to keep an eye on unexpected dependencies.
Helm: Pretty awful, use kustomize instead. 15 years ago, though, you just had 100 random files in /etc/ and /var/lib/cgi-bin that Helm attempts to replace that. Now you get a backup, source control, code reviews, and guaranteed consistency between machines. You never had an outage 15 years ago because someone edited some random file in production? Lucky you, because I sure did. Helm attempts to make configuration less "interesting" and "fun". I think it's a bad design, but it's way better than what we did 15 years ago.
Hope this helps.
We're deploying an application that's in containers but not running on k8s, and has:
* an Angular front-end, using grpc-web
* a gRPC-web proxy (improbable-eng; golang)
* an nginx proxy (for Angular, and routing grpc traffic)
* an authentication manager (grpc+rust)
* an audit logging service (grpc+rust)
* a database service (grpc+kotlin)
* a 'logic' engine thing (grpc+kotlin)
* jaeger for distributed tracing
It's been a joy to develop, but I wouldn't mind reducing the 2 proxies to 1 with maybe Envoy. I tried rewriting grpc-web proxy in Rust, but gave up after struggling and not having enough time to complete it.
I have looked for people who have successfully run Istio in production outside Kubernetes and I cannot find any. Most of the documentation and examples you will find online are for Kubernetes.
1. why are they choosing gRPC over REST if the application is entirely web based. With REST you have a standardized system of verbs and not specific application functions (GET /user/id , 200 OK <data> vs.... MyGetUserFoo(ID) , some return data unique to application). With RPC both sides need very specific knowledge of the functions each other have and their arguments all the time, service discovery seems harder and an update to application seems to nearly always imply and update to web.
2. What is the model supposed to be for typical protobuf schema sharing? I like PBs, but it seems a little harder than JSON or CBOR or MessagePak or other schemaless serializers in that the proto file has the same sync issue among all of your endpoints.
3. Wouldn’t be really swell if serializers were supported by cloud providers and Microservices a little more? I have a project right now that every time a message goes from our services to an endpoint it needs encoding and decoding. This is kind of a header now for every Microservices we have, it would be nice to not have the option of making so many mistakes.
Some might say that is a good thing. It kind of makes your API "type-safe". You can also auto-generate client libraries for the grpc services so your JS code would literally import a function and call it with args instead of dealing with XHR stuff.
> 2. What is the model supposed to be for typical protobuf schema sharing? I like PBs, but it seems a little harder than JSON or CBOR or MessagePak or other schemaless serializers in that the proto file has the same sync issue among all of your endpoints.
I don't think the "sync" issue is a problem really. We already have sync issues even if you just use REST. We need to make sure your client code is up to date to handle any changes in your API. This is more or less something engineers need to sync manually. May be write additional tests, may be annotate APIs with versions, etc. Something like grpc makes this explicit and since it can generate code, it allows us to build tooling that can automate most of the "sync issues".
Protobufs can be and in normal practice are made backwards-compatible. For serious breaking changes, you want to version your endpoints anyway.
Systems like this work far better when you just work to not break backwards compatibility full stop.
Similarly, they work better when you don't tangle up dependencies in a bunch of shared code, generated or otherwise.
Your always blocked from using later features until an update because we'll, you won't have application code that can take advantage of those features until you update.