Hacker News new | past | comments | ask | show | jobs | submit login
gRPC: The Ugly Parts (kmcd.dev)
111 points by ingve 64 days ago | hide | past | favorite | 84 comments



> There’s always a lingering question about Google’s long-term commitment to gRPC and protobuf. Will they continue to invest in these open-source projects, or could they pull the plug if priorities shift?

Google could not function as a company without protobuf. It is ingrained deeply into every inch of their stack.

Likewise, gRPC is the main public-facing interface for GCP. It's not going anywhere.


Their commitment to open source, however, might go.

Quite recently Google quietly unshipped an effort to make their protobuf build rules more useful to OSS users of Bazel (see the rules_proto repository). This wasted a huge amount of planning and work that'd gone into the migration.

And the fact that these tools are designed first and foremost for Google use shows up everywhere. Stuff that Google fundamentally doesn't care about but is widely used (eg Ruby) is stagnant.

In this state, it's totally reasonable to reconsider whether these tools are worth building on top of. I personally still believe! But I don't blame people who are skeptical.


> Their commitment to open source, however, might go

Google's OSS contributions are largely correlated to the fact that they could _afford_ to do OSS. When you have the best business model in the world, you can afford X% of your engineering hours focused on OSS. Of course, it's purely not altruistic they also get back a lot in return

However, if due to AI or other advancements, Google's business model takes a hit I wouldn't be surprised that their OSS contributions are the first to go off. Like we saw Google codejam being discontinued in 2022 layoffs

Though if your business outlives Google, gRPC going away might be least of your problems


There was a influential internal white paper about not being a "tech island" that drove open-sourcing. The point was that by having its own tech stack Google would eventually be left behind and have a harder time recruiting.

Not sure if the message is still believed as strongly.


The message is pretty well understood - the only difference is that the monorepo (think of it as a service in and of itself) and its associated tooling do get seen as "Google-specific."

Bazel in general has really awful support.


Google continuing to use gRPC and protobuf internally and Google continuing to invest in the open-source projects are not the same thing. It being so central to Google isn't necessarily even a good thing for people outside Google; it means there's a lot of potential for changes which are very good for Google but very pain for everone else.


protobuf might as well never disappear as it is so central to Google. gRPC however is hardly used internally compared to stubby which is the actual essential version


Depends whether you consider Google Cloud internal to Google.


even google cloud is not very dependent on gRPC

as far as I remember most of the API is built on REST


"Not very dependent" is subjective. The objective relevant take is it is a required dependency of parts of officially supported APIs of major GCP services that have large paid customers with SLAs. It can't go away anytime soon.

Google may have Stubby as the primary internal RPC, but several other large companies rely primarily on gRPC and have contribute to it and have incentives to keep it alive, e.g. Dropbox, Netflix, even parts of Apple and Microsoft. In fact, Microsoft's support of gRPC in C#/.NET is arguably more first-class and polished than any other language.


Fair enough

Although this might have some external implications, most of GCP does not rely on gRPC, even the external customers are not usually dependent on gRPC when using Google services.

Correct me if I'm wrong but gcloud uses REST and so do the libraries google publishes for using GCP apis in different languages.

The question is can Google stop supporting gRPC, protobuf or stubby tomorrow, and I still think gRPC is relatively at risk


> so do the libraries google publishes for using GCP apis in different languages

Not true. Many Google Cloud client libraries do support gRPC.

> still think gRPC is relatively at risk

I would agree with you that relative to protobuf and stubby, gRPC is less critical to Google infra. Yet, in absolute terms I would put the probability of that happening for any of those in the next couple decades at zero.


you are right, I checked now and these are based on gRPC

for some reason I remembered these used REST


That info is a bit outdated. All but the oldest APIs (most notably GCE) support gRPC out of the box.

For newer services, there is an automatic 1:1 mapping between the REST and gRPC APIs, except for features like streaming RPCs which have no REST equivalent.


It supports gRPC but that's not the commonly used flow, even by google cloud actual UI

The REST apis are converted to stubby internally, not gRPC which what makes it relatively disposable


The argument that gRPC is disposable because everything is stubby internally applies equally to REST. And I don't think anyone is arguing that REST is disposable.

I'm not sure what part of GCP you work in, but in my experience, the big customers are using gRPC, not REST.


It isn't.


(disclosure: ex-grpc-team here)

Indeed. I'm quite confident there's never been an RPC library with so many man-years invested in it. Last month was gRPConf and it appeared it was as staffed as ever, if not more, and Rust is being adopted as a new first class language too.


any videos from the event? I saw bunch of slides (pdf, ppt), but that's about it...


Pretty sure videos are recorded for YouTube. I know my own talk was. I expect them to be posted this week or next.


Thanks! Can't wait to watch them!


Originally, I was going to complain that this is more of a critique of the grpc ecosystem rather than protocol.

IMO, readability of generated code, is largely a non concern for the vast majority use cases. Additionally, if anything it's more so a criticism of the codegen tool. Same with the complaints around the http server used with go.

However, I totally agree with criticisms of the enum naming conventions. It's an abomination and super leaky. Made worse by the fact it's part of the official(?) style guide https://protobuf.dev/programming-guides/style/#enums


To be fair, the ecosystem is kind of inextricably tied to the protocol. I’m not aware of any other production grade Go gRPC implementations besides the official one.


But grpc isn't limited to go. Criticizing gprc, as a whole, for the http library used with go isn't valid. However, it's fair to take issue with the choice of http library used by the most popular go codegen tool.


Connect [1] is one and it's fantastic. The Go implementation in particular is much nicer than grpc-go.

[1] https://connectrpc.com/


Wow that’s awesome! I wasn’t aware of this.


> IMO, readability of generated code, is largely a non concern for the vast majority use cases

Completely disagree. More often than I'd like to, I've had to read generated code from various codegen tools in order to figure out what it was doing (in order to fix my usage of that code where I was making bad assumptions about the generated interface) or figure out why the generated code wasn't doing what it was supposed to (because it was was buggy). All code eventually needs to be read by someone, even code that's generated on the fly during the build process.


I read the generated code quite often and each time it boggles my mind who in the world came up with that crap. The readability and code quality is seriously awful and it is a valid criticism. When the generated code indeed is buggy, this a double whammer.

However, it is also true that a lot of devs don't read it or simply don't care so I would argue it is mostly a non-issue in practice contrary to what the author of the article suggest. My life is certainly not affected by ugly generated code.

Also, worth mentioning, when I wrote code generators in the past, albeit less complex, it's rarely the common case that makes the generated code ugly, but rather the coverage of a gazillion corner cases.

Can the generatee code be 2-4% faster? Sure. Is anyone updating the code generator for that? Well, if you feel the small gain is worth the pain of touching a fairly complicated generator that produces already monstrous code, patch it, test it, and fill a PR. Point is, none of the proto maintainer is moving a finger for 2% better.


In that case, I would imagine you would struggle with any clients generated via an IDL. The same "issue" occurs with openapi/swagger generated clients.

If you're not working on whatever is responsible for generating the code, you're not supposed to have to look under the hood. The whole purpose is to abstract away the implementation details, it's contract driven development. If you find yourself frequently needing to read the underlying code to figure out what's going on, the problem isn't with the tool, it's elsewhere.


>In that case, I would imagine you would struggle with any clients generated via an IDL. The same "issue" occurs with openapi/swagger generated clients.

Sometimes. Only sometimes. And that doesn't mean it's not a problem there either.

Abstractions that completely abstract what they wrap can claim "no need to look under the hood", but generated RPC code fails miserably there when something fails or changes, particularly during code generation (where you can't usually even get partial data / intermediate state, just "it crashed in this core loop that is executed tens of thousands of times, good luck").

And on this front, protobuf and gRPC are rather bad. Breaking changes are somewhat frequent, almost never have a breaking semver change so they surprise everyone, and for some languages (e.g. Go) they cannot coexist with the previous versions.

Figuring out what broke, why, and how to work around or fix or just live with it is made vastly more difficult by unreadable generated code.


> Even though it’s not meant to be hand-edited, this can impact code readability and maintainability

This kind of implies that the generated code is being checked into a repo.

While that works, it's not the way things are done at Google, where protobuf codegen happens at build time, and generated code is placed in a temporary build/dist directory.

Either way , you shouldn't need to do any maintenance on protobuf generated code, whether you add it to a repo or use it as a temporary build artifact.


The Go ecosystem (at least in the public) heavily encourages committing all generated code as go code is meant to be functional via a simple `go get`. Even a popular project like Kubernetes is full of generated protobufs committed in the codebase.


Next ecosystem totally forgot what proper makefile is... And they arguing it is innovative and pragmatic. Sad.


If you've ever seen a Makefile longer than 1000 lines you'd get it too


I agree, and that's how I do things, but I still think the readability of generated code is important. I want to know what generated code is doing, and more often than I'd like, I run into problems with it, and need to trace through it to find out what's going on. Sometimes this just makes it easier to submit a thorough bug report (or even a patch) to the maintainer of the codegen tool, but often by tracing through what's going wrong I can find a way to work around the problem. Readability is definitely something I look for in generated code.


> This kind of implies that the generated code is being checked into a repo.

I wouldn’t say that. I observe uncommitted generated code all the time. Sometimes I want to read the code just to understand how the heck something works. Or I step into it in a debugger.

I definitely believe that generated code should be pleasant to read.


Wow, what a nice article! Every point of it matches my experience (mostly positive) and buffrs [1] is a tool I wasn't aware of. Thanks for sharing this article!

[1]: https://github.com/helsing-ai/buffrs?tab=readme-ov-file


I 100% agree with the enum rules, the frustrating lack of required, but I do disagree with the “oh no FE couldn’t use it out of the box”.

It’s actually ok that not everything need accomodate every single environment and situation. I’d personally like to see some _more_ RPC mechanisms for service-to-service that don’t need to accommodate the lowest-common-denominator-browser-http-requirements, there’s plenty of situations where there’s no browser involved.


Some more bad parts related to protobuf:

- While nearly all fields are forced to be optional/nullable, lists and maps can only be empty, not null.

- No generics (leads to more repetition in some cases).

- Custom existing types are not supported. WKT thus require hardcoded codegen (this is a mistake IMO). It limits flexibility or requires much more manual code writing. For example, if I have codebase that uses Instant type (instead of DateTime from standard library) to represent UTC time, there is no build-in way to automate this mapping, even though it could equally well map to the same over-the-wire format as DateTime (which has a hardcoded support). If that kind of extensions would be supported, even specific cases like mapping a collection of timestamps to double delta encoded byte array over-the-wire could be supported. This wouldn't require any changes to the underlying over-the-wire format (just more flexible codegen).


The criticisms the author levies against Protobuf are unfair. Inside Google, all source code is in a monorepo, and depending on other Protobuf files is a matter of code-sharing it as a Bazel library; it is trivial. There is no need for a package management system because its existence would be irrelevant.


> Inside Google, all source code is in a monorepo

And outside Google, it isn't. It's fair to judge something that's been released to the public based on what the public can reasonably do.


Yea, but from the complexities of managing external facing libraries they do an extremely well job. As a former Googler I can jump into bazel, grpc projects or start my own relatively easily.

I tried making a few guides on a personal blog explaining how to use these tools but to be honest without seeing how they get used within Google it's relatively difficult to understand why they have some of the design decisions which may seem as clunky initially.


As a not-former-Googler, the last time I looked into Bazel I was confused and had no idea what I was looking at.

The world is much much much bigger than Google's internal tooling. Even Google's internal tooling that they've made public.


That is true. I am using Bazel daily with all deps vendored and proto defined as targets so there is really no need for these tools the author mentioned because with Bazel the underlying problems simply don't exist in the first place.

It's worth pointing out that it takes a bit of time and pain to gronk the underlying rational for some of the less obvious design decisions.

For example, Protos aren't versioned because you already version them with git. Releases usually some kind of hash, so you already have reliable dependencies with checksums. No point in versioning protos again. It's a mono repo, so why bother with distribution? Composition? Use proto lib target...

Without Bazel, though, your basically totally lost and then these tools kinda make sense as a way out of the pain caused the lacking tool support you will face.

That said, a lot of orgs have less then ideal IT for historical or whatever reasons so these problems are 100% real and these solutions exist for a reason.


> For example, Protos aren't versioned because you already version them with git. Releases usually some kind of hash, so you already have reliable dependencies with checksums.

Sorry, but this is such a stupid statement. You external service or client doesn't have access to your internal git hash

> That said, a lot of orgs have less then ideal IT for historical or whatever reasons

No. A lot of orgs don't require setting up Bazel to make simple things like generating code from protobufs work


> You external service or client doesn't have access to your internal git hash

Including the build commit is very straightforward - but more to the point, Google projects frequently treat the monorepo like a service to do things like fetch the latest config. Even deployments tend to be automated through the monorepo - lots of tools make automated change requests.


It's a stupid statement internally as well. Unless you can freeze time itself, deploying new versions of stuff isn't instantaneous. And what happens if you need to do a rollback of a component?


> And what happens if you need to do a rollback of a component?

You revert the offending commit. That triggers an automatic deploy (or, even more likely, your buggy change gets caught in the canary stage and automatically reverted for you).

The Google philosophy is called "live at head" and it has a bunch of advantages, provided your engineers are disciplined enough to follow through.


Until you run into things like "your partners deploy once every two months" or "the team's deploy is delayed by X due to unforseen changes downstream" or ...

Protobuf is built specifically for Google and Google's way of doing things. Not everyone is built like Google.


Well, the core problem is that you shouldn't be deploying as infrequently as every 2 months...you should spend engineering energy to fix that rather than on working around it.


Must be nice not to have to live in the real world.


Deploying every two months is not required to live in the real world.


Customers who expect interface stability for 12+ months is the real world.


You can deploy a service with a stable interface frequently...


Yep, well said.


If you think that answer makes even the remotest sense then you really don't know what you are talking about.


I worked at Google; I'm describing to you exactly how the infrastructure worked.

Yes, Google's internal infrastructure is that good. It's easy to get caught up in the "haha cancel products" meme and forget that this company does some serious engineering.


What Google does with its internal infrastructure is of limited applicability to the majority of people, where interface stability is the prime directive.


Interface stability and "living at head" are not mutually exclusive?


They are not. A good interface rarely ever needs to be changed; the point of live at head is that it's impossible to commit a breaking change as long as there's a test depending on the thing you broke.


If Google wants to push gRPC as a general, open, standardized solution -- which they most certainly want, and have done -- then they need to cater to how everyone does things, not to how Google does things.


In a sense, everyone does things because of how google does things. How does one decide whether the chicken or the egg should follow?


In what sense does "everyone do things because of how google does things"?


How is it unfair? Is protobuf an externally available tool or not?


Nonsense. The criticism are perfectly fair and realistic.

Bazel is virtually unusable outside of Google. If protobuf is only usable inside Google infra then the response would be “don’t use it if you aren’t Google”. And yet I somehow doubt that’s what you’d argue!


I've tried to use gRPC several times over the years, but the lack of front-end support just always kills it for me. It'd be such a killer feature to have all that gRPC offers plus support for JS (or an easy way to deploy grpc-web that doesn't have loads of gotchas), but every time I look I realise it's not going to work. I've been surprised how little that situation changed over the 5 years or so I was tracking the project. I don't even consider it any more.

Who wants to use one tech stack for microservices and an entirely different one for frontend. Better to just use the same one everywhere.


You should really check out ConnectRPC. Out of the box it supports gRPC, gRPC-Web and a much more reasonable protocol called Connect without an extra middleware to do translation between gRPC/gRPC-Web. Plus, their typescript support is very well supported and there is a library exposing a Tanstack Query wrapper.


There's a definite UX problem imo with having to manage protobuf synchronization between repos.

But the majority of these criticisms seem really superficial to me. Who cares that the API is inconsistent between languages? Some languages may be better suited for certain implementation styles to begin with.

Also regarding the reflection API I was under the impression that the codegenned protobuf code directly serializes and didn't reflect? Thrift worked that way so maybe I'm confused.


You need a separate package to actually serialize protobuf from a congen'd struct, so it uses reflection.


Haha the good old circular dependency problem. I am still surprised there are people in the industry that can’t recognise and solve these problems. It is such a classic pattern and its use arises in the wild all the time.


Whilst gRPC is nearly always used together with protobuf, I think it’s important to note they are different projects. gRPC is a CNCF project with open governance. gRPC people show up at industry conferences like KubeCon and run their own conference.

Protobuf is a Google-internal project, with opaque governance and no conference.

I find it striking that the one has such tight dependencies on the other. Indeed the article is mostly about protobuf.


gRPC actually doesn't care that much about the serialization method. It's straightforward to use JSON, MsgPack, BSON, CapnProto, and so on with gRPC instead of protobuf.


Right. Any examples of popular programs doing this?


IIRC Flatbuf uses gRPC. It's really not that different under the hood than any other HTTP based RPC system.


The remark about reflection for the Go implementation surprised me. I always treated the generated code as a black box, but occasionally saw their diffs, and assumed the byte blobs to be for clever generated marshalling. If not, what are they used for?


They are the protobuf descriptors, which is basically the protobuf encoded version of the protobuf files. They can be optional and many plugins have options to exclude that part from the generated code. Generally, they're useful for the Server Reflection feature of gRPC and it's used for runtime reflection.

Here's a more complete description of what they are and how they're used: https://buf.build/docs/reference/descriptors.


One of the reasons I'd disregard gRPC for front-end development is my belief that data exchange between front and back-ends should be driven by the front-end.

The UX drives the shape of data it needs from back-ends.

Any UI work I do is interactive and fast. Anything that hinders that, like code generation, is a drag. Adding a suitable new endpoint or evolving data shapes coming from existing endpoints happens instantaneously in my dev environment.

I value that flexibility over tightly controlled specs. I might eventually add types when I'm reasonably sure I got the data shapes right.


> my belief that data exchange between front and back-ends should be driven by the front-end.

Why, then, are you trying to design the 'data shapes' before the UX is old enough to drive? It seems you don't live by what you preach.

For what it is worth, I happen to share in your belief. gRPC couldn't possibly get in the way as you're not going anywhere near that layer of abstraction while the UX is being developed. By the time you're at the point of making network calls the data model is fully solidified. There is no need for continued iteration once you've gone that far.

The biggest reason to disregard gRPC on the frontend, though, is that frontends are often found on slow, high latency networks where call overhead starts to bite you. Moving beyond simple applications, you really need some kind batching system rather than the RPC model in order to diminish the overhead cost. Of course, you can bend gRPC into some kind of batching system if you try hard enough – it is all just 1s and 0s at the end of the day, but there are other solutions more in tune to that particular problem.


>They eventually added a ServeHTTP() interface to grpc-go as an experimental way to use the HTTP server from the Go standard library but using that method results in a significant loss of performance.

I don't quite see how that is possible. ServeHTTP() seems like a most general HTTP interface. How could implementing that interface for a protocol built on top of HTTP result in a performance degradation!? If that is indeed the case, that seems like it would imply a flaw in go's standard http library, not grpc.


> I don't quite see how that is possible.

1. gRPC brings its own HTTP implementation. There was no other implementation to lean on when the project started. It may be optimized in ways that the standard library's implementation is not.

2. gRPC's implementation has features not available in the standard library's implementation. These features can improve performance under certain conditions.

Keep in mind that "significant" often gets overstated in computing. Many will tell you that cgo call overhead is significant, and relatively speaking it is, but we're only talking mere nanoseconds. With extremely precise measuring tools you can, indeed, observe it, but it's so small that it is impossible for a human to notice and thus doesn't actually matter in practice. It is not like your gRPC endpoint is suddenly going to start taking minutes to respond if you switch to using ServeHTTP.

> that seems like it would imply a flaw in go's standard http library, not grpc

If a pickup truck not being able to drive as fast as a race car implies a flaw with pickup trucks, sure. Most would simply consider them different tools for different jobs, though. The standard library's HTTP implementation tries to be reasonably suitable for a wide range of tasks, while gRPC's implementation only has to serve gRPC. As such gRPC can carefully tune its package to the needs of gRPC without concern for how it might affect someone serving, say, HTML. The standard library doesn't have the same luxury.


> The generated code isn’t even that fast

This is the most important part, the codegen products for many languages are pretty slow.


The one useful purpose protobuf serves: if you run across someone who is enthusiastic about it, then you know to never trust anything that person says.


Experienced people use complex solutions because they know it's the only way they can get stuff to work at the scale they need.

Inexperienced people use complex solutions because they think if the experienced people use it, it must be the right thing to do, no matter what the scale is.

Always use the simplest thing that will work.


I think protobuf solves some real problems but introduces new ones (especially when coupled w/ gRPC). It's not a silver bullet, but sometimes it really brings benefits (e.g: not having to re-invent the wheel between the backend and frontend, in terms of types)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: