
Protobuffers Are Wrong - based2
http://reasonablypolymorphic.com/blog/protos-are-wrong/index.html
======
alecbenzer
This was like 2 or 3 sentences worth of good points sprinkled in between a
bunch of stuff that I'm pretty sure isn't an actual problem for anyone. If
anything, the things pointed out are issues for people trying to write code
that manipulates protos generically, which is not what most people spend their
time writing and is probably exactly the wrong thing to optimize.

The main good point: Google's problems are probably not your problems, don't
just blindly adopt Google tech for no reason.

Also: calling people amateurs without really substantiating is a huge smell
IMO. The average Google engineer isn't a genius or particularly amazing by any
stretch, but especially for something as core/foundational as protobuf, the
answer is much more likely something like "these decisions made more sense for
Google internally, especially when weighing against the cost of significantly
re-architecting how proto works". The ad-hominem at the beginning reeks of
someone who had an email chain that went like:

"You guys are doing proto wrong, don't you realize protos should obviously be
like XYZ?"

"Well actually we'd like to do X but it would've been too hard, I'm not
actually sure Y is a net-win, ..."

(omg what amateurs...)

~~~
kentonv
> calling people amateurs without really substantiating is a huge smell IMO.
> The average Google engineer isn't a genius or particularly amazing by any
> stretch

Note that Jeff Dean and Sanjay Ghemawat -- the original creators of Protobuf
-- aren't average Google engineers. They are literally the highest-ranked
engineers at Google (Level 11, "Senior Fellow", a title assigned only to the
two of them last I heard), and they basically invented MapReduce, BigTable,
Spanner, and a variety of other foundational distributed systems technologies.
Jeff now leads the AI division while Sanjay continues to focus on systems
infrastructure.

So yeah, "amateurs".

(Disclosure: I wrote Protobuf v2, but it was just a fresh implementation of
the same design.)

~~~
coldtea
Sure, but you didn't address any of the points in TFA.

Whether Dean and Ghemawat are high ranking or not, do the points stand? Is the
design of protobuf solid?

~~~
kentonv
See:
[https://news.ycombinator.com/item?id=21873418](https://news.ycombinator.com/item?id=21873418)

------
mattnewton
The author lost me at “make all fields required” for some nice type system
properties.

required was a mistake, in my opinion and in the proto 3 spec’s opinion. Capn’
proto has a nice write up here too With essentially the same points I would
make but written better: [https://capnproto.org/faq.html#how-do-i-make-a-
field-require...](https://capnproto.org/faq.html#how-do-i-make-a-field-
required-like-in-protocol-buffers)

I think protos might just be being used for the wrong thing in the author’s
example. You shouldn’t replace your application’s data structures with protos
everywhere, in my experience protobufs are for when you want to serialize and
you write a bunch of backwards compatible serialization code by hand. This
code is hard to generate because it encapsulates all the changing requirements
needed to work across different versions, so the lack of general type system
tools doesn’t really offer opportunity to cut down on the schlep. If you don’t
have these problems, don’t think you will have these problems, evaluate
whether the tech is right for you. I’ve worked on projects before at Google
that have made this mistake and threw away the nice data model expressable in
a language to use proto interfaces where there was no need for serialization.
I don’t think the solution is to have protos expanded to be comparable to that
in every language.

Disclaimer: Googler who is forced to use a lot of protos, my opinions are my
own and and I didn’t design or ever work on them directly. Probably also just
an amateur :D

~~~
seriesf
Yes but proto 3 was also a mistake. Throwing away presence entirely was wrong
and not preserving unknowns was also wrong. Proto 2 forever, imho.

~~~
rcfox
It can be approximated with a single-item `oneof` field. It's ugly and
boilerplatey, but at least it's binary-compatible with proto2 and gives the
original behaviour.

My main problem with proto2 these days is that I needed to interface with some
C# code, and there is no proto2 library for C#!

~~~
kentonv
> there is no proto2 library for C#!

Looks like that's getting fixed!

[https://github.com/protocolbuffers/protobuf/blob/master/docs...](https://github.com/protocolbuffers/protobuf/blob/master/docs/csharp/proto2.md)

[https://github.com/protocolbuffers/protobuf/pulls?q=is%3Apr+...](https://github.com/protocolbuffers/protobuf/pulls?q=is%3Apr+author%3AObsidianMinor)

------
tlarkworthy
This is the wrong argument. Who cares about the type system of a binary
packaging format? The joy is how these messages can be used as rows in storage
systems as well as RPC. Complicating the type system limits the domain
applicability and increases the porting cost. No.

Protobuffers are shit coz they don't support zero copy and you have to
deserialize the whole thing even if you are interested in one field or an
outer envelope, causing memory churn in your JVMs. Cap'n'proto and flat
buffers attack this real problem. The expressivity of the type system is a
minor issue, hence no credible competition.

Note grpc abandoned required fields! Nothing is required over a decade,
backward compatibility is important! Required should be enforced at
application later not the binary packing layer. It is a property of the
version of the code processing the blob, not the blob representation itself.

Ex googler and equally happy openAPI spec user.

~~~
doublement
This is going to sound sarcastic but it's not: Can we get back to just putting
the members of C structures into network byte order and sending that over the
wire in binary, à la 1995?

~~~
BubRoss
Why even put them in network byte order? Every modern system is little endian,
if you standardize on that, only exotic systems would have to deserialize
anything.

~~~
doublement
Because when someone builds a hugely popular exotic system in the future,
because it is one (1) cent cheaper, you'd end up with code that has to check
to see if it's running on such a system.

~~~
BubRoss
This doesn't make any sense for multiple reasons, but especially because you
wouldn't be checking anything in the first place. A big endian system would
would reorder bytes and a little endian system would just use it directly from
memory without another copy or reordering anything.

------
ves
I agree with a lot of this post, although the tone isn’t great. The problems
we ran into with protobufs at my job include:

1\. The schema evolution claims don’t really hold water for our systems.

2\. The type system isn’t very expressive (e.g. no generics means you have to
write the same error wrapper for all your endpoints) and lots of our devs
found it unintuitive, especially oneofs.

3\. The “default value”/nullable field feature turns out to be a recipe for
postmortems and data quality degradation. Making everything nullable isn’t
good.

4\. The python library doesn’t have mypy typing and the generated objects
aren’t... super pythonic.

I (along with some colleagues) built a library to paper over protobuf and
address these issues. Notably, it includes a very well-specified algorithm to
automatically assign version numbers to schemas during development, as well as
provide operational instructions to avoid bumping a version without causing
downtime if possible. And all the codegenned models have mypy types!

You can read more about it here:

[https://tech.affirm.com/defining-data-models-with-
idol-a3109...](https://tech.affirm.com/defining-data-models-with-
idol-a31093cd0707)

It’s so far turned out really really well for us.

In particular, “schema evolution” is a property of a particular distributed
system and there aren’t universally safe rules; schemas for historical machine
learning datasets and rpc services, say, have to evolve differently cos the
data flow is different. Also, there’s no version bumping algorithm built in,
and nullable/optional fields are a pain to program against for data scientists
and client devs alike.

~~~
GeneralMayhem
re: (3) - nullability is more or less required for backwards compatibility. If
you have existing data and add a new field going forward, your options are to
make the old data invalid until you backfill, or give your code a way to
detect "this field doesn't exist" and deal with it accordingly.

~~~
ves
I opted to go for “pinning” based on the version number, so if you make a
breaking change, like adding a required field, IDOL copies your schema into a
v2 (say) namespace and then applies the change, leaving v1 untouched.

At this point we just have separate types for separate versions and tools in
the host language can help you deal with that.

This turns out to be much better for data quality and client code than adding
lots of nullable fields, at the cost of making breaking changes to APIs a bit
more work. It seems to have been worth it so far.

Going forward, the service author has to support the “old” versions until we
can determine that there’s no old data sitting around (so all clients are on
the new version, all serialized data has been backfilled or dropped, or
whatever’s appropriate), at which point they can delete the old schema. And we
have some simple tools to verify this, since we stick the version number onto
the models / serialized data.

------
skybrian
A good research paper would first explain what the protobuffer design goals
are before explaining why they are misguided, inapplicable, or aren't
achieved. But I guess this is just a blog post.

As it is, it's unclear whether the author of the blog post even understood the
reasons behind protobuffer design decisions.

~~~
Ozzie_osman
100%. But it's easier to just say "it was designed by amateurs" than actually
explore why design decisions were made in a particular way, so _shrug_.

------
hardwaresofton
I finally feel safe to suggest that I think the cargo-culting of gRPC on to
projects these days is also wrong. One of the best (and to be fair, worst)
parts about http is it's flexibility, and it's like people just completely
skipped over `Content-Type` and other simple options.

Throwing out standards-compliant HTTP (whether 1,2 or 3) with the bathwater
that is JSON decoding was a mistake. JSON + jsonschema + swagger/hyperschema
should be good enough for most projects, and for those where it isn't good
enough, swap out the content type (but keep the right annotations) and call it
a day! Use Avro, use capnproto, use whatever without tying yourself into the
grpc ecosystem.

Maybe gRPC's biggest contribution is the more polished and coherent tooling --
in combining three solutions (schema enforcement, binary representation and
convenient client/server code generation), they've created something that's
just _easier_ to use. I personally would have preferred this effort to go
towards tools that work on standards-compliant HTTP1/2/3.

~~~
lkramer
I'm not necessarily saying gRPC is the solution to everything, but I don't see
why HTTP is so great? It's a protocol for transferring, primarily text over
networks. Most backend systems operates in binary, so serializing binary data
into a text format seems to be unnecessary overhead.

~~~
rapsey
It is great because it has quality implementations in every language. Much
like protobuffs.

~~~
hinkley
HTTP also has a vast range of proxies, transport encodings, cryptographic
layers, solutions for client/server clock skew, tracing and a whole bunch of
other things like rerouting and aliasing baked in.

------
izacus
I'm a bit confused about the type system rant - and someone correct me if I'm
wrong: The whole point of protobuffs is that they're easily usable in multiple
programming languages, so it seems to me that they kinda have to end up being
the smallest common subset of typing features. If you try to do it strongly,
they'll be hard to use in some languages (e.g. Java, the favorite beating
horse of the OP and other language purists) or they'd have to restrict the
amount of programming language targets.

Where am I wrong?

~~~
ves
You could have the type model mentioned in this post in nearly all languages
as well.

~~~
izacus
Yes, you could and that would certanly improve things (I'm not a fan of these
restrictions either).

But you'd still have a rather Java-ish type system right?

------
colanderman
Don't forget also the Protobuf C++ compiler's failure to properly namespace
user-level identifiers vs. library-level identifiers.

For example, if your Protobuf has both a "foo" and "has_foo" field (which is
perfectly legal by the Protobuf language definition! and works fine with e.g.
the Python binding!), you will get a C++ compiler error due to a "has_foo()"
method being generated on behalf of both "foo" and "has_foo".

This naming clash could have been avoided simply by prepending _all_ generated
method names with a defined prefix, but the implementors either didn't
recognize this issue, or chose not to do anything about it.

(Everything else in the article rings true for me. I've been hoping years for
someone to write this article.)

~~~
kentonv
Yes, we were very much aware of this, and chose not to do anything about it.

This problem almost never manifests in practice. The issue is raised all the
time, but it's basically always observed only as a _theoretical_ problem (by
someone who invariably thinks they are sooooo smart for discovering it), not
as a real problem preventing compilation of a real schema.

Prepending all generated method names with a prefix would be a rather extreme
solution that no one would like. Have you ever tried to read libstdc++'s STL
implementation, where absolutely everything is prefixed with __? It's really
quite awful. I wouldn't want to use a serialization framework that did that.

The right solution, in my opinion, is to provide annotations that allow the
developer to rename a particular field for the purpose of a particular target
language, so that e.g. you can say that "has_foo" should be renamed to
"has_foo_" (or whatever) in C++ generated code. Yeah, it's an ugly hack, but
it gets the job done.

I can't remember if this ever got implemented in Protobuf, because, again,
it's almost never actually needed. Cap'n Proto does have such annotations,
though.

(Disclosure: I'm the author of Protobuf v2 and Cap'n Proto.)

~~~
colanderman
> (by someone who invariably thinks they are sooooo smart for discovering it),
> not as a real problem preventing compilation of a real schema

That's a pretty dismissive view of your users. This has actually bitten me in
practice, so consider their foresight vindicated.

(Notably, it was actually the inability to easily distinguish between a
missing and empty array, which caused us to resort to using "has_foo" fields,
only later to hit the issue with the C++ compiler.)

If you dismiss this as a valid concern, how can I be confident that there are
not other similar issues you simply dismissed as unimportant?

Say what you will about STL, but the level of attention to detail there
assures me that I'm not likely to get bitten by some weird issue the
developers chose to turn a blind eye to.

> you can say that "has_foo" should be renamed to "has_foo_" (or whatever) in
> C++ generated code.

This is fine, even if it's a transformation predetermined by the language.

~~~
kentonv
Sorry for the snark. This issue is a sore spot for me because so many people
have reported it without having actually been affected by it, and because they
tend to assume the designers were stupidly unaware of the issue, rather than
that the issue is actually rather hard to solve in a satisfying way.

However, if you actually were affected by it, then you are right to be annoyed
by it.

The particular case where someone developed a protocol mostly in one language
and then later on started targeting a new language is indeed a case that I do
worry about. The idea of language-specific annotations defining language-
specific renames was designed for that use case.

I haven't worked on Protobuf in almost a decade, but Cap'n Proto does address
this issue as I said -- without making everyone's code horribly ugly.

------
jonbronson
"The solution is as follows:

Make all fields in a message required. This makes messages product types."

Except it also breaks backwards compatibility, one of the most powerful and
sought-after features of protobufs.

~~~
naasking
> Except it also breaks backwards compatibility, one of the most powerful and
> sought-after features of protobufs.

It doesn't have to. Just add row types to handle unknown content, ie. if an
intermediary knows only of fields foo and bar, then they can process any data
with such fields if given a type like "type SomeRecord = { foo : int, bar :
string | r }", where 'r' represents the remainder of the record.

The article's criticisms are valid and there are typed solutions to most of
the objections that have been raised against it.

~~~
SpicyLemonZest
I'm not sure that's simple enough to be a "just", but in any case the primary
problem is the other direction. If I add `required baz: int` to my service's
definition of a protobuf, all protobufs that have ever been generated before
become invalid because they don't contain a value for baz.

~~~
naasking
That fact doesn't change if you eschew types. Backward-compatible schema
evolution has rules.

~~~
SpicyLemonZest
Right, that's the point. The article's suggestion to "make all fields in a
message required" fundamentally misunderstands the issues at hand, because no
matter how appealing it is from a type theory perspective, following that
suggestion would make it impossible to ever add a field in a backwards
compatible manner.

~~~
naasking
> The article's suggestion to "make all fields in a message required"
> fundamentally misunderstands the issues at hand, because no matter how
> appealing it is from a type theory perspective, following that suggestion
> would make it impossible to ever add a field in a backwards compatible
> manner.

You absolutely could in multiple ways:

1\. You make every accepted product type have a row type at your service
interface if you expect schema evolution.

2\. If you have to add a field unexpectedly, ie. where you did not have a row
type, then you must deprecate the old API. If this seems onerous to you, then
your service infrastructure is probably insufficiently flexible.

~~~
SpicyLemonZest
Option 1 seems like it defeats the point. If you're going to declare a field
with a more permissive type than currently allowed, aren't you just hacking
weak types back into your strong type system?

Option 2... look. I've seen a lot of API deprecations, across multiple teams
in multiple companies, and every one of them was very onerous in ways that had
little to do with the service infrastructure. If you've done easy API
deprecations, more power to you, but I don't think your experience is
representative.

------
Ozzie_osman
Unfortunately I was turned off by the angry and obnoxious tone. Seems to be
getting more common to get traction on HN homepage. But yeah, even though
author makes some good points, the argument loses effectiveness in my book
because of things like calling people amateurs.

~~~
mikestew
The angry, pissed-off coder rant is occasionally pulled off well, but in
general it grew tiresome for me fifteen years ago. Not everyone is Hunter S
Thompson (well, no one is, now), and not every technical annoyance is the
Kentucky Derby and thus worthy of such treatment.

To this day, I’ll still forgive a well-crafted MongoDB rant, though.

------
evmar
Previous discussion, leading with a comment from one of the protobuf authors:
[https://news.ycombinator.com/item?id=18188519](https://news.ycombinator.com/item?id=18188519)

~~~
finnthehuman
The way dweis never responded to batmansmk was disappointing to say the least.

kentonv was willing to engage on the points and came out much more reasonable
in the whole thing.

~~~
joshuamorton
dweis isn't a designer, so he'd be the wrong person to answer those things.

Sanjay, Jeff, and Kenton are probably the three best to answer such questions.

Presumably the top few concerns for protos are wire performance (decode/encode
speed and cost, wire size), compatibility for changes (what this suggestion
just totally breaks), and cross language usability.

Some other tradeoffs might be non-wire perf (I believe protos beat flatbuffers
here, at the cost of worse on wire perf), but it's not clear that that was
intentional.

------
kentonv
I guess I'll copy/paste the comment I made last time this was posted:
[https://news.ycombinator.com/item?id=18190005](https://news.ycombinator.com/item?id=18190005)

\--------

Hello. I didn't invent Protocol Buffers, but I did write version 2 and was
responsible for open sourcing it. I believe I am the author of the "manifesto"
entitled "required considered harmful" mentioned in the footnote. Note that I
mostly haven't touched Protobufs since I left Google in early 2013, but I have
created Cap'n Proto since then, which I imagine this guy would criticize in
similar ways.

This article appears to be written by a programming language design theorist
who, unfortunately, does not understand (or, perhaps, does not value)
practical software engineering. Type theory is a lot of fun to think about,
but being simple and elegant from a type theory perspective does not
necessarily translate to real value in real systems. Protobuf has undoubtedly,
empirically proven its real value in real systems, despite its admittedly
large number of warts.

The main thing that the author of this article does not seem to understand --
and, indeed, many PL theorists seem to miss -- is that the main challenge in
real-world software engineering is not writing code but changing code once it
is written and deployed. In general, type systems can be both helpful and
harmful when it comes to changing code -- type systems are invaluable for
detecting problems introduced by a change, but an overly-rigid type system can
be a hindrance if it means common types of changes are difficult to make.

This is especially true when it comes to protocols, because in a distributed
system, you cannot update both sides of a protocol simultaneously. I have
found that type theorists tend to promote "version negotiation" schemes where
the two sides agree on one rigid protocol to follow, but this is extremely
painful in practice: you end up needing to maintain parallel code paths,
leading to ugly and hard-to-test code. Inevitably, developers are pushed
towards hacks in order to avoid protocol changes, which makes things worse.

I don't have time to address all the author's points, so let me choose a few
that I think are representative of the misunderstanding.

> _Make all fields in a message required. This makes messages product types._

> _Promote oneof fields to instead be standalone data types. These are
> coproduct types._

This seems to miss the point of optional fields. Optional fields are not
primarily about nullability but about compatibility. Protobuf's single most
important feature is the ability to add new fields over time while maintaining
compatibility. This has proven -- in real practice, not in theory -- to be an
extremely powerful way to allow protocol evolution. It allows developers to
build new features with minimal work.

Real-world practice has also shown that quite often, fields that originally
seemed to be "required" turn out to be optional over time, hence the "required
considered harmful" manifesto. In practice, you want to declare all fields
optional to give yourself maximum flexibility for change.

The author dismisses this later on:

> _What protobuffers are is permissive. They manage to not shit the bed when
> receiving messages from the past or from the future because they make
> absolutely no promises about what your data will look like. Everything is
> optional! But if you need it anyway, protobuffers will happily cook up and
> serve you something that typechecks, regardless of whether or not it 's
> meaningful._

In real world practice, the permissiveness of Protocol Buffers has proven to
be a powerful way to allow for protocols to change over time.

Maybe there's an amazing type system idea out there that would be even better,
but I don't know what it is. Certainly the usual proposals I see seem like
steps backwards. I'd love to be proven wrong, but not on the basis of
perceived elegance and simplicity, but rather in real-world use.

> _oneof fields can 't be repeated._

(background: A "oneof" is essentially a tagged union -- a "sum type" for type
theorists. A "repeated field" is an array.)

Two things:

1\. It's that way because the "oneof" pattern long-predates the "oneof"
language construct. A "oneof" is actually syntax sugar for a bunch of
"optional" fields where exactly one is expected to be filled in. Lots of
protocols used this pattern before I added "oneof" to the language, and I
wanted those protocols to be able to upgrade to the new construct without
breaking compatibility.

You might argue that this is a side-effect of a system evolving over time
rather than being designed, and you'd be right. However, there is no such
thing as a successful system which was designed perfectly upfront. All
successful systems become successful by evolving, and thus you will always see
this kind of wart in anything that works well. You should want a system that
thinks about its existing users when creating new features, because once you
adopt it, you'll be an existing user.

2\. You actually _do not want_ a oneof field to be repeated!

Here's the problem: Say you have your repeated "oneof" representing an array
of values where each value can be one of 10 different types. For a concrete
example, let's say you're writing a parser and they represent tokens (number,
identifier, string, operator, etc.).

Now, at some point later on, you realize there's some additional piece of data
you want to attach to every element. In our example, it could be that you now
want to record the original source location (line and column number) where the
token appeared.

How do you make this change without breaking compatibility? Now you wish that
you had defined your array as an array of messages, each containing a oneof,
so that you could add a new field to that message. But because you didn't,
you're probably stuck creating a parallel array to store your new field. That
sucks.

In every single case where you might want a repeated oneof, you always want to
wrap it in a message (product type), and then repeat that. That's exactly what
you can do with the existing design.

The author's complaints about several other features have similar stories.

> _One possible argument here is that protobuffers will hold onto any
> information present in a message that they don 't understand. In principle
> this means that it's nondestructive to route a message through an
> intermediary that doesn't understand this version of its schema. Surely
> that's a win, isn't it?_

> _Granted, on paper it 's a cool feature. But I've never once seen an
> application that will actually preserve that property._

OK, well, I've worked on _lots_ of systems -- across three different companies
-- where this feature is essential.

~~~
xyzzyz
> But I've never once seen an application that will actually preserve that
> property.

I wonder if author uses Chrome, which depends heavily on this in its Sync
feature.

~~~
akalin
When I worked on Chrome Sync, we spent some time making sure that unknown
fields were preserved properly. Glad to see that someone noticed, cheers!

~~~
xyzzyz
I did notice that when I was an owner of protobuf in Chromium :) Custom
patches to support unknown field preservation in lite mode sure brought me
some hassle when updating to version 3 of the library.

------
tgsovlerkhgsel
"Make all fields in a message required" would defeat one of the main benefits
of protobufs: The ability to retroactively add/remove fields while still
keeping the message compatible with implementations using the previous version
of the proto definition.

The other issues (e.g. that you cannot make a repeated oneof) are annoying,
but many of them are consequences of upgrading the "language" (if you want to
call it that) without introducing incompatibilities and/or changing the wire
format. Having a new, incompatible version would likely be a lot more
annoying. Simply not having these features at all and having to write your own
ugly hack as a workaround would definitely be a lot more annoying.

------
traverseda
I'd be interested to hear their thoughts on capnproto.

~~~
kentonv
I would expect he has the same issues with Cap'n Proto. Aside from some
aesthetic cleanups, Cap'n Proto's type system is extremely similar to Protobuf
-- because, frankly, Protobuf got that part right. Cap'n Proto's main
difference from Protobuf is the encoding, which it doesn't seem like this guy
cares too much about.

(I'm the author of Cap'n Proto, and Protobuf v2, though I did not design
Protobuf's type system.)

------
daxorid
The author isn't wrong about protobuf's shortcomings, but to say:

 _and solve a problem that nobody but Google really has_

Is pretty absurd. There are plenty of projects that serialize a LOT of data
between different runtimes/platforms (e.g. Go and Java) such that built-in
serialization is not possible and JSON/XML is 3-10 times slower.

------
wellpast
> The dynamic typing guys complain about it being too stifling, while the
> static typing guys like me complain about it being too stifling without
> giving you any of the things you actually want in a type-system. Lose lose.

Type system purists are blinded by their commitment to purity. All context is
thrown out the window — it’s purism or bust.

The absurdity here is profound; it’s “Lose Lose” unless you go all typing or
none.

And yet I completely understand the lament here. I think what the (smarter)
type purists realize is that if they lose the purism position, static types do
become much less of a tyrant tool and more like any other tool in our toolkit:
a nominally useful one to be applied __judiciously __.

Then they’d have to turn their attention to the unforgivingly dynamic outside
world and market.

------
based2
[https://www.reddit.com/r/programming/comments/eezvhp/protobu...](https://www.reddit.com/r/programming/comments/eezvhp/protobuffers_are_wrong/)

------
ronnier
> Fields with scalar types are always present. Even if you don’t set them. Did
> I mention that (at least in proto31) all protobuffers can be zero-
> initialized with absolutely no data in them? Scalar fields get false-y
> values—uint32 is initialized to 0 for example, and string is initialized as
> "".

> It’s impossible to differentiate a field that was missing in a protobuffer
> from one that was assigned to the default value. Presumably this decision is
> in place in order to allow for an optimization of not needing to send
> default scalar values over the wire.

I believe there’s a trick you can do if you mark it as a “oneof” with only one
field.

~~~
humbledrone
> It’s impossible to differentiate a field that was missing in a protobuffer
> from one that was assigned to the default value. Presumably this decision is
> in place in order to allow for an optimization of not needing to send
> default scalar values over the wire.

Isn't this just flat incorrect? You can tell the difference between set-to-
default and not-set with buffer.has_some_field().

~~~
ronnier
Not for things like ints and strings.

~~~
kentonv
It depends on which version you're using.

In proto1 and proto2, every field had a "has" method and "explicitly set to
default value" was different from "absent".

In proto3, they tried to remove this feature, and instead said that for basic
types, "set to default" and "absent" are the same thing.

(I wrote proto2. I left Google before proto3 came about.)

~~~
humbledrone
Thanks. I must be only used to proto2.

------
altmind
Its 2019 and protobuf js compiler still only support commonjs modules and
google-developed closure imports. No AMD/UMD and no ES6 modules.

How are we supposed to use it in a browser environment if we are not using
browserify or webpack?

------
an_d_rew
The sad thing is that, rather than forward this to the small "decision team"
at work, where we can ponder the merits of the author's points...

... I'm going to just close my browser tab due to the puerile ranting at the
beginning (and sprinkled throughout). A few good points, and perhaps a great
basis for "proto4" or whatnot, but that my "OMG they're so dumb" ranting?

If that was a peer-reviewed paper, I'd have rejected it after reading the
first paragraph, if I even made it that far. That's just not how you make a
technical argument or win people over.

------
choppaface
One important thing missing from the current criticism is Protobuf’s lack of a
facility for serializing a sequence of messages to a file. There’s RecordIO
internally at Google, yet they markedly declined to open-source the C++ lib
for it. There’s hints of it in Protobuf Java and then Amazon has open-sourced
their own implementation of it with the same name.

Lack of public RecordIO is partially to blame for creation of TFRecords, which
are in many ways inferior to (for example) tar archives of string-serialized
protobuf messages. (tar supports index, streaming, compression, etc).

~~~
dekhn
I requested that the RecordIO format (bytes on the disk and codfe
implementation) be opensourced (for ease of interoperability between Google
datasets and open source/scientific work). It wasn't because there were some
'flaws' in the design, but it was pointed out that leveldb open sourced a
format very similar to it (but which never got used outside of leveldb).

------
stabbles
I only have experience with flatbuffers in C++ (it seemed easier to be
integrated in a project back then). Can anyone comment on the pros and cons of
flatbuffers vs protobuffers?

~~~
seriesf
I worked on trying to make flatbuffers work at google and it just never was as
fast as proto2/c++. I guess the author of this piece would describe me as an
amateur because like the authors of protocol buffers I only have about thirty
years of industry experience. AMA.

~~~
kentonv
I'd be really interested in hearing why it wasn't faster! I expect the answer
is along the lines of: "Well theoretically the zero-copy design should be
faster, but in practice factors X and Y dominate performance and Protobuf wins
on those." I'd love to know exactly what X and Y are...

(I'm the original author of proto2/c++, but I'm mostly interested for any
lessons that might apply to my work on Cap'n Proto...)

~~~
seriesf
The C++ proto implementation is just already tuned to an absurd degree and it
is hard to beat. Any place where copying was an important problem has already
been eradicated with aliasing (ctype declarations) so flatbuffers' supposed
advantage isn't there to begin with. It's much more important to eliminate
branches, stores, and other costs in generated code.

~~~
kentonv
I'm guessing you were trying to use it with Stubby?

Admittedly the networked-RPC use case is not a particularly compelling one for
zero-copy (the mmaped-file case is much more so, and maybe even shared-memory
RPC).

Still, I'd expect that not having to parse varints nor allocate memory would
count for something. Wish I could see the test setup.

------
acvny
"but unfortunately, literally nobody considers Java to have a well-designed
type-system". What? That is mildly put a lie.

~~~
toolslive
Indeed. I imagine some people do think Java has a well-designed type system.
However, you probably don't consider those people to be authorities on the
subject.

------
isopede
What would be an appropriate replacement for embedded systems? I've looked at
the "tiny" versions of protobuf (nanopb, etc), but haven't tried them yet.

Are protobuf competitors (flatbuffers, capnproto) appropriate for small
embedded systems (microntrollers, mostly <64K RAM).

~~~
kentonv
I think an implementation of Cap'n Proto that's actually optimized for
embedded systems would likely be smaller than any implementation of Protobuf
could be. However, I'd have to admit that the current Cap'n Proto C++ library
is not so optimized.

Here's a GitHub comment where I outlined what we'd need to do to fix that,
FWIW:
[https://github.com/capnproto/capnproto/issues/844#issuecomme...](https://github.com/capnproto/capnproto/issues/844#issuecomment-504641785)

Caveat: I don't have any personal experience with embedded systems.

------
rhacker
My main problem with protobuf isn't the actual serialization or the proto
files. It's the use case. They actually pitted this up against REST. REST is
slowly going out of favor, so of course it makes sense to start gap filling.
But when we look at two major competing technologies: GraphQL or Protobuf to
fill that "I don't want to use REST anymore feeling" GraphQL actually solved
something useful and pushed the notch forward. Protobuf really just said,
hmmm, let's put TONS of constraints down on top of REST to make it more
reliable and faster. Basically Swagger 4.0 maybe?

I keep seeing people saying things like well, protobuf can be used to make
your GraphQL faster, etc... So you're actually trying to argue for "some"
usefulness for protobuf for someone that made it to the next level. That might
last, what, 1 month? The only thing that should be responsible for adding a
binary encapsulation format would be something built into Http specs, not some
kind of custom Rest->graphql->protobuf stack.

/rant over

------
edem
It is either Protobuf or Protocol Buffers, not Protobuffers. This kinda upsets
my OCD.

------
shadowgovt
It's an interesting article. I was hoping for some alternative suggestions,
because proto is "just good enough" at structure and wire to become the one
tool a project will reach for so it doesn't need two tools.

------
jamesu
I've been working on a project which requires writing ~40 different packet
types in a custom protocol, but always thought something like protobuf would
be a great fit for standardizing the packet serialization routines.

~~~
atombender
You could look at ASN.1, which was pretty much created for that purpose.

------
bborud
No, protobuffers are pretty good exactly because they don't try to solve
everything. Their lack of expressiveness is actually a very good thing when
designing communication between processes. Narrow is good.

------
jayd16
Why doesn't he fork the project and crank out a few patches?

When you make a rant like this and don't actually solve it or offer an
alternative you just come off as a jerk.

The solution offered is to write every field and also an isSet bit... Wouldn't
this balloon message sizes, throwing away the major reason to use protobuff?

------
jchw
If I opened my comment with personal attacks on the authors competency I hope
people would downvote me. This has 137 points right now and I don’t even think
it makes much sense; it sounds as though they stopped short of understanding
the reasoning behind many of the limitations and just assume they are
mistakes, when I’d argue it makes a ton of sense from a PoV of how protos
work.

Like why can’t you repeat a oneof? Imo because it stops acting like a oneof. A
oneof is treated like a union in the generated code, and you can expect that
only one of the oneof message tags will appear in binary for that message. If
you want a repeated oneof, it’s actually no different than if you had all of
the fields be repeated and outside of a oneof. It gives a different interface
in generated code but it’s the exact same thing you’d want in the underlying
proto binaries: multiple of whichever message tag. The distinction of oneof is
not useful here.

I think the proto design is quite smart, OTOH. Like the format is designed to
allow backwards and forwards compatibility provided you follow some rules that
you can easily enforce via linting.

Yes, there are some slightly odd side effects. Many things in proto are
special cased. Like Map can’t be repeated because Map is already repeated;
maps are sugar for repeated pairs, and you can’t have a repeated repeated
field. You can of course just make a quick submessage with a map and repeat
that. It doesn’t seem like that big of a deal.

OTOH, for how simple protobufs wire format is, the sugar features like Map
help make it feel a bit richer from the PoV of the generated code, whereas the
simple wire format makes it more predictable, easier to understand under the
hood, and helps to future proof for new features.

Seriously, binary protobuf is so simple _anyone_ can parse it trivially. It’s
just a flat sequence of pairs, of a message tag and the corresponding data,
with 5 different wire types that do not specify any typing but instead only
how to interpret the wire data (IIRC: variable length w length prefix, 4 byte
intrger, variable integer, group start and group end, where ‘group’ refers to
nested messages.) The wire type is encoded in the lowest 3 bits of the message
tag. The message tag is written as a base128 variable length quantity, which
is just an integer where each byte has a high bit specifying if there are more
bytes, and low bits specifying binary data in least significant first order.
The remainder of the bits are just the tag of the field from the proto file.
The length prefix variant uses a second base128 vlq to specify length.

Protos are clever: the protobuf compiler itself compiles protobuf definitions
into protobuf messages called descriptors that can be passed to a languages
own protobuf compiler through standard in. These descriptors also get encoded
into the resulting output because they can then be used to perform reflection.

Speaking of reflection, you can also have a bunch of metadata in the form of
extensions and message/field/enum/etc. extension options. You can use these at
runtime, or you can write custom protobuf plugins. I am doing both of these
things simultaneously for different purposes in some projects; it helps me
organize schema information and couple it with metadata.

I don’t think protos are perfect, but I do think they are clever and useful
for what they are used for at Google. I actually personally suspect they are a
bit underrated because outside of Google it’s not always completely obvious
how to use protobufs to their fullest. That said, nothing’s perfect and protos
are certainly full of weird quirks. But if you embrace them, I think there’s a
lot of elegance to be found lurking beneath.

That is all. Disclaimer: I do work for Google, and admittedly I did not like
protobuf until I started working here. But, now I sincerely like protobuf.

------
choeger
The author's tone may be rude but they are absolutely right. The design of
data description languages is a well researched field and deviating from
standard technique without explaining the motivation behind that deviation is
a huge smell.

------
heavenlyblue
Has the author heard about capnproto?

------
kortex
(2018) Original thread here:
[https://news.ycombinator.com/item?id=18188519](https://news.ycombinator.com/item?id=18188519)

Protobufs are the worst (de)serialization format, except for all the others.

My chief complaints are:

\- Protoc is very obtuse and tricky to use for anything where you want
packaging, especially with python

\- The gRPC compiler plugin is even more frustrating in this regard

\- It's very optimized for compactness on the wire, at the expense of serving
as a useful structure within programs (I can't find the source, I think it's
somewhere in the protobuf dev docs, but I've had multiple coworkers tell me
this)

\- The gRPC server python implementation does weird things with
multiprocessing under the hood that I do not understand, which interferes with
other modules trying to use multiprocessing.

\- I still have not found an ideal way to organize files to work well with
importing and still compile correctly with protoc/grpc plugin, _and_ generate
python files with correct import syntax. If anyone knows the "correct" way to
do this that doesn't require too much setup.py hackery, please let me know.

External schema '.proto' files is a feature, not a bug.

The complaints in the article about the type system are pretty silly to me. I
mean, they are great features, but they are not really in the sphere of the
engineering goals when Google set out to make pb/gRPC.

Here's what I love about it though:

\- Support, in particular gRPC's support across so many languages.

\- Language agnostic data structure contracts

\- Shallow learning curve to get a smoke test hello world put together - I
found it a lot easier than Thrift to get up and start playing

To quote hardwaresofon:

> in combining three solutions (schema enforcement, binary representation and
> convenient client/server code generation), they've created something that's
> just easier to use

Specific comparisons:

Cap'n proto looks great on paper but at the time (about a year ago) it had
some issues with python2.7 and 3.6, which made it a nonstarter for the
application at the time.

MsgPack-RPC might work well but I'm a bit dissuaded by the unhealthy looking
repo of the python/go/cpp implementations.

Anything over HTTP - you have the binary-to-text issue. Which, if there are
better solutions for this nowadays, let me know.

I believe that XML is dead/dying as a ser/de format (outside of the markup
domains it has already demonstrated to be very proficient at). Similar lack of
binary support.

That leaves Thrift and Avro, which have juuust enough of a barrier to entry,
with my lack of time to dig into alternatives, that I have not been able to
research thoroughly yet.

------
nif2ee
Doesn't Rust's Serde can simply solve the forward and backward compatibility
issue independent of the serialization format with ignoring non-exising
members in serialized types and setting defaults for the other way around?

~~~
namibj
There's also
[https://github.com/TimelyDataflow/abomonation](https://github.com/TimelyDataflow/abomonation)
, which is very fast, but ugly.

------
rolltiide
Ey well so is that url!

“This is a good website name, only I am smart enough to think of this, I am an
SEO genius with the number one search result for responsible polymorphism”

------
NoZZz
Protobuffers are shitty asn.1

