Hacker News new | past | comments | ask | show | jobs | submit login

I guess I'll copy/paste the comment I made last time this was posted: https://news.ycombinator.com/item?id=18190005


Hello. I didn't invent Protocol Buffers, but I did write version 2 and was responsible for open sourcing it. I believe I am the author of the "manifesto" entitled "required considered harmful" mentioned in the footnote. Note that I mostly haven't touched Protobufs since I left Google in early 2013, but I have created Cap'n Proto since then, which I imagine this guy would criticize in similar ways.

This article appears to be written by a programming language design theorist who, unfortunately, does not understand (or, perhaps, does not value) practical software engineering. Type theory is a lot of fun to think about, but being simple and elegant from a type theory perspective does not necessarily translate to real value in real systems. Protobuf has undoubtedly, empirically proven its real value in real systems, despite its admittedly large number of warts.

The main thing that the author of this article does not seem to understand -- and, indeed, many PL theorists seem to miss -- is that the main challenge in real-world software engineering is not writing code but changing code once it is written and deployed. In general, type systems can be both helpful and harmful when it comes to changing code -- type systems are invaluable for detecting problems introduced by a change, but an overly-rigid type system can be a hindrance if it means common types of changes are difficult to make.

This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.

I don't have time to address all the author's points, so let me choose a few that I think are representative of the misunderstanding.

> Make all fields in a message required. This makes messages product types.

> Promote oneof fields to instead be standalone data types. These are coproduct types.

This seems to miss the point of optional fields. Optional fields are not primarily about nullability but about compatibility. Protobuf's single most important feature is the ability to add new fields over time while maintaining compatibility. This has proven -- in real practice, not in theory -- to be an extremely powerful way to allow protocol evolution. It allows developers to build new features with minimal work.

Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time, hence the "required considered harmful" manifesto. In practice, you want to declare all fields optional to give yourself maximum flexibility for change.

The author dismisses this later on:

> What protobuffers are is permissive. They manage to not shit the bed when receiving messages from the past or from the future because they make absolutely no promises about what your data will look like. Everything is optional! But if you need it anyway, protobuffers will happily cook up and serve you something that typechecks, regardless of whether or not it's meaningful.

In real world practice, the permissiveness of Protocol Buffers has proven to be a powerful way to allow for protocols to change over time.

Maybe there's an amazing type system idea out there that would be even better, but I don't know what it is. Certainly the usual proposals I see seem like steps backwards. I'd love to be proven wrong, but not on the basis of perceived elegance and simplicity, but rather in real-world use.

> oneof fields can't be repeated.

(background: A "oneof" is essentially a tagged union -- a "sum type" for type theorists. A "repeated field" is an array.)

Two things:

1. It's that way because the "oneof" pattern long-predates the "oneof" language construct. A "oneof" is actually syntax sugar for a bunch of "optional" fields where exactly one is expected to be filled in. Lots of protocols used this pattern before I added "oneof" to the language, and I wanted those protocols to be able to upgrade to the new construct without breaking compatibility.

You might argue that this is a side-effect of a system evolving over time rather than being designed, and you'd be right. However, there is no such thing as a successful system which was designed perfectly upfront. All successful systems become successful by evolving, and thus you will always see this kind of wart in anything that works well. You should want a system that thinks about its existing users when creating new features, because once you adopt it, you'll be an existing user.

2. You actually do not want a oneof field to be repeated!

Here's the problem: Say you have your repeated "oneof" representing an array of values where each value can be one of 10 different types. For a concrete example, let's say you're writing a parser and they represent tokens (number, identifier, string, operator, etc.).

Now, at some point later on, you realize there's some additional piece of data you want to attach to every element. In our example, it could be that you now want to record the original source location (line and column number) where the token appeared.

How do you make this change without breaking compatibility? Now you wish that you had defined your array as an array of messages, each containing a oneof, so that you could add a new field to that message. But because you didn't, you're probably stuck creating a parallel array to store your new field. That sucks.

In every single case where you might want a repeated oneof, you always want to wrap it in a message (product type), and then repeat that. That's exactly what you can do with the existing design.

The author's complaints about several other features have similar stories.

> One possible argument here is that protobuffers will hold onto any information present in a message that they don't understand. In principle this means that it's nondestructive to route a message through an intermediary that doesn't understand this version of its schema. Surely that's a win, isn't it?

> Granted, on paper it's a cool feature. But I've never once seen an application that will actually preserve that property.

OK, well, I've worked on lots of systems -- across three different companies -- where this feature is essential.

> But I've never once seen an application that will actually preserve that property.

I wonder if author uses Chrome, which depends heavily on this in its Sync feature.

Yeah, most big Google services -- including Search -- rely pretty heavily on unknown field retention. Google has been building large services out of microservices since a decade before anyone ever said the word "microservice". When one service is updated to emit a new field, and another service is updated to consume it, it's important that the feature can then work, without updating all the middlemen.

When I worked on Chrome Sync, we spent some time making sure that unknown fields were preserved properly. Glad to see that someone noticed, cheers!

I did notice that when I was an owner of protobuf in Chromium :) Custom patches to support unknown field preservation in lite mode sure brought me some hassle when updating to version 3 of the library.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact