You have misunderstood the "required considered harmful" argument. It's not fundamentally about the abstract concept of required vs. optional but about the specific implementation in Protocol Buffers, which turns out to have had unintended consequences.
Specifically: As implemented, required field checking occurred every time a message was serialized, not just when it was produced or consumed. Many systems involve middlemen who receive a message and then send it on to another system without looking at the content -- except that it would check that all required fields were sent, because that's baked into the protobuf implementation.
What happened over and over again is that some obscure project would redefine a "required" field to "optional", update both the producer and the consumer of the message, and then push it to production. But, once in production, the middlemen would start rejecting any message where these fields were missing. Often, the middleman servers were operating on "omnibus" messages containing bits of data originating from many different servers and projects -- for example, a set of search results might contain annotations from dozens of Search Quality algorithms. Normally, those annotations are considered non-essential, and Google's systems are carefully architected so that the failure of one back-end doesn't lead to overall failure of the system. However, when an optional backned sent a message missing required fields, the entire omnibus message would be rejected, leading to a total production outage. This problem repeatedly affected the search engine, gmail, and many other projects.
The fundamental lesson here is: A piece of data should be validated by the consumer, but should not be validated by pass-through middlemen. However, because required fields are baked into the protobuf implementation, it was unclear how they could follow this principle. Hence, the solution: Don't use required fields. Validate your data in application code, at consumption time, where you can handle errors gracefully.
Could you design some other version of "required" that doesn't have this particular problem? Probably. But would it actually be valuable? People who don't have a lot of experience here -- including Jeff and Sanjay when they first designed protobufs -- think that the idea of declaring a field "required" is obvious. But the surprising result that could only come from real-world experience is that this kind of validation is an application concern which does not belong in the serialization layer.
> There is no metadata anywhere.
Specifically, you mean there is no header / container around a protobuf. This is one of the best properties of protobufs, because it makes them compose nicely with other systems that already have their own metadata, or where metadata is irrelevant. Adding required metadata wastes bytes and creates ugly redundancy. For example, if you're sending a protobuf in an HTTP response, the obvious place to put metadata is in the headers -- encoding metadata in the protobuf body would be redundant and wasteful.
From what you wrote it sounds like you think that if Protobufs had metadata, it would have been somehow easier to migrate to a new encoding later, and Google would have done it. This is naive. If we wanted to add any kind of metadata, we could have done so at any time by using reserved field numbers. For example, the field number zero has always been reserved. So at any time, we could have said: Protobuf serialization version B is indicated by prefixing the message with 0x00 0x01 -- an otherwise invalid byte sequence to start a protobuf.
The reason no one ever did this is not because the format was impossible to change, but because the benefits of introducing a whole new encoding were never shown to be worth the inevitable cost involved: implementation in many languages and tools, code bloat of supporting two encodings at once (code bloat is a HUGE problem for protobuf!), etc.
> Validate your data in application code, at consumption time, where you can handle errors gracefully.
Honest question: how can I validate data in application code when optional fields decode to a necessarily-valid value by design?
Suppose I'm an application author and I have an integer field called "quantity" which decoded to a 0. How can I tell whether that 0 meant "the quantity was 0 in the database" or "the quantity field was missing" instead?
(One answer is that I should opt into a different default value, like -1, which the application can know indicates failure. If that's what I should always do, then why not help me gracefully recover by requiring that I always specify my fallback value explicitly, rather than silently defaulting to a potentially misinterpretable valid value like `0`?)
I understand that required fields break message buses that only need to decode the envelope, but if I am working on a client/server application where message buses are not involved (as almost all client/server programmers in the world are), I don't follow how "everything is optional, and optional means always succeed with a valid default value" facilitates graceful recovery in the application layer. In order to gracefully recover, the application has to be informed that something went wrong!
It seems to me that this design more directly facilitates bugs in the application layer that are difficult to detect because the information that something unexpected happened during decoding is intentionally discarded by default. It makes the resulting bugs "not the protocol layer's fault" by definition, but that is not a compelling pitch to me as an application author.
> Suppose I'm an application author and I have an integer field called "quantity" which decoded to a 0. How can I tell whether that 0 meant "the quantity was 0 in the database" or "the quantity field was missing" instead?
First, this is clear on the level of wire encoding: either the field has encoded 0 value, or it is simply missing from encoding.
Second, in proto2, you actually have has_quantity() method on a proto message, which will tell you whether quantity is missing or set to 0.
In proto3, the design decision was that the has_foo() methods are available only on embedded message field, and not available on primitive fields, so you'd have to wrap your int64 in a message wrapper, like e.g. the ones available in google/protobuf/wrappers.proto.
The point here (and a common pattern inside google3) is that in your handling code you simply manually check the presence of all required fields: if (!foo.has_quantity()) { return FailedPreconditionError("missing quantity"); }. It is a bit of a hassle, but the benefit is that you have control on where the bug originates and how it is handled in your application layer, as opposed to silently dropping the whole proto message on the floor.
In proto2, you could use `has_foo()` to check if `foo` is present, even for integer types. You could also specify what the default value should be, so you could specify e.g. a default of -1 or some other invalid value, if zero is valid for your app.
Unfortunately, proto3 removed both of these features (`has_` and non-zero defaults). I personally think that was a mistake. I'm not sure what proto3 considers idiomatic here. Proto3 is after my time.
Cap'n Proto also doesn't support `has_` due to the nature of the encoding, but it does support defaults. So you can set a default of -1 or whatever. Alternatively, you can declare a union like:
# (Cap'n Proto syntax)
foo :union {
unset @0 :Void;
value @1 :Int32;
}
This will take an extra 16 bits on the wire to store the tag, but gets the job done. `unset` will be the default state of the union because it has the lowest ordinal number.
I suppose in proto3, you ought to be able to use a `oneof` in a similar way.
My point about metadata is that if protobufs had even a small amount of self description in them, the middlemen who weren't being updated could all have been found automatically and the version skew issue would have been much less of an issue. Like how Dapper can follow RPCs around, but for data and running binaries.
Google doesn't do that because for its specific domain it needs a very tight encoding, amongst other reasons (and the legacy issues). It could have fixed the validating-but-not-updated-middleman issue in other ways, but instead it made the schema type system less rigorous vs more rigorous. That seems the wrong direction.
Specifically: As implemented, required field checking occurred every time a message was serialized, not just when it was produced or consumed. Many systems involve middlemen who receive a message and then send it on to another system without looking at the content -- except that it would check that all required fields were sent, because that's baked into the protobuf implementation.
What happened over and over again is that some obscure project would redefine a "required" field to "optional", update both the producer and the consumer of the message, and then push it to production. But, once in production, the middlemen would start rejecting any message where these fields were missing. Often, the middleman servers were operating on "omnibus" messages containing bits of data originating from many different servers and projects -- for example, a set of search results might contain annotations from dozens of Search Quality algorithms. Normally, those annotations are considered non-essential, and Google's systems are carefully architected so that the failure of one back-end doesn't lead to overall failure of the system. However, when an optional backned sent a message missing required fields, the entire omnibus message would be rejected, leading to a total production outage. This problem repeatedly affected the search engine, gmail, and many other projects.
The fundamental lesson here is: A piece of data should be validated by the consumer, but should not be validated by pass-through middlemen. However, because required fields are baked into the protobuf implementation, it was unclear how they could follow this principle. Hence, the solution: Don't use required fields. Validate your data in application code, at consumption time, where you can handle errors gracefully.
Could you design some other version of "required" that doesn't have this particular problem? Probably. But would it actually be valuable? People who don't have a lot of experience here -- including Jeff and Sanjay when they first designed protobufs -- think that the idea of declaring a field "required" is obvious. But the surprising result that could only come from real-world experience is that this kind of validation is an application concern which does not belong in the serialization layer.
> There is no metadata anywhere.
Specifically, you mean there is no header / container around a protobuf. This is one of the best properties of protobufs, because it makes them compose nicely with other systems that already have their own metadata, or where metadata is irrelevant. Adding required metadata wastes bytes and creates ugly redundancy. For example, if you're sending a protobuf in an HTTP response, the obvious place to put metadata is in the headers -- encoding metadata in the protobuf body would be redundant and wasteful.
From what you wrote it sounds like you think that if Protobufs had metadata, it would have been somehow easier to migrate to a new encoding later, and Google would have done it. This is naive. If we wanted to add any kind of metadata, we could have done so at any time by using reserved field numbers. For example, the field number zero has always been reserved. So at any time, we could have said: Protobuf serialization version B is indicated by prefixing the message with 0x00 0x01 -- an otherwise invalid byte sequence to start a protobuf.
The reason no one ever did this is not because the format was impossible to change, but because the benefits of introducing a whole new encoding were never shown to be worth the inevitable cost involved: implementation in many languages and tools, code bloat of supporting two encodings at once (code bloat is a HUGE problem for protobuf!), etc.