
Protocol Buffers v3.0.0 released - Rican7
https://github.com/google/protobuf/releases/tag/v3.0.0
======
amluto
They added a feature that impressively fails to interoperate with the rest of
the world.

> Added well-known type protos (any.proto, empty.proto, timestamp.proto,
> duration.proto, etc.). Users can import and use these protos just like
> regular proto files. Additional runtime support are available for each
> language.

From timestamp.proto:

    
    
      // A Timestamp represents a point in time independent of any time zone
      // or calendar, represented as seconds and fractions of seconds at
      // nanosecond resolution in UTC Epoch time. It is encoded using the
      // Proleptic Gregorian Calendar which extends the Gregorian calendar
      // backwards to year one. It is encoded assuming all minutes are 60
      // seconds long, i.e. leap seconds are "smeared" so that no leap second
      // table is needed for interpretation.
    

Nice, sort of -- all UTC times are representable. But you can't _display_ the
time in normal human-readable form without a leap-second table, and even their
sample code is wrong is almost all cases:

    
    
      //     struct timeval tv;
      //     gettimeofday(&tv, NULL);
      //
      //     Timestamp timestamp;
      //     timestamp.set_seconds(tv.tv_sec);
      //     timestamp.set_nanos(tv.tv_usec * 1000);
    

That's only right if you run your computer in Google time. And, damn it,
Google time leaked out into public NTP the last time their was a leap second,
breaking all kinds of things.

Sticking one's head in the sand and pretending there are no leap seconds is
one thing, but designing a protocol that breaks interoperability with people
who _don 't_ bury their heads in the sand is another thing entirely.

Edit: fixed formatting

~~~
justinsaccount
It's interesting that you refer to a huge amount of planning and engineering
as "sticking your head in the sand".

[https://googleblog.blogspot.com/2011/09/time-technology-
and-...](https://googleblog.blogspot.com/2011/09/time-technology-and-leaping-
seconds.html)

I think that the approach everything else uses is the "sticking your head in
the sand approach". You basically pretend that there is no problem and that
time is perfectly accurate, up until you have a minute with 59 or 61 seconds.

Just because suddenly trying to handle "Oh shit, everything is off by an
entire second!" is the approach everything else uses doesn't mean it is the
right approach.

~~~
amluto
No, I agree they did a bunch of good engineering for internal use.

But they didn't keep it internal properly -- the real world has leap seconds
for better or for worse, and this library really does stick its head in the
sand and pretend they don't exist. Google specifically says that this library
is designed to be "the foundation of Google's new API platform". Yet they give
a data type (as a headline feature) and a sample usage that is simply
incorrect if you don't set your system to work using Google's "leap smear". It
also seems quite likely that it'll result in blatantly wrong human-readable
strings. I'll even quote a string from timestamp.proto [1]:

9999-12-31T23:59:59Z

That looks like an RFC 3339 string, and it even has the 'Z' suffix, which
means it's UTC, which has an agreed-upon international definition. But this is
not a valid UTC time. It's a time in a different time zone that Google made
up.

Google easily could have done better: publish a spec for a different kind of
time like:

9999-12-31T23:59:59s

where the little 's' means 'smeared'. Supply a serializer and deserializer for
that. Now there's no ambiguity.

[1]
[https://github.com/google/protobuf/blob/master/src/google/pr...](https://github.com/google/protobuf/blob/master/src/google/protobuf/timestamp.proto#L103)

------
zellyn
\- removing optional values is actually quite nice. In practice, I end up
checking for "missing or empty string" anyway.

\- the "well-known types" boxed primitive types essentially add optional
values back in. And depending on your language bindings, may look the same.

\- extensions are still allowed in proto3 syntax files, but only for options -
since the descriptor is still proto2. It seems odd to build a proto3 that
couldn't represent descriptors.

\- I still don't understand the removal of unknown fields. Reserialization of
unknown fields was always the _first_ defining characteristic of protobufs I
described to people. I actually read many of the design/discussion docs
internally when I worked at Google, and I still couldn't figure this one out.
Although it's certainly simpler…

\- Protobufs are the "lifeblood" (Rob Pike's words) of Google: the protobuf
team is working to get rid of significant Lovecraftian internal cruft, after
which their ability to incorporate open source contributions should improve
dramatically.

~~~
teacup50
> _\- removing optional values is actually quite nice. In practice, I end up
> checking for "missing or empty string" anyway._

I feel the opposite; this greatly reduces the utility of protobuf.

Previously, I could trust that if parsing succeeded, then I had a guarantee of
a populated data structure.

Now, I have to check each field individually, in manually written code, to
verify that no required fields are missing.

That's _really_ lame, and a _huge_ step backwards.

~~~
smallnamespace
> I could trust that if parsing succeeded, then I had a guarantee of a
> populated data structure

Using required fields have actually bit Google more than once and were
increasingly being considered harmful.

A canonical example is that you add a required field, and then update binaryA
in production (which receives messages from binaryB), which immediately
crashes or errors out because the new field is missing.

So practically speaking, you can never add required fields to any message
where you can't guarantee binary version syncing amongst all instances of the
message-dependent services. At scale, this is essentially operationally
impossible.

And if you're _not_ running an RPC-based service architecture, then why are
you using protos anyway?

~~~
teacup50
> _A canonical example is that you add a required field ..._

Yeah. Don't do that without versioning your protocol. It's even less difficult
to handle than maintaining API/ABI compatibility in a library.

> _So practically speaking, you can never add required fields to any message
> where you can 't guarantee binary version syncing amongst all instances of
> the message-dependent services._

Sure you can. If you version things at the protocol or per-request level, you
can negotiate protocol conformance just fine.

Having a message type defined as "Message_V1" __OR __ "Message_V2" is _still_
simpler than having "any or none of the fields from any iteration of the
message definition, where consistency is solely defined in terms of the
field/message validation code you write in every protocol consumer".

> _And if you 're not running an RPC-based service architecture, then why are
> you using protos anyway?_

It's a very serviceable compact serialization mechanism for at-rest data.

~~~
cbsmith
> Yeah. Don't do that without versioning your protocol. It's even less
> difficult to handle than maintaining API/ABI compatibility in a library.

Actually, the whole point of that was so you don't have to version your
protocol. Protocol versioning actually tends to make code maintenance a pain
in the posterior, and working through old data really annoying. Instead, you
do optional fields.

If you don't want that, go ahead and just write raw bytes and don't bother
with the serialization layer.

> Having a message type defined as "Message_V1" OR "Message_V2" is still
> simpler than having "any or none of the fields from any iteration of the
> message definition, where consistency is solely defined in terms of the
> field/message validation code you write in every protocol consumer".

But you don't have to do either. It seems like you aren't familiar with the
use of protocol buffers. You just define optional fields with a reasonable
default, and magically all the old protobufs get that default value.

~~~
teacup50
> _If you don 't want that, go ahead and just write raw bytes and don't bother
> with the serialization layer._

Or just keep using protobuf2, 'cause it's been working great for us for ~6
years.

> _But you don 't have to do either. It seems like you aren't familiar with
> the use of protocol buffers._

I've written my own protobuf compiler. I'm familiar.

> _You just define optional fields with a reasonable default, and magically
> all the old protobufs get that default value._

That only works up until there's no "reasonable default".

~~~
cbsmith
> Or just keep using protobuf2, 'cause it's been working great for us for ~6
> years.

Oh sure, I wouldn't change practices, but I'd certain question why that
practice had been put in place.

> That only works up until there's no "reasonable default".

If there is no reasonable default, I'd be even more wary about making it a
required field.

------
rdtsc
How does this compare or in general why would you pick this vs newer formats
like Cap'n'proto or FlatBuffers?

From FlatBuffers overview I see this comparison:

\---

Protocol Buffers is indeed relatively similar to FlatBuffers, with the primary
difference being that FlatBuffers does not need a parsing/ unpacking step to a
secondary representation before you can access data, often coupled with per-
object memory allocation. The code is an order of magnitude bigger, too.
Protocol Buffers has neither optional text import/export nor schema language
features like unions.

\---

So are the newer ones useful mostly when serialization vs deserialization
speed matters
([https://google.github.io/flatbuffers/](https://google.github.io/flatbuffers/))
?

~~~
jackmott
Cap'n'proto is more or less abandoned I believe. But it and the flatbuffer
approach gives very fast serialization and deserialization speed (essentially
takes 0 times) but you pay a cost when you later access data, because it
extracts the values you need on demand from the raw bytes.

I'm not sure it would often make much sense overall.

~~~
dwrensha
> Cap'n proto is more or less abandoned I believe

As maintainer of capnproto-rust, I beg to differ. :)

Cap'n Proto is indeed actively maintained, and here at Sandstorm we depend on
it every day as a core piece of our infrastructure.

------
JoachimSchipper
This looks like a nice evolution.

It's a pity that the "deterministic serialization" gives so few guarantees; I
have worked on at least one project that really needed this.

(Basically, we wanted to parse a signed blob, do some work, and pass the
original data on without breaking the signature; unfortunately, this requires
keeping the serialized form around, since the serialized form cannot be re-
generated from its parsed format.)

~~~
pherl
The main concern that the deterministic serialization isn't canonical is due
to the unknown fields. As string and message type share the same wire type,
when parsing an unknown string/message type, the parser has no idea whether to
recursively canonicalize the unknown field.

The cross-language inconsistency is mainly due to the string fields comparison
performance, i.e. java/objc uses utf16 encodings which has different orderings
than utf8 strings due to surrogate pairs.

Feel free to start an issue on the github site asking for canonical
serialization with your use case. We may change the deterministic
serialization with stronger guarantee (e.g. cross language consistency) or add
another API for canonical serialization.

~~~
JoachimSchipper
This was years ago; I'd feel bad asking you to do a lot of work to support one
niche use case in a research project that never quite made it to market. And
protobufs ended up saving us quite a bit of development work, even if keeping
the blob around is Wrong in a moral sense.

(You can find the niche use case in a response to your sibling comment, BTW.)

------
jalfresi
"The main intent of introducing proto3 is to clean up protobuf before pushing
the language as the foundation of Google's new API platform"

Does anyone know if this means Google's public APIs will be proto3 based? I
quite like protobufs.

~~~
agency
They've been experimenting[1] with exposing Google Cloud Platform APIs over
gRPC (which is powered by proto3), so it seems quite likely.

[1] [https://cloud.google.com/blog/big-data/2016/03/announcing-
gr...](https://cloud.google.com/blog/big-data/2016/03/announcing-grpc-alpha-
for-google-cloud-pubsub)

------
manish_gill
If someone better informed than me can please explain - where and why would
something like Protocol Buffers be useful?

~~~
zellyn
We use them internally at Square for our RPC mechanism ("Sake", similar to
"Stubby", Google's internal RPC mechanism), for our Kafka-based
logging/metrics/queue infrastructure, and for defining external JSON APIs.
We're in the process of switching from Sake to GRPC, which also use Protobufs
as their payload format (although you can sub in different transports).

~~~
zellyn
I should mention that we use Ruby, Java, and Go. So protobufs are also the
"lingua franca" for cross-language communication.

------
gonyea
Shocking! Google's started supporting more languages than just the ones they
care about. I really hope this signals the death of their disdain culture.

Being a worthwhile Cloud provider means hiring experts in all sorts of
languages and supporting their efforts.

Imagine a world where Google didnt just "support node" (YEARS late), but
actually turned their v8 expertise into a Cloud product.

But that'd involve convincing Java-devs-turned-VPs to care about JavaScript,
<2004>and EVERYONE knows that JavaScript is a terrible language.</2004>

------
skybrian
Sadly the JSON format they chose isn't actually suitable for high-performance
web apps. Web developers who use protobufs will continue to get by with
various nonstandard JSON encodings.

~~~
positr0n
Why isn't is suitable? (I've never used protobufs)

~~~
skybrian
The fields are indexed by field names (converted to lower camel case) instead
of tag numbers. It's great for readability, but it's a lot more verbose,
particularly for repeated fields.

~~~
ambrice
> Added a new field option "json_name". By default proto field names are
> converted to "lowerCamelCase" in proto3 JSON format. This option can be used
> to override this behavior and specify a different JSON name for the field.

~~~
skybrian
Right, but nobody's going to set that for every single protobuf field.

~~~
ambrice
You're right. The only people that would use it are people that a) care enough
about optimization to switch out shorter tag names and b) don't care enough
about optimization to switch to binary format. Probably not many..

~~~
skybrian
Except everyone who is doing RPC in a browser. Not sure why that is, but
binary formats aren't popular. We still care about performance.

------
mattiemass
Wow, this seems to address a bunch of problems I've experienced with protobuf
in the past. Looks awesome!

~~~
grosbisou
Could you expand on the problems you encountered?

~~~
colanderman
I've never looked at proto3, but proto2 has at least the following issues:

* No clue about namespacing. If you pick the wrong name for something, you can have name clashes within a protobuf, across uninterpreted option classes, with protobuf source code, with your own source code; and it's different if you're in Python or C. Nowhere are naming restrictions defined.

* The API is maddening and inconsistent, especially in Python. (It's totally different between Python and C.) Some things look like lists but really aren't (e.g. you can't assign a list to a repeated field in Python). Even basic reflection (e.g. to get at uninterpreted options) is a Lovecraftian nightmare, and the docs are wholly unhelpful.

* Good luck serializing a list. There's not really such a thing, despite that the API pretends like there is; there are only repeated fields. So you need a separate flag to distinguish "empty list" from "not present list".

* Abstruse implementation. There are so many layers of indirection in the generated source and the core library that I wouldn't know where to start debugging.

Not sure if they fixed any of these issues with proto3.

~~~
cbsmith
The short answer is the Python implementation wasn't exactly great.

~~~
colanderman
Reflection in the C++ version is as bad or worse given that you can't mess
around with it in a REPL to figure out how it really works. And the C++
version has most of the namespacing issues (e.g. any field starting with
"set_" has potential to clash with another field).

Both implementations are equally bad, despite that they seem to have been
written by two separate teams that didn't communicate with each other.

~~~
cbsmith
Honestly, I never had much trouble with the C++ one.

------
forrestthewoods
Google also has flatbuffers. I wonder if flatbuffers is being used by enough
developers to justify significant development?

[https://github.com/google/flatbuffers](https://github.com/google/flatbuffers)

~~~
IshKebab
I think it's more that GRPC (Google's RPC-over-HTTP2 protocol) directly
supports Protobuf, and not Flatbuffers. All of Google's Cloud APIs use
Protobuf (for example the [Speech
API]([https://cloud.google.com/speech/reference/rpc/](https://cloud.google.com/speech/reference/rpc/))
).

I have to say, GRPC is pretty great. It's statically typed, supports loads of
languages, the interfaces are simple to define (basically Protobuf), and it
supports streaming requests! Most RPC systems omit that, or _only_ have
message streams (e.g. MQTT). Good RPC systems need both.

The only downside I find is that it is rather complicated (in design; not
use).

~~~
forrestthewoods
As an FYI, GRPC support was added to flatbuffers a month ago.
[https://github.com/google/flatbuffers/tree/master/grpc](https://github.com/google/flatbuffers/tree/master/grpc)

------
zbjornson
> primitive fields set to default values (0 for numeric fields, empty for
> string/bytes fields) will be skipped during serialization.

I don't totally understand this. Presumably during deserialization they will
be set to defaults and not missing? Otherwise, coupled with the removal of
required fields, it seems impossible to actually send a 0-value number or
empty string, or to send a proto without a field and not have it set to 0 or
"" (have to explicitly null the field?).

~~~
prattmic
Within the API, proto3 does not have the concept of field presence. All fields
are "present" and default to their type's zero value.

Since the client can handle this, there is no need to explicitly serialize
default values.

~~~
merb
and how do you send a explicit zero so that the client knows that the field is
really set by the server and not the default? or a explicit empty string?

~~~
tantalor
One case where this question is important is when you are updating a record
stored by the server. You only want to send fields you are changing because
the record might be huge. But then how does the server distinguish between
fields you didn't set and fields you want to set back to the default? The
solution is to also tell the server which fields you are changing in a
separate message.

Example:

    
    
        {
          'update_record': {
            # Set foo=bar
            'foo': 'bar'
          },
          'fields_to_update': {
            'foo': true,
            # Set some_int_flag=0 (default)
            'some_int_flag': true
          }
        }
    

See also "Field Masks in Update Operations"
[https://developers.google.com/protocol-
buffers/docs/referenc...](https://developers.google.com/protocol-
buffers/docs/reference/csharp/class/google/protobuf/well-known-types/field-
mask)

~~~
zbjornson
Thanks for explaining that and for the reference. It seems like a lot of
overhead for a protocol that is designed to be cheap...

~~~
pherl
There are also wrapper well known types that you can fallback to (in
wrappers.proto), when you need to distinguish between, say empty string and
null.

~~~
dyoo1979
[https://github.com/google/protobuf/blob/master/src/google/pr...](https://github.com/google/protobuf/blob/master/src/google/protobuf/wrappers.proto#L31)

------
blt
I was hoping for packed serialization of non-primitive types. I once used
Protobuf to serialize small point clouds, and ended up needing to serialize
them as a packed double array and reconstruct the (x, y, z) structure at read
time to avoid Protobuf malloc'ing each point individually. Not a huge deal,
but it would be a real pain for more complex types.

------
andrewmcwatters
Could someone explain to me why you would use Protocol Buffers, Cap'n Proto,
etc versus rolling your own type-length-value protocol besides API interop?

What if your team could write a smaller TLV protocol, and it was necessary to
keep your codebase small? Would this not be wise? Are Protobufs and party not
comparable to TLV protocols?

~~~
euyyn
In the vast majority of cases, you want your team to spend their time doing
something other than reinventing protos, debugging the in-house
implementation, maintaining the library, etc.

It's not clear to me anyway how doing it yourself would help keeping your
codebase small vs using protos. In terms of code to maintain, doing it
yourself is a net loss. In terms of binary size and method count, the proto
libraries for Objective-C and Android are optimized like crazy.

~~~
andrewmcwatters
Those are all reasons why I wanted to use protobufs to begin with. It sounded
like it solved many issues for us.

But I'm thinking about scripting environments, where the data types used in
protobufs don't exist in the host language. Simple things like this. I think
in the implementations I've seen, they're just coerced or ignored. That's
fine, imo.

But in terms of small codebases: a simple TLV protocol, where only limited
data types are implemented, can be 1/10th of the size of any protobufs
implementation.

My team has built out a high performance type-length-value system that doesn't
require compiled schemas for game development, and we have a very small
serialization lib that's smaller than any protobufs implementation for our
target language.

I'd like to use protobufs to decrease the amount of modules we have to
personally maintain, but I don't see the value in doing so for our particular
situation.

~~~
euyyn
I'm a bit confused: When you talk of size, are you talking of the compiled
binary size of the runtime + generated code, or are you talking of lines of
code?

If you're talking of binary size, I'm surprised that it'd be a problem given
that you're using a scripting environment. Maybe you'd be willing to share
more details?

If you're talking of lines of code, using someone else's library seems to me
to always be better.

------
wehadfun
In C# why use Protocol Buffer over the XML or binary serializes?

~~~
klodolph
The C# binary serializer is not really comparable in terms of what it does.
It's more like Python's Pickle library.

[http://stackoverflow.com/questions/703073/what-are-the-
defic...](http://stackoverflow.com/questions/703073/what-are-the-deficiencies-
of-the-built-in-binaryformatter-based-net-serializati)

C# binary serialization is only useful in certain circumstances. It doesn't
work outside the .NET world and it even has compatibility problems within the
.NET world—you can break deserialization by making certain changes to your
code. From the Microsoft documentation:

> The state of a UTF-8 or UTF-7 encoded object is not preserved if the object
> is serialized and deserialized using different .NET Framework versions.

(From [https://msdn.microsoft.com/en-
us/library/72hyey7b(v=vs.110)....](https://msdn.microsoft.com/en-
us/library/72hyey7b\(v=vs.110\).aspx))

Also see [https://msdn.microsoft.com/en-
us/library/ms229752(v=vs.110)....](https://msdn.microsoft.com/en-
us/library/ms229752\(v=vs.110\).aspx)

