

Ask HN: Experience with Protocol Buffers - bcater

Does anyone here have experience with protocol buffers?<p>http://code.google.com/p/protobuf/<p>I need to move some data faster and with less parsing on either side of the transmission, and these seem like a good choice.
======
jokull
This has just been posted on HN

<http://msgpack.sourceforge.net/>

Make sure you benchmark simple JSON. It might be enough.

~~~
bravura
Here are some benchmarks of PB vs JSON vs Thrift:
[http://bouncybouncy.net/ramblings/posts/json_vs_thrift_and_p...](http://bouncybouncy.net/ramblings/posts/json_vs_thrift_and_protocol_buffers_round_2/)

I also have some benchmarks of Python deserialization of JSON, and talk about
alternatives here: [http://blog.metaoptimize.com/2009/03/22/fast-
deserialization...](http://blog.metaoptimize.com/2009/03/22/fast-
deserialization-in-python/)

I encourage you to listen to jokull, and find out if JSON really is too slow
for your needs. My advisor taught me that if you keep your data in an easy-to-
read format, you're more likely to catch bugs in the output, merely because
you have first-class tools for inspecting the file and are more likely to do
so.

~~~
joeld42
I find protobuf easier to catch bugs, because it can catch if you're missing
data or trying to put the wrong type in or leaving out a required field. There
is a ascii-version that is more readable than JSON, too.

That said, overall I agree with your overall sentiment, certainly do look at
JSON as well. Protobuf is overkill for a lot of things, and JSON keeps things
simpler.

------
joeld42
I've used this for some projects, with much success. In one case I used it as
a data interchange format between a C++ app an some Python utilities. It
worked very well for that and the generated API's were easy to work with,
albiet a bit bulky.

In another case, more of an experiment, I used them to serialize game data and
sent it using enet. This proved very flexible and easy to change/add things,
and the packets were extremely compact.

Pros:

* Read/Write access to data from C++ or Python

* Generated API's were easy to work with

* Very compact representation

* Ascii-dump version very useful for debugging

* More error checking than something like json (i.e. it tells you if you leave out a required field)

Cons:

* Adds some build steps, can be more of a headache to maintain (compared to json or something)

* API can't parse ascii version, bad for config data or other stuff that might want to be human readable (vs. xml or json)

* Generally requires copying your data into the protobuf struct, and then packing, rather than going straight from your "native" format into a packed buffer.

* Adds a bit more complexity

* Not as lightweight as json

For what you're doing, I would recommend them.

They're great for "structure" style data, a little weird for array-style. For
example, one of the things I was storing was a 4x4 matrix, and I resorted to
making a struct with 16 members such as m_00, m_01, etc.. which worked fine
and it stored it compactly but was a little weird. I don't think there's a way
to have a float[16] or something like that. I could be wrong, maybe there's a
better way to do this.

Generally, these days i use one of three formats. I am very happy to have
outgrown xml.

protobuf -- for hierarchical, nested data, if it needs to be compact and
accessed from different languages

JSON -- for quick and dirty stuff, when format needs to be flexible (or when i
need to use javascript)

GTO -- for large sets of structured data. (www.opengto.org)

------
MichaelGG
We were using .NET's WCF messaging system, but wanted a faster/smaller format.
<http://code.google.com/p/protobuf-net/> let us keep our code the same, while
using protobuf for the wire format. Worked quite well.

Another approach to consider is using a text format (XML, JSON), then running
it through fast compression like QuickLZ. This has the benefit of not having
to change the program much more than a call to compress/decompress.

------
pkc
Protocol Buffers will work fine and their documentation is very clear. These
are my findings based on my experience -

* Serializing data is ok but parsing takes quite a bit of time especially for large requests. (I am talking in milli seconds) * PBs always require a copy from your internal app data to its structures. Couldn't find a way to avoid that. * They have variable length encoding and it might be a good option if your data comprised of large percent of integers. From our experience don't use it if you are sending within your corp network as packing and parsing takes more time compared to savings in amount of data transfer. They might be a good option if you are sending data across slow networks.

Some of the metrics show that Thrift performs better than PBs. Also Thrift
provides options of using different protocols. If Performance is prime
criteria JSON + zipping should be a good option. Also they won't have an
intermediate step of generating marshaling code.

------
jbooth
Check out Avro, too. With both Protocol Buffers and Thrift, it's really hard
to evolve schemas because you won't be able to read data written with an
earlier version of the schema. Avro has the speed of binary while being
flexible enough to read older data with later versions of the schema.

~~~
kleinsch
That's a misconception. If you design your schema properly, evolving it isn't
a problem. We use protocol buffers (forced to because we're integrating with
Google) and in the schema, everything is marked as optional so they can add or
remove fields in future versions of the schema. This puts the onus on your
code to properly handle missing fields, but that's the same problem you'll
face with any schema that can be changed. Google has changed the schema
multiple times and we've had periods where our code hasn't been updated yet.
It works just fine.

------
JoelPM
Protos will work fine. Thrift will also work fine. ________ (insert other
binary format) will also work fine.

As long as there are libraries for the languages you're using it's not a big
deal. I'd recommend solving the problem and moving on - in the serialization
format wars the real victim is productivity.

------
pwpwp
If your data layout is fairly static, PBs are good.

What I did for an app was encode a kind of JSON in PBs:

[http://pwpwp.blogspot.com/2009/08/storing-json-as-
protocol-b...](http://pwpwp.blogspot.com/2009/08/storing-json-as-protocol-
buffers.html)

