

Serializing Data - JSON vs. Protocol Buffers - metachris
http://4feets.com/2009/08/serializing-data-json-vs-protocol-buffers

======
andres
Google's documentation claims that protocol buffers are designed to be fast so
I tested them out and I also found that their Python implementation was too
slow to use. I posted a question to the Discussion Forum and found out that
it's a known (but undocumented!!!) problem:

[http://groups.google.com/group/protobuf/browse_thread/thread...](http://groups.google.com/group/protobuf/browse_thread/thread/2f57569563a6a476/53ed0cf5cf5ee1ff?lnk=gst&q=andres#53ed0cf5cf5ee1ff)

Recently I tried out keyczar, Google's crypto toolkit and found that their
Python implementation was too slow by a factor of 100X because they are using
a slow random string generator:

[http://groups.google.com/group/keyczar-
discuss/browse_thread...](http://groups.google.com/group/keyczar-
discuss/browse_thread/thread/8ed3a503dad4b6b7)

It's hard to see how both of these problems could have passed internal
benchmarking and made it into live code. Maybe Google isn't using its own
Python implementations internally.

~~~
jasonwatkinspdx
Large systems tend to push you towards tradeoffs other people wouldn't make.
Scalability over efficiency. Bisection bandwidth is the most scarce resource
in a large cluster so I imagine protocol buffers have had much more attention
paid to saving bytes than to encoding/decoding speed. Latency critical stuff
probably doesn't use python anyhow.

~~~
haberman
Protobufs are actually pretty fast to encode and decode (in the neighborhood
of 200-300MB/s on my core2 desktop, when using the C++ bindings).

It's just the Python implementation that is slow. I'm working on a Python
implementation that will be much faster. It's really unfortunately that
Protocol Buffers are getting a bad rap due to the current Python
implementation.

------
mbrubeck
My friend Josh (a Google engineer) is working on a Protocol Buffer
implementation that's explicitly designed for (a) efficiency and (b) clean
integration with dynamic languages like Python:
<http://wiki.github.com/haberman/upb>

It's still a work in progress, but the code is there (including tests and some
documentation) and you can start working on bindings for your favorite
language...

------
jacquesm
Conclusion, if you're developing for the Android platform you may want to use
protocol buffers, otherwise use JSON.

I'm quite surprised at how large that library is and how badly it performs,
especially since it does _less_ translation than a comparable JSON serializer
would.

For a complete comparison it would have been nice to include XML as well.

~~~
stingraycharles
I wonder whether the author used the 'optimize_for=SPEED' parameter in
Protocol Buffers. Seems relevant, and since the author didn't mention it, I
suppose he probably didn't.

I personally have a hard time accepting that PB is actually orders of
magnitude slower than JSON, especially given the fact that PB prides itself in
its efficiency.

~~~
metachris
i didn't use the optimize_for=SPEED option but will do that right now and
update it in half an hour.

EDIT: I just added the option *optimize_for=SPEED to the .proto file, and it
increases the speed of the Protocol Buffers by around 5% (still 10 x more than
JSON with Python).

> I personally have a hard time accepting that PB is actually orders of
> magnitude slower than JSON

it's not always slower, but in certain situations yes. it seems that the
Python implementation is especially slow -- would be nice to see the results
with C++. anyone cares to give it a shot?

------
jrockway
Interesting. I am writing an Android app that uses JSON. Using Protocol
Buffers never even occurred to me. (JSON works fine, and I am not going to
change for this app, but I will definitely consider it in the future.)

