

Thrift, Protocol Buffers and JSON comparison - ropiku
http://bouncybouncy.net/ramblings/posts/thrift_and_protocol_buffers/

======
andres
I abandoned Protocol Buffers because the Python implementation was too slow.
The problem is that Google hasn't written a C-extension yet because it
wouldn't be compatible with AppEngine. It's a known problem that has gone
undocumented.

[http://groups.google.com/group/protobuf/browse_thread/thread...](http://groups.google.com/group/protobuf/browse_thread/thread/2f57569563a6a476/53ed0cf5cf5ee1ff?lnk=gst&q=andres#53ed0cf5cf5ee1ff)

------
Gonsalu
The tests are using simplejson. Someone added cjson (impressive) results in
the comparison, over at reddit:
[http://www.reddit.com/r/programming/comments/811gl/comparing...](http://www.reddit.com/r/programming/comments/811gl/comparing_thrift_protocol_buffers_and_compressed/c07ynfo)

~~~
inklesspen
The tests apparently were not run with simplejson's c extension speedups
compiled in. I did so: <http://gist.github.com/72412>

Simplejson was slightly faster on two out of three tests. Consistently so,
when I re-ran the tests.

Test environment: py2.6 on Mac OS X, with simplejson 2.0.9 and python-cjson
1.0.5

Test script: <http://gist.github.com/72413>

Also, I changed the test script from using time.time() to time.clock(), which
according to the python docs should be used for performance testing on unixes.

------
lacker
For handling protocol buffers in Python, it is much faster to generate the C++
protocol buffer wrappers, and then swig them. It is bothersome to regenerate
this every time you change the proto definition though.

------
mattj
I think the schema is causing this. Lists of strings of nonfixed size aren't
going to yield good results, as any serialization framework now has to perform
work to find the delimeters of each string. In this case you could store IP
addrs and all the other DNS fields as ints and you should see a massive
speedup. This would probably be closer to the actual workload google or fb
sees - why would they be serializing huge records of data that's already been
encoded into a human-readable / string format?

~~~
joshu
The string could be prefixed with the length of the string.

~~~
mbreese
I thought this is exactly how protocol buffers worked with non-fixed length
fields. Doesn't it start the record with the length of the string? I'm not
sure how thrift works, but probably the same way.

(Not speaking from experience, just from what I remember of the format when I
read the specs).

------
mokeefe
I suspect that he forgot to add the speedup option for pb:

option optimize_for = SPEED;

~~~
mitchellh
Although I'm not a pb user, reading around its been said that this flag
doesn't work for Python bindings to PB.

------
oconnor0
Any ideas as to why the YAML code is so slow?

