
My thoughts on MessagePack - frsyuki
https://gist.github.com/2908191
======
ynniv
cheald has an excellent comment regarding MessagePack, JSON, and Protocol
Buffers in the post from 15 hours ago:
<http://news.ycombinator.com/item?id=4091051>

    
    
       MessagePack is not smaller than gzip'd JSON
       MessagePack is not faster than JSON when a browser is involved
    

Other comments from that post include:

    
    
       MessagePack is not human readable
       MessagePack has issues with UTF-8
       Small packets also have significant TCP/IP overhead
    

Really, anyone who hasn't read the other comments should:
<http://news.ycombinator.com/item?id=4090831>

~~~
frsyuki
Although the original blog post focuses on JavaScript and browsers,
MessagePack itself doesn't mainly focus on them.

A major use case of MessagePack is to store serialized objects in memcached. A
blog post written by Pinterest describes this use case
(<http://engineering.pinterest.com/posts/2012/memcache-games/>). They use
MessagePack with Python which is faster than one with JavaScript. They could
store more objects in a server without performance declination (e.g. gzip).

It's true that MessagePack is not always faster than JSON (e.g. within
browsers), and it's not always smaller than other serialization methods (e.g.
with gzip compression). So we should consider that which serialization methods
should I use for "my" case.

There are also general tendency which is helpful to select MessagePack or
JSON:

    
    
        MessagePack is faster to serialize binary data such as thumbnail images.
        MessagePack is better to reduce overheads to exchange small objects between servers.
        JSON is better to use it with browsers.

~~~
alexgartrell
> They could store more objects in a server without performance declination
> (e.g. gzip).

The performance declination argument is bullshit. Network's a million [0]
times slower than gzip.

Truth be told, once you're on the network, you're already screwed w.r.t. most
serialization. The only thing efficient compression/decompression is going to
buy you is lower CPU (memcached servers run at like 2% CPU util, even under
heavy load [1]).

Memcache at Facebook actually uses the ascii protocol, and the memcached
implementation is a braindead strtok parser (some of our other stuff uses
ragel -- you'll have a hell of a time out optimizing ragel with the right
compiler flags -- I've tried and failed).

Just use whichever serialization format has the best API, because I can say
with near certainty that it's not going to be a perf problem for you if you're
touching disk, network, etc.

[0] Obviously a made up number, but it's way slower. Especially if you're
unlucky and lose a packet or something.

[1] With the exception of weird kernel spin lock contention issues, which can
happen if you're not sharding your UDP packets well and trying to reply from
8+ cores on 1 UDP socket. You probably aren't.

------
shykes
Zerorpc (<http://github.com/dotcloud/zerorpc-python>) uses msgpack as its
primary serialization format. Among other things it is significantly more
efficient for floating point data. At dotCloud we use it to move many millions
of system metrics per day, so it adds up. Also worth noting: msgpack maps
perfectly to json, so there's nothing keeping you from translating it to json
on its way to the browser, where json is indeed a better fit. In practice this
doesn't affect performance since you typically need to parse and alter the
message at the browser boundary anyway, for some sort of access control.

------
frsyuki
We're using MessagePack in a Rails application to log user behaviors and
analyze them.

Compared with other serialization libraries such as Protocol Buffers, Avro or
BSON, one of the advantages of MessagePack is compatibility with JSON. (In
spite of its name, BSON has special types which cause incompatibility with
JSON)

It means we can exchange objects sent from browsers (in JSON format) between
servers written in different languages without losing information.

I will not use MessagePack with browsers but it's still useful to use it with
web applications.

~~~
rkalla
If JSON compatibility is an issue, have you looked at UBJSON?
<http://ubjson.org/>

May be a bit bigger than msgpack but is damn-near human readable even in its
binary format and really easy to encode/decode. Also 1:1 compatibility with
JSON.

Compatibility and simplicity were the core design tenantes. It may _not_ be
the right choice, just throwing it out there incase it helps.

Disclaimer: I am the author of the spec.

~~~
chmike
I looked at it. It's design process is not completed. One strong negative
point is that it enforces big endian integer encoding.

Another one is that it doesn't use the value space of tags as efficiently as
message pack. I would use the unused space to encode small string size in the
tag since objects (associative arrays) have generally many short identifier
strings as keys.

I sent these as comments and requests for change but didn't receive any
response yet. I don't know how open its design process is.

------
dkhenry
As much as we would like to jump back on the MessagePack is no JSON
alternative here, I would like to commend The author on taking the criticism
posted earlier like a mature adult and explaining his point of view. Even
admitting that some of the benchmarks might have been misleading.

------
SeoxyS
I've personally been using BSON[1] as a binary alternative to JSON, and it's
worked out great. I've written an Objective-C wrapper around the C lib, in
case anybody's interested. Every other language has a solid implementation
from 10gen (the MongoDB guys). It's a solid format with a clear spec that's
extensible and fully supports UTF-8.

[1]: <http://bsonspec.org/>

~~~
stock_toaster
I am kind of partial to tnetstring passed through lzf right now.

------
samsoffes
I think it's hilarious that 2 of the 3 comments on the gist are about how he
formatted his Markdown. _sigh_

I think MessagePack is great. Good for him for trying to make something
better. He openly admits it's not better all the time. Why not help?

------
TheRevoltingX
Very interesting discussion. I work on a 2D MMORPG for Android. This is
extremely relevant to me. I have a few questions though.

What if you take compression and deserialization out of the picture? For
example, in my server I have a hash like data structure that gets turned into
JSON for browsers and byte array for mobile clients.

For example, because the data has to be transferred at fast rates and will be
going over mobile networks. The size of the packet matters because every
millisecond counts.

Then to read the data, I simply read the stream of bytes and build the objects
I need on the client. This has to happen mostly without allocations for
example on Android to avoid the GC.

So a few questions: Does deserializing JSON cause any memory allocations? If
you're not tokenizing the data and don't need to parse it, will it be a
significant gain over s serialized byte protocol or JSON?

In any case, I'll experiment on my end and perhaps blog about my own findings.

~~~
frsyuki
Disclaimer: I'm authoer of MessagePack for C++/Ruby and committer of one for
Java.

As for strings, JSON has to allocate memory and copy to deserialize strings
because strings are escaped.

MessagePack does't have to allocate/copy because the serialized format of
strings is same as the format in memory. But it depends on the implementation
whether actually it doesn't allocate/copy.

C++ and Ruby implementations try to suppress allocation and copying (zero-
copy). But Java implementation doesn't support zero-copy feature so far (we
have plan to do so. Here is "TODO" comment:
[https://github.com/msgpack/msgpack-
java/blob/master/src/main...](https://github.com/msgpack/msgpack-
java/blob/master/src/main/java/org/msgpack/unpacker/ByteArrayAccept.java)).

As for the other types, C++ implementation (and new Ruby implementation which
is under development) has memory pool which optimizes those memory allocations
petterns. But it's hard to implement such optimizations for Java because JVM
(and Dalvik VM) doesn't allow to hook object allocation.

~~~
TheRevoltingX
Interesting, thanks so much for the response. I'll keep this in mind as I
continue to develop my app. It's still between custom byte stream, json,
thrift, etc.

But MsgPack looks interesting as well and, if anything, these blog posts have
brought it into the light for me.

I looked at the java class and what _might_ help is if you can set a buffer
size and use that buffer to store the data in the buffer and expand it if
necessary. But that seems like a lot of work. But yeah, not sure if you can
optimize based on usage patterns due to the constraint you said. In any case,
great stuff and thanks for the info.

------
keithseahus
The best way of serialization at present. I like the thought, performance and
various implementations.

~~~
twelvechairs
As nice as Message Pack might be, 'The best way of serialization' I'm not sure
is a very helpful statement. It can't be 'the best' because 'the best' depends
on the specifics of what you are doing. As noted in the OP: "...its pros and
cons should be carefully considered, and there are many situations where it
simply does not offer enough advantage...".

I, for instance, am still using the much-less-cool yaml, because I need to
reference the same object at multiple points within the same serialization.
JSON and (AFAIK) msgpack just dont do that, so in this case there is simply no
argument. It took me far too much playing around to figure this out, because
the internet is full of "JSON > yaml" and similar broad statements, and very
few plain descriptions of what the actual different use cases for each type of
serialization might be.

~~~
StavrosK
Isn't JSON a subset of YAML? So isn't it, quite literally, that YAML > JSON?

~~~
TylerE
Only if you don't consider having a totally compliant implementations for
pretty much every platform a feature. YAML is a big hairy beast.

