
It's like JSON. but fast and small. - dsr12
http://msgpack.org/
======
pooriaazimi
It looks cool, but be very careful about using it (in a browser, at least).
Browsers have super-fast JSON parsers and parsing msgpack is much, much slower
than parsing good old JSON.

This great comment (from 201 days ago) is exactly about this issue:

<http://news.ycombinator.com/item?id=4091051>

I quote a bit of it here:

    
    
        JSON size: 386kb
        MsgPack size: 332kb
        Difference: -14%
    
        Encoding with JSON.stringify 4 ms
        Encoding with msgpack.pack 73 ms
        Difference: 18.25x slower
    
        Decoding with JSON.parse 4 ms
        Decoding with msgpack.unpack 13 ms 
        Difference: 3.25x slower
    
    

and also this comment: <http://news.ycombinator.com/item?id=4093077>

    
    
       MessagePack is not smaller than gzip'd JSON
       MessagePack is not faster than JSON when a browser is involved
       MessagePack is not human readable
       MessagePack has issues with UTF-8
       Small packets also have significant TCP/IP overhead

~~~
judofyr
_sigh_. Why do we always want to declare a winner? Anyone who has tried to
store binary data in JSON knows that MessagePack does indeed have a use case.
And anyone who is working with browsers knows that JSON/JS is the king there.

> MessagePack is not smaller than gzip'd JSON

But gzip'd MessagePack is smaller than gzip'd JSON, so?

> MessagePack is not faster than JSON when a browser is involved

Using it in the browser was never the use case for MessagePack. JSON these
days are used for far more than browser-server-communication.

> MessagePack is not human readable

Fair enough. Although I don't consider JSON without newlines to be "human
readable" either. I'll have to pipe it through a beautifier to read it.

> MessagePack has issues with UTF-8

MessagePack only stores bytes. It's up to you to decide how to interpret those
bytes.

> Small packets also have significant TCP/IP overhead.

And?

~~~
adventured
I think developers like to declare winners for very, very good reasons
typically. Winners are often very useful. For example in settling on a
standard once a good or great solution is found so a community can thrive
around it, lots of open tools and open code can be built for it, developers
can build upon it with a roadmap in mind, sites can publish with it knowing
they can reach a maximum audience, etc etc etc etc.

Who really wants to have even a dozen different 'standards' for data
interchange? The gradual move from XML to JSON demonstrated perfectly well how
messy it can all get, just trying to support two quasi standards.

~~~
oscargrouch
Choice is a good thing.. do you use a fork, to open doors, eat,scratch your
foot, kill flies, start a fire, just because you have to use the one-and-
ultimate tool ?

json are not the best for every scenario, as also msgpack does not have the
best fitness for everything..

Nature is pretty wise on that matter, it "launch" several possibilities, cause
the environment is always changing.. so in a particular giver scenario, one of
the possibilities (biological agents) can succed..

but that one that succeded in a particular scenario, will fail in another,
where now, another subject who has failed is the better option, adaptation..

so.. choise is a good thing.. why people are so lazy, and dont like to think?
everything must be ready..

its our duty to choose the best tool to solve a given problem.. its cool that
i can even use something "obsolet" from the 1800´s to solve a 2012 problem..

So if we keep thinking.. "oh that old unwanted thing from the past", we will
never see it..

Winners are only winners in a given scenario.. its good to have choice, even
against all boards and standards i prefer to decide from myself, than to have
others to do that decision for me..

This is technology, not fashion.

------
gilgoomesh
This is just an implementation of a binary encoded Property Tree.

And not the best implementation out there since it lacks the redundant
key/value elimination that more compact serialization formats use. Consider an
array of maps where each of the maps have the same keys. By eliminating the
redundant keys (have each subsequent occurrence simply reference earlier
occurrences) you can halve the encoded data size relative to implementations
without redundant elimination.

It also lacks a canonical string encoding. All strings are saved as raw binary
data making it terrible for data interchange between environments with
different string encodings.

C/C++/Objective-C users have many better options than this. In
Mac/iOS/Cocoa/CoreFoundation, you can use Apple's Binary PList which is open
sourced here:

    
    
        http://www.opensource.apple.com/source/CF/CF-476.10/CFBinaryPList.c
    

In non-Apple C/C++ environments, you can use the open source implementation of
Apple's PList here:

    
    
        https://github.com/JonathanBeck/libplist

~~~
andrewvc
As far as redundant kv encoding isn't gzip good enough? Sure, it's more
generalized, but tailored compression schemes often aren't the win they're
touted as. Additionally, they can't exploit scenarios where multiple pairs are
repeated (for instance gzip would compress [{x: 1}, {x: 1}, {x: 1}, ...] far
more efficiently.

~~~
gilgoomesh
Actually, Apple's implementation does handle multiple pair redundancy since
all CFArrays and CFDictionarys added to the PList are uniqued.

Obviously, further GZIPing the result will always make it more compact but
GZIP on uniqued result is likely to result in a more compact representation
than GZIP on the non-uniqued result.

------
creationix
Let me weigh in with my experience with msgpack. I'm a long-time nodejs core
contributor and have been designing browser libraries for many years. I
discovered msgpack almost three years ago and was sad at the lack of
javascript support.

Since then I've written and maintain a javascript codec with two optimized
versions. One is for [node.js][] and uses node's Buffer methods to do the fast
byte to number conversions. The other is for the [browser][] using typed
arrays.

I use these libraries in production at various places (the cloud9 IDE backend
used them to great effect)

While the native JSON.parse and JSON.stringify is slightly faster than mine
with string heavy payloads (and the msgpack is only marginally smaller in that
case anyway), when the data is array and number heavy, my codec is much faster
than JSON and the data on the wire is a lot smaller.

Also don't underestimate the value of having a binary data type. In my
libraries, I extended the format slightly to also encode undefined (as well as
null). I also have a string type and a buffer type. In node.js the buffer type
is a instance of a node Buffer. In the browser, the buffer is an ArrayBuffer
instance (typed array type). So the practical effect is if you put a JS string
in, it's encoded as UTF-8 on the wire and comes out the other end as a JS
string. If you put a binary buffer in, it comes out the other end as a buffer.

I've contacted the authors of some of the other codecs and when they extend
the format, we agree to extend in compatible ways.

My biggest production use of msgpack was as the transport format of my
[smith][] rpc system. In smith, an rpc call is done as an array. The first
value is the function to call, and the rest are the args. If you're calling an
anonymous function, then the identifier is a number. In this usage, the
payload tends to be array and number heavy and thus very fast and efficient.

(Edited to add in missing links)

[node.js]: <https://github.com/creationix/msgpack-js> [browser]:
<https://github.com/creationix/msgpack-js-browser> [smith]:
<https://github.com/c9/smith>

~~~
creationix
Just an update, it turns out that Typed Arrays aren't as fast as node's Buffer
implementation. When comparing msgpack-js-browser to the native JSON library,
JSON is way faster in chrome. <http://jsperf.com/msgpack-js-vs-json>

However, in the number and array heavy case, the msgpack is 2.5x smaller when
serialized. So even if it's a bit slower in browsers, the bandwidth savings
and the ability to store binary data may still be worth it. (remember that
performance in the browser scales very differently since it's distributed
across all your client's browsers)

~~~
creationix
Interestingly, in Firefox, the gap is smaller, their typed array
implementation is a bit faster. For the number heavy case, msgpack is only 47%
slower which is close enough in performance for a great many use cases.

------
newishuser
MessagePack Has nothing to do with JSON.

This page is a good example of how marketing copy can undermine your product.

MessagePack is a data serialization format, it has about as much to do with
JSON as honey has to do with milk, but they went and started a flame war with
no more than 7 tiny words, "It's like JSON. but fast and small."

Bam, ruined their own message. Now instead of reading that page thinking "What
can message pack do for me?" you read the page thinking "Better than JSON huh?
I'll be the judge of that."

Marketers could do us all an enormous favor, themselves included, if they
stuck to promoting the positive aspects of their products instead of the
negatives of the competition. It puts the consumer in the wrong mindset. Why
would you want someone wandering around what is essentially your store looking
for negative aspects of your product. Seriously, just stop being negative.

------
gmac
Equally, "it's like JSON, but binary and human-unfriendly".

~~~
creationix
How is it binary unfriendly? It's designed to be super easy to parse using C.
The lengths and types are in the first byte(s) and most conversions can be
done using typecasts. And as far as human readable binary format, it's not
that bad. I can usually read a hex dump of msgpack if I've been working with
it all day.

~~~
collinvandyck76
But binary, and also, it is human-unfriendly.

~~~
gmac
Exactly (or I'd have written "... binary- and ...").

------
avar
A comparison between msgpack and Sereal and other formats. Sereal is a format
we released at work a few months ago: [http://blog.booking.com/sereal-a-
binary-data-serialization-f...](http://blog.booking.com/sereal-a-binary-data-
serialization-format.html)

~~~
aufreak3
Looks like good work with some thought having gone into it, although with most
such schemes the major gains are usually reaped with the first few simple
optimizations.

Maybe you need a catchy marketing line like "Its like msgpack, but smaller and
faster" :)

------
Joeri
Someone at ebay already compared all the protocols and ended up advising json
instead:

<http://news.ycombinator.com/item?id=3967157>

~~~
scubaguy
Except when performance is important.

------
oelmekki
EDIT : Comment based on wrong assumption : string are actually smaller than in
JSON, see judofyr comment.

I have to disagree on that news title, for the "It's like JSON" part.

As pointed out in a link[1] of Someone in an other comment here :

> A major use case of MessagePack is to store serialized objects in memcached.

This also seems very relevant (in article linked from here):

> typical short strings only require an extra byte in addition to the strings
> themselves.

So, stored strings are actually bigger (even only by one bit) than plain text
json.

Of course, I'm biased because I'm a webdeveloper, but I see JSON mainly as a
message exchange format, meant to communicate across languages and, most of
the time, to communicate between server side and client side. Typically, I'll
use server side helpers to format strings like value with currency, i18n
messages, etc, which makes a lot of strings in any JSON document transmitted.

The original author made it clear it was not intended for server/client side
communication : it's a very specific format intended to offer better
performances on very specific use cases (I would rather know how it compares
to BSON than to JSON).

JSON, on the other hand, is a generalist exchange format. Messagepack is
definitely not like JSON.

[1] <http://news.ycombinator.com/item?id=4092969>

~~~
judofyr
> > typical short strings only require an extra byte in addition to the
> strings themselves.

> So, stored strings are actually bigger (even only by one bit) than plain
> text json.

Strings in JSON require _two_ bytes (the quotes) in addition to the string
itself.

~~~
oelmekki
Oh, I see. I thought it meant "one bit more than JSON". Thanks for fixing it.

------
jarin
Seems like this could be useful as an alternate API format. I'd like to see
sites offering plist (bigger than JSON but dead simple parsing in Cocoa) and
serialized formats as well as JSON and XML and all that.

~~~
justincormack
Thats a large amount of testing, potential bugs for what purpose?

~~~
masklinn
Ease of consumption from Cocoa (mostly Touch) applications?

------
laurent123456
So big news, binary encoded data is more compact than human-readable strings.
I'm sure this new format is useful but bringing in the JSON comparison is
completely irrelevant (and probably done only to get some page-hits). AMF is
also a binary format, has been around for years, and is probably more compact
than JSON too.

I think binary formats are great for online games where performance is
critical, however for most applications human-readable strings are a lot
easier to manage. In any case, both formats have different use cases.

~~~
creationix
It's not a new format, I have no idea why it's popular news today. Having
worked extensively with both msgpack and JSON in a javascript environment, I
can tell you, its the closest to JSON of all the binary formats. The
difference with JSON is msgpack's strings are binary safe (you can have a png
as your value) and the format is a bit more compact, especially around
integers.

JSON has: numbers, strings, booleans, null, arrays, objects. Msgpack has:
numbers, raws, booleans, nil, arrays, maps.

So I guess msgpack is a superset of JSON. The raws can contain utf8 encoded
strings like JSON mandates, or they can contain other things. There is no
technical reason that the keys of the maps have to be strings. You could take
a lua table that has another table as key and encode that in msgpack just
fine.

In practice, I wanted more out of msgpack, so I extended the format using some
of the reserved byte ranges to add in an "undefined" type and a distinction
between utf8 encoded string and raw binary buffer.

For me this new format has been extremely useful as a general data
serialization between processes (node to node, server to browser, etc..) I
usually use it over binary websockets or raw tcp sockets.

~~~
_rs
Thanks for the explanation. How do you think msgpack compares to something
like this: <https://github.com/unixpickle/keyedbits> (specification found in
wiki)

~~~
creationix
Interesting format. It appears easier to implement than msgpack in a scripting
language. My gut feeling is that mspack will be slightly more compact and
faster to decode (especially if decoded using C).

------
WinnyDaPoo
O_o didn't this make news a long time ago?

~~~
Someone
<http://news.ycombinator.com/item?id=4090831>

<http://news.ycombinator.com/item?id=4092969>

------
stock_toaster
json is kind of a lingua franca these days. For some kinds of things though, I
do like tnetstring (has some nice qualities).

msgpack (and bson too) always seemed a bit odd to me though. If you need
binary packing, why not use protobufs or thrift?

~~~
ArchD
Within the space of {schema-ful, schemaless} x {binary, text}, protobuf,
{BSON, MessagePack}, JSON each occupy a distinct position. The position
(schema-ful, text) is not very meaningful combination in practice and not
covered by these, whereas the position (schema-less, binary) is a valid
practical use case supported by {BSON, MessagePack}. For example, you don't
know the schema before-hand but still want to minimize data size.

~~~
ville
This looks like a useful way to compare. Waiting for someone who has compared
them to draw this plot :)

------
ianb
Well, with Python it doesn't seem particularly fast. I put together a small
test:
[http://svn.colorstudy.com/home/ianb/msgpack_json_comparison....](http://svn.colorstudy.com/home/ianb/msgpack_json_comparison.py)

It uses some random JSON data I found. msgpack is more compact (87% of the
JSON representation), though not dramatically so.

json encode: 5.54msec simplejson encode: 8.27msec msgpack encode: 11.4msec
json decode: 16.4msec simplejson decode: 4.06msec msgpack decode: 2.84msec

I'm confused about why json is faster at encoding and simplejson is faster at
decoding. simplejson is fastest when you combine encoding and decoding.

~~~
masklinn
> It uses some random JSON data I found. msgpack is more compact (87% of the
> JSON representation), though not dramatically so.

And you can probably end up with a draw if you gzip both

------
plq
From an API perspective, msgpack is indeed quite similar to json (at least for
python) -- it was quite easy to reuse existing json machinery for integrating
into spyne. I'm not aware of anybody using Spyne with msgpack though.

------
lazyjones
The best choice always depends on your data and the quality of the
implementations for the language you use. We tested many serializers recently
for our backend (Perl, but we wanted something more open to other languages
than Storable) and found msgpack slightly faster and 20-25% smaller than JSON
at encoding, but only half as fast at decoding, so we chose JSON.

The most surprising result was that JSON was 2x (decoding) to 3x (encoding)
faster than Storable. The downside is that it sets the utf-8 flag ...

------
willvarfar
In my own tests, the json parser you use has a _massive_ impact on runtime,
and msgpack is much faster than other altarnatives.

My own measurements: [http://stackoverflow.com/questions/9884080/fastest-
packing-o...](http://stackoverflow.com/questions/9884080/fastest-packing-of-
data-in-python-and-java)

------
felipesabino
and why not just use protocol buffers?

~~~
tkahn6
MessagePack is schema-less, whereas protobuf is not.

~~~
pjscott
It's also almost API-compatible with JSON encoders, so if you wrote a program
that uses JSON for serialization, switching to msgpack is often as easy as
doing a global find-and-replace.

------
rwmj
XDR? ASN.1? Everything old is new.

~~~
dfox
XDR is not directly comparable to this as it requires pre-agreed structures.

ASN.1 is slightly more complex in this regard: BER can be mostly understood
without knowledge of it's intended structure but only mostly (actually,
MessagePack seems to me like BER done right(-er)), PER requires schema in any
case. Bust most significantly ASN.1 is mostly about how to encode the schema
itself, which is completely outside of the scope of MessagePack.

~~~
aliguori
There seems to be a pretty fatal flaw in MessagePack that doesn't exist in
[BCD]ER. MessagePack doesn't explicitly represent strings and only provides a
binary data type.

Besides not being able to distinguish between a true binary blob and a string,
the two parties need to agree on what string encoding should be used. Is it
UTF-8, UCS-2, EBCDIC? I suspect this would create tons of incompatibilities
between implementations as various parties make their own naive assumptions
about what strings are encoded with.

That seems like a pretty major flaw to me. X.690 is a bit overly complicated
(there are 10+ string types...) but there is such a thing as too simple.

------
kzk_mover
For those who are interested in Pinterest usecase (Memcached + MessagePack),
please access to this url too.

> <http://engineering.pinterest.com/posts/2012/memcache-games/>

~~~
johnbellone
This actually seems like a great use case, going to read later, but I
immediately thought of MongoDB BSON format.

------
frsyuki
Here is a benchmark result of msgpack, json, bson and protobuf:
<https://gist.github.com/4348013>

An important point we should consider here is that actual performance depends
on implementation.

------
SimHacker
Well at least they didn't name it a cute cargo cult acronym like YAML and then
only later realize it actually wasn't a markup language, then have to come up
with a backronym to cover up their mistake.

------
cjensen
It used to be that those who do not learn from history are doomed to write
yet-another string class.

Nowadays it seems that those who do not learn from history are doomed to write
yet another serialization scheme.

------
dbbolton
How would this compare to YAML in terms of speed?

~~~
TillE
YAML is generally quite slow, except perhaps when compared to XML. Even JSON
parsers are much faster, and any binary format will be faster than that.

------
dschiptsov
Lisp-ish way of type-tagging the data for efficient binary encoding? Heresy!
Not a J* way.)

------
drudru11
I don't see any evidence presented that it is smaller or faster on this page.

------
tkahn6
This is an interesting coincidence. I happened upon this a few days ago for
the first time and it worked out really wonderfully.

I needed to transfer data from a python script to a ruby script. Ended up
looking like this:

Python script:

    
    
        import msgpack
        msg = msgpack.packb(some_data)
        sys.stdout.write(msg)
    

Ruby script:

    
    
        require 'msgpack'
        msg = `python my_script.py`
        some_data = MessagePack.unpack(msg)

------
lucian303
... and not a standard.

