This great comment (from 201 days ago) is exactly about this issue:
I quote a bit of it here:
JSON size: 386kb
MsgPack size: 332kb
Encoding with JSON.stringify 4 ms
Encoding with msgpack.pack 73 ms
Difference: 18.25x slower
Decoding with JSON.parse 4 ms
Decoding with msgpack.unpack 13 ms
Difference: 3.25x slower
MessagePack is not smaller than gzip'd JSON
MessagePack is not faster than JSON when a browser is involved
MessagePack is not human readable
MessagePack has issues with UTF-8
Small packets also have significant TCP/IP overhead
> MessagePack is not smaller than gzip'd JSON
But gzip'd MessagePack is smaller than gzip'd JSON, so?
> MessagePack is not faster than JSON when a browser is involved
Using it in the browser was never the use case for MessagePack. JSON these days are used for far more than browser-server-communication.
> MessagePack is not human readable
Fair enough. Although I don't consider JSON without newlines to be "human readable" either. I'll have to pipe it through a beautifier to read it.
> MessagePack has issues with UTF-8
MessagePack only stores bytes. It's up to you to decide how to interpret those bytes.
> Small packets also have significant TCP/IP overhead.
Who really wants to have even a dozen different 'standards' for data interchange? The gradual move from XML to JSON demonstrated perfectly well how messy it can all get, just trying to support two quasi standards.
json are not the best for every scenario, as also msgpack does not have the best fitness for everything..
Nature is pretty wise on that matter, it "launch" several possibilities, cause the environment is always changing.. so in a particular giver scenario, one of the possibilities (biological agents) can succed..
but that one that succeded in a particular scenario, will fail in another, where now, another subject who has failed is the better option, adaptation..
so.. choise is a good thing.. why people are so lazy, and dont like to think? everything must be ready..
its our duty to choose the best tool to solve a given problem.. its cool that i can even use something "obsolet" from the 1800´s to solve a 2012 problem..
So if we keep thinking.. "oh that old unwanted thing from the past", we will never see it..
Winners are only winners in a given scenario.. its good to have choice, even against all boards and standards
i prefer to decide from myself, than to have others to do that decision for me..
This is technology, not fashion.
Although I'm not sure why you say "quasi" standard. XML and JSON seem like pretty robust standards to me, and they should both be around for a long time.
To be fair, MessagePack seems to be throwing down the gauntlet with their tagline. The parent comment also makes some important points that anyone considering the two should be aware of.
The GP was not declaring a winner. He was responding to overreaching title of the link. Yes there are similarities to JSON, and yes it may be faster and smaller, but only in certain cases. It is very relevant for the GP to point out that in the most common JSON cases (browser communication), those claims are basically false or only marginal.
This is reasoned argumentation, not arbitrary judgement to declare The One.
> MessagePack only stores bytes. It's up to you to decide how to interpret those bytes.
Then it's not really "like JSON" is it? I'm not saying that it is a bad thing or that it isn't a better option for many current uses of JSON but the "like JSON but fast and small" is a misleading pitch.
In theory this may sound like no big deal; in practice I've observed in similar cases it's a disaster. Average developers routinely muck this up (and that's just me being conservative, above-average ones can choke on this problem too). msgpack really ought to have a dedicated string type, and either declare that this string is always a particular encoding, or give a way to declare what encoding the string is in. (The second is more flexible and arguably more correct, but in something like this where there's going to be dozens of libraries trying to implement it, it is virtually guaranteed that a number of them will muck the variable encoding support up badly, so in practice I'd go with mandatory UTF-8 too.)
If MsgPack doesn't let me throw strings, numbers plus arrays and dictionaries containing arbitrary further structures without me having to worry about defining the encoding it isn't on par.
Obviously any sane data format can contain any data but not equally easily or efficiently.
(I'm not disagreeing with you, or anything. I just like talking about binary encodings.)
It's pretty interesting.
That's not true. Most languages have 1-byte integers, 2-byte integers and 4-byte integers.
Ok, I'll show myself out.
You are right. But I came to that page after reading about a faster and smaller JSON at HN, thus I was expecting to find a faster and smaller JSON.
It's just another case of bad marketing hurting a brand.
Using Python's json and simplejson modules, which are quite zippy, the encoding time was about 11.4 us, and the decoding time was about 8 us. With msgpack, the encoding and decoding times were 2.7 us and 1.7 us, respectively. The encoded size was 187 bytes with JSON, and 140 bytes with msgpack. Zlib compression brought those down to 127 and 125 bytes, while bringing the total encode-and-compress time up to 25 us for JSON and 16 us for msgpack.
I do agree the tagline is bound to annoy.
You may want to ask whoever copywrote messagepack's website, they kind-of are the one declaring:
> It's like JSON. but fast and small.
Hopefully it will be soon, with degradation for old browsers. I'd like to be able to load an avatar as binary png data with the rest of the data.
Biggest issue for me with JSON is needing to base64 any binary data, so msgpack could actually be quite useful.
The next thing would be to have practices and/or libraries that switch techniques based on browser support.
Bits on a some kind of modern electronic storage device require a program (software) to interpret them.
When people say something is not "human readable" they mean not human readable by their favorite text editor, etc. Nothing would prevent a different program from making the binary blob "human readable".
So I claim that, unless we use e.g. Williams-Kilburn tube memory, which stored bits on the screen of the CRT http://www.computerhistory.org/revolution/memory-storage/8/3... which were, in fact, "human readable", I believe it does us all a disservice to say that just because your program does not interpret the binary blob the way you like implies that somehow the binary blob itself is deficient!
That said, personally, I find tagged-length binary protocols very easy to read with nothing more than a hex editor, and often feel that their simplicity of implementation (no state machine required for their parser) to put them much farther into the camp of "human usable" than most file formats people like to claim are "human readable".
And not the best implementation out there since it lacks the redundant key/value elimination that more compact serialization formats use. Consider an array of maps where each of the maps have the same keys. By eliminating the redundant keys (have each subsequent occurrence simply reference earlier occurrences) you can halve the encoded data size relative to implementations without redundant elimination.
It also lacks a canonical string encoding. All strings are saved as raw binary data making it terrible for data interchange between environments with different string encodings.
C/C++/Objective-C users have many better options than this. In Mac/iOS/Cocoa/CoreFoundation, you can use Apple's Binary PList which is open sourced here:
Obviously, further GZIPing the result will always make it more compact but GZIP on uniqued result is likely to result in a more compact representation than GZIP on the non-uniqued result.
I use these libraries in production at various places (the cloud9 IDE backend used them to great effect)
While the native JSON.parse and JSON.stringify is slightly faster than mine with string heavy payloads (and the msgpack is only marginally smaller in that case anyway), when the data is array and number heavy, my codec is much faster than JSON and the data on the wire is a lot smaller.
Also don't underestimate the value of having a binary data type. In my libraries, I extended the format slightly to also encode undefined (as well as null). I also have a string type and a buffer type. In node.js the buffer type is a instance of a node Buffer. In the browser, the buffer is an ArrayBuffer instance (typed array type). So the practical effect is if you put a JS string in, it's encoded as UTF-8 on the wire and comes out the other end as a JS string. If you put a binary buffer in, it comes out the other end as a buffer.
I've contacted the authors of some of the other codecs and when they extend the format, we agree to extend in compatible ways.
My biggest production use of msgpack was as the transport format of my [smith] rpc system. In smith, an rpc call is done as an array. The first value is the function to call, and the rest are the args. If you're calling an anonymous function, then the identifier is a number. In this usage, the payload tends to be array and number heavy and thus very fast and efficient.
(Edited to add in missing links)
However, in the number and array heavy case, the msgpack is 2.5x smaller when serialized. So even if it's a bit slower in browsers, the bandwidth savings and the ability to store binary data may still be worth it. (remember that performance in the browser scales very differently since it's distributed across all your client's browsers)
This page is a good example of how marketing copy can undermine your product.
MessagePack is a data serialization format, it has about as much to do with JSON as honey has to do with milk, but they went and started a flame war with no more than 7 tiny words, "It's like JSON. but fast and small."
Bam, ruined their own message. Now instead of reading that page thinking "What can message pack do for me?" you read the page thinking "Better than JSON huh? I'll be the judge of that."
Marketers could do us all an enormous favor, themselves included, if they stuck to promoting the positive aspects of their products instead of the negatives of the competition. It puts the consumer in the wrong mindset. Why would you want someone wandering around what is essentially your store looking for negative aspects of your product. Seriously, just stop being negative.
Maybe you need a catchy marketing line like "Its like msgpack, but smaller and faster" :)
I have to disagree on that news title, for the "It's like JSON" part.
As pointed out in a link of Someone in an other comment here :
> A major use case of MessagePack is to store serialized objects in memcached.
This also seems very relevant (in article linked from here):
> typical short strings only require an extra byte in addition to the strings themselves.
So, stored strings are actually bigger (even only by one bit) than plain text json.
Of course, I'm biased because I'm a webdeveloper, but I see JSON mainly as a message exchange format, meant to communicate across languages and, most of the time, to communicate between server side and client side. Typically, I'll use server side helpers to format strings like value with currency, i18n messages, etc, which makes a lot of strings in any JSON document transmitted.
The original author made it clear it was not intended for server/client side communication : it's a very specific format intended to offer better performances on very specific use cases (I would rather know how it compares to BSON than to JSON).
JSON, on the other hand, is a generalist exchange format. Messagepack is definitely not like JSON.
> So, stored strings are actually bigger (even only by one bit) than plain text json.
Strings in JSON require two bytes (the quotes) in addition to the string itself.
I think binary formats are great for online games where performance is critical, however for most applications human-readable strings are a lot easier to manage. In any case, both formats have different use cases.
JSON has: numbers, strings, booleans, null, arrays, objects.
Msgpack has: numbers, raws, booleans, nil, arrays, maps.
So I guess msgpack is a superset of JSON. The raws can contain utf8 encoded strings like JSON mandates, or they can contain other things. There is no technical reason that the keys of the maps have to be strings. You could take a lua table that has another table as key and encode that in msgpack just fine.
In practice, I wanted more out of msgpack, so I extended the format using some of the reserved byte ranges to add in an "undefined" type and a distinction between utf8 encoded string and raw binary buffer.
For me this new format has been extremely useful as a general data serialization between processes (node to node, server to browser, etc..) I usually use it over binary websockets or raw tcp sockets.
msgpack (and bson too) always seemed a bit odd to me though. If you need binary packing, why not use protobufs or thrift?
It uses some random JSON data I found. msgpack is more compact (87% of the JSON representation), though not dramatically so.
json encode: 5.54msec
simplejson encode: 8.27msec
msgpack encode: 11.4msec
json decode: 16.4msec
simplejson decode: 4.06msec
msgpack decode: 2.84msec
I'm confused about why json is faster at encoding and simplejson is faster at decoding. simplejson is fastest when you combine encoding and decoding.
And you can probably end up with a draw if you gzip both
The most surprising result was that JSON was 2x (decoding) to 3x (encoding) faster than Storable. The downside is that it sets the utf-8 flag ...
My own measurements: http://stackoverflow.com/questions/9884080/fastest-packing-o...
ASN.1 is slightly more complex in this regard: BER can be mostly understood without knowledge of it's intended structure but only mostly (actually, MessagePack seems to me like BER done right(-er)), PER requires schema in any case. Bust most significantly ASN.1 is mostly about how to encode the schema itself, which is completely outside of the scope of MessagePack.
Besides not being able to distinguish between a true binary blob and a string, the two parties need to agree on what string encoding should be used. Is it UTF-8, UCS-2, EBCDIC? I suspect this would create tons of incompatibilities between implementations as various parties make their own naive assumptions about what strings are encoded with.
That seems like a pretty major flaw to me. X.690 is a bit overly complicated (there are 10+ string types...) but there is such a thing as too simple.
An important point we should consider here is that actual performance depends on implementation.
Nowadays it seems that those who do not learn from history are doomed to write yet another serialization scheme.
I needed to transfer data from a python script to a ruby script. Ended up looking like this:
msg = msgpack.packb(some_data)
msg = `python my_script.py`
some_data = MessagePack.unpack(msg)