Hacker News new | comments | show | ask | jobs | submit login
MsgPack vs. JSON: Cut your client-server exchange traffic by 50% (indiegamr.com)
78 points by muellerwolfram on June 10, 2012 | hide | past | web | favorite | 51 comments



JSON's appeal is that it is both compact and human readable/editable. If you're willing to sacrifice all semblance of readability/editability, then sure, let's all do the binary marshaling format dance like it's 1983.

Additionally, if you're sending data to a browser, then you're cutting the knees out of native JSON.parse implementations (Internet Explorer 8+, Firefox 3.1+, Safari 4+, Chrome 3+, and Opera 10.5+). The copy claims "half the parsing time" (just because of smaller data size), but I'm exceptionally skeptical of those claims since this is just going to move the parser back into Javascript.

I whipped up a quick example to demonstrate my point: https://gist.github.com/2905882

My results (using Chrome's console):

    JSON size: 386kb
    MsgPack size: 332kb
    Difference: -14%

    Encoding with JSON.stringify 4 ms
    Encoding with msgpack.pack 73 ms
    Difference: 18.25x slower

    Decoding with JSON.parse 4 ms
    Decoding with msgpack.unpack 13 ms 
    Difference: 3.25x slower
So MsgPack wins the pre-gzip size comparison by 14%, but then is nearly twenty times slower at encoding the data, and is over three times slower at decoding it.

Furthermore, once you add in gzip:

    out.json.gz: 4291 bytes
    out.mpak.gz: 4073 bytes
So, our grand savings is 218 bytes in exchange for 78ms slower parsing time. It'd still be faster to use JSON's native encoding/decoding facilities even if your client were on a 28.8k baud modem.


> I'm exceptionally skeptical of those claims

I can attest that these guys are known to post bogus benchmark numbers. The claim on their home page ("4x faster than Protocol Buffers in this test!") is incredibly misleading; their benchmark is set up in such a way that Protocol Buffers are copying the strings but Message Pack is just referencing them. It's true that the open-source release of Protocol Buffers doesn't have an easy way of referencing vs. copying, but it is still highly misleading not to mention that you're basically benchmarking memcpy().

One of these days I'll re-run their benchmark with my protobuf parser which can likewise just reference strings. I am pretty sure I will trounce their numbers.


I pointed out the Protocol Buffers comparison to a friend on IRC. He came back with:

"They rolled their own "high-level" serializer and deserializer for PB, built on top the lower-level stuff the documentation advises you not to use. Using the recommended interfaces, PB is faster than in their test. It is still slower than msgpack. Not sure why they'd make their test look cooked when they win in a fair test anyway. Further examination shows that the test is _mainly_ a test of std::string, for protobuf. std::string is required to use contiguous storage, whilst msgpack's test uses a msgpack-specific rope type."


Wow, awesome seeing the benchmarks to back up the text of your reply. Thanks for doing that.


I started reading their page and saw stuff like "take tiny, tiny data sample to have the most difference" and I started doubting. Since I saw the "faster" processing, and given how straight forward it is it parse json in C I was doubting even more.

So I clicked comments, expecting to see this, so that I wouldn't have to do it myself. Ah, makes me happy, thanks!


These results mirror the experiences I've had with large JSON payloads of time series data and experimenting with different wire formats.

The best message sizes I have obtained by custom compressors with complex predictive models and bit level encoding.

However with JavaScript on the critical path (say browser side) even a simple bit/byte bashing format like MessagePack was a performance loss and in my case the payloads were bigger.

YMMV, but if you really need aggressive compression or performance none of the standard solutions (protobuff et al) come close to hand rolled solutions.

If you have JavaScript on your path, I've had the most success by using JSON-to-JSON transformations utilising a predictive model (think say rows to columns and delta encoding) with gz over the top. This has got my payloads down in size without blowing out time - but it does spray the GC with lots of short lived object traffic.


Here's a conveniently executable fiddle if people want to exec cheald's benchmark themselves:

http://jsfiddle.net/javajosh/SjPub/


For the native JavaScript implementation you are correct - no doubt about that it is faster there, however, looking at the server-side it looks the other way around: for example parsing MsgPack in Ruby is about 5x faster than parsing JSON - you can rewrite your benchmark to Ruby and try for yourself


But you can install a c extension implementation for ruby (or pretty much any other language that doesn't already have native JSON built in) on your server fixing that performance gap. That's exactly what you can't do on the browser JS side for MsgPack.


then you could get a c extension for MsgPack, that would have to parse less bytes than the JSON c extension and therefore be faster

JSON is very good, no doubt about that - but i think that even JSON can be optimized, does not have to be by MsgPack, but probably by a binary format.


My point was that you can't install a c extension in a browser. I assumed that all the MsgPack libraries would be C or equivalent performance-wise.

As to text vs binary, I'll just leave this link here: http://catb.org/~esr/writings/taoup/html/textualitychapter.h...


i understood your point, however the speed in the user's browser is not important since it's just 1 operation, whereas the server has to do most of the operations decoding/encoding all that

i guess there are pros and cons on both sides, certainly nice would be something like a native BSON implementation with extensions in the browser/debugger to make it readable


Your blog post suggests using MsgPack as a drop in replacement for JSON for client-server communications. In that context every encode/decode operation on the server would be matched by one in the client browser. The <5% gain in performance on the server (and I think that is generous) cannot possibly make up for the performance hit of a javascript implementation of MsgPack.

When MsgPack was written a lot of JSON in the browser was still not native so an argument could be made if you desperately needed that bandwidth and it worked better than average for your specific workload. That is simply not true anymore.


i'm using it in Flash, so there is absolutely no JavaScript involved in my case...


>i understood your point, however the speed in the user's browser is not important since it's just 1 operation, whereas the server has to do most of the operations decoding/encoding all that

Speed in the user's browser is VERY important.


it is, but can you tell the difference between 0.5ms or 1ms? ;)


When you're doing it 100 times? Sure.


Having actually used MsgPack, it's nice that's compact, but bad that it doesn't handle utf-8 properly. The reason data is smaller is because they're basically using less bits depending the value, eg if the numerical value is "5" then you can use 3 bits to represent the value whereas JSON will always use floats to represent integer values. If you know exactly what your data in the JSON might be, MsgPack is nice, otherwise it can be a pain in the butt if you're sending arbitrary data from users.

Here's a nice review of different "competitive" formats to JSON: http://qconsf.com/dl/qcon-sanfran-2011/slides/SastryMalladi_...

According to their review, msgpack is good for small packets of data, bad for big ones. Binary JSON formats like MsgPack, are only good if you know your exact usage pattern, otherwise they bring along too many restrictions for it to be competitive. The best transport mechanism is still JSON.


"whereas JSON will always use floats to represent integer values."

It might use floats to represent integers in memory, but when json is transferred over the wire it's a textual encoding so each digit will consume 8 bits.


That's true, but I guess one should then mention a 5 digit number would consume less space in memory than in textual format. MsgPack recognizes this and stores that 5 digit number using the minimum number of bits required, but this wouldn't be possible in the usual json string formats.


What problems does it have with UTF-8?


It uses invalid UTF-8 chars all over the place, I guess it assumes text to be Latin-1.

‚¤¨£ - all those chars would be two bytes in utf-8, and even worse, you'll have to escape high utf-8 characters thus making your MsgPack message bigger than JSON if you're unlucky enough.


According to https://github.com/msgpack/msgpack/issues/26 it doesn't handle strings at all:

> This is by design. MessagePack doesn't have string type but only bytes type. So decoding bytes to string is application layer business.

So it looks like all of the claims of "just change one line of code" (and also most of these benchmarks) aren't quite true: unless you can guarantee your input text is ASCII, technically you have to first traverse your objects and replace all strings with decoded byte arrays. I would think that would be a dealbreaker for any user input.


Saying that reducing 50% of a 30 bytes message reduces 50% of the client-server exchange traffic is misleading (and wrong).

The average HTTP header is around 700 bytes plus 40 bytes of TCP/IP header. Since its gains diminishes with more data (MsgPack seems to focus on the structure of the message), I imagine the real gain will be between 1-2% of traffic.

The extra complexity is simply not worthy.


Also, if you're using it in the browser, JSON is already there while you need to load a library in order to use msgpack.


The recent trend of new projects trying to create a new JSON binary/more compressed protocol is troublesome ... and I strongly feel like these projects overlook gzip and "what's there" in favor of "look at my new shiny library". I had the same grievance w/RJSON previously (http://news.ycombinator.org/item?id=4068555).

These projects are failing to realize the broader picture: gzip and JSON parsing are builtin and the subject of many optimizations by browser vendors. The network/request/response cost of putting an RJSON/MsgPack like library into the mix negates any payload savings I might have had by creating a dependency to the library.


We have now come full circle in terms of how data is marshaled between client and server.

I've now seen it go from XDR (the marshaling used by ONC-RPC) to XML to JSON and now to a binary form of JSON, which is basically like XDR.


I'm convinced this is a very elaborate troll.


Msgpack is not like xdr. It's binary but more compact.


This sort of comparison is fairly meaningless in many real-world situations as other have already pointed out. You'll find that with most payloads, compressed JSON will actually be significantly smaller than non-compressed MsgPack (or Thrift / Protocol Buffer / Other Binary Serialisation Format).

And compressed MsgPack (or Thrift / Protocol Buffer / Other Binary Serialisation Format) will often be roughly the same size as compressed JSON.

Of course, there's also the performance benefit of faster serialisation / deserialization, but again, it won't make much of a difference in many real-world situations.

That said, our API supports both JSON and Google Protocol Buffer (using standard content negotiation via HTTP headers) and we actually use it with Google Protocol Buffer from our mobile app. It's not so much for the message size benefit (which is non-existent in practice) but more for development convenience (and the potentially slightly better performance is an added benefit we get for free).

We've got one .proto file containing the definition of all the messages our API returns / accepts. Whenever we make a change to it and click Save, the Visual Studio Protobuf plugin we use kicks in and automatically generates / updates the strongly-typed C# classes for all the API messages.

On the XCode side, a simple Cmd-B causes a 2-liner build script we wrote to copy across the latest .proto file for the API and automatically generate strongly-typed Objective-C classes for all our API message in a couple of seconds. Beautifully simple.

It then lets us code against strongly typed classes and take advantage of code completion and compile-time type checks instead of working against arrays or dictionaries. It also means that there's no magic or automatic type guessing going on that inevitably end up breaking in all sort of weird manners - our API contract is well-defined and strongly-typed.

And while the API design is still in flux, working against strongly-typed classes means that a simple compile will tell us whenever a change we made to an API message causes existing code to break.

Last but not least, if we ever need to check what the API returns in a human-readable form, all we have to do is call the API with Accept: application/json and we can troubleshot things from there.

It all makes working with the API from both sides really quick and easy and making changes to the messages trivial. It's certainly possible to have a nice workflow with JSON as well but I have to say that I quite like what we ended up with with Protocol Buffers


This exactly mirrors our use of (and experience with) protobuf. I think that people over-value human readability when using non-checked encoders that require debugging the serialized data.


"for development convenience"

I wonder what the development convenience is when using protobuf instead of JSON. Could you elaborate?


That's what my whole comment was all about. Automatic generation of strongly-typed classes. Code completion. Compile-time type checking. Well-defined and strongly-typed API contract. Anything you want me to clarify?


Word of warning, the node.js implementation has many bugs and pull requests are ignored. Find someone's fixed fork to point to.


I looked at it lately working in python, but was a bit disappointed that they didn't include more types (not necessary, but some sugar for sets, lists, uuids would be nice - they have lots of codes to spare at the moment) or distinction between utf8 and bytes. Lack of that leads to:

    >>> msgpack.loads(msgpack.dumps(u"ąę"))
    '\xc4\x85\xc4\x99'


Yep, the moment it couldn't encoded a datetime I knew it wasn't going to work for me.


What's the difference between this and BSON, another binary based JSON implementation?


The main differences are:

1. MessagePack includes less data types. Several of the BSON-only data types are specific to MongoDB, but others - such as separate types for binary data and UTF-8 text - are IMO quite useful.

2. MessagePack includes several optimizations in the "type byte" to efficiently represent small values.

A while back I designed my own serialization format that combined features from both, but I never finished an implementation.


do you have a specification or even some prototype on github? would be interesting to see


I dug the spec I wrote off my backup hard disk and posted it here: https://gist.github.com/2907123

I also planned to define a standard set of tags for the "tagged value" type, but never quite got around to that.


BSON is really only used by MongoDB.


CSV/JSON/YAML/XML/similar + zip/gzip = best of both worlds in many cases. But don't add the compression step until you're sure you really need it. It may not matter. If/when it does matter you may have a kind of Maserati problem on your hands anyway.

I'm really hoping all the "Every Byte Is Sacred" kinds of wire formats will die off. Moore's Law-like effects are making computing resources beefier and more abundant over time. No similar effect for our human cognitive capacity or work environments. Meatware is more expensive than hardware, in the general case.


>Additionally, the whole thing look less readable when you look at it – so as a small bonus this will probably repell some script-kiddies, who are trying to intercept your JSON- or XML-calls.

This of course is not a problem because we're all using SSL here.


You'll get far better gains by removing HTTP headers and other fun things.


Surely this depends on how long your actual content is. If your JSON values are short, I can see the advantages of this. If you've got a lot of data, then I can't see this reducing traffic by 50%.


has anyone tried using msgpack with backbone, or any other js framework?


you won't be able to use msgpack with backbone, unless you're using backbone on the server side and you have something to decode the binary json blobs. msgpack is not a json replacement on the client side.


using msgpack on the server with rails for example shouldnt be a problem. using this gem https://github.com/nzifnab/msgpack-rails , it seems like its not much different then to letting rails return xml, json or html.

I just thought that it might be possible to transform the received msgpack-data into json, so backbone can use it.


That's what I mean. MsgPack is strictly a wire format for server side applications. You won't gain much using that as the wire format for browsers, especially if you use compression which would probably negate the benefits of MsgPack anyway. You'd also need something to read/write byte arrays inside the browser; I think firefox has some non-standard mechanisms for doing this, but I doubt there's cross browser compatibility here.


Override Backbone.Sync, and you should be sweet.


This has actually been covered before last year:

http://news.ycombinator.com/item?id=2571729

One interesting comment:

Yup. We tested messagepack a while back in our environment. Speed difference was 2% versus gzipped JSON. Not enough to invest time in changing a working system and losing human readability.

Also looks like MsgPack has very poor serialize/deserialize speeds on the JS end:

https://gist.github.com/1101623




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: