
You're Using JSON, Why not MessagePack? - andrewvc
http://blog.andrewvc.com/why-arent-you-using-messagepack
======
randrews
For the same reasons we are in the process of switching from Protocol Buffers
back to JSON:

* It does not support our languages. Is there a Ruby module with no C extension? A C# library? Lua? What about that really cool language coming out next week? What about C?

* Even if it does, why have to deal with someone else's poor API design? JSON has a million parsers for everything, and if by chance you don't like any of them, you can write another one in about two hours.

* It's not human-readable. Enough said.

* It's smaller than JSON, sure, but is it smaller than gzipped JSON? PB's aren't for our purposes. Neither are TNetstrings. This might be, I haven't checked, but I doubt it.

~~~
haberman
> Is there a Ruby module with no C extension?

Why is the "with no C extension" part important to you?

~~~
arockwell
This is important if you want to use JRuby.

~~~
nupark2
There's a protobuf java library.

~~~
andrewvc
And a jruby messagepack library

------
haberman
The "4x faster than Protocol Buffers" claim is misleading, as I have explained
before: <http://news.ycombinator.com/item?id=2146147>

I'm working on a Protocol Buffer library that can serialize/deserialize to
_either_ JSON _or_ Protocol Buffers. That way you can do all your development
with JSON, but if you ever find you need the efficiency improvements of a
binary format, you can just change your Serialize() call.

Having a .proto file gives you the benefits of something like JSON Schema: a
place to document all your fields and what they mean, and a few very simple
validation constraints like the expected types.

~~~
sorbits
You already said it, but just wanted to back you up: The MessagePack benchmark
is completely useless!

It consists of serializing 3 integers and a string 200,000 times.

MessagePack defines 27 different types¹ (excluding reserved ones) with
variable bit length for the type identifier, length somewhat correlated with
frequency of use.

A benchmark should therefor test real life data and a lot of it.

Their inability to produce such benchmark makes me question the sanity of
splitting up e.g. the type marker for “array” into 3 different types depending
on the size of the array — this adds complexity, so it would be good to know
what exactly the authors based this design choice on, hopefully not that it
made it faster to serialize a 3 element array 200,000 times.

¹
[http://wiki.msgpack.org/display/MSGPACK/Format+specification...](http://wiki.msgpack.org/display/MSGPACK/Format+specification#Formatspecification-
TypeChart)

------
simonw
Because it isn't human readable, which will make debugging it far less
convenient than glancing at some JSON in a browser (or in Firebug).

~~~
terrcin
This. Never liked nor used XML for that exact reason.

~~~
nostromo
Rather than down vote you for a simple mistake, I thought I'd let you know
that xml is just as human readable as json.

~~~
Confusion
Most people find formats like JSON and YAML much easier to read than XML.

~~~
jwdunne
It doesn't make XML any less human readable really. Hell, if you read and
write XHTML a lot, it pretty much speaks volumes of its readability.

For large docs, I'd prefer XML too. I mean, I'd rather hunt down a missing
closing tag than a missing closing bracket when the thing is pages upon pages
long.

------
cullenking
Zip both resulting strings and I bet you it's a wash. Since JSON is mostly
used (at least by me) for sending data to client side javascript, it doesn't
make a difference. JSON is just easier.

~~~
freebird_uk
Yup. We tested messagepack a while back in our environment. Speed difference
was 2% versus gzipped JSON. Not enough to invest time in changing a working
system and losing human readability.

------
samdalton
Changing data formats all the time is usually not worth any of the effort
involved. XML to JSON seemed to take long enough, and now that we've got a
lightweight, internationally accepted, easy to read by humans and computers
alike, data interchange format, I think we'd like to keep it for a while.

------
makmanalp
<http://bsonspec.org/> is another such alternative.

~~~
jsherer
Yes. The first thing I said when I saw this post was "How is this different
than BSON?" Given that BSON is the basis of MongoDB's storage engine, I'm
pretty confident in it and probably won't be moving over to MessagePack
anytime soon.

~~~
latch
bson messages are often larger than their json counterpart. The benefit of
bson isn't size, but rather ease (on the cpu) of serializing and
deserializing.

~~~
jsherer
I agree in part. Most of the time JSON and BSON are very similar in size, and
sometimes larger (due to length prefixes). But, size benefits do emerge when
you need to store binary data inside the object. Instead of a base64 encoded
string, the binary data can be stored directly.

~~~
benatkin
Yeah, GridFS would be pretty silly in JSON.

------
X-Istence
I'm not using JSON or MessagePack because I need to send arbitrary binary
data. I understand why people use JSON, and it has its domain (sending data to
and from the browser), but in other situations I don't think it is a good
protocol.

Another poster mentioned tnetstrings, those look interesting, however I am not
sure how well those would handle binary data.

~~~
simonw
I wouldn't say JSON's main domain is sending data to the browser - I'd say
it's exchanging common data structures (lists, hashes, strings, numbers)
between environments, including between different languages. Sure, it doesn't
make sense if you're shipping binary blobs around, but most of the time you're
probably dealing with lists, hashes, strings and numbers.

~~~
barrkel
JSON is good for edge labeled trees. XML is good for node labeled trees. But
the data structures we want to transport are most often edge labeled graphs.
You usually end up having to invent a mechanism for canonical node references
to turn the tree into a possibly cyclic graph and vice versa. ISTM to improve
on JSON, focus should be here.

------
chubs
I'd love to see some benchmarks comparing the size of gzipped messagepack vs
gzipped json for a wide bunch of example messages. If this is better than
json, and has objective-c libraries, it could be magic for our purposes! (file
size is very important when making mobile apps).

------
vmind
There's also TNetStrings which mongrel2 is using now.
(<http://tnetstrings.org/>)

~~~
duskwuff
Which is itself really just a flawed reimplementation of bencode. (Bencode can
be parsed without lookahead; tagged netstrings cannot.)

<http://wiki.theory.org/BitTorrentSpecification#bencoding>

~~~
inklesspen
It's actually an extension of DJB's netstrings. Hence the name.

------
premchai21
I'm certainly glad if MessagePack allows non-string keys in maps, as it
apparently does (I've only read the spec so far, but not had a chance to try
the code). This is a real pain in JSON.

My stock “basic data type” set comes mainly from Ruby these days. I wince a
little at the lack of interned-symbol type, but it's possible to live without
that. But what of string encodings? In a recent piece of code which I wouldn't
mind replacing with MessagePack, I prefix strings with fixnum IANA encoding
numbers. I suppose that's too much to ask in this context, though… ? How do
other people deal with this—just force everything to UTF-8?

------
simon_kun
because when JSON is Gzip'd there's only around ~5% in it:
<http://goo.gl/b2Uur> (the bit on geojson).

~~~
Groxx
I think you mean _25%_. And it comes out ~50% the size of MessagePack after
gzipping.

------
VMG
Because many other programs understand JSON.

------
kalelias
* fucking use JSON

* if needed: use snappy for blob storage (ie for key value stores)

* if you have an need for very high perfomance in your application that is mature to some degree, evaluate some binary protocols or compression and see how they perform.

the bad thing is that i already see full stack frameworks popping up "now with
<binary-protocol-xy>" and everybody will scream "YEAH!". Most YEAH!-sayers
will seriously be butt-hurt by the plain fact that it isn't human readable.
Switching between JSON/binary won't matter, there will be situations that are
not debugable.

Trivial performance optimizations won't fix a broken design.

~~~
andrewvc
Your tone in really just not constructive and also not appreciated.

As far as JSON being human readable, most JSON sent over the wire has no
whitespace and generally needs to be run through a parser if you're going to
read any significant amount if it.

You even say in your third point that if you need high performance you'd look
at a binary protocol, this is one such protocol. Also, you completely ignore
the space savings mentioned.

Your name is green, you're new to HN, and frankly comments like this drive
away the kind of users we want here, so learn to talk to your peers
respectfully, or keep your mouth shut.

~~~
kalelias
What i am saying is that as long you try to "improve" your project with
marginal "improvements" you will fail at anything. For the majority, things
like message pack is micromanagement. It will cause a lot of work and noise
with minor improvements. The tradeoff is too high, for the majority. The
minority are people who already have shit done. That are people who are
actually saving alot of money with squezing the last bits of performance out
of something. my viewpoint is probably out of scope because its wholistic, but
thats where to start.

------
slashclee
Because MessagePack doesn't work in MacRuby.

~~~
3LegsJim
because no support for brainfuck.

------
sandGorgon
I have a naive question - why would I not use something like ZeroMQ for
messaging ?

I completely understand the need for JSON when you are communicating with the
frontend/Javascript. But if you are doing backend messaging, would you not
rather use ZeroMQ (which comes with its own protocol).

From what I understand (from previous HN posts), it is super fast and handles
binary data very efficiently. I also understand that you can tune stuff at the
OS level to wring the last bit of performance from ZeroMQ.

P.S: yup, it is written in C, but all bindings apparently work very well.

~~~
simonw
They're different types of thing. ZeroMQ doesn't replace JSON, it replaces
HTTP. If you're using ZeroMQ it will help you get some bytes from one place to
another, but you still need a serialization format. You might well send JSON
over ZeroMQ.

~~~
sandGorgon
Here's my specific doubt - my application uses arrays, maps, lists, etc. Can I
not use zeromq to send these data-structures as binary data rather than
serializing them to JSON ?

~~~
pjscott
You're going to have to send them as a sequence of bytes _somehow._
MessagePack converts your data structures into a straightforward binary
format, or lets you roll your own custom serializers easily (I've used it, and
it's good stuff), and JSON does something similar in a less compact but more
human-readable form. If your data structures are very simple, you could roll
your own, e.g. send an array of ints by just converting them to network byte
order and sending them.

Once you've got your data expressed as a sequence of bytes, you can send that
with ZeroMQ. Or with HTTP, or raw sockets, or whatever. The serialization
format and the transport protocol are pretty much completely independent.

------
kainosnoema
Because JSON is around twice as fast to encode/decode in Node, which is where
I really need the performance. MessagePack is just slow, complicated, and
unnecessary.

~~~
broofa
Citation needed, especially since the node-msgpack library docs say it's
anywhere from 1.2-3X faster. <https://github.com/pgriess/node-msgpack>

~~~
kainosnoema
Good point... a Node MQ project formerly based on msgpack recently switched to
JSON and increased its performance by almost 200%:
<https://github.com/aikar/wormhole/issues/3>

I've done several of my own benchmarks as well and can confirm that the
current Node JSON implementation is much faster than MessagePack. Their
benchmarks are most likely quite old.

~~~
andrewvc
That mostly speaks to the quality of node's MessagePack implementation,
nothing more. Benchmarking data format serialization is hard, because one poor
implementation throws things off.

~~~
kainosnoema
MessagePack in Node isn't much more than bindings to the C++ library
([https://github.com/pgriess/node-
msgpack/blob/master/src/msgp...](https://github.com/pgriess/node-
msgpack/blob/master/src/msgpack.cc)). For me the bottom line was "What's
faster in Node, for my project, right now?", and the answer was JSON.

There are benefits to MessagePack that have already been mentioned here,
namely not having to base64 binary data first (smaller size), but that's true
for any binary message format. I'd love to see some other binary formats
thrown into the ring and see how they compare to MessagePack in both size
efficiency and encode/decode performance. BSON seems like an interesting
option, but I don't know enough about it to comment...

~~~
andrewvc
True, but, optimization is tricky, and you can say the say thing about ruby's
messagepack and JSON libraries, where the speed difference is reversed. I,
unfortunately don't have the time to go digging as to why, but there's a
discrepancy somewhere.

------
dangrover
I love MessagePack. Using it a lot in Etude (etudeapp.com).

------
longlistener
If you are representing this on an HTML page you'd need to Base64 encode the
bugger, adding a lot more overhead.

Honestly I'm not sure why you wouldn't just simply use gzipped/JSON. My test
show nearly no difference (1-2%) in performance with MessagePack, yet you get
to leverage all kinds of things that already understand JSON.

------
podperson
Premature optimization?

------
pepijndevos
How does it stack up against Pickle? Because that's what I'd be using in
Python in places where JSON doesn't make much sense.

------
joubert
with orthogonal GZIPing, I'm happy using JSON (which allows me to read the
data in my browser debugger, curl, etc.), while getting good compression.

------
SergeyHack
There is also Thrift.

------
leon_
What about bencoding?

<http://wiki.theory.org/BitTorrentSpecification#bencoding>

Seriously, this is not an either or question. Just use the right tool for the
right job. I wouldn't serialize my data to XML on a micro controller but so I
wouldn't drive my JS frontend with binary serialized data.

~~~
ZoFreX
This was one of my thoughts, too. Bencoding is hardly ever used outside of
Bittorrent, but it's a useful serialisation tool.

