
MessagePack: like JSON, but fast and small - signa11
https://msgpack.org/
======
dathinab
> MessagePack: It's like JSON. but fast and small

 _and complete_.

Both are minimal self describing data serialization formats, but JSON is
incomplete. _It misses a type to represent one of the fundamental type: byte
blobs_. And also parts of floats.

Which means there are a lot inconsistent ways to hack that in e.g. base64
encoded strings. But this means it's loosing partially it's property of being
self-describing (same if you allow numbers as strings e.g. "-12").

Just to be clear I'm not saying JSON should have bytes directly in it. But a
"native" base64 string additional to string, number, null, list and map would
help. E.g. `{ "bytes": b"YWJjZGU=" }`

Wrt. floats it's missing (+/-) Infinity, NaN is more like a error variant so
it's okish to be missing, but then why not have it.

Also for completeness it would be better to differentiate between int and
float as float is imprecise due to rounding errors.

\----

PS: I don't like `null` but having it in this kind of data serialization
format is still required. But the difference between a field not being there
and it being `null` can be a mess and not all tools handle it well.

PPS: Both JSON and MsgPack can be used for non self describing serialization,
e.g. by serializing the a record (fixed number of fields in known order) as a
list of values instead of a mapping. But both are focused on enabling self
describing serialization.

EDIT: PPPS: Yes, it's a very weak form of self-describing is use here, there
are systems which are much more self describing e.g. XML+XML Schema linked
from the XML document but also tend to be far more complex

~~~
nprateem
And comments. Not being able to put comments in JSON files is beyond stupid.

~~~
jrockway
JSON is a binary computer-to-computer interchange format. No computer-to-
computer interchange format has comments.

The problem with JSON is that it's just human-readable enough that people
think it's a config file format. It's not.

~~~
knodi123
What are the material differences between a config file format, and a machine-
to-machine only format?

~~~
haggy
Config files are typically a 'load once, read many" pattern. Machine to
machine communication protocols can be VERY high rate and over the network so
they need to be highly efficient both in memory footprint and serialization
overhead.

~~~
iainmerrick
So... JSON isn’t great as a machine to machine protocol, then? So I guess it
must be a config format after all!

------
magicalhippo
I was looking at MessagePack for communicating to and from my STM32F1-based
microcontroller project from the PC controller software I'd be writing. At
least the official C library was not optimized for memory usage and code size.
I also considered BSON, but it also lacked suitable libraries.

So I ended up using JSON. Yes the message sizes are larger in byte size with
JSON but using the jsmn[1] parser I could avoid dynamic memory usage and code
size was small. The jsmn parser outputs an array of tokens that point to the
buffer holding the message (ie start and end of key name etc), so overhead is
quite limited.

For JSON output I modified json-maker[2]. It already allowed for static memory
usage and rather small code size, but I changed it to support a write-callback
so I could send output directly over the data link, so I didn't have to buffer
the whole message. This is nice when sending larger arrays of data for
example.

Combined it took about 10kB of program (flash) memory, of which float to
string support is about 50%. Memory usage is determined by how larger incoming
messages I'd need, for now 1kB is plenty.

A nice advantage of using JSON is that it's very easy to debug over UART.

Though having compact messages would be nice for wireless stuff and similar,
so does anyone know of a MessagePack C/C++ library that is microcontroller
friendly?

[1]: [https://github.com/zserge/jsmn](https://github.com/zserge/jsmn)

[2]: [https://github.com/rafagafe/json-
maker](https://github.com/rafagafe/json-maker)

~~~
liamdiprose
Protobufs/nanopb would be my go-to for minimal message size.

If you want small code size, CBOR seems like a good bet:

> The Concise Binary Object Representation (CBOR) is a data format whose
> design goals include the possibility of extremely small code size, fairly
> small message size, and extensibility without the need for version
> negotiation. [1]

This [2] C-implementation fits in under 1KiB of ARM code.

[1]: [https://cbor.io/](https://cbor.io/)

[2]: [https://github.com/cabo/cn-cbor](https://github.com/cabo/cn-cbor)

~~~
jopsen
CBOR is also used on WebAuthn, usage in a web spec means to me that someone
smart considered it a sane choice -- and more importantly that the format is
here to stay.

~~~
kbumsik
It's great CBOR is accepted in wider area, but I am personally curious why
WebAuthn choose CBOR instead of JSON. WebAuthn is a web browser feature, and
why W3C would introduce a new data exchange format in their specs? Maybe
WebAuthn needed a binary data type?

~~~
jopsen
I'm guessing a binary format is nice when interacting with a device..

Anybody know if (and why) U2F uses CBOR?

------
supermatt
I took the first example from
[http://www.json.org/example.html](http://www.json.org/example.html) and
msgpack makes it 304 bytes. a simple gzip on the json is 289 bytes. The larger
examples are even more in gzips "favour"

I am not 100% sure why I would choose to use this - maybe for super tiny
documents?

~~~
blattimwind
gzip is very slow. I don't know why anyone would still use it today.

~~~
jgalt212
really? what do you suggest that is faster? Our shop has tested a number of
compression formats (xv, bz2, gzip, etc) and gzip is good enough and faster
than the others we tested.

~~~
blattimwind
The algorithms you name are all rather outdated.

Typical gzip decompression speed is somewhere in the 200-250 MB/s region,
compression is much slower. LZ4 for example tends to compress at ~600-700
MB/s, and decompress at several GB/s. zstd is tweakable over a very wide range
of ratio-speed trade-offs.

LZMA(2) (xz) is a rather troubled format and should not be used any more.
bzip2 has always been slower than gzip with usually marginally better
compression. It has been irrelevant for a long time.

~~~
gnulinux
> It has been irrelevant for a long time.

Funny you say that, my company uses bz2 for compressing pretty much
everything.

~~~
blattimwind
And many companies use fixed-width record formats for data exchange... what's
your point exactly?

~~~
gnulinux
I guess that it's not irrelevant. Deprecated, old, etc sure, but irrelevant?

------
tracker1
Depends on what you're doing... for real-time data streaming, sure... for
request-response, I'm less convinced. JSON being human readable, relatively
lightweight, and highly compressible is pretty convincing.

I'm not against messagepack or protobuf, however, much like all things, I'd
rather start with simple http+gz+json (maybe websockets) and optimize as
needed. Not everyone is at the scale of FAANG, and most don't really need this
level of optimization.

~~~
SlowRobotAhead
>[JSON] relatively lightweight

Um... no? Epoch as binary, 4 bytes. As JSON, 10 bytes. 64bit big num in
binary, 8 bytes, in JSON 19 bytes.

For a small example object I can think of

{ “x” : “y” }

In binary, 5 bytes. In JSON, 9 bytes.

Now... light ENOUGH because you’re using a PC and gigs of ram and a 100mbit
internet, sure. Light in terms of a microcontroller? No.

I flat out could not use JSON in any of my projects and I’m not at FAANG
level.

~~~
tracker1
seems like what you're talking about would fall under the real-time streaming
category... HTTP overhead alone would outweigh the JSON differences you're
talking about.

~~~
SlowRobotAhead
Double. That's the inefficiency you can count on with JSON over CBOR or
MsgPack if you're dealing with number data or short strings.

Sure, the HTTP overhead and TCP overhead under that are significant if you're
transferring a single bignum. How about 10,000 of them?

Even if the best case of all strings where binary gets limited by ASCII's
inefficiency, "":"", is "only" 5 bytes of wasted data, but what about a
million item JSON? 5 million wasted bytes does seem like it outpaces NIC then
TCP then HTTP layer overhead. But uea, IDK.

The claim was that JSON is lightweight, it is not. It's not as bad as XML,
I'll give it that.

------
paulrpotts
I implemented a streaming deserializer for MessagePack data in C for small
microcontrollers. It is quite small and not complete.

Then I tested it with test data streams generated by the reference C++
implementation and Python implementation.

It's actually kind of a pain, because the C++ serializer generates a variety
of different data types depending on the values you are packing, not the type
of the values you are packing. Let's say I encode a uint32_t field. The stream
might get a uint8, a uint16, a uint32, or even an int32 (for reasons that
completely elude me).

Also, C++ strings come out as the 'ext' type while Python strings come out as
the 'string' type, so I have to accommodate both, even though they are both
basically byte strings.

So, I want to tell the deserializer what kind of output field I'm expecting,
for each struct member or array, and then look in the data stream to see if
there is a data object there that came from the same type. But this is
impossible, so the per-type decode functions have to be quite complicated to
handle a variety of types.

So - it works but I can't do much on the deserialization side to verify that
the data I'm unpacking really matches what was encoded. I can only detect very
broken cases, like when I'm unpacking a uint32_t and in the stream there is an
int32_t with a negative value.

I guess this is mostly done for optimization, but two things would make it a
lot better:

\- if the spec actually specified how data types in different languages were
allowed to be encoded

\- if the encoded data contained _two_ type fields, one indicating the
original source data type and another indicating the type it was encoded into.

Basically the spec is just way too "loose" to make it usable for the use case
I'm trying to use it for, which is to easily generate data that is sent to a
micro and stored in EEPROM, then deserialized out of EEPROM later.

That's probably not very close to a use case the original designer had in
mind. But I haven't found anything that works better (less decode logic).

~~~
ludocode
The spec doesn't really make this clear, but reifying the packed encoding
types is not, I think, how MessagePack was intended to be used. At least it's
not how it's implemented in MessagePack libraries. As far as I know they
pretty much all use the most efficient representation for all values, so the
original data type is always lost.

You generally shouldn't worry about the low-level type that a value was
encoded into in the MessagePack stream. It's dynamically typed, so you should
just care about values. When encoding you should allow the encoder to use the
most efficient representation, and when decoding you should be able to tell
your MessagePack parser the integer width you want instead of caring about the
original type or how it was encoded. It should then accept any packed integer
type as long as the value is in range.

This is how my MessagePack implementation works as well. If you expect to
receive an integer that fits in, say, `uint16_t`, you can call
`mpack_expect_u16()` or `mpack_node_u16()`, and it will allow any integer
representation as long as the value is in range.

It sounds like this is where you were going with your implementation as well,
so this may not be comforting because it's not what you want, but it is at
least the correct way to understand the format. I've talked about this pretty
extensively and wrote up a protocol clarifications document that explains a
bit more about how and why MessagePack libraries discard integer width and
signedness:

[https://github.com/ludocode/mpack/issues/35](https://github.com/ludocode/mpack/issues/35)

[https://github.com/ludocode/mpack/blob/develop/docs/protocol...](https://github.com/ludocode/mpack/blob/develop/docs/protocol.md)

If you really want things like original integer width represented in the
format, ultimately you're going to want to use a different format, probably
one that is non-dynamic and uses schemas.

As far as the string vs ext, you may have meant string vs bin; there was a
format change a while back that separated string and bin types and not all
MessagePack libraries have adapted to that. Many libraries (including mine)
support a compatibility mode so they will use only compatible string
representations.

------
dang
Related from 2012:
[https://news.ycombinator.com/item?id=4092969](https://news.ycombinator.com/item?id=4092969)

2011:
[https://news.ycombinator.com/item?id=2886106](https://news.ycombinator.com/item?id=2886106)

[https://news.ycombinator.com/item?id=2571729](https://news.ycombinator.com/item?id=2571729)

2010:
[https://news.ycombinator.com/item?id=1453332](https://news.ycombinator.com/item?id=1453332)

[https://news.ycombinator.com/item?id=1203444](https://news.ycombinator.com/item?id=1203444)

Others?

~~~
andymoe
Yeah, this same conversation about Message Pack happens once every two years
here but that’s ok I guess...

~~~
dang
Have there been threads in the past 8 years?

~~~
andymoe
I guess you’re right that these are the top level threads before 2012. What
I’m remembering (and found via google search) is this kind of discussion
around message pack comes up in most other threads about serialization formats
or adjacent topics.

~~~
dang
There definitely have been some comments:
[https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...](https://hn.algolia.com/?dateRange=all&page=0&prefix=false&query=%22message%20pack%22&sort=byDate&type=comment)

------
CiPHPerCoder
Under PHP...

> Msgpack is an PECL extension, thus you can simply install it by:

Having something available in PECL is a good first step, but nobody will use
it unless you either:

1\. Get it into the standard library (which requires an RFC for PHP
Internals), OR

2\. Write a pure-PHP polyfill installable from Composer, OR

3\. Do #2 then #1 (using the polyfill's popularity to argue for the importance
of the RFC acceptance to make #1 a reality).

Reason: A lot of the places PHP is deployed, you can't compile C code or
install binary dependencies (.so, .dll files). You can't access the OS package
manager, either.

But Composer is a pure-PHP package manager that still operates in these
environments.

So if anyone on HN ever wants your thing to be used by PHP developers, don't
just stop at "PHP extension, written in C, available in PECL".

~~~
snuxoll
> Reason: A lot of the places PHP is deployed, you can't compile C code or
> install binary dependencies (.so, .dll files). You can't access the OS
> package manager, either.

If you’re in the target audience for msgpack you probably aren’t relying on
shared hosting and can build a pecl extension.

~~~
somehnguy
Thats some strange logic.

I have multiple racks of company owned bare metal that I deploy to. Still
don't want to build a pecl extension. Much simpler and way less likely to not
run into random build issues if I can just install via composer.

~~~
snuxoll
> Still don't want to build a pecl extension.

Docker? Build it as a system (RPM/DEB) package? Man, my life would be
difficult if I just flat out refused to use packages for PHP/Python/Ruby
requiring native extensions because it required some minimal effort on my part
to deploy it.

------
Felk
I found myself using msgpack as a drop-in alternative for acceptable mimetypes
for HTTP responses in a flask app. Browsers would get the response data with
the pretty-printed json embedded (or even a custom template with the data
fitted in, if there was one for that specific endpoint), api clients asking
for nothing in particular would get pure json, and clients asking for msgpack
would get that.

Seemed like a free way to offer slightly better performance on the API (both
serialize/parse times and bandwidth), since it can just serialize any data
without specifying protocols or schemas, like json serializers can too. I
didn't know about any alternatives that would also require no further
configuration or infrastructure, so it seems like msgpack fills this 'free
performance boost' niche quite nicely

------
brianolson
Or you could use IETF standard CBOR (Concise Binary Object Representation)
[https://cbor.io/](https://cbor.io/) RFC 7049

------
enitihas
Is there any advantage of msgpack over json or gzipped json on one side and
soemthing like protobuf or flatbuffers on the other?

Msgpack, unlike json, is not human readable on the wire, not a purely text
based format, and I doubt it is smaller or faster than protobufs or
flatbuffers.

~~~
neuland
Protobuf requires schemas, which is good practice anyways but maybe you don't
have or want to do for some reason.

FlatBuffers doesn't have as many client libraries. There's a MessagePack
library for a _ton_ of languages.

One thing I really like about MessagePack is that the Python client (and
others too) supports reading from a stream. So you can write a bunch of
msgpack messages to a file or TCP socket and it just works.

Protobuf can't do this out of the box because it doesn't include how long the
message is. You can write a wrapper that specifies the message length, which
isn't that hard and I've done before, but it is another thing to maintain. And
other formats do that out of the box (ex. Cap'n Proto)

And as someone else mentioned, Protobuf doesn't have NULL which is useful in
some cases. (I understand that Go has strong opinions about there being a
useful default value, but that doesn't map well to a lot of languages)

One other thing with Protobuf is that the Python client is not very pythonic.
I've been keeping my eye on this project [0] which makes Protobuf messages
work just like dataclasses. They don't support OneOf types currently, which I
happen to need for some of my use cases. But they're working on it [1]

[0]
[https://github.com/eigenein/protobuf](https://github.com/eigenein/protobuf)

[1]
[https://github.com/eigenein/protobuf/issues/85](https://github.com/eigenein/protobuf/issues/85)

~~~
oh_sigh
You can use e g. in the Java api writeDelimitedTo to do streaming protobufs

~~~
haberman
It exists in C++ too:
[https://github.com/protocolbuffers/protobuf/blob/master/src/...](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/util/delimited_message_util.h)

------
mgamache
I used MessagePack in a real-time streaming app. It was smaller than Protobuf
(and about as fast in .net). In fact, if you are a .net dev and use the
popular SignalR it can use MessagePack for real-time browser binary messaging.
I highly recommend it. But, it is not a direct replacement for JSON as it's
not a human readable or accessible without libraries.

~~~
GordonS
I've also used MessagePack on .NET, but as a serialisation format for use with
a message queue - I don't recall the numbers off hand, but it's something like
an order of magnitude to serialise and deserialise than JSON, and resulted in
a _lot_ less allocations too.

It supports compression too. Even if the resulting serialised size isn't
always smaller than compressed JSON, from a performance standpoint it's a lot
better.

~~~
pkolaczk
It might be better than text formats, but it still encodes most things as
variable-size and this is not particularly machine-friendly. A format with
separate metadata and fixed length encodings for most of the things (eg
numbers) would be much more efficient to serialize / deserialize.

------
PopeDotNinja
I've worked extensively with MessagePack and JSON. I much prefer JSON because
it's human readable. It's just a pain to debug something that looks like raw
binary data, and I usually have to debug it at times I'm just not in the mood
to deal with shenanigans.

------
pan69
Reminds me of Adobe AMF (Action Message Format) that was used in their Flash
Player:

[https://www.adobe.com/content/dam/acom/en/devnet/pdf/amf-
fil...](https://www.adobe.com/content/dam/acom/en/devnet/pdf/amf-file-format-
spec.pdf)

------
Aardappel
If you want a binary JSON,
[https://google.github.io/flatbuffers/flexbuffers.html](https://google.github.io/flatbuffers/flexbuffers.html)
is worth looking at, since it carriers over the advantages of FlatBuffers (in-
place access without unpacking), but without the need of schema like
FlatBuffers.

------
Dansvidania
I thought the appeal of JSON was that it is, among other things, somewhat
human readable. The gain in bytes does not seem to justify the loss of human
readability. Am I missing something?

~~~
GordonS
It depends on your use case; having human readable serialised messages isn't
always a big deal. For example, I recently used it for a high-throughput
messaging system, where the vastly improved performance was a huge selling
point.

~~~
Dansvidania
Are you referring to the reduced payloads or does this also provide a faster
way to serialize/de-serialize?

~~~
GordonS
Much faster serialisation/deserialisation (and a lot less allocations on
.NET).

It does support compression, but messages sizes are generally comparable to
compressed JSON (YMMV, it will depend on your messages).

------
tsegratis
Probably we need a human readable text view for messagepack

That would let it fully take on the role of json

But what's the text format? -- now there is endless happy bikeshedding

Maybe a library with json (or compatible superset, to get all messagepack
features). Then the standard just serializes to messagepack, before gzip or
whatever

~~~
ludocode
I wrote a MessagePack to JSON conversion tool which is ideal for viewing
MessagePack. It supports a pseudo-JSON debug output so you can view messages
even when they use features outside of JSON, like binary blobs or arbitrary
key types:

[https://github.com/ludocode/msgpack-
tools](https://github.com/ludocode/msgpack-tools)

It works best if you use MessagePack like JSON where all map (object) keys are
strings, so you can easily understand a message without context. If you want
to optimize your MessagePack more, you would tend to use integers for map
keys, but this makes the JSON-equivalent view not super clear because you just
see a bunch of numbers in a tree structure.

~~~
elcritch
Thanks! Your `msgpack-tools` are really handy. When I use MessagePack usually
I'll add it as an option to an endpoint (`/api/request/123?type=mpack`) in
addition to json but there's still cases where having a mpack-to-json tool
comes in handy.

------
naikrovek
Ok so without taking away from this library, what's wrong with using a fully
binary format of your own? No string parsing, no nonsense, just binary data.
It seems like people are becoming afraid of plain binary, and I don't
understand why. It's so easy, plain binary is very small, and is extremely
easy to parse when compared to _anything_ with text in it, especially if you
have to support multiple encodings.

1.) Decide upon a binary format.

2.) Use it.

No 3rd party libraries required; you could write writer or reader code for a
Commodore 64, if you had to.

Keep it simple. I suspect any simple "type-length-value" type of binary file
format would be written or read at least as quickly as this, without third
party code.

~~~
com2kid
Been there, done that, switched to proto bufs because maintaining 4 different
parsers in 3 different languages was a pain.

Also proto bufs gave us those nice schemas, the code gen, and easy
forwards/backwards compat.

Was it slower? Yeah. Was it crap ton less work? Yup.

~~~
naikrovek
I haven't had the same experience I guess. Keeping binary readers and writers
maintained was a very small portion of the time I spent on the applications
which used those writers and readers.

~~~
com2kid
In our case we were adding new APIs and expanding our format rapidly for years
on end. After the 20th or so miscommunication that lead to a week+ delay
because someone got a field order wrong, or in one case because we hit a bug
in the C# compiler in regards to struct layouts (!!) we switched away from
rolling our own.

------
emergie
Unfortunately messagepack also forces IEEE754 standard for floating point
numbers. That means it is useless for things like money or large/arbitrary
precision numbers.

In JSON a number-type value has to be parsed like in javascript, so IEEE754
double.

for example:

    
    
      {"number": 0.1}
    

in reality it is parsed as:

    
    
      {"number": 0.100000000000000005551115123126}
    

Solely deserialization followed by serialization might introduce a change in
serialized value.

Because of problems with IEEE754 in all of our api we use only floats as
strings, like {"number": "0.1"}. For us enforced IEEE754/double format for
floating point numbers renders number type near useless.

~~~
ludocode
This is technically correct, except that almost all JSON libraries parse JSON
numbers into either 64-bit integer or 64-bit double, so you get those lossy
conversions anyway.

I've worked on a production app where we had to swap out the JSON parsing
libraries in the server (Rails) and all clients (Android, iOS, Rails again)
for ones that preserved numbers as BigDecimals. This was a huge pain, it made
everything slower, and even then it wasn't ideal because different BigDecimal
libraries aren't even necessarily compatible. Foundation's NSDecimalNumber has
different limits than Android's BigDecimal for example, and these are the
types used by the parsers so you can't just get the raw data to do it
yourself.

If I had to do it over again I would never rely on JSON's decimal support. I'd
rather stuff my decimals in strings and parse them myself.

------
Waterluvian
Small lesson learned with MessagePack:

We prematurely utilized it and paid the price of "nothing's human readable
without first unpacking it" without actually benefiting much given we weren't
shipping much data that often."

------
atombender
Anyone know of an efficient binary format similar to MessagePack that supports
deserializing only whitelisted keys, and not paying the penalty of parsing
data that isn't needed?

MessagePack is great, but libraries generally only support deserializing the
whole thing. I have an application where these structured documents can be
very large, and scanning code sometimes only needs a very small subset of
keys.

I believe Cap'n Proto has this feature, but unlike MessagePack it's not
schemaless.

For example, given a struct like this:

    
    
      {
        "id": "123",
        "name": "Developers",
        "members": [{
          "id": "567",
          "permissions": [
            {"type": "read"},
            {"type": "write"}
          }
        }]
      }
    

Let's say I only want the name, the ID of each member, and whether
permissions.read is set. I may want to do something like (Go):

    
    
      StreamingUnmarshal(b,
        func(keypath string, parse func() interface{}) {
          switch keypath {
            case "id", "members.id":
              value := parse()
              // ... use value ...
            case "members.permissions.type":
              if parse().(string) == "read" {
                // ...
              }
          }
        })
    

Random access could be also work, as long as it didn't need to sequentially
parse from the beginning of the data to get to the right value each time.
Something like:

    
    
      id := GetKey(b, "id")
      memberIDs := GetKey(b, "members.id")
      permissionTypes := GetKey(b, "roles.permissions.type")
    

The trickiest bit is treating arrays of nested structs (as in
"roles.permissions.read") correctly, although efficient scanning of keys
becomes an important optimization point, too.

Probably the best method would be to store the keys pre-sorted at the
beginning of the data, so that they'd better fit in the CPU cache, and have
pointers to the offsets of the values:

    
    
      KEY1,KEY2,KEY3,VALUE1,VALUE2,VALUE3
    

Arrays of structs are tricky here, again, but this is solvable.

~~~
kentonv
Have you considered using a sqlite file?

No, seriously: It doesn't require an external schema, supports every kind of
indexing and fast random access you could want, is supported in like every
language, OS, and architecture, has copious tooling, documentation, and
community support, and is battle-tested across literally billions of
installations worldwide.

~~~
atombender
No, that would not make any sense. SQLite cannot deal with structured,
hierarchical data like the stuff I described in my comment.

Also, I am talking about individual documents that already live in a database
such as PostgreSQL. I can't store an entire SQLite database in a single
column.

~~~
kentonv
> I can't store an entire SQLite database in a single column.

Sure you can, it's just a file. :) sqlite scales down nicely to data sets of
just a few kilobytes -- if you're worried about parse time of your documents
then I assume they are larger than that.

That said, if you're already loading the whole blob from a single row in
Postgres anyway, then is random access such a big win? Or is the idea that you
would selectively read byte ranges out of Postgres? If you're already pulling
the bytes into RAM then avoiding the parse isn't that huge of a win.

(I say this as the author of Cap'n Proto which is all about zero-copy random
access... it's only a big win in certain use cases, like mmap() or shared
memory IPC.)

~~~
atombender
Well, many documents are several megabytes, but many are in the order of 100
bytes. I need a serialization scheme that scales up to large documents and
down to tiny ones.

I can't imagine that the overhead of initializing an SQLite database from a
small byte array in memory is that small, not to mention the overhead of
maintaining the table schema.

Out of pure curiosity, I glanced at the Go bindings for SQLite, and there's no
provision for initializing a database from a byte array, or accessing the raw
underlying byte data of a live database. The C API supports implementing your
own VFS for custom storage, but that's not supported by the Go bindings, and
seems like a lot of work.

You're right about loading whole blobs; I was misremembering a little bit. The
application in question already pares down the document keys in its queries to
avoid sending everything. I'm in the middle of a research project into an
alternative backend where the documents are stored as binary data, not JSON,
and given a set of keys/keypaths, I want to do a little better than
deserializing the whole blob.

------
IshKebab
I was looking into binary JSON formats recently (there are a ton), and there
are some important differences to note. Especially for my application - you
can't do what Amazon calls a "sparse read". Say I have a 10 GB JSON file and I
only want to read one key, no dice. You have to parse the entire file.

"But, maybe some of these binary JSON formats are smarter!" you think. Well,
almost all of them aren't. Only two are: BSON, and Amazon Ion. Unfortunately
BSON limits the message size to 2 GB.

Amazon Ion is also the only format that actually deduplicates object keys.
Definitely the most capable and well-designed of these formats. Unfortunately
it is also the most complicated.

Sadly a lot of these formats make questionable choices, like storing numbers
in big endian format (why?), using explicit `uint8`/`uint16`/`uint32` sizes
rather than something like Protobuf's varint, etc. And there are also a load
of them that are nearly identical. You really have to dig deep to find the
critical flaws.

~~~
ludocode
Binn also does binary length prefixing of all values, allowing you to skip
through it. It's also a lot simpler than Amazon Ion and doesn't have the awful
mistakes of BSON so it might do the trick for you. It's not at all popular
though.

[https://github.com/liteserver/binn](https://github.com/liteserver/binn)

The reason most formats don't length-prefix everything is because it makes it
costly to encode in both time and space. You have to basically encode a
message inside-out to calculate the nested sizes of everything. This is going
to be hugely slow and memory-intensive if you're encoding a 10 GB file, and
it's useless for messages on the scale of kilobytes so there isn't any point.
MessagePack on the other hand can be encoded in one pass from start to finish
as long as you know the element counts of your maps and arrays beforehand.

> storing numbers in big endian format (why?)

Embedded processors tended to be big-endian, like older PowerPC and older ARM.
These formats are designed for embedded so it (probably?) improved performance
on those processors. This is less true now since virtually all modern ARM
processors and probably most other embedded processors now run in little-
endian mode.

Ultimately what it comes down to is that these formats are designed for the
opposite of your use case. I don't know what you're using a 10 GB JSON file
for but there must be a better storage solution for you than a schemaless
serialization format.

~~~
IshKebab
> The reason most formats don't length-prefix everything is because it makes
> it costly to encode in both time and space.

Yeah this is true, except for BSON because it uses fixed-size length prefixes,
so you can just go back and fill them in later. Presumably that's why they
used fixed-size lengths. The downsides are it is less space efficient and
limited to 2GB.

In any case Amazon make the very good point that formats are read more often
than they are written. It makes sense to optimise for the read case.

> Embedded processors tended to be big-endian, like older PowerPC and older
> ARM.

Nobody uses PowerPC anymore, and ARM hasn't been big endian for ages. Also
MessagePack isn't designed for embedded systems and it still uses big endian.
I don't think that's the reason. I suspect it's from a misguided belief that
"network byte order" still matters.

And I totally agree, a schema-based format makes way more sense for my use
case - changing is difficult though.

------
banana_giraffe
I've skimmed over it and didn't see if there's a compelling reason to use this
over CBOR, or vice versa.

Anyone have any insight?

~~~
brianolson
CBOR went to the trouble of being an IETF standard, RFC 7049. So, you know,
the lovely thing about using standards!

~~~
ludocode
CBOR also made a lot of changes to MessagePack making it far more complicated,
both to use and to implement. I've talked about this on HN before so I'm
repeating myself a bit but here's a short list:

\- CBOR has two ways of encoding maps and arrays: fixed length and variable-
length. This complicates decoders, especially those that would pre-allocate
arrays and maps to the proper sizes, which significantly reduces decoding
performance. The CBOR spec has nothing useful to say about this; it just
requires you to allocate indefinitely.

\- CBOR defines a canonical representation, including a key sorting order
based on binary representation which is just awful. It requires multi-pass
encoding which is slow, complex, error prone, and completely non-intuitive:
[1,2,3] comes before 100000 which comes before [1,2,3,4].

\- CBOR has more types in the core spec, ones that are extremely specific to
certain applications or programming languages. It has a 16-bit float, and it
has both null and undefined as separate types.

\- CBOR defined a system of "tags" with a huge number of extension types.
These are supposed to be optional, but of course they only work if both ends
support them. Some features like BigNum are well-supported in some programming
languages but not others, so CBOR implementations tend to diverge in supported
message types.

CBOR as a standard is far worse than the "non-standard" MessagePack it
purports to replace. Here's a great HN comment on it from another user (and
another MessagePack library implementer) a few years back:
[https://news.ycombinator.com/item?id=14072598](https://news.ycombinator.com/item?id=14072598)

~~~
SlowRobotAhead
All those gripes are optional features.

Your parser or encoder does not need to support indefinite arrays, that
feature is clearly designed to be used with some practical limitations like “I
don’t know how many but let’s assume less than x, and I’ll send a STOP when
I’m done”. Canonical ordering is optional. Yes, a typed system that has more
types, IDK what to say about that other than you don’t have to use them. And
yes, tags need to be supported on both ends, just like ANY DATA that is being
transferred, compare to a strict schema’ed system and I no difference except
that you’re only partially required to adhere to the plan.

Maybe msgpack is just objectively better because it has less features. IDK.
Doesn’t matter because CBOR got an RFC and is actually popping up in places.
If there was a competition, cbor won, right or wrong.

------
rglover
I tried to work with MessagePack last year while teaching someone who was
building it into their product. Absolute nightmare.

------
rudolph9
JSON is as pervasive these days as XML and despite it's shortcomings is not
going away for the foreseeable future.

------
hesdeadjim
These formats are great for private communication, but become a nice attack
surface area if publicly exposed. I was using thrift on an old project and had
to add a sanity check layer to make sure an attacker couldn’t just specify a
valid request with a list set to have int.max elements.

~~~
dathinab
But any parser needs valid checks of this kind, including the JSON one.

A common way to mess with JSON parsers is e.g. to nest a lot arrays and
objects. So you needs max nesting depth. There are a bunch of other ways to
mess with JSON parsers, too.

(In case of MsgPack lists and maps: A parser should only per-allocate memory
for given size if it has done proper sanity checks, e.g. compares the given
length with the remaining byte length of the message. Also you can simply not
preallocate and instead, like in JSON parsers, grow your list on demand and
just use the length to know when the list ends).

But yes you have to make sure the parser works for your use case.

------
StavrosK
This is great, but, to complement this, does anyone know a better alternative
to TCP? I've recently needed to make two processes communicate over the
network and, while MessagePack handles the serialization, I found myself
needing something higher level than TCP but lower than HTTP.

An annoyance of TCP was that I never knew whether I read all the data. I
either read less and leave data unread, or I read more and end up blocking for
a long time (or implement timeouts and get the worst of both worlds).

What's a good alternative? Maybe 0mq? All I need is to send some MsgPack bytes
to another client over the wire, hopefully without having to guess whether
there's more to read in the socket or not.

~~~
kragen
ØMQ is probably a good fit (I think nanomsg is dead?) but there are also a
number of protocols at the TCP level that provide a message-sequence interface
rather than a byte-sequence interface: SCTP is perhaps the best-known, and
Plan9 IL is another (deprecated) solution.

For some applications, though, the simplest solution is to shut down one half
of the TCP connection once you've finished sending your data. That's how rsh,
finger, and HTTP/0.9 responses work, and it's a supported option in HTTP/1.0
and HTTP/1.1. Failing that, preceding each message with a byte count, a la
netstrings, is fairly simple; or you can use SLIP-like or COBS framing.

~~~
StavrosK
Hmm, interesting, and 0mq is a bit heavy. Unfortunately I can't shut down the
connection, as the server needs to provide real-time updates to the clients
(its pub/sub), but I'll look into STCP, thanks.

~~~
kragen
There are a couple of ghetto ways to do pubsub. Webhooks is one, and it's
often by far the easiest to implement, but in other cases it's impossible.
"Long polling" is another: you open a connection and tell the server what you
think the current state of a variable is, and the server just sits there with
the connection open until that isn't the current state of the variable any
more, at which point it sends you the new current state, or the delta from the
state you had to the current state, and closes the connection. If you were
wrong about the current state, this happens immediately. Again, though, there
are pubsub cases where this works, and pubsub cases where the extra latency
and kernel CPU of opening a new TCP connection for every message are
intolerable.

So, to take the canonical concrete example, a chat channel might number the
messages on it in a monotonically increasing order, and you might tell the
server the channel name and the number of the last message you saw, at which
point it sends you the messages since that point, if any, then closes the
connection. As I understand it, this is how Kafka works, except for the
connection-closing part.

In all probability, your life will be easier and your performance will be
better with ØMQ, but these hacks are things that work reasonably well and are
extremely easy to implement with off-the-shelf tech.

SCTP in many cases suffers from the fact that it doesn't run on top of TCP, so
NATs don't know what to do with it. If you have enough control over your
network that that isn't a concern for you, UDP with IP multicast is another
plausible solution, the one TIBCO used originally IIRC; you can allocate a
multicast IP address per pubsub channel or multiplex them. With IP multicast,
recovery from lost messages is a concern, especially if 802.11 is part of your
network (since 802.11 uses hop-by-hop ACKs for unicast packets) but there are
a variety of reliable multicast protocols like SRM to handle that.

Feel free to hit me up for more info, I've been hacking around with different
ways of doing pubsub since the previous millennium.

~~~
StavrosK
That's very informative, thank you for taking the time. Just so you have more
context, this is what I'm using this in:

[https://gitlab.com/stavros/itsalive](https://gitlab.com/stavros/itsalive)

Clients can connect to the server and get updates for the commands that are
currently running, which is not high throughput or complex from a networking
perspective. I was wondering if there was something lightweight that will do
the same, and 0mq seems like the best choice, but a simple loop over the
connections seems to work well as well.

I played around with 0mq for this and it works great, but in this instance I
might not want to add the extra dependency (especially since I've already
implemented it, minus a bug where it'll block if a packet is exactly 4k).

I think adding an "end of message" character (eg a newline) would be the
simplest thing to do in this instance.

~~~
kragen
Yeah, that sounds like the best choice. (That's the SLIP framing approach.)
SCTP probably isn't viable if you want random people to be able to watch the
presentation without rebuilding their kernels, and multicast IP isn't viable
on the global internet. An IRC server would work fine, and you might even be
able to just use a secret channel on Freenode, but some places block IRC
because of other pubsub software that uses it.

~~~
StavrosK
Yeah, I wouldn't want to burden Freenode with that, but IRC is an interesting
choice. I'll use the terminating character, thanks for your time!

~~~
kragen
Right, the benefit of using IRC is that you don't have to write the server;
there are dozens of well-known, actively-maintained free-software servers,
they're well-documented, and they already support epoll and kqueue and have
reasonable ways of handling all kinds of pathological network conditions. But
maybe a simple asyncio-based event loop, or even threads, would be fine for
itsalive.

------
Cymen
The "Try!" demo fails with large JSON submissions with an exception in jQuery:

Uncaught RangeError: Maximum call stack size exceeded

I'm curious to see the savings difference and hoped to with "Try!" but it'll
have to wait.

------
gbrits
Slightly related: recently wrote a binary encoder/decoder that used
bitpacking, delta-encoding, and other well-known 'tricks' to efficiently pack
batches of 10k-100k events of the same uniform type (essentially encoding
column by column and using similarities to my advantage). Nothing too fancy,
but it was A) a huge success in terms of compresison-ratio and B) a hassle to
write. Do any, more or less, turn-key solutions exist for this? Specifically
targeting Node but a commandline util might work as well.

~~~
jiggawatts
I had a play with something similar and I noticed that -- at least for my use
case -- you can get ~90% of the gains through just a couple of simple tricks:

1) Identify medium-scale similarity boundaries in the data structures. E.g.: a
sequence of messages in a protocol, such as a C "struct" with a bunch of
fields.

2) Compute the binary difference between these structures so that _most_ of
the subsequent bytes after the first message are either zeroes or small
numbers. Both the sender and receiver have to keep the previous message in a
buffer to allow this.

3) Use a high-performance compression algorithm that supports "user provided
dictionaries", such as Zstandard. Train it with sample data.

This above is surprisingly straightforward because it doesn't require complex
changes to the underlying data structures. You don't even necessarily need to
be able to parse it at all, as long as it has large-scale repeating structures
that you can identify.

~~~
gbrits
Agreed. It really blows generic compression algos out of the water. Didn't
know about Zstandard dictionary encoding. This might just be what I'm after.
Thanks

------
dilipray
[https://medium.com/unbabel/the-need-for-speed-
experimenting-...](https://medium.com/unbabel/the-need-for-speed-
experimenting-with-message-serialization-93d7562b16e4)

> It’s that is not enough to just know some new cool technology, nod along and
> go about your day with your assumptions unchallenged. You need to find out
> more, test it out, have a grasp before committing to it, and, if you’re
> lucky, learn a thing or two in the process.

------
squaresmile
This reminds me of a quite popular mobile game that sends master game data
using messagepack which is then gzipped and then encrypted and then base64
encoded in a json response. Fun.

~~~
dwild
That remind me of one of my first contracts, 10 years ago, for an Android
mobile app. It was an app made to quickly record something using video, audio
or picture, that would then be uploaded through their API. The API worked
essentially like you said, it was converting the binary file to base64, adding
it to a JSON, which was in a GET variable (thus urlencoded) and sending that
over an HTTP connection (I don't remember if it was HTTPS or not though, I
hope it was, but at the time I could have ignored that part). Android didn't
allowed more than 16 MB of memory for an app, thus I had to build streams to
support each things individually, that was an interesting challenge. I was
amazed to find out that there was officially supported Base64 streams, but
their API strangely didn't accept the default Base64, it was only accepting
the one for URL (which replace a few characters with others one), so I had to
add another stream on top to do this.

------
PaulHoule
You can't beat JSON by very much with a format that has the same free-form
structure. (Particularly when you use gzip, zstd, ...)

Parsing code has to be branchy to handle many different possible structures.

If you want extreme performance, variable-length strings are a problem -- the
old mainframes that had fixed-length "HOLLERITH" strings had a good idea. I
like just about everything about Apache Arrow except that it ignores the
problem of fast/portable string handling.

~~~
GordonS
Well, you can't always beat it much from a _size_ perspective, but MessagePack
can be an order of magnitude faster to serialise/deserialise than JSON.

~~~
PaulHoule
It is somewhat faster, but other formats are even faster than that.

~~~
GordonS
In all the cases I've used it, it was a _lot_ faster than JSON.

Other formats might have been faster, but MessagePack was very easy to use.

------
nojvek
Why would one use msgpack over flat buffers? AFAIK offers an amazing zero copy
feature meaning you can get to the data of interest without having to parse
the whole object in memory. Since every type is length prefixed you’re can
jump around indexes quickly to access what’s needed and only read that.

It seems msgpack still needs a deserialize step to partially read data right ?

Netflix uses flatbuffers and works wonders in low powered devices.

------
otoburb
Obligatory Cap'n Proto[1] reference whenever serialization format discussions
crop up.

[1] [https://capnproto.org/index.html](https://capnproto.org/index.html)

~~~
mikepurvis
Capnproto is schema based rather than self-describing. So it's related, and it
does compete with formats like json and msgpack, but there isn't total overlap
in the applications.

~~~
twic
I would be nice to have a taxonomy, or at least a bestiary, of serialisation
formats. Tentatively:

Schema-driven no-compromise fast compact binary formats with no cross-version
compatibility: Cap'n Proto, FlatBuffers, SBE, ASN.1 PER, XDR, OMG CDR.

Schema-driven binary formats which allow some cross-version compatibility:
Protocol Buffers, Thrift.

Self-describing binary formats: MessagePack, CBOR, BJSON, Bencode, ASN.1 BER,
Avro (?), Fast Infoset, AMF3.

Self-describing textual formats: JSON, XML, YAML, TOML.

I'm using "self-describing" here to mean simply that you can recover the
structure of the encoded data without a separate schema, rather than that you
can attach any semantic meaning to it.

~~~
kentonv
> no cross-version compatibility: Cap'n Proto, FlatBuffers

This is incorrect: Cap'n Proto absolutely allows cross-version compatibility,
using roughly the same semantics as Protobuf. I believe FlatBuffers does too.
(I'm unsure about the rest, haven't studied them in a while.)

> I'm using "self-describing" here to mean simply that you can recover the
> structure of the encoded data without a separate schema, rather than that
> you can attach any semantic meaning to it.

Protobuf, Cap'n Proto, and probably several of the other binary formats can
parse data into a message tree without the help of a schema, but all the
fields will be labeled numerically. MessagePack is only considered "self-
describing" in comparison because in encodes human-readable field names on the
wire.

------
DmitryOlshansky
Needs honorable mentions to:

\- Avro

\- CBOR

\- SMILE

And BSON anyone? I think not much people besides MongoDB using it though.

Yes, compressing JSON with gzip-style compressor usually yields 0.5-1% better
results then equally compressed binary format (in my limited testing). Still
the serialization speed and savings on compression are great to have.

------
wittedhaddock
I implemented this in Swift w/o using the Foundation library a few years back
if anyone wants it:
[https://github.com/wittedhaddock/bytepress](https://github.com/wittedhaddock/bytepress)

------
tzs
OK, I give up...how is that two column list of languages organized? Why is the
Perl implementation [1] not listed?

[1]
[https://metacpan.org/pod/Data::MessagePack](https://metacpan.org/pod/Data::MessagePack)

------
rsynnott
So not at all like JSON, then?

------
kortilla
But is is readable as plaintext? That’s one of the main appeals of json.

~~~
mrlala
Sure doesn't look like it..

I agree- that's why json is honestly great. I avoided it for the longest time,
but now I totally see the appeal.

~~~
jaywalk
What did you avoid it in favor of? XML?

------
ChuckMcM
I like it! Sort of the unholy love child of XDR and JSON :-).

Now we need an IDL that will let you define a structure and have it produce
<language> marshalling and unmarshalling routines.

------
sonicxxg
Missing links to benchmarks. Also, Json is not really a good baseline for
comparison. If you support binary data natively in the format, you should
compare it to Bson.

~~~
ants_a
Bson is just terrible. Just as a taste, arrays are encoded as key-value, where
key is the array index converted to a decimal string and then stored as zero
terminated string with 4 byte length prefixed to it. It boggles the mind why
somebody would design a format like this.

------
gravypod
Is there any comparison of the tooling/performance/etc of MessagePack, BSON,
Protobuffers, Flatbuffers, and plain JSON?

~~~
ludocode
I did some pretty extensive benchmarking of various schemaless serialization
libraries a few years back. All of these libraries have advanced quite a bit
since then so it's a bit out of date, but the relative speeds of MessagePack
vs JSON and BSON are probably still relevant:

[https://github.com/ludocode/schemaless-
benchmarks](https://github.com/ludocode/schemaless-benchmarks)

I haven't compared them to schema formats like Protobuf or FlatBuffers yet
because the use cases are pretty different. I like MessagePack for small
projects or rapid prototyping because you don't need to integrate any big
libraries or set up code generation as part of your buildsystem. (Mostly I got
sick of integrating the C++ Protobuf library into embedded projects.)

The MessagePack format is a lot simpler than Protobuf and the best
implementations are nowhere near as allocation-prone as the reference
implementation so I expect they would beat it flat out on performance, though
the messages may be slightly larger. They would probably beat FlatBuffers for
encoding speed as well, but I don't expect any schemaless format could beat
FlatBuffers for decoding speed.

------
etaioinshrdlu
It also supports binary data in strings, unlike JSON. This means no more slow
base64 encoding/decoding.

~~~
dguaraglia
Not sure why you are being down voted, sending binary data is a well known
issue with JSON, and base64 can be pretty costly if using a naive
implementation.

In a previous life I inherited a service that shipped tons of data (billions
of requests a week) as base64 encoded protobuf strings over HTTP. It was a bad
solution in so many ways, but there were historical reasons why it had gotten
there. The system required a number of servers and I decided to do some
profiling to see if there were some quick gains that could be made. As it
turns out, about 60% of CPU time was spent decoding base64 using Python's
standard library. I was shocked.

------
erling
Apologies if I missed it, but what are the advantages of this over bEncode?

------
nabla9
Size and speed comparison between JSON.zst and MessagePack please.

~~~
kentonv
Why only compress the JSON? MessagePack will compress about as well as JSON
does.

~~~
nabla9
The sales pitch for MessagePack is: like JSON, but fast and small.

~~~
kentonv
Correct: Uncompressed MessagePack is faster and smaller than uncompressed
JSON, and compressed MessagePack is faster and smaller than compressed JSON.

I still don't see why you'd compare uncompressed MesasgePack to compressed
JSON.

~~~
nabla9
Because I suspect that the MesasgePack–compressed or not–is not the worth of
the effort until I see comparison.

In other words, compressed MesasgePack is probably only tiny amount smaller
than compressed JSON.

~~~
kentonv
Sure, a comparison between compressed JSON vs. compressed MessagePack is
interesting.

I interpreted your original message as requesting a comparison between
compressed JSON vs. uncompressed MessagePack, which didn't make sense (but
which I see people ask a lot, including elsewhere in this thread). Sorry if I
misunderstood.

~~~
nabla9
>I interpreted your original message as requesting a comparison between
compressed JSON vs. uncompressed MessagePack

You interpreted it correctly. And it makes sense. If MessagePack is
alternative to JSON so in JSON.zstd if I need compactness.

------
scandum
Bit of a shameless plug, but yet another alternative is VTON, though it is
typeless.

[https://github.com/scandum/vton](https://github.com/scandum/vton)

------
kstenerud
I'll throw my hat into the ring as well.

I've been building a new ad-hoc data format to replace JSON for a couple of
years [1], and am nearing completion of the reference implementation in go. It
natively supports the following types:

* Nil : No data (NULL)

* Boolean : True or false

* Integer : Positive or negative, arbitrary size

* Float : Binary or decimal floating point, arbitrary size

* Time : Date, time, or timestamp, arbitrary size

* URI : RFC-3986 URI

* String : UTF-8 string, arbitrary length

* Bytes : Array of octets, arbitrary length

* List : List of objects

* Map : Mapping keyable objects to other objects

* Markup : Presentation data, similar to XML

* Reference : Points to previously defined objects or other documents

* Metadata : Data about data

* Comment : Arbitrary comments about anything, nesting supported

But the most important feature is that it is a paired format: a binary format
[2] and a text format [3], which are 1:1 compatible. This allows you to
transmit in the binary format, and only convert to text when a human is
involved.

I've put together a quick comparison here:
[https://github.com/kstenerud/concise-encoding#comparison-
to-...](https://github.com/kstenerud/concise-encoding#comparison-to-other-
formats)

Currently, I'm finishing off the go implementation [4], which so far I've
managed to get running 30% faster than the json codec, using less than half
the memory. I'll be pushing the binary codec to master soon, and the text
codec shouldn't take much longer since the code is pretty modular.

[1] [https://github.com/kstenerud/concise-encoding#concise-
encodi...](https://github.com/kstenerud/concise-encoding#concise-encoding)

[2] [https://github.com/kstenerud/concise-
encoding/blob/master/cb...](https://github.com/kstenerud/concise-
encoding/blob/master/cbe-specification.md#concise-binary-encoding)

[3] [https://github.com/kstenerud/concise-
encoding/blob/master/ct...](https://github.com/kstenerud/concise-
encoding/blob/master/cte-specification.md#concise-text-encoding)

[4] [https://github.com/kstenerud/go-cbe/tree/new-
implementation](https://github.com/kstenerud/go-cbe/tree/new-implementation)

~~~
brokensegue
Why add special support for URIs and not any other types that can be easily
represented as strings e.g. UUID or ISBN

~~~
kstenerud
It's mostly a question of how common they are. Both uuid and ISBN can be
represented as URI. However, I'm still on the fence regarding uuids, because
they tend to get a lot of use in general... I may add it after all.

The idea is to make a format for the 80% case, so things like ISBN are
definitely out.

~~~
brokensegue
Explicitly supporting Uri has a performance cost with no clear size win.
Supporting UUID at least could win back some bytes.

------
dilann
Oh, it's turd polishing anyway. Numbers in JSON are a pain.

~~~
pjscott
The data types that msgpack supports aren't _quite_ the same as JSON --
msgpack has encodings for signed and unsigned 8, 16, 32, and 64-bit integers,
as well as single- and double-precision floats. (This runs into trouble if
you're dealing with JavaScript, which doesn't support the full 64-bit range of
integers, but everybody else should be okay.)

