
RFC 7049 - Concise Binary Object Representation (CBOR) - yorhel
http://tools.ietf.org/html/rfc7049
======
jameshart
RFCs don't amount to much without adoption. The RFC database is full of
protocols with grand designs and seemingly broad applicability. Look at the
"Extensible Provisioning Protocol", EPP -
[http://tools.ietf.org/html/rfc5730](http://tools.ietf.org/html/rfc5730) \- a
protocol "for the provisioning and management of objects stored in a shared
central repository." \- it reads as a marvelously generic protocol for client-
managed key-value data storage - maybe it's suited for caching systems, or
cloud BLOB storage, or as an abstraction of dropbox... but in reality it's
just the protocol used by internet domain registrars to manage domain name
registrations on a registry server - the nichiest of niche applications, yet
the subject of a dozen RFCs. It's not going to be picked up and supported by
Hadoop or Dropbox or anybody else who needs client managed obect storage,
they're going to stick to HTTP REST.

This CBOR format is being proposed by the VPN Consortium - presumably there's
some specific VPN interoperability application they have in mind for this. In
the meantime, everybody else will continue to use compressed JSON, or protocol
buffers, or whatever other standards have good library support and
interoperability and - crucially - _adoption_ in their domain.

~~~
na85
I agree with all the points you've made, and I haven't read this RFC beyond a
quick skim, but consider:

-a lot of the time, a dearth of implementations of a new Thing is not because the new Thing is bad, but simply because people are change-averse and lazy, even in the face of an objectively better Thing, and

-I still consider this a quality submission; even if CBOR doesn't get adopted it's still neat to read. It's like watching one's government draft new legislation, except more relevant.

------
ghoul2
There is one significant problem I see:

the length field for compound types (arrays and maps) specify the length in
"the number of items", not in bytes. This means while processing, If I need to
skip a compound type, I actually need to process it in its entirety. Not very
"small device" friendly.

In practice, I have found far more utility in knowing the byte-length of a
compound field in advance than the number of items it contains. If I am
interested in the field, I am anyway going to find out the number of items
cause I am going to process it. If I am not interested in the field, the
number of items are useless to me, but the byte-length would have come in
handy.

~~~
mwcremer
I think the thinking here is that the sender may not be able to compute the
byte size of the object a priori. Think HTTP chunked encoding.

~~~
ghoul2
I understand that is a concern in many situations. The problem here though is
that you don't get the "streaming" benefits in any case: you still have to
include the length-in-number-of-items of the compound type and the lengths of
each individual member item in any case.

------
ctz
Avoiding the need for protocol version negotiation might be a useful feature
in some systems, but it seems to me that the things you lose makes it really
not worth it. Particularly, a protocol without atoms invariably ends up like
most JSON APIs -- very 'stringly typed', somewhat poorly defined, and verbose
on the wire.

Which is strange for a thing calling itself 'concise'.

~~~
craftit
It does seem an odd trade off. Having key value pairs is great for prototyping
and the keys make it easier for people to interpret the messages and to write
code to use them. On the other hand repeatedly sending readable key values
seems a huge waste. I guess when streaming you could send a header with a map
in it, but it then makes things complicated....

~~~
eli
Could something like GZIP mask a lot of that repetition?

~~~
craftit
Definitely, but at the cost of requiring more memory and processing power. If
its goal is to be compact and lightweight I think it would be helpful to have
something in protocol itself.

Looking at the spec though, it does allow numeric keys in maps so you could
use id's and provide the definitions elsewhere.

------
huhtenberg
Serialization formats are like indentation styles. Dead easy to pick or invent
one, nearly impossible to convince others to switch to it.

~~~
albertzeyer
Yes! For that reason, I also have my own:
[https://github.com/albertz/binstruct](https://github.com/albertz/binstruct)

------
stevecooperorg
Anyone care to comment on where we might use such a thing? Is it already in
use? And does it compare favourably with BSON?

~~~
lucian1900
It looks closer to msgpack if anything, but with actual strings and bytes.

~~~
memracom
It is inspired by msgpack. If you read the RFC then you will see that the
authors like msgpack but have some different requirements.

------
kabdib
This gets a surprising number of things right. I've worked on a couple of
these. In particular I'm delighted to see both the definite and indefinite
streams of things.

I'm a little bit tired (well, more than a little tired) of standards that
aren't couched in terms that are directly executable. English descriptions and
psuedo-code are fine, but in the end I want to have some working code that
implements an API for the stuff. Doesn't have to be an official API, but
something usable shows me that (a) it is indeed usable, and (b) will go a long
way towards heading off other people's mistakes.

We don't do crypto without test vectors. I don't know why we think we can do
other complex standards without test vectors, either. (I worked on NBS / NIST
in the 70s on some verification suites. Have we lost that practice?)

I think that much of what is busted on the modern web can be traced back to
loose english and lack of reference code (even stuff with placeholders). CSS,
HTML, etc., I'm looking at you... :-/

~~~
tveita
> In particular I'm delighted to see both the definite and indefinite streams
> of things.

Why? I can see the advantages of either one, but I don't see what having both
gets you.

In my experience the implementation advantages of having length-prefixed lists
disappear if you have to support indefinite lengths anyway.

~~~
kabdib
I want to use the same data structures for

\- Passing small messages around

\- Doing streaming of large content (occasionally)

I'm probably doing these over different pipes, but the data shares a lot of
the same characteristics and I don't want to use two totally different APIs to
get the job done.

"Large" can be "I need to transfer something on the order of megabytes using a
4K intermediate buffer."

------
roncohen
They should do the world a favor and include a datetime type.

~~~
Someone
Whats terribly wrong with
[http://tools.ietf.org/html/rfc7049#section-2.4.1](http://tools.ietf.org/html/rfc7049#section-2.4.1)?

~~~
drdaeman
The minor issues are missing timezone and precision information.

But, most importantly, use of integers for datetime values hides type-level
semantics. It's just integers and you, the end user, and not the deserializer,
is responsible for handling the types.

I think it's quite inconvenient to do tons of `data["since"] =
parse_datetime(data["since"])` all the time, for every model out there.

~~~
Someone
But they also allow strings with time zone information:

 _" Tag value 0 is for date/time strings that follow the standard format
described in [RFC3339], as refined by Section 3.3 of [RFC4287]."_

RFC4287: _" A Date construct is an element whose content MUST conform to the
"date-time" production in [RFC3339]. In addition, an uppercase "T" character
MUST be used to separate date and time, and an uppercase "Z" character MUST be
present in the absence of a numeric time zone offset."_

------
craftit
Looks a good spec, great as a way of sending data to 'Internet Of Things'
style devices where processing power and possibly bandwidth are issues.

------
MichaelGG
Can anyone enlighten me on why number equivalency is a good idea? The spec
says that even if you're expecting an integer like 0, encoders can decide to
use floating point, and things should just work. One of the first statements
is that "7" should be able to be represented in multiple ways. That doesn't
seem concise.

~~~
arh68
Hmm, that's not the impression I got. I don't think they're arguing you
_should_ use multiple encodings willy nilly. Rather, they're avoiding the
limit of exactly 1 encoding for every input (maximum flexibility in the spec).

 _Of course, in real-world implementations, the encoder and the decoder will
have a shared view of what should be in a CBOR data item. For example, an
agreed-to format might be "the item is an array whose first value is a UTF-8
string, second value is an integer, and subsequent values are zero or more
floating-point numbers" or "the item is a map that has byte strings for keys
and contains at least one pair whose key is 0xab01"._

7 is 7 whether it's uint_8 or uint_32, right?

~~~
arh68
Also, there's actually a much more relevant section later on in the spec (just
got to p18):

    
    
       For constrained
       applications, where there is a choice between representing a specific
       number as an integer and as a decimal fraction or bigfloat (such as
       when the exponent is small and non-negative), there is a quality-of-
       implementation expectation that the integer representation is used
       directly.

------
thomseddon
I would really love to see a convergence of such binary formats, I hate that
choosing between Google's Protocol Buffers, Apache (Facebook) Thrift etc.
forces you down a very specific path of non-interoperable libraries.

I would like to see how this compares to other formats with respect to
serialised size...

------
slavio
Any JSON object encoding format would greatly benefit from compression. It
does not have to be complicated: even something as simple as using a
dictionary array of "symbols" whose indices can be used instead of repeating
string values.

------
LeafStorm
This looks like a fairly well-designed format. My main concern is that this
seems to have suddenly appeared out of nowhere and gone directly to RFC.
(Presumably there was an Internet-Draft, but I have never seen anything about
this before.)

~~~
ape4
These kind of binary formats always have vulnerabilities. eg
[http://technet.microsoft.com/en-
us/security/bulletin/ms04-00...](http://technet.microsoft.com/en-
us/security/bulletin/ms04-007)

~~~
AsymetricCom
It would be up to the parser to implement the standard without a
vulnerability, but a protocol is a language and a language can be designed to
be self-referential, hypocritical, inconstant etc, making a conforming parser
impossible. A lot of these so called "living standards" are probably not
"evolving" so much as partially classified.

------
430gj9j
In the examples, 0x3bffffffffffffffff decodes to -18446744073709551616, which
doesn't fit into int64_t. Why didn't they switch to bignums after INT64_MIN
(-9223372036854775808) instead? Seems a bit asymmetric.

------
YesThatTom2
I was really starting to like capnproto.com

------
mbq
At least it doesn't copy the JSON's braindead idea to rule out NaNs and
Infs...

~~~
angersock
Wait, what? I hadn't heard about that. Whatfuck?

~~~
mbq
See: [http://stackoverflow.com/questions/1423081/json-left-out-
inf...](http://stackoverflow.com/questions/1423081/json-left-out-infinity-and-
nan-json-status-in-ecmascript)

In short, this is because JS does not treat NaN and Infinity as numerical
constants but as pre-defined, mutable variables; this way backward-compatible
parsing of hypothetical sane JSON with eval would be vulnerable to injection.
Nevertheless, many JSON codecs have their own idea what to do with it, so this
stuff can get really nasty.

------
otikik
I like it. Fighting the urge to write a parser for it in my language of
choice.

~~~
michaelmior
I couldn't fight it off
[https://github.com/michaelmior/pycobr](https://github.com/michaelmior/pycobr)

Just got the encoder so far (without major type 6, i.e. tagging) and the code
is pretty messy and possibly not 100% correct, but it's true that the amount
of code required is pretty minimal.

~~~
michaelmior
Update: Fixed a bunch of bugs in the encoder and have a working decoder as
well. Still no tagging, but you can encode/decode pretty much anything you
could with a naive JSON implementation.

