
CBOR – Concise Binary Object Representation - tosh
https://cbor.io/
======
camgunz
CBOR is MessagePack. At least cbor-ruby started with the MessagePack sources.
The story is that Carsten took MessagePack, wrote a standard and added some
things he wanted, and called it something else.

I wrote [1] a pretty comprehensive (and admittedly biased) critique of the
CBOR standard years ago.

[1]
[https://news.ycombinator.com/item?id=14072598](https://news.ycombinator.com/item?id=14072598)

Disclaimer: I wrote and maintain a MessagePack implementation.

~~~
PopeDotNinja
In your opinion, what's the value of MessagePack? I worked with it when
writing some Fluentd tools, and it was neat, but I didn't love it enough to
switch from JSON on other projects. Maybe there's a killer feature I didn't
know I need?

~~~
Mordak
At my work we recently went through a large exercise to decide on a common
data storage format. The contenders were JSON, MessagePack, and Avro.
MessagePack won because:

\- Msgpack serialization and deserialization is very fast in many languages -
often 100x faster then JSON

\- Msgpack natively supports encoding binary data

\- Msgpack has type extensions, making it trivial to represent common types in
an efficient way (eg. IPv4 address, timestamps)

\- Msgpack has good libraries available in many languages

If you do not care about those things (no binary data, no need for extended
types, not performance critical) then JSON is just fine.

~~~
pdimitar
I'm curious why didn't you consider FlatBuffers as well.

~~~
Mordak
FlatBuffers are not self-describing.

FlatBuffers, Protobuf, Cap'n Proto, etc., all require an external schema
configuration that you compile into a code chunk that you include into your
program. Without this it is impossible to make sense of the data. In our case,
the data is semi-structured and changes frequently. The prospect of
maintaining a schema registry for all the data users and keeping everyone up
to date and backwards compatible is enough of a burden that it was excluded.

Avro also uses schemas, but since the schema is embedded in the file it is
self-describing so the reader does not need to do anything special to
interpret the data. But Avro's C library is buggy and the python
deserialization performance was terrible, so Avro was not selected.

------
userbinator
It is odd to read in the RFC section 1.1 this sentence,

 _The format should use contemporary machine representations of data_

...and then see in section 1.2,

 _All multi-byte values are encoded in network byte order (that is, most
significant byte first, also known as "big-endian")_

I know it's largely a choice of tradition, but it seems almost anachronistic
to specify any new protocols as BE when LE is the overwhelming majority of
machines today, and probably has been for at least the past two decades.

~~~
kbumsik
It it also weird that "network byte order" is considered as BE. Is there any
advantage of BE when it is transferred over the network?

~~~
rhn_mk1
If I remember correctly, BE helps in switching networks for routing packets. A
packet would start with the big-endian destination address. If that address
was hierarchical, i.e. most significant bytes signify the network (interface)
it was in (like IP), then only a few of the first bits need to be processed in
order to find which interface to direct the packet to. The transmission out
therefore begins just after the few first bits are received.

E.g. for a switch connected to networks A: 0xa, B: 0x1, C: 0x3, a only the
first byte 0x1 of the packet destined to 0x1234 needs to be processed before
forwarding, saving some time compared to LE, where the entire address 0x4321
would have to be processed to find out that it's at the 0x1 network.

~~~
baybal2
Registers on anything today will be way wider than few bytes, moreover on
network hardware.

The gate savings on doing endiannes specific circuits are close to zero in
comparison to many many other things a typical logic block comes with today.

~~~
bestouff
It's not intended to save gates but time. Bits are (were) serially encoded on
the wire, so you could start switching sooner, before even having received the
whole header.

~~~
baybal2
I understand the rationale, but all that hardware today will probably get more
bits on input in a single clock cycle than at the time this was a concern.

Today, you have to process way way more bytes in a single clock cycle anyways.

Internally, for a chip designer, almost all modern high speed serial busses
look way wider than a single byte. And all of its serialness is kept inside
the transmitter/serdes/interface blocks without any external exposure.

------
dang
Thread from 2016:
[https://news.ycombinator.com/item?id=10995726](https://news.ycombinator.com/item?id=10995726)

2015:
[https://news.ycombinator.com/item?id=9597198](https://news.ycombinator.com/item?id=9597198)

2013:
[https://news.ycombinator.com/item?id=6932089](https://news.ycombinator.com/item?id=6932089)

2013:
[https://news.ycombinator.com/item?id=6632576](https://news.ycombinator.com/item?id=6632576)
(the largest)

------
eadan
A recent discussion here on Latacora's "How (not) to sign a JSON object" [0],
had me thinking of CBOR. Unlike JSON, MsgPack, protobufs, BSON, or any other
commonly used data interchange format that I'm aware of; CBOR has a canonical
representation (although, with seeming ambiguity in float representation) [1].

Anyone have any thoughts on using canonical CBOR for object signing?
Currently, I'm building a system with a content-addressable data store, and
I'm particularly interested in data formats with a canonical form for this
use-case.

[0]
[https://news.ycombinator.com/item?id=20516489](https://news.ycombinator.com/item?id=20516489)

[1]
[https://tools.ietf.org/html/rfc7049#section-3.9](https://tools.ietf.org/html/rfc7049#section-3.9)

~~~
dwaite
Generally, there isn't an efficient object model for CBOR (three really
troublesome features are the use of arbitrary CBOR structures as map keys, 64
bit unsigned negative numbers, and semantic tagging resulting in data being
represented in an alternative form e.g. a BigDecimal type rather than a binary
array).

As a result, round-tripping through a CBOR implementation still may result in
data structure changes. Depending on the type of change and any exploits in
say the hashing algorithm, this could be a security issue.

On the flip side, you can just tag a byte array as CBOR data, and sign it.
Unlike JSON, you don't need to perform an encoding/escaping to make one
document safe to embed into another document.

------
nlohmann
Shameless self-plug: JSON for Modern C++
([https://github.com/nlohmann/json](https://github.com/nlohmann/json))
supports CBOR along MessagePack, UBJSON, and BSON, see
[https://github.com/nlohmann/json#binary-formats-bson-cbor-
me...](https://github.com/nlohmann/json#binary-formats-bson-cbor-messagepack-
and-ubjson).

------
schoen
I'm curious how this compares to

* Cap'n Proto

* ASN.1

* gzip-compressed JSON

in various ways. (I don't know much about progress in serialization methods.)

~~~
thayne
Cap'n'proto and ASN.1 require a schema. Gzip compression means the content has
to be decrypted into json and then parsed into a native representation, which
probably requires more memory and cpu than cbor deserialization.

~~~
edoceo
Not decrypted. Decoded or decompressed.

~~~
thayne
Oops, I meant decompressed.

------
nneonneo
Neat. One little thing I gotta wonder about: was this name perchance a
backronym for the author’s name Carsten Bormann (CBor)? Not that I’m
complaining, but that does seem like an odd coincidence.

Also, how does this compare to JSON binary serialization, such as BSON?

~~~
starbugs
Regarding your first question, yes. Source: I have been a student at
University of Bremen and attended some of his lectures (I remember at least
one titled "Physikalisch-technische Grundlagen Digitaler Medien").

------
kbumsik
For someone saw CBOR, CBOR is kinda binary version of JSON, like MsgPack. CBOR
is already mature as it is used in web browsers internally by a web standard
[1] (and yes, Signed HTTP is in a controversy though...)

Because it is pretty much optimized for machine readability and lightness, one
of usage is microcontrollers for IoT, with CBOR + CoAP(kinda HTTP for IoT),
although I wouldn't say it is common yet, since CoAP needs IPv6.

[1]: [https://wicg.github.io/webpackage/draft-yasskin-http-
origin-...](https://wicg.github.io/webpackage/draft-yasskin-http-origin-
signed-responses.html#cbor-representation)

~~~
magicalhippo
I recently wanted to use something JSON-RPC'ish for communication between host
PC and my microcontroller. I looked into CBOR/MessagePack as well as UBJSON.

I didn't find a CBOR/MessagePack nor UBJSON implementation that was
microcontroller-friendly and easy to use.

In the end I ended up just using plain JSON. Easy to debug as you can easily
see what's on the wire, easy to implement, relatively small code size.

~~~
kbumsik
Most CBOR implementations for MCUs tend to be a part of RTOSs, integrated and
optimized for their own memory allocation libraries. But many of them are
based on Intel's TinyCBOR:
[https://github.com/intel/tinycbor](https://github.com/intel/tinycbor)

------
kstenerud
I've been working on a general purpose binary and text encoding format for
some time now as a personal project. It supports time, decimal floats, binary
blobs as first-class citizens, supports comments, and focuses on encoding the
most commonly used values more compactly. The binary format [1] is almost done
(I've come up with a more compact date representation that I'll be adding this
weekend to replace smalltime in the C and go implementations), and the text
format isn't far behind.

I almost had a better floating point compression format, but it turned out to
be too complicated, and only works well for decimal floating point, so I'll
probably not use it in CBE.

[1] [https://github.com/kstenerud/concise-
encoding/blob/master/cb...](https://github.com/kstenerud/concise-
encoding/blob/master/cbe-specification.md)

~~~
tjalfi
Have you considered reserving some opcode space for future extensions?

This would make it easier for CBEv2 to add types without changing how a byte
is interpreted.

~~~
kstenerud
Yes, I will be reserving space once I finish with the new date format.

------
mantap
I tried CBOR but honestly the JS libraries available are not yet of production
quality and I had to abandon it and go back to messagepack.

------
ktpsns
Fun Fact: CBOR is both an acronym for the software as well as the initials of
the author, Professor Carsten Bormann ([https://www.informatik.uni-
bremen.de/~cabo/](https://www.informatik.uni-bremen.de/~cabo/))

------
Zamicol
COSE, which uses CBOR, is the JOSE (commonly known by its subset JWT) of small
binary messaging.
[https://tools.ietf.org/html/rfc8152](https://tools.ietf.org/html/rfc8152)

~~~
nmadden
It is, although COSE is significantly different to JOSE in many ways.

------
larodi
this been here for a while... including the website. question: is anyone
actually/effectively using it in a commercial project?

------
throwaway3627
BINC, MsgPack, ...

