
CBOR – Concise Binary Object Representation - robin_reala
http://cbor.io/
======
Animats
It's not as bad as ASN.1. But when you need a "strict mode" for a binary
format, and otherwise allow ambiguous decoding, there's something wrong.
Something that probably can be exploited. The RFC admits this.

How does the density compare with JSON run through GZIP compression?

~~~
galonk
It seems rather that the spec allows, but doesn't require, lenience in what
data the encoder accepts. If you want to write an encoder that always errors
when given invalid data (that is always "strict"), that's fine. But if you're
going to be liberal in what you accept, you are required to also implement a
strict mode.

------
CJefferson
This format is weird -- it represents (effectively) a 65-bit signed integer,
which doesn't fit on any current platform sensibly, but also puts a strict
upper limit on integer sizes.

EDIT: Found bigints. Still weird having a signed 65-bit integer.

~~~
advisedwang
This sounds like they were trying to make one format that would fit both
"signed int64" and "unsigned int64".

~~~
ioquatix
A signed 64-bit integer with 2s complement representation still only has 2
__64 unique values. 2 __65 for a signed representation, while not stupid, is a
bit out there.

~~~
deathanatos
readers: The parent poster is saying that it has 2⁶⁴ unique and 2⁶⁵ unique
values; the formatting is wonky.

ioquatix: You might want to edit your post to indicate exponentiation. It took
me quite a moment to realize you meant that a signed 64-bit int represented in
2s complement has two-to-the-sixty-fourth values, not two hundred and sixty
four. I'm guessing you used two asterisks which HN mistook as a zero-length
italics?

~~~
moron4hire
Whoa, how did you do it?

~~~
deathanatos
Unicode has superscript versions of all of the arabic numerals. "⁶" here is
U+2076, "SUPERSCRIPT SIX"[1]; there's not any real "formatting" happening in
my post. (Because HN doesn't support superscript, and two asterisks is a pain
as I noted, and ^ gets confused by programmers for xor…)

If you happen to be on OS X, you can type Control+Command+Space in most
contexts, and get a popup that will let you select special characters, like
the above, emoji, etc. It even has a decent search box.

If you're on Linux, you can set a key as a Compose key (I use my right alt
key); the default set of compose sequences includes superscript six as
Compose, ^, 6. You can also input arbitrary unicode into most contexts if the
app is using GTK with Ctrl+Shift+u, followed by the hexadecimal code point,
followed by space.

On Windows, I have no idea.

[1]:
[http://www.fileformat.info/info/unicode/char/2076/index.htm](http://www.fileformat.info/info/unicode/char/2076/index.htm)

------
stock_toaster
Wasn't CBOR originally submitted to IETF as msgpack, against[1] the desires of
the msgpack dev team? Or am I thinking of something else?

[1]:
[https://github.com/msgpack/msgpack/issues/129](https://github.com/msgpack/msgpack/issues/129)

------
nornagon
How does this compare to other similar specifications, e.g. MsgPack and BSON?

~~~
rspeer
From what I've read of the spec so far: msgpack has a confusion between text
and binary data baked into it that will probably never be resolved, and CBOR
deliberately fixes that.

It also seems to have a better implementation of streaming data than msgpack.
In msgpack, streaming is something you have to implement outside of the format
itself, possibly by concatenating together many msgpack representations. CBOR
has a way to say "here comes a streaming list, I'll tell you when it's done".

BSON is a representation of MongoDB's data model and doesn't make that much
sense to use without MongoDB.

I currently use msgpack for a lot of things, but if CBOR's Python library is
good enough, I might switch.

~~~
Matthias247
Afaik newer versions of messagepack added an extra type to have string and
binary now seperated.

I read somewhere that CBOR was better designed for extensibility, but don't
know anything further about it.

One difference (on the non-technical side) is that CBOR is standardized
through IETF.

~~~
rspeer
> Afaik newer versions of messagepack added an extra type to have string and
> binary now seperated.

The problem is that the 'str' type contains arbitrary binary data in an
unspecified encoding, and always will, because of backward compatibility. This
isn't changed by adding a 'bin' type.

Msgpack decoders in Python, for example, have to give you bytestrings unless
you pass an option that promises that 'str's are all encoded in UTF-8.

~~~
prutschman
From
[https://github.com/msgpack/msgpack/blob/master/spec.md](https://github.com/msgpack/msgpack/blob/master/spec.md)

    
    
      Raw
        String extending Raw type represents a UTF-8 string
        Binary extending Raw type represents a byte array

~~~
rspeer
Ah okay, I didn't know there was now a specific String type (and that the one
I was calling 'str' is called 'raw'). Does the Python library use it?

~~~
prutschman
It can: see [https://github.com/msgpack/msgpack-python#string-and-
binary-...](https://github.com/msgpack/msgpack-python#string-and-binary-type)

~~~
rspeer
I don't even know what to believe anymore. That documentation is referring to
two types, with "raw" renamed to "str" plus a new "bin", which is what I
thought it was.

But the link you posted referred to three types, where "str" and "bin"
subclass "raw", which sounded like it provided a non-backward-compatible "str"
that's guaranteed to be text.

------
vbit
See also ubjson ([http://ubjson.org/](http://ubjson.org/)) which is quite nice
because it is a simple format that manages to be quite concise and extensible.

An interesting feature is support for strongly typed arrays and objects. This
lets you embed binary data as-is by specifying it as an array of uint8.

------
floatboth
I wrote a Swift encoder/decoder some time ago:
[https://github.com/myfreeweb/SwiftCBOR](https://github.com/myfreeweb/SwiftCBOR)

------
greydius
> ...and a few values such as false, true, and _null_.

when are we going to learn from our past mistakes?

~~~
deathanatos
I don't think null is necessarily bad. Sometimes, you either have a type or
nothing at all. It's when nullable is the _default_ that undoes us, I think.
It should be an explicit choice that your type is <thing|null>, not an
implicit one. That's the problem with Java references, or C pointers: it's on
by default and you can't opt-out; since null is always a valid value for those
types, I can't inform the type checker that null isn't a valid input (or a
possible output) from a function, so if someone mistakenly passes one, you
won't find out until runtime.

Contrast to Rust's Option<Foo> type; you know when null (None in Rust) is a
possibility, because it's in the type. (And in Rust, you're forced to deal
with it, too.)

~~~
paulddraper
You are right that null can be problem when encoded wrong in a statically
typed language.

But they cause problems even without static types.

[https://www.lucidchart.com/techblog/2015/08/31/the-worst-
mis...](https://www.lucidchart.com/techblog/2015/08/31/the-worst-mistake-of-
computer-science/)

For example, Rust can have Option<Option<T>>, which can be None, Some(None),
or Some(Some(...)). But you can represent that with null, because nulls don't
"stack".

------
bascule
I can't speak to the people who designed JOSE/COSE or their motivations, but
to me every design decision they made is either ignorant or actively opposed
to past problems with similar standards.

ASN.1 would perhaps be the main motivation here. Despite being an abstract
syntax, signatures across BER/DER/PER were all distinct. With a little bit of
work, signature algorithms as expressed in standards like CMS could be
abstract across the encoded representations.

But They Didn't Do That.

Flash forward almost 30 years into the future, and we're literally dealing
with the same problems:

[https://www.ietf.org/proceedings/94/cose.html](https://www.ietf.org/proceedings/94/cose.html)

"The resulting formats will not be cryptographically convertible from or to
JOSE formats."

WHAT REALLY? A longstanding problem with these formats for 30 years, and they
literally did the exact same thing? Yes, yes they did.

Hey, know what format does this (at least for bearer credentials if you think
JWT/CWT are cool. It's hard to argue with one specific JOSE/COSE thing since
this cancer literally has their fingers in every single honeypot they can get
their tentacles into)?

Macaroons:

[http://macaroons.io](http://macaroons.io)

In addition to not just being JSON-for-CMS/SAML, Macaroons are actually
_designed to be bearer credentials_ rather than slapping a bunch of "We took
an old idea and added JSON" and slapping it on old concepts, but...

Macaroons are provably secure in their own dialect of Abadi's authorization
logic. Check the last page of the paper:

[http://research.google.com/pubs/pub41892.html](http://research.google.com/pubs/pub41892.html)

CWT (and vicariously JWT) try to slap JSON/JOSE syntax/standards onto a bunch
of existing concepts, but fail to actually fix the fundamental problems, like
provable security. Yes, provable security: Macaroons are predicated on proofs.
JWTs are predicated on broken promises and ad hoc design.

All that said, the authors of the JOSE/COSE standards couldn't have tried
harder to repeat _every single one of_ the mistakes of the past. These
standards are nonsense. Unless you're switching from CMS they offer no
practical benefits, and can potentially introduce new vulnerabilities due to
their ludicrous complexity.

Unless you can name the exact CBOR-encoded standard you want to use and why
you should use it, and the alternative you're considering is practically
anything but CMS, AVOID AVOID. Stick with anything that's more standard.
Slapping JSON on things doesn't help, but just introduces new problems in a
space where older standards are at least better understood.

There are problems in this space: ASN.1 is old, overcomplicated, hard to use,
and the source of many vulnerabilities in the way it's described. So we have a
serialization format with bad semantics used in security critical contexts.
Should we replace it? Yes? Is JOSE/COSE the solution?

JSON has many odd/ambiguous semantics, a shitty type system, and there is no
direct mapping between ASN.1's type system and JSON/JOSE's, because it is
intentionally restricted by design.

JOSE/COSE's solution is to improve a security critical format by getting rid
of types.

Wait what? Improving security by _getting rid of types_? Yes that's exactly
what JOSE/COSE are doing, and it's the wrong direction IMO.

I would much prefer some sort of modern typed serialization format which has
packed and unpacked representations. Protobufs or capnproto come to mind.

