
Saltpack – A modern crypto messaging format - remx
https://saltpack.org/
======
jnwatson
This is classic young-person-fails-to-understand-venerable-standard-so-he-
reimplements-half-of-it.

Messagepack is schemaless and noncanonical. What that means is that a lot of
the bounds/field checking is pushed up to the application layer. I wouldn't
encode crypto with that (and I love Messagepack).

All the hate for ASN.1, yet it is among the most battle-tested specifications
out there. Blaming ASN.1 for the shitty ASN.1 parsers written in the 80's and
90's is like blaming libsocket for all the network attacks.

------
gcr
This is the serialization protocol that Keybase uses. You can try this right
now with `echo hello world | keybase encrypt username`

------
floatboth
MessagePack should be replaced with [http://cbor.io](http://cbor.io)
everywhere, as CBOR is an actual IETF RFC. Even if that kills the naming pun
opportunity.

~~~
tptacek
Nobody should really care whether something is or isn't an IETF standard. The
cart has been dragging the horses with internet standardization for well over
a decade now.

~~~
dchest
Indeed. MessagePack should be replaced with CBOR in new protocols, as CBOR is
a very nicely designed format.

~~~
tptacek
Isn't CBOR just IETF's bikeshedded MessagePack? I'm not trying to be snarky: I
don't have strong opinions about encoding formats.

"CBOR is better than MessagePack because it's better designed" is a good
argument, I think --- but: you'd want it to be so much better that the
difference is material.

"CBOR is better than MessagePack because it's standardized" is not, I think, a
persuasive argument.

~~~
dchest
Yeah, I skipped all the drama, read the spec and implemented an
encoder/decoder. CBOR is just how MessagePack-like format should have been
done from the beginning: it's technically superior in a sense that it's neat
and simple, replacing many specialized rules with one generalization.

I agree that IETF standardization isn't a good argument (well, for some it
is), that's why I replied with a better argument :) But, seriously, I won't
say that everyone should replace MessagePack everywhere with CBOR, both work
fine (as long as you use the latest version of MessagePack, with binary/string
distinction).

~~~
theamk
Looking at CBOR spec, I see that it is just more complex. Two ways to encode
lists/strings -- indefinite-length and strings. 16-bit floats in the main
spec. Separation of "null" and "undefined" values. And tags, with which define
things like decimal fractions, bigints, and regexps.

The tags are the worst, actually. Sure, spec says "decoders do not need to
understand tags", but this is not really the case. For example, if someone has
floating point numbers and worries about precision loss, they can store the
value as _decimal fraction_ (per section 2.4.3). This means that your decoders
have to support both tagging and your favorite bignum library just to make
sense of the data. In comparison, in msgpack (or json or xml or anything else)
you would just have to store a string representation -- trivial to convert to
either regular floats if you do not care, or to pass to your favorite bignum
library (and this will be simple, as all of them support constructor based on
ascii strings).

In general, I think optional tags in data-interchange protocols are a very bad
idea. For example, there is a tag for "Standard date/time string" and for
"Epoch-based date/time". Which means that either:

\- You schema says "date/time", and your decoder now must support both of them
(and probably untagged strings, and integers too). So this is an extra
complexity in your decoder.

\- Your schema says "date/time in 'Standard date/time string' format", and now
every encoder user must make sure the emitted value is encoded and tagged
appropriately. This means you cannot do `x = cbor_encode({"now": date})`, you
have to read your encoder documentation to make sure the CBOR encoder you are
using will generate the required encoding.

So extra complexity in either case, and no real benefits. Better stick to
msgpack, at least it has no extensions defined currently.

~~~
dchest
Seriously, I just don't understand how can you described the benefits of CBOR
and claim that they are its drawbacks.

Yes, you have to read the documentation of your encoder/decoder to understand
what tag values it maps to your programming language's objects, but if you
need to encode or decode those same values with MessagePack you'll have to
define your own format for them and document it. You just moved this problem
up the stack, but with an ad-hoc format.

Separation of "null" and "undefined" is for full JavaScript support. Before
C99, C didn't have boolean type, but you wouldn't complain if serialization
formats had them, would you? Same thing "undefined": while it's useless in
most other languages, it's useful to have it for JavaScript.

I don't like float16 too - they took 19 lines of decoder code to support (no
need to support it in encoder) — but it's the same situation as with
"undefined" — some people need it.

------
hoistbypetard
Does anyone have a link to a run-down on reasons to prefer this format over
CMS? The community's been iterating on that for ~20 years now (RSA PKCS#7, RFC
3852, RFC 5652, and a pile of other formats/protocols built on top of these)
and so far nothing is jumping out to make me think this is an improvement.

~~~
throwaway2048
Not using ASN.1 is a massive improvement. ASN.1 is turing complete. That means
you cannot computationally decide if two certificates are equivilent, assuming
advanced enough fuckery.

This is very very bad.

ASN.1 (which is also used in x509) is a catastrophicly awful format for
encoding certificates or other crypto operations, and is the direct cause of a
huge amount of SSL/TLS security issues.

~~~
nullc
> ASN.1 is turing complete.

ASN.1 is a horror show, and the actual implementations of it that are
available are even worse-- I've still yet to find an open source
implementation of BER that strictly matches the spec, correctly accepting and
interpreting all values that should be accepted and rejecting all values that
should not... but I believe that you're joking about "turing complete".

... But ASN.1 is so bad that I'm not completely sure. Can you confirm or cite?

------
exabrial
Why don't we have an emoji-like character yet for pgp begin message?

~~~
tlrobinson
I assume it's ASCII rather than UTF-8 to maximize compatibility with systems
that may or may not handle UTF-8 correctly.

Actually, yeah, that's pretty much what they said:

 _The changes here are small: we 've reduced our characters to base62 plus
some period markers, and only at the ends of words. PGP messages often get
mangled by different apps, websites, and smart text processors._

~~~
pacaro
I understand their reasoning, but base62 is obnoxious to deal with. I'd prefer
base32. 1) case insensitive, 2) works better with OCR (assuming a sensible
variant is chosen), and 3) can be processed with simple bit-shifts

~~~
eximius
Why on earth would OCR be a concern here? (genuine curiosity)

~~~
xoa
Simply printing stuff out onto alkaline/archival paper and sticking it in a
vault still remains in much of the world an important part of
retention/preservation schemes and contracts (and in some jurisdictions/cases
may even be a regulatory requirement). Properly stored quality paper can have
a lifespan of 500-1000 years, can have some favorable tradeoffs in terms of
reader future-proofing and upkeep vs purely digital storage, etc. No reason
not to want to take advantage of signing/encryption just because it's paper,
or be able to easily retain important messages in their native format.

You could argue that this doesn't need to be part of a message format itself
because it'd be perfectly possible to write an intermediation layer for print
that simply translates arbitrary input to an OCR favorable output and back,
and maybe that'd make sense anyway if there are other desirable choices for
properties like parity unique to print/archival/OCR. Still, to the extent that
a format base choice is arbitrary and makes no particular difference to the
humans or computers involved since they'll be intermediating through software
anyway, better OCR properties doesn't seem like an entirely unreasonable
metric to consider as part of the design considerations if there isn't a
compelling reason otherwise.

~~~
snailmailman
Isn't that situation a good place to use QR codes? It might be a large code
(im not sure the limitations of QR codes) but it would eliminate the need for
OCR. Nobody is going to be decrypting it by hand so you don't really need
letters that are human readable. The base62 form works well for posting online
but for printing QR seems like a good format to me

~~~
pacaro
So QR codes are great as part of a designed workflow where the reading app is
well understood. The challenge comes in more ad hoc scenarios.

Also QR codes hit a certain practicality limit with size (2953 bytes to stay
in spec)

Perhaps a meta-point is that when you are trying to design a general purpose
interchange format, there will always be scenarios that you didn't imagine. In
this case I have raised OCR and (legitimately) many people's responses have
been a rather polite WTF (although I did garner one downvote). Experience
teaches us that formats will be used in unexpected ways.

------
rektide
Not totally sure what BaseX is, how it compares versus Base64, especially post
HTTP deflate compressions, but I'm not sure I like it. I'm pretty sure this
kind of exercise is better left out, and that everyone should just use zstd on
whatever encoding so as to decouple problem domains.

Round two of skepticism: msgpack is a niche player with no clear big corporate
sponsor. Protobufs, flatbufs, and thrift are all actively making faster better
quicker implementations, but I can not off the top of my head think of any
major msgpack lovers. Avro also seems to just generally have some fast impls
already, especially on platforms I care about[1], so credit there too. I ought
review, but out of hand I can't think of anything distinguishing about
msgpack.

Definitely nice having _some_ alternative to Salmon protocol[2] (as in Buzz,
OStatus) on hand. Alas I believe it's again fully encapsulating, versus say
http signatures[3], where the signature is decoupled from the payload. It
takes both types!! Neither is right.

[1]
[https://github.com/mtth/avsc/wiki/Benchmarks](https://github.com/mtth/avsc/wiki/Benchmarks)
[2] [http://www.salmon-protocol.org/](http://www.salmon-protocol.org/) [3]
[https://tools.ietf.org/html/draft-cavage-http-
signatures-06](https://tools.ietf.org/html/draft-cavage-http-signatures-06)

~~~
theamk
The authors explicitly compare saltpack and PGP, and all of you questions get
answered when you think about it as "better PGP", an encryption designed to be
used with email, generic online messengers, forums and so on:

> Not totally sure what BaseX is, how it compares versus Base64, especially
> post HTTP deflate compressions, but I'm not sure I like it.

BaseX is "armor" \-- a way to insert binary data into the text-only media such
as email messages and forum posts. It seems to be way more convenient, as it
uses no punctuation and is whitespace-insensetive. For example, I have once
tries to send PGP message via gmail web interface and it took me two or three
tries to figure out how to properly paste the text without gmail inserting
extra whitespace and making the message undecodeable. BaseX should not have
such problems. It is slightly less efficient that base64, including under
compression, but I think it is worth it.

> I'm pretty sure this kind of exercise is better left out, and that everyone
> should just use zstd on whatever encoding so as to decouple problem domains.

Are you talking about zstd as in "Zstandard - Fast real-time compression
algorithm"? This has nothing to do with this proposal, there is no compression
anywhere in there.

> Round two of skepticism: msgpack is a niche player with no clear big
> corporate sponsor. Protobufs, flatbufs, and thrift are all actively making
> faster better quicker implementations, but I can not off the top of my head
> think of any major msgpack lovers. Avro also seems to just generally have
> some fast impls already, especially on platforms I care about[1], so credit
> there too.

The deserialization speed does not really matter. All saltpack messages are
encrypted, and your decryption time will dominate your deserialization time.

> I ought review, but out of hand I can't think of anything distinguishing
> about msgpack.

Well, msgpack is in a completely different group from Protobufs/flatbufs/and
thrift. The former is a protocol, which is implemented by a number of
libraries, while the latter is a specific library, available from a single
vendor only.

As a result, msgpack has fewer features (no schema support at all), but is not
bound to a single large corporation. IMHO, a right choice for global
communication protocol.

> Definitely nice having some alternative to Salmon protocol[2] (as in Buzz,
> OStatus) on hand.

Wait, what? Salmon is about blogs on the web, HTTP posts, XML schemas embedded
in HTML. The encryption is only a small part of it.

Saltpack is about encryption email/chat/messaging, has no specification of
payload format, and designed to work without HTTP using efficient formats. I
see very few common things between two protocols.

> Alas I believe it's again fully encapsulating, versus say http
> signatures[3], where the signature is decoupled from the payload. It takes
> both types!! Neither is right.

Well, the big difference is that http signatures have no encryption while
saltpack has it. So just two very different goals that they want to achieve.

~~~
rektide
I'm basically obligated to reply to a long long list of mudslinging and
disfavorment on your part here- section by section:

1\. You randomly talk about BaseX ad nauseum but brely mention what I was
actually critical of- it's advantages vs base64. I still have no clean picture
why not base64, like everyone else, which would serve the exact same needs.

2\. You _just_ finish talking about compression then slam me for mentioning
compression as a relevant factor.

3\. You criticize deser speed as not important, say decryption will dominate.
But while the message format may be competing with PGP it's inner payload is
msgpack and I expect inside the firewall systems to have canonical text as
msgpack, and performance is relevant. But I didn't initially grok that msgpack
is the inner payload, that the BaseX text is the normal messaging format.

4\. You try to pull some technical distinction nonsense about msgpack being
just a messaging format by talking about how competitors also have other stuff
too, while also being a message format. I find this distinction in bad faith
and believe most techs could reasonably see there is way more overlap than
differences. Your advantage ends up rather accurately being "msgpcka has fewer
features" and chalk up the advantage as some nebulous political one, while
ignoring the fact that Thrift is owned by the most reliable open source org on
the planet Apache whereas Msgpack is just some rando project.

5\. You obviously don't know what Salmon is for. Your decision to focus on
some apparently unrelated technical things that it corresponds with ignores
that it is a (XML)signing format for arbitrary content. HTTP and HTML have no
bearing on what Salmon Digital Signing Protocol is, yet you harp on them to
draw a false contrast, and as usual you refuse to acknowledge that there could
be some bearing or relationship that I had validly called out.

I'd like to better understand some of your valid points, but some of your
arguments seem done in very bad faith and there to argue rather than explain
or demonstrate. I have a hard time understanding how I can start to reconcile
our two point of views with what you have written.

------
based2
[https://www.reddit.com/r/crypto/comments/43ur46/saltpack_an_...](https://www.reddit.com/r/crypto/comments/43ur46/saltpack_an_aead_crypto_encoding_format_competing)

------
ape4
Every message starts with "saltpack". That's handy for TLAs.

------
btmiller
Great. We can now salt food, salt passwords, configure things using
Salt(Stack), and now also use saltpack as a messaging format. Can we stop
overloading the term "salt"? Makes it ridiculous to search for things on
Google.

~~~
oconnor663
The name here refers to the NaCl library, which is what saltpack uses for all
it's crypto. DJB says that "NaCl" is pronounced "salt", so officially
speaking, I say "saltpack" is pronounced "nacklepack" :)

~~~
hossbeast
Saltpack -> NaCL pack -> nacklepack -> Nickelback

