
Miscreant: a multi-language misuse resistant encryption library - waffle_ss
https://tonyarcieri.com/introducing-miscreant-a-multi-language-misuse-resistant-encryption-library
======
tptacek
This library implements a crypto primitive that sacrifices a marginal but
measurable amount of performance to avoid a very common user error with crypto
primitives --- repeating a nonce (a cryptographic counter). For perspective,
this week's KRACK 802.11 bug is an instance of nonce reuse.

The primitive being provided here is an instance of SIV, which is widely
considered the most conservative mainstream cipher mode that addresses nonce
reuse. SIV is a moral cousin to Deterministic DSA and EdDSA, in that the
"nonce" is based on a hash of the message. You can add additional nonce
material, and that will improve the security of the system, but even with a
constant all-ε stream of additional nonces, for most applications you're fine.

The downsides to AES-SIV are that the mode is "offline" and two-pass. You have
to have the whole message available to encrypt with AES-SIV (the state needed
for CTR mode comes from processing the whole message). This makes some kinds
of streaming interfaces hard to implement. On the other hand, you can almost
always delegate that kind of interface up one layer in your application stack
and pass AES-SIV chunks of messages.

This library or something like it will eventually hit some kind of "1.0", and,
at that point, if you can get away with the performance hit --- and you
_virtually always can_ , because bulk encryption isn't a bottleneck in most
systems, and on the systems where SIV's performance hit matters you tend not
to get much benefits from the "faster" stuff --- you should use this for bulk
encryption. (Unfortunately, KRACK is a very good example of a setting that
probably couldn't get away with using AES-SIV). As a crypto interface, it's
better than NaCL.

~~~
notheguyouthink
Hello, very ignorant of crypto user question if you don't mind:

In this system, do distributed encryptions of the same data cause any vector
for attack? Eg, if this library helps prevent nonce misuse, does distributing
the writes of data across multiple computers cause possible problems?

This may seem completely off the mark, so feel free to respond with a simple
"No.", i'm very ignorant of this stuff. All i know, is to be very wary heh.

My use case, for those curious, is that i've got a distributed , content
addressed filesystem built ontop of a sort of ledger. This filesystem allows
for offline writing, which of course means that when nodes reconnect,
reconciliation of any data conflicts must take place. This is done on the
ledger in a very bitcoiny way. Simply, stupid.

In a nonce system (as described in the blog at least), i could simply use the
length of the chain as the nonce and i'd be 100% safe from nonce reuse _on the
ledger_. However, some writes of distributed nodes might duplicate nonces
before reconciliation takes place. The ledger never has duplicates, but old-
hash addresses would _(temporarily)_ contain a duplicated nonce.

Does this scenario sound bad for Miscreant?

As an aside, thank you to all involved for this project! Friendly crypto will
be such a boon to developers!

~~~
loup-vaillant
An easy way to prevent nonce reuse is to use a nonce big enough to be selected
at random. 192 bits is big enough. XChacha20 provides that. I believe there
are more general nonce extension constructions out there, but I don't know
enough to recommend any.

If for some reason you cannot (or don't want to) rely on your system's random
number generator, consider using a _hash_ of the chain as the nonce. Just make
sure duplicated chains never happen, _or_ revealing the existence and location
of duplicates is not a problem.

If you need to mitigate replay attacks… Err… I don't know.

------
dochtman
Have you thought about Chacha20/Poly1305-based analogues to the algorithms you
implemented, per
[https://github.com/briansmith/ring/issues/413](https://github.com/briansmith/ring/issues/413)?
Is AES-SIV mainly better because it is specified, and thus has seen more
analysis, or are there other reasons that Chacha20/Poly1305 aren't taking off
in this space?

~~~
bascule
As you guessed, my main reason is there aren't standard constructions around
SIV modes for ChaCha20Poly1305, although as Brian Smith notes in that thread
SIV is a general construction which should be easy to adapt.

See my comments in that same thread for some early hints at Miscreant.

AES is also more ubiquitously available in hardware, which is nice when
targeting slower devices like embedded/IoT.

------
CJefferson
Would it be reasonable for the library to almost make a "good attempt at
making a nonce, perhaps by using a global counter, date time, uptime and
reading /dev/random?

Then nonce collision would require a library submitting identical nonces,
identical messages, and basically every way of getting a unique number from
the OS to all collide.

~~~
lvh
Sorta: if your CSPRNG is weird/broken then you have way worse problems, so
just use your CSPRNG. If your CSPRNG is fine, adding global counters,
timestamps, uptime... makes no sense.

(Also, you should use /dev/urandom, not /dev/random. Or
getentropy/getrandom/CryptGenRandom, depending on your OS.)

~~~
CJefferson
CSPRNG can be badly behaved at startup time on virtual machines, and there
have been problems with bad random numbers at start up time. Adding wall clock
time to the mix wouldn't hurt.

~~~
lvh
Which operating system do you run where time isn't already in the CSPRNG pool?

~~~
CJefferson
I believe Linux only uses physical devices (which are very limited in some
VMs), but this might just be folk-legend (I haven't read the source).

~~~
lvh
No, Linux' random.c has add_device_randomness (sp? it's definitely
something_device_randomness), which includes a bunch of machine-specific
things to the RNG, including MAC addresses, serial numbers, and, crucially,
the RTC.

~~~
CJefferson
I'm obviously misinformed. Thanks for that.

------
Ninn
I'm not sure that im entirely convinced by the arguments against libsodium.
Wouldnt it be possible to extend/fork libsodium where one could create a
simpler api to automatically force (or atleast default to) correct nonce
handling?

~~~
lvh
Yes, I believe that's possible, though not necessarily with the same
performance. I built magicnonce [1] explicitly with two properties in mind:

* the simplest possible synthetic nonce design that's "obviously" correct for some useful value of correct.

* uses only explicitly exported libsodium APIs.

It's an off-line scheme like SIV, because the users I'm targeting don't care
about on-line schemes. If you really care about on-line schemes, I think the
simplest, most practicable solution right now is to just use an large nonce
scheme (like secretbox) filled with random bits.

To give you an idea about performance and to continue with the author's
gracious use of unflattering benchmarks, my Clojure+FFI code is about 20%
slower than his AES-SIV-PMAC code. It should probably just publish it on IACR
instead of just in a random module of my libsodium binding.

[1]:
[https://github.com/lvh/caesium/blob/master/src/caesium/magic...](https://github.com/lvh/caesium/blob/master/src/caesium/magicnonce/secretbox.clj)

One could argue that it is still desirable to have fast, AES-based algorithms,
and I think that both this article and related work (e.g. the AES-GCM-SIV
paper) do a reasonable job of making that point. For example, my cost function
assumes libsodium is free but implementing any nontrivial crypto is very
expensive. If that's not true, say, in resource-constrained embedded
environments (libsodium not free, developing crypto par for the course), I can
totally see why you'd end up with SIV or SIV-PMAC: you just need one really
fast implementation of an established primitive (AES). Really anywhere where
the platform hasn't tilted the table in favor of GCM by having CLMUL
instructions, AES-SIV-PMAC should dominate both from a performance and safety
perspective.

