The primitive being provided here is an instance of SIV, which is widely considered the most conservative mainstream cipher mode that addresses nonce reuse. SIV is a moral cousin to Deterministic DSA and EdDSA, in that the "nonce" is based on a hash of the message. You can add additional nonce material, and that will improve the security of the system, but even with a constant all-ε stream of additional nonces, for most applications you're fine.
The downsides to AES-SIV are that the mode is "offline" and two-pass. You have to have the whole message available to encrypt with AES-SIV (the state needed for CTR mode comes from processing the whole message). This makes some kinds of streaming interfaces hard to implement. On the other hand, you can almost always delegate that kind of interface up one layer in your application stack and pass AES-SIV chunks of messages.
This library or something like it will eventually hit some kind of "1.0", and, at that point, if you can get away with the performance hit --- and you virtually always can, because bulk encryption isn't a bottleneck in most systems, and on the systems where SIV's performance hit matters you tend not to get much benefits from the "faster" stuff --- you should use this for bulk encryption. (Unfortunately, KRACK is a very good example of a setting that probably couldn't get away with using AES-SIV). As a crypto interface, it's better than NaCL.
Without additional precautions this approach is vulnerable to a fairly basic chunk-reordering attack, since any re-ordering of the "chunks" is a valid ciphertext. I strongly recommend against this approach.
EDIT: Unfortunately there is not really a better way to implement a streaming interface on top of a nonce-misuse-resistant encryption scheme: it's fairly easy to prove that any nonce-misuse-resistant construction must necessarily be "offline" in the sense tptacek describes.
These schemes achieve a security definition called OAE2 (STREAM specifically achieves nOAE, which Rogaway proves equivalent to OAE2) and are robust against reordering and truncation attacks. For more information, please see the paper:
In this system, do distributed encryptions of the same data cause any vector for attack? Eg, if this library helps prevent nonce misuse, does distributing the writes of data across multiple computers cause possible problems?
This may seem completely off the mark, so feel free to respond with a simple "No.", i'm very ignorant of this stuff. All i know, is to be very wary heh.
My use case, for those curious, is that i've got a distributed , content addressed filesystem built ontop of a sort of ledger. This filesystem allows for offline writing, which of course means that when nodes reconnect, reconciliation of any data conflicts must take place. This is done on the ledger in a very bitcoiny way. Simply, stupid.
In a nonce system (as described in the blog at least), i could simply use the length of the chain as the nonce and i'd be 100% safe from nonce reuse on the ledger. However, some writes of distributed nodes might duplicate nonces before reconciliation takes place. The ledger never has duplicates, but old-hash addresses would (temporarily) contain a duplicated nonce.
Does this scenario sound bad for Miscreant?
As an aside, thank you to all involved for this project! Friendly crypto will be such a boon to developers!
If for some reason you cannot (or don't want to) rely on your system's random number generator, consider using a hash of the chain as the nonce. Just make sure duplicated chains never happen, or revealing the existence and location of duplicates is not a problem.
If you need to mitigate replay attacks… Err… I don't know.
Reusing the same nonce and key to encrypt two different messages is very, very bad: in most schemes it leaks a lot of information about the plaintext and/or the key immediately. One easy conceptual example is CTR mode (or any of the authenticated-encryption modes based on CTR), which uses the key and nonce to generate a stream of bits that are then XOR'd with the plaintext. If you encrypt two plaintexts with the same stream, an attacker can XOR them together and the encryption cancels out, which basically reduces the attacker's problem to just the sort of cryptanalysis that was popular in the pre-computer era. (This is the exact same problem with using a one-time pad twice.)
Reusing the same nonce and key to encrypt the same message is only bad in the sense that you'll get the same ciphertext. Most good cryptographic systems attempt to avoid leaking even the bit of information that two ciphertexts exhibit the same plaintext, and using unique nonces (or random nonces from a large-enough space of possible nonces) gets you that. SIV and therefore Miscreant leaks this information, but remains secure if you're encrypting two different messages. For some protocols, leaking information about a message being repeated is bad - imagine an online poker game, where the message is what card your opponent has. If you see the same ciphertext twice, probably that gives you a significant advantage. For some protocols, possibly including the distributed reconciliation you're talking about, maybe that's totally fine.
Also, SIV (and therefore Miscreant) is quite a bit slower than other schemes, because it processes the message to compute the nonce and then again to encrypt the message now that it knows how to do that. For some protocols this is a big tradeoff. Since you mention "distributed filesystem" and "bitcoiny," I'll guess that's not a problem for you, but also given those constraints, probably just using a random IV selected from a large space is easy to do, so you might as well do that. Maybe do that and also use SIV, if you don't trust your users' offline RNGs.
Since you mention content-addressed filesystems, take a look at the message-locked encryption construction, which is at a high level just SIV taken to an extreme: instead of just deriving the IV, you also derive the key from the hash of the message. This is a good primitive for building cross-user deduplication for something like an end-to-end encrypted version of Dropbox, because two unrelated users who happen to have the same file will produce the same ciphertext, while (at least in theory) if you were never in possession of the cleartext, you have no way to compute the key that was used to encrypt the ciphertext. However, it's got some inherent tradeoffs just by the nature of the problem, so if you don't need dedup across mutually untrusted users, don't use it. If you do, and you want your files to be content-addressed across users, MLE will make sure different users get the same ciphertexts and therefore the same hashes of ciphertexts for the same input files.
Alas, there are no standardized robust schemes, with AEZ making it to the CAESAR final portfolio being the most likely scenario of that happening.
Point being, its inclusion in the final CAESAR portfolio is far from clear at this point.
Disclaimer: I'm not the GP.
To the best of my knowledge, that's why he designed AEZ under the encode-then-encipher paradigm - encode the plaintext with \tau trailing zero bits, then encipher it with (what is intended to be) a length-preserving SPRP.
Being an SPRP, the inverse is also a PRP, and thus any adulteration of the ciphertext results in a pseudorandom decryption, preventing RUP (and causing the trailing zeroes to fail to validate).
Constructing the SPRP then becomes the interesting part - a fascinating construction can be found in https://arxiv.org/abs/1310.4050 (which neither uses a block cipher entire, paying the cost of multiple invocations per block, _nor_ gives up on proofs - instead, it extends Cook's Elastic Cipher construction to arbitrary lengths, using the _round function_ directly, with security reductions to the original cipher the round function was taken from.)
See my comments in that same thread for some early hints at Miscreant.
AES is also more ubiquitously available in hardware, which is nice when targeting slower devices like embedded/IoT.
Apparently the most popular cipher mode used in TLS is AES-GCM and has seen a lot of attention, including its inclusion in the linux kernel. OTOH, ChaCha20-Poly1305 is used for mobile, for example by Google.
Then nonce collision would require a library submitting identical nonces, identical messages, and basically every way of getting a unique number from the OS to all collide.
(Also, you should use /dev/urandom, not /dev/random. Or getentropy/getrandom/CryptGenRandom, depending on your OS.)
With shorter nonces (64 bits), it is strongly recommended to use a counter. You can start at zero, or if you prefer, start at a random number or use a LFSR whose cycle is 2^64. That way you know for sure that the first 2^64 nonces are all unique. If random generation was used, you would hit collisions due to the birthday paradox, and by 2^32 nonces the chance of a collision would be about 40% .
With long nonces (192 bits), you can use random generation entirely, given that you have a good CSPRNG.
With intermediate nonces, it might be reasonable to use a 64-bit counter and either randomize the rest or use clock data (if you have a system with good clocks). This gives a little protection against any sort of nonce-reset bug in your code, while still giving you a full 2^64 nonces before the counter cycles. But this is just my own opinion, and I'm not a professional cryptographer (although I did study under Philip Rogaway).
It seems imminently more likely for a counter to get reset by a code bug (e.g. KRACK), then for a largish random number to repeat, so I understand why coders instinctively lead towards random numbers rather than counters.
 https://eprint.iacr.org/2016/475.pdf, see table page 6
* the simplest possible synthetic nonce design that's "obviously" correct for some useful value of correct.
* uses only explicitly exported libsodium APIs.
It's an off-line scheme like SIV, because the users I'm targeting don't care about on-line schemes. If you really care about on-line schemes, I think the simplest, most practicable solution right now is to just use an large nonce scheme (like secretbox) filled with random bits.
To give you an idea about performance and to continue with the author's gracious use of unflattering benchmarks, my Clojure+FFI code is about 20% slower than his AES-SIV-PMAC code. It should probably just publish it on IACR instead of just in a random module of my libsodium binding.
One could argue that it is still desirable to have fast, AES-based algorithms, and I think that both this article and related work (e.g. the AES-GCM-SIV paper) do a reasonable job of making that point. For example, my cost function assumes libsodium is free but implementing any nontrivial crypto is very expensive. If that's not true, say, in resource-constrained embedded environments (libsodium not free, developing crypto par for the course), I can totally see why you'd end up with SIV or SIV-PMAC: you just need one really fast implementation of an established primitive (AES). Really anywhere where the platform hasn't tilted the table in favor of GCM by having CLMUL instructions, AES-SIV-PMAC should dominate both from a performance and safety perspective.