
AES-GCM-SIV: Specification and Analysis [pdf] - remx
https://eprint.iacr.org/2017/168.pdf
======
tyingq
Reading this first was helpful for me: [https://www.lvh.io/posts/nonce-misuse-
resistance-101.html](https://www.lvh.io/posts/nonce-misuse-
resistance-101.html)

------
tveita
Going directly to CFRG with this kind of feels like stealing the thunder from
the ongoing CAESAR competition.

Unfortunately the only misuse-resistant CAESAR candidate left in the running
is AEZ, according to
[https://aezoo.compute.dtu.dk/doku.php](https://aezoo.compute.dtu.dk/doku.php).

~~~
azet
yea. I don't get why HS1-SIV isn't in the next round.

~~~
aicez
Hardware performance might be the reason.

------
p1mrx
Do I understand correctly that the change from GHASH to POLYVAL is basically
saying "screw tradition, let's put Little Endian on the wire"?

In general, when you see Little Endian on the wire, that means someone forgot
to call htonl() in their code.

~~~
nbadg
I'm reading that the same, but in this case it's a performance consideration;
most systems expected to use AES-GCM-SIV use little-endian order internally,
and since AES is often implemented at extremely low level (including in
hardware), for most real-world systems (in particular, ones where crypto is
extremely performance-sensitive), it doesn't make sense to impose an extra
byte reordering for a value that is only used by the encryption process
itself.

------
bascule
For context, this is describing an updated AES-GCM-SIV construction following
a number of attacks reported by NSA earlier this year:

[https://mailarchive.ietf.org/arch/attach/cfrg/pdfL0pM_N.pdf](https://mailarchive.ietf.org/arch/attach/cfrg/pdfL0pM_N.pdf)

Several cryptographers have been wary of this construction, both because of
the history of attacks and also because it generally hasn't lived up to the
goals of (nonce) "misuse resistant authenticated encryption" as described in
the seminal Rogaway paper on the matter:

[https://eprint.iacr.org/2006/221.pdf](https://eprint.iacr.org/2006/221.pdf)

It will be interesting to see more analysis on the latest version. For the
intended use case (QUIC ticket encryption) it would be helpful.

~~~
azet
For the record: I'm not a cryptographer.

:)

------
memming
Sadly hard to read...I was away from the crypto world for too long!

------
runeks
What is this used for? Stuff like file encryption?

~~~
nbadg
AES-GCM-SIV is useful for applications that require some combination of:

1\. A distinct message-based format (ie not streaming)

2\. Protection against modified plaintext/malleable ciphertext without a
separate MAC/signature construct (this protection can extend to some
"attached" plaintext called "associated data")

3\. Multiple entities making encrypted messages that cannot necessarily
communicate with each other to negotiate safe nonces (for example, a server
farm, or in distributed computing)

As such, it would not be the best choice for file encryption.

\-----

In more detail: AES-GCM-SIV has three parts: AES, GCM, and SIV.

AES is a (some might say _the_ ) choice for symmetric encryption. In general,
symmetric encryption is ultimately used for any and all _content_ that you
encrypt; asymmetric encryption is used almost exclusively with other
cryptographic primitives, because it is very slow in comparison (among other
issues). Cryptosystems like PGP make it seem like you're using asymmetric
encryption on content by going like this:

    
    
        asymmetric_encrypt(symmetric_key) + symmetric_encrypt(content, symmetric_key)
    

AES, like all other "block ciphers", _by definition_ operate on small chunks
of data at a time -- in the case of AES, 128 bits. The process by which you
turn the AES primitive, which can only deal with 128 bits, into something that
can encrypt arbitrary-length data, is called a "block mode of operation". AES-
GCM-SIV is one such block mode.

GCM is one (again, some might say _the_ ) choice for "authenticated encryption
with associated data", or AEAD. It is a more sophisticated construct than
"simple" encryption, in that it also asserts that the ciphertext (and,
optionally, some attached in-the-clear plaintext, for example, metadata) was
encrypted by someone who had the symmetric key to the data. This is important,
because otherwise, the ciphertext is "malleable" \-- that is to say, an active
attacker could modify the ciphertext without discovery by the client. This can
be a big problem; an easy example why is this scenario:

1\. Imagine an encrypted "on/off" boolean value for a lightbulb. Alice
(client) sends True for "on" to Bob (lightbulb), and Mallory (the active
attacker) intercepts the message 2\. Mallory knows the format of the plaintext
because she has the same kind of lightbulb. She flips the bit in the
ciphertext that corresponds to the on/off value, and then transmits the
modified message to Bob 3\. Bob successfully decrypts the message, sees
Mallory's "off" command, despite Mallory never knowing the encryption key for
the message, and turns himself off

GCM solves this problem in a way that is analogous to combining a MAC
construct with the AES encryption, in one integrated primitive. You could do
these separately, and for a long time that's what the industry standard was,
but it's easy (and catastrophic) to screw up, so there's been a substantial
movement towards AEAD primitives instead.

Finally, SIV ("synthetic initialization vector"). GCM (and a number of other
block modes) require an initialization vector, or IV. No key + IV combination
can ever be used twice; if it is, the plaintext of both messages is fairly
trivially recoverable. AKA, total system failure, equivalent to no encryption,
etc. It is difficult to understate the severity of this; it's a really, really
big deal to reuse the same key and IV combination. The problem that SIV seeks
to solve is one where you have a bunch of different sources for the IV, and
you can't necessarily guarantee their uniqueness. Traditionally, you would
choose a random IV from a very large set of values to avoid the birthday
paradox (you're basically relying on random chance to avoid reusing the
IV+key, but because the selection space is so large, you're extremely
confident it won't be a problem). But this doesn't always work; the linked
paper gives an example of a server cluster running Google's new QUIC protocol,
which would be subject to IV birthday collisions during a DDoS attack.

SIV is part of a relatively new area termed "nonce reuse resistance" or "nonce
misuse resistance". These constructs seek to protect against accidental IV
reuse, as in the birthday collision above, by constructing the IV from the
plaintext of the message. The idea is that, if you accidentally reuse the same
key + IV, by definition, you must also have the same plaintext -- resulting in
an identical ciphertext, which protects the _contents_ of the message. As
such, you can safely use AES in an AEAD block mode, in a stateless distributed
system, without worrying about two different protocol instances accidentally
reusing the same key + IV combination (which would, again, break everything).

 _Disclaimer: this is significantly simplified for the purposes of
explanation._

~~~
MichaelGG
An upvote wasn't enough: thanks for writing this!

Does this mode entirely derive the IV from the plaintext or do you provide a
nonce that's also mixed in? (So the chances of nonce reuse + same message =
reused IV are even lower?) Could you not do the same kinda thing just with GCM
by using an IV plus hash of the plaintext?

~~~
nbadg
You're welcome!

From my understanding, IV plus hash of the plaintext is _sort of_ what's going
on in SIV (note that SIV is subtly different from GCM-SIV which is subtly
different from AES-GCM-SIV), plus an added step to also mutate that hash using
a single block of AES.

The last bit is important not just to protect against weaknesses in the hash
function, but also to prevent leaking information about the plaintext. Because
the IV is required for decryption, it must be transmitted in the clear, so
this could be trivially vulnerable to attack -- using the same True/False
example I gave above, you could simply hash both values and see which one
compares, and now you've exposed the plaintext. Obviously real-world scenarios
are typically more complex than this, but the attack vector is nonetheless
available.

It takes me a while (read: I don't have time right now) to properly absorb
specs like this (my formal background is in, of all things, mechanical
engineering), so I can't speak completely confidently about whether or not
there's a nonce mixed into AES-GCM-SIV specifically. That disclaimer aside, I
believe there is; the key lengths _are_ longer, and the AES-GCM-SIV construct
does still involve a nonce. If I had to speculate, I would say that the nonce
and key are distributed together, as if they were a single unit, and this is
responsible for the longer key. But that's really as far as I'm willing to
speculate, and should be taken with a huge grain of salt!

