
NIST Draft: The KMAC, TupleHash, and FPH Functions [pdf] - niftich
https://www.ietf.org/mail-archive/web/saag/current/pdfxrzR9rycq3.pdf
======
tptacek
In case you're wondering, these are new NIST standards for things you can do
with SHA-3 beyond simply using it as a hash.

KMAC is SHA-3's alternative to HMAC. You can use HMAC with SHA-3, but it's
wasteful, since part of the point of SHA-3 is to make the HMAC construction
(which applies the hash function twice to achieve separation between the key
and the data being authenticated) unnecessary.

TupleHash is a construction that reliably hashes an array of byte strings,
some entries of which might be the empty string.

FPH is a construction that exploits parallelism to more quickly hash huge
strings.

------
niftich
These slides from 2015 [1] detail some of the rationale that went into
designing these functions.

Specifically, slide 13 says these 'Domain Separated' functions are "completely
unrelated", ie. they can't be computed from each other. According to them
"this should make it a little harder to shoot yourself in the foot with these
functions".

It appears the designers are realizing that assembling crypto building blocks
in ways that remain secure and make sense is difficult, so they're providing
ready-made versions made for specific uses.

[1]
[http://csrc.nist.gov/news_events/cif_2015/research/day1_rese...](http://csrc.nist.gov/news_events/cif_2015/research/day1_research_200-250pt1.pdf)

~~~
tptacek
Two interesting bits of context:

* Being usable in a straightforward way as a MAC --- something SHA-2 isn't, like MD5 and SHA-1 --- was one of the design criteria for the SHA-3 contest.

* This is also part of a heartening trend in crypto research towards "misuse-resistant" cryptography. The "misusers" in MR crypto are... wait for it... application developers. :)

A good example of a crypto primitive designed specifically for misuse
resistance is Rogaway's SIV mode, which is making a comeback in GCM-SIV. The
most popular example of misuse-resistant crypto is of course the Nacl/Sodium
library.

------
zokier
The others I understand but tuplehash seems bit odd. It essentially just
defines a naive serialization format for tuples, but I would have thought that
generally it would be left to applications to figure it out how to get a
hashable blob from their data structures.

Is there some large application for tuplehashes that explain its inclusion in
a standard? Because I don't know any.

~~~
zAy0LfpBZLC8mAC
Because people often don't realize that if you want to hash a bunch of pieces
of data in a cryptographically secure fashion, you have to serialize them
(with an injective function). People tend to think that just somehow feeding
it all into the hash function will take care of it, so simple concatenation is
common, with occasionally disastrous consequences.

~~~
lotharrr
In particular, there's all sorts of mischief an attacker can do when
serialized(A,B)==serialized(C,D) but A!=C and B!=D. I've seen OAuth
implementations do this, allowing someone to submit the signature from a
"good" message along with "bad" arguments (of their own choosing), with a very
different meaning.

An impatient application developer might just use hash(concat(args)). Raw
concatenation is serialization, but it's ambiguous: s("manage", "able") ==
s("man", "ageable"). The safest approach is to ensure that the serialization
function is reversible (even though you're just going to throw it into a hash,
which is obviously not). When your arguments are arbitrary bytes, you either
use a delimiter character (and escape any places where it appears naturally),
or prefix each string with (a fixed-length representation of) its length.
Either way, if you could conceivably parse the serialized form back into the
original arguments, then you're safe from tricky format-confusion attacks.

~~~
zAy0LfpBZLC8mAC
That's what "injective" means :-)

Also, I think that serialization is actually injective by definition. If you
can't deserialize it, it's not a serialization, it's just an arbitrary
sequence of symbols that you maybe happen to have generated based on some non-
serial structure.

And actually, it's not just the safest way, it is the only way. Though it's
true that the hash is not injective, it is collision resistant (based on
current understanding), that is what makes it a cryptographic hash function.
The only theoretical alternative to (injective) serialization would be to feed
the data into a collision resistant function that takes tuple inputs, which by
definition would be a cryptographic hash function, which would make it
pointless to feed the result into another cryptographic hash function.

The solutions you suggest are good advice, though. Also, don't forget to tag
message types if you use a key for multiple purposes, so one message cannot be
replayed as another type of message.

------
jonstewart
I have been so desperate for an expert-crafted standardized tree-hash.
Hallelujah!

~~~
niftich
Though it does not meet all your criteria -- it's not standardized by an
institute -- BLAKE2 [1] also supports tree-hashing, is expert-crafted from a
SHA-3 finalist, and has seen adoption in the last two years.

Notably, on recent architectures, it's faster than SHA-3, SHA-256, or even
SHA-1.

[1] [https://blake2.net/](https://blake2.net/)

~~~
jonstewart
It is really the standardization that's the thing, as my application would be
hashing of disk images for digital forensics purposes/evidence authentication
purposes. BLAKE2 seems like a fine algorithm otherwise.

------
amluto
Without actually comparing the spec to the Sakura paper [1]: are these
instantiations of the Sakura scheme?

[1]
[http://keccak.noekeon.org/Sakura.pdf](http://keccak.noekeon.org/Sakura.pdf)

~~~
niftich
The FPH (Fast Parallel Hash) schemes are, as their core is CSHAKE, which is a
user-customizable (parameterized) SHAKE, which uses the Sakura construction.

In the Sakura paper's conclusion, they state:

"In the current FIPS 202 draft, the SHA-3 extendable-output functions chosen
by NIST adopt SAKURA for sequential hashing as a special case of tree
hashing." Similarly, the FIPS 202 document refers to SAKURA when it talks
about the XOFs SHAKE128 and SHAKE256:

"The two SHA-3 XOFs are also specified in a manner that allows for the
development of dedicated variants. Moreover, the SHA-3 XOFs are compatible
with the Sakura coding scheme for tree hashing in order to support the
development of parallelizable variants of the XOFs, to be specified in a
separate document." [1]

[1]
[http://www.nist.gov/customcf/get_pdf.cfm?pub_id=919061](http://www.nist.gov/customcf/get_pdf.cfm?pub_id=919061)

------
zaroth
But is it _trustworthy_?

