
Maybe Skip SHA-3 - kungfudoi
https://www.imperialviolet.org/2017/05/31/skipsha3.html
======
remcob
> SHA-3 did introduce something useful: extendable output functions

Much more than that!

Keccak (SHA-3) introduced the Sponge construction. This goes _way_ further
than extendable output functions: It allows you to build all the symmetric
cryptographic primitives from one basic element! This means: hashes, PRNGs,
MACs, encryption, AEAD, etc. The only thing you need that looks like crypto
code is a single permutation function. This significantly reduces the amount
of security critical code to write. If you talk about hardware
implementations, the permutation function takes less die space than SHA2, and
optimizes all these primitives.

The sponge construction and all it's derivatives are proven secure assuming
the permutation function is. And assuming SHA3 is secure, the permutation
function is too. The alternative, the Merkle–Damgård construction, has a bunch
of known problems. It's also used by pretty much every thing else, so it made
sense to diversify.

Honestly, I'm a bit sad by the reputation SHA-3 is getting. It is an eye-
opening elegant design that makes all previous crypto look overly complex.

~~~
snakeanus
> It allows you to build all the symmetric cryptographic primitives from one
> basic element! This means: hashes, PRNGs, MACs, encryption, AEAD, etc

Isn't this true for most cryptographic hash functions? MAC with HMAC or H(key
|| m), encryption with a CTR-like mode on top of the MAC, etc.

~~~
tptacek
Observe also that Chapoly, which is built on a stream cipher generated from a
hash running in counter mode, written by basically one of the pioneers of
ciphers-from-hash-constructions, does not use its hash core as its MAC; it's
simply faster to use a polynomial MAC.

~~~
remcob
With a sponge construction the AEAD is a one-pass operation. You feed
plaintext in, out comes ciphertext. In the end you crank it one more time and
out comes the MAC/tag.

This is unlike other constructions, where you need to go over the data twice,
with two different crypto algorithms (encryption + mac) that may or may not
share a core component.

~~~
tptacek
OCB does the same thing, without a sponge construction.

------
JoshTriplett
SHA-3 does seem to have relatively little to offer by way of incentives to
switch. "Just as good" isn't motivation, and any notions of higher
cryptographic strength haven't been extensively discussed. "Easier to
implement in hardware" will be more compelling when such hardware exists.

I'm curious about the statement that SHA-3 is slow; it links to
[https://www.imperialviolet.org/2016/05/16/agility.html](https://www.imperialviolet.org/2016/05/16/agility.html)
, which doesn't seem related, and matches the previous link. I wonder if that
was supposed to link to somewhere else, like [http://bench.cr.yp.to/results-
sha3.html](http://bench.cr.yp.to/results-sha3.html) (as linked from
[https://www.imperialviolet.org/2012/10/21/nist.html](https://www.imperialviolet.org/2012/10/21/nist.html)
)?

From that, SHA-3 certainly doesn't run significantly faster than alternatives
(variants of BLAKE do indeed outperform it), but it seems roughly on par with
SHA-256/SHA-512. But "on par" doesn't give any incentive to switch.

I wonder how much relative attention the SHA-3 winner (Keccak) gets compared
to other alternatives, like BLAKE?

~~~
jasode
_> I'm curious about the statement that SHA-3 is slow; [...] I wonder how much
relative attention the SHA-3 winner (Keccak) gets compared to other
alternatives, like BLAKE?_

Coincidentally, I ran a bunch of hash performance benchmarks last week. These
were my findings:

    
    
      test:  hash a 500MB block of memory.
      hardware:  Intel Core i7-5820K Haswell-E 6-Core 3.3GHz
      compiler:  MSVC2017 (19.10.25019), 32-bit exe:
        blake2sp - official reference code[1]     153MB/sec
        SHA3 - Keccak official reference code[2]   12MB/sec
        SHA3 - rhash sha3[3]                       45MB/sec
        SHA3 - Crypto++ library v5.6.5[4]          57MB/sec
        SHA256 - Crypto++                         181MB/sec
        SHA256 - MS Crypto API[5]                 113MB/sec
        SHA1 - MS Crypto API                      338MB/sec
        MD5 - Crypto++                            345MB/sec
        CRC32 - Crypto++                          323MB/sec
    

The conclusion is that the fastest SHA3 implementation (Crypto++ lib with its
assembly language optimizations) is more than 2x-3x slower than SHA256. I
can't speak for SHA3 implemented in FPGA/ASIC but as far as C++ compilation
targeting x86, it's slow. I've been meaning to try the Intel Compiler to see
if it yields different results but haven't gotten around to it yet.

Blake2sp is fast. The official reference code is not quite as fast as Crypto++
implementation of SHA256 but it's faster than Microsoft's Crypto API of
SHA256. (There are several variants of BLAKE and I chose blake2sp because
that's the algorithm WinRAR uses. I think the specific variant of BLAKE that
directly competed with Keccack for NIST standardization is slower.)

[1] [https://github.com/BLAKE2/BLAKE2](https://github.com/BLAKE2/BLAKE2)

[2]
[http://keccak.noekeon.org/files.html](http://keccak.noekeon.org/files.html)

[3]
[https://github.com/rhash/RHash/blob/master/librhash/sha3.c](https://github.com/rhash/RHash/blob/master/librhash/sha3.c)

[4] [https://www.cryptopp.com/](https://www.cryptopp.com/)

[5] [https://msdn.microsoft.com/en-
us/library/ms867086.aspx](https://msdn.microsoft.com/en-
us/library/ms867086.aspx)

~~~
briansmith
I guess 32-bit x86 performance is maybe not the best benchmark. I think people
aren't optimizing for that ISA to the same extent as they are optimizing for
x86-64, 64-bit ARMv8, or 32-bit ARM.

If you care about performance and you don't have dedicated SHA-256
instructions then on a 64-bit platform you should evaluate SHA-512 as it is
much faster. If you only have 256 bits of storage available then truncate its
output to 256 bits. IIRC, it's about 1GB/sec on my Haswell laptop.

~~~
jasode
_> I guess 32-bit x86 performance is maybe not the best benchmark._

I compiled for 32bit instead of 64bit because I wanted the same executable to
also run on a 32bit Macbook. When Thomas Pornin ran benchmarks[1] in 2010 for
both 32bit & 64bit, the SHA256 hash performance didn't change as much as the
SHA512. I'll recompile for 64bit and report back if there was a massive
difference.

[1] [https://stackoverflow.com/questions/2722943/is-
calculating-a...](https://stackoverflow.com/questions/2722943/is-calculating-
an-md5-hash-less-cpu-intensive-than-sha-family-functions/2723941#2723941)

~~~
baby
SHA-3 lanes are 64 bits. In 64 bits arch they can use full registers for
operations. I'd bet it would be way faster on 64bit

~~~
npscalar
Mismatches can show an interesting property. It is likely that SHA-256 is
slower than Keccak on 64-bit platforms, and that SHA-512 is slower than Keccak
on 32-bit platform.

------
snakeanus
> SHA-3 did introduce something useful

To me the most useful part of SHA-3 is that you don't need to use HMAC as it
is not vulnerable to length extension attacks. Meaning that it's much faster
than SHA-2 when used as a MAC construction.

~~~
gcp
But aren't we trying to get away from MACs into authenticated ciphers?

~~~
tptacek
An authenticated cipher is usually just a composition of a cipher and a MAC.

~~~
gcp
Presumably, if you're using the composition, it's hardened against the length-
extension attack in the first place, right?

I mean the advantage seems to only be there when you're cobbling together your
own constructions, i.e. doing what you shouldn't be doing.

~~~
tptacek
Length extension attacks are idiosyncratic to hash functions with a particular
underlying design (Merkle-Damgard compression). They are the reason we have
HMAC. AEADs built on MACs that are not built on general-purpose hash functions
don't have that problem.

------
ktta
My suggestion after looking around for the best hash function is SHA-256. Not
SHA-512, not SHA-512/256\. Not even SHA-224 because it offers no performance
benefits.

BLAKE2 is a good candidate right now considering speed. But SHA-256
instructions will only get more prevalent which will take care of the speed
issue.

There have been speed comparisons between SHA-256 and BLAKE2 without SHA
instructions[1]. BLAKE2 seems like good option then, with the optimized code.

But once SHA extensions use hardware instructions, it gets a 5.7x boost which
blows BLAKE2 out of the water.[2] There are instructions for SHA-256 in NEON
too although I couldn't find any benchmarks. I'd say it'll definitely be
faster than BLAKE2 with HW instructions.

Combine that with SHA256 being uber popular everywhere, that'll be one less
headache.

I also doubt SHA-256 will be over thrown in the near future, atleast until
SHA-3 instructions become commonplace, which I anticipate will take about a
decade from now.

[1]:
[https://news.ycombinator.com/item?id=14357197](https://news.ycombinator.com/item?id=14357197)

[2]:
[https://github.com/randombit/botan/issues/807#issuecomment-2...](https://github.com/randombit/botan/issues/807#issuecomment-270652403)

EDIT: Looks like SHA-224 might be the best option, considering it has the same
core with a truncated output which will make it immune to length-extension
attacks.

~~~
tptacek
The advantage to 224 or 512/256 is that they don't expose the full register
state of the hash at output and thus, like Blake2 and SHA-3, aren't vulnerable
to length-extension attacks: you can (though it would be idiosyncratic to do
so) use a simple prefix MAC with them, rather than HMAC.

~~~
RcouF1uZ4gsC
With SHA 224, there is only 32 bits worth of information that are lost. Isn't
a LEA "feasible" by generating the ~ 4 billion messages and finding the 1 that
validates.

With 512/256 this is infeasible.

Is this a real concern with 224 in practice?

~~~
tptacek
224 is 512/224.

~~~
RcouF1uZ4gsC
According to
[https://en.m.wikipedia.org/wiki/SHA-2](https://en.m.wikipedia.org/wiki/SHA-2)

There is SHA 224 which has 256 bit internal state and SHA 512/224 which has
512 bit internal state. In my previous comment, I was referring to the former.

~~~
ReidZB
This is how it's defined in FIPS-180, more or less (SHA-224 is SHA-256/224).
This confusion is why I like explicitly stating SHA-256/x or SHA-512/x.

------
aicez
The assessment that SHA-3's performance is bad completely disregard the
results from hardware benchmarking (FPGAs and ASICs). All the hardware results
have shown conclusively that Keccak is multiple times faster than SHA-2. While
I can't say that the algorithm is going to be supported in HW, with enough
usage it might be the case.

Edit: Removed my comment regarding BLAKE as it was incorrect.

~~~
agl
By selecting primitives designed for hardware (AES and GHASH) rather than
primitives that use operations that are commonly applicable (ChaCha20 and
Poly1305), we've ended up with extra hardware to support AES and GHASH. But
it's not clear that was actually a good idea, or just path dependence.

ChaCha20, Poly1305, BLAKE2 benefit from improvements that benefit a wide-range
of applications, while SHA-3, AES and GHASH do not. Thus the "cost" of high
performance support for the former can be amortised over a much wider base.

~~~
aicez
I suspect that the extra hardware support is not a major concern. This might
be because any company that has a true security concern will eventually need
to designated an area on its silicon for cryptographic purposes. This area
will be security hardened to protect against any side-channel attacks.

Also, sharing HW resource for cryptographic purposes is not possible for any
device that needs to pass certain security certification.

Edit: Typo and additional comment

~~~
tptacek
It's already a problem. The most popular authenticated encryption construction
is AES-GCM, which is/was difficult to implement safely on popular mobile
platforms because their ARM ISAs required side-channel-prone table-based
implementations to be performant. We had to select, in protocols, between
Chapoly and GCM to get safety and performance on all those platforms.

Most vulnerabilities in cryptosystems happen in the joinery. Anything we can
do to eliminate joinery is going to make our cryptosystems more resilient.
Selecting _new_ primitives that will require hardware support to be performant
seems like an own-goal.

As someone who has done a number of audits for certified devices, I don't
think your statement about shared hardware is accurate. Are you talking about
FIPS 140?

~~~
aicez
That's the reason why I said "suspect" =). I do not claim to know the exact
reason for the selection. In any case, the standard is finalized. If you're
concern about this being the problem for the next standard which will likely
to affect the use of AES-GCM, I suggest you participate in the current
cryptographic contest that would target authenticated encryption: CAESAR
([https://competitions.cr.yp.to/caesar-
submissions.html](https://competitions.cr.yp.to/caesar-submissions.html)). I'm
not sure how this will affect the overall usage of authenticated encryption in
the industry, but this is currently one of the main topics of interest for
cryptographic researchers.

"As someone who has done a number of audits for certified devices, I don't
think your statement about shared hardware is accurate. Are you talking about
FIPS 140?"

Yes. Is my understanding incorrect? I'd like to be informed if this is the
case. Thanks.

~~~
tptacek
There are FIPS certification levels where shared hardware footprint is an
issue, but most commercial devices don't need to ship devices with that
certification.

I really don't care about what the standards say; thankfully, the important
standards, like TLS, aren't bound by what NIST standardizes.

------
simias
A rather minor point but:

>[The di­ver­sity of cryp­to­graphic prim­i­tives] con­tributes to code-size,
which is a worry again in the mo­bile age

In the "mobile age" we use embedded processors with gigabytes worth of RAM and
several times the amount of NAND storage, I don't really think a SHA-3
implementation is going to tip the boat over.

The author's other points seem very fair though.

~~~
agl
I can confirm that the code size of BoringSSL is something that we worry about
for mobile apps. A 50KB increase (which wouldn't be much for an optimised
SHA-3) would raise questions.

I admit that when I see the 100MB+ size of apps that I have to download, I do
wonder why. But I assume that it would be even worse were people not worrying
about this stuff.

~~~
Dylan16807
> A 50KB increase (which wouldn't be much for an optimised SHA-3) would raise
> questions.

Can you elaborate on that? That's orders of magnitude bigger than a naive
implementation, and wouldn't fit in L1 cache either.

------
richdougherty
> As I've mentioned before, diversity of cryptographic primitives is
> expensive. ... exponential number of combinations ... limited developer
> resources ...

It's a good point that diversity is expensive. There's a counter-argument too:
if we all decide to use one thing, but it goes wrong - then we're in serious
trouble. If we have multiple approaches, then the damage is lessened if one of
them turns out to have problems.

In other words, I think it's better to have a variety of approaches to manage
risk, even if it does impose costs.

~~~
tptacek
The track record on that design strategy is not good. We've been pursuing
designs with cryptographic diversity built in for several decades, and the
result has been decades of terrible vulnerabilities --- almost none of which
have been mitigated by diversity!

~~~
baby
Have we had some big crypto failures besides "everyone was using md5" and
"everyone was using DES"?

~~~
tptacek
Yes: for instance, the TLS BEAST vulnerability, which affected all the block
ciphers in that version of TLS, because the protocol standardized
interchangeable ciphers and a message encryption construction based on CBC as
separate components.

Ironically, when the RC4 attack refinements were published, for awhile sites
were forced to choose between a vulnerable CBC implementation and using the
vulnerable RC4 cipher.

With the exception of MD5 in X.509 certificates and RC4 in TLS, has there been
another cryptographic vulnerability that required the retirement of a cipher?
SWEET32 is about as close as you come to that, right?

And, obviously, there have been many more problems, even just in TLS, than
X.509 MD5 and RC4.

~~~
baby
Wasn't that more because we had not really thought of aead constructions? I
mean we didn't even know we had to mac after encrypting.

If we had more stream ciphers at the time we would not have been stuck with
beast or rc4.

Imo sweet32 was just the weak but well-marketed cherry on the top to get rid
of the previous generation. We just had the sha-1 collision.

~~~
tptacek
We had plenty of good ciphers available in TLS when this happened, but because
the BEAST bug was in joinery and not in the cipher itself, it didn't help.
Cipher agility didn't help, and in fact probably hurt us.

~~~
niftich
Which "good" ciphersuites [1] were usable in TLS 1.0 when BEAST was made
public in June 2011; with BEAST affecting only TLS 1.0?

TLS 1.0 per RFC 2246 defined DES40_CBC, DES_CBC, 3DES_EDE_CBC, RC4_40,
RC4_128, and IDEA_CBC.

RFC 4132 added CAMELLIA_128_CBC and CAMELLIA_256_CBC in 2005, RFC 4162
proposed SEED_CBC in 2005, and RFC 6209 in April 2011 informationally
specified ARIA_128_CBC, ARIA_256_CBC, ARIA_128_GCM, and ARIA_256_GCM, but the
GCM ones only work in TLS 1.2 or later. Same story with CAMELLIA_*_GCM.

Informational RFC 5469 in 2009 deprecated the use of DES and IDEA
ciphersuites, so that left RC4, 3DES_EDE, the Korean ciphers SEED and ARIA,
and CAMELLIA, with the block ciphers in CBC. Pretty much no one supported the
latter three, and when Mozilla rejiggled the ciphersuites in 2013, support for
these was removed entirely [2].

Your point still stands about BEAST being a flaw in the glue rather than the
ciphers themselves, but I believe the parent poster makes a stronger point
about the complete lack of AEAD constructions before TLS 1.2, and the lack of
other stream ciphers which were more self-contained in TLS 1.0 and thus
unaffected.

[1] [https://www.iana.org/assignments/tls-parameters/tls-
paramete...](https://www.iana.org/assignments/tls-parameters/tls-
parameters.xhtml#tls-parameters-4) [2]
[https://news.ycombinator.com/item?id=12393683](https://news.ycombinator.com/item?id=12393683)

~~~
dfox
I feel that tptacek's point is that excessive cipher agility in TLS precluded
adoption of AEAD (and "unconventional" modes in general), because it was built
on assumption that the interchangeable primitive is the block cipher itself
(ie. keyed primitive that transforms n bit input to n bit output) and not the
combination of cipher+mode (which transform arbitrary length input into some,
preferrably longer, output). You cannot plug AEAD into framework that expects
block cipher and MAC.

(Due to how all this is really implemented, this in fact was not technical
problem, but problem of "design culture" or something like that)

------
hackcasual
The SHAKE family of functions (and cryptographic sponges in general) are very
cool, but like the article says, SHA-3 is weakened by failing to focus on
performance. If SHA-2 is sufficiently secure (which looks more and more the
case) then a faster SHA-2 would have been a better way of looking at what was
wanted out of SHA-3.

Faster primitives means more PBKDF rounds, more usage of hashes to secure
underlying data.

~~~
tptacek
If you're concerned about the primitives making up your password hash, you
shouldn't be using PBKDF2 to begin with; better to use a purpose-built KDF
like scrypt, bcrypt, or argon2.

------
dreamthtwasrome
Speed isn't a black-and-white panacea property, cost of brute-forcing (ram,
cpu, time and money) and other attacks should also be prioritized for a given
use-case (ie argon2, scrypt). Fixating on a single, fast algorithm and making
it a near universal monoculture makes it easier for well-funded state and
corporate entities to reduce their costs to generate a hash collison or do
something else nasty because they only have to invest in one kind of possibly
cheaper/simpler ASIC building-block.

Having several different, but throughly sensible algorithms who's strengths,
weaknesses and limitations are well known is a better approach than saying
"everyone use X" a-la NESSIE. Users with Paradox of Choice going on can pick
the lunch special, AES.

------
rihegher
in short: "SHA-3 should probably not be used. It offers no compelling
advantage over SHA-2 and brings many costs."

------
matt_wulfeck
Why don't we use hash functions that combine the output of two discrete
hashing functions and combine the results?

It seems to me it's easy to break sha or md5, but what about breaking two
fundemantally different algorithms _at the same time_?

Implementation seems as easy as concatenating the two hex outputs into a
single string and comparing the results.

I'm curious to what others think of this idea.

~~~
agl
It turned out that breaking the combination is SHA-1 and MD-5 is much easier
than people expected:
[https://www.iacr.org/archive/crypto2004/31520306/multicollis...](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf)

~~~
tptacek
Shameless plug (this is one of Sean's, though):

[http://cryptopals.com/sets/7/challenges/52](http://cryptopals.com/sets/7/challenges/52)

------
baby
I believe this is a good complememt to agl's post with a focus on K12 instead:
[https://cryptologie.net/article/393/kangarootwelve/](https://cryptologie.net/article/393/kangarootwelve/)

------
yuhong
The preimage strength debacle is probably worth mentioning

