Hacker News new | past | comments | ask | show | jobs | submit login
Maybe Skip SHA-3 (imperialviolet.org)
241 points by kungfudoi on May 31, 2017 | hide | past | favorite | 182 comments

> SHA-3 did introduce something useful: extendable output functions

Much more than that!

Keccak (SHA-3) introduced the Sponge construction. This goes _way_ further than extendable output functions: It allows you to build all the symmetric cryptographic primitives from one basic element! This means: hashes, PRNGs, MACs, encryption, AEAD, etc. The only thing you need that looks like crypto code is a single permutation function. This significantly reduces the amount of security critical code to write. If you talk about hardware implementations, the permutation function takes less die space than SHA2, and optimizes all these primitives.

The sponge construction and all it's derivatives are proven secure assuming the permutation function is. And assuming SHA3 is secure, the permutation function is too. The alternative, the Merkle–Damgård construction, has a bunch of known problems. It's also used by pretty much every thing else, so it made sense to diversify.

Honestly, I'm a bit sad by the reputation SHA-3 is getting. It is an eye-opening elegant design that makes all previous crypto look overly complex.

If I may (hopefully accurately) summarise your argument: the sponge construction means that we could just implement a single permutation in hardware / optimised software and build everything from it, thus saving die area / code size, and complexity.

But deciding to skip SHA-3 doesn't preclude any of that. SHA-3 is the hash standard itself, and even the Keccak team appear to be pushing K12 rather than SHA-3. It seems unlikely that a full set of primitives built around the Keccak permutation would choose to use the SHA-3 parameters at this point.

Indeed, SHA-3 adoption might inhibit that ecosystem by pushing towards those bad parameters. (I think there's wide agreement that SHA-3 is too conservative.)

On the other hand, you might want SHA-3 because you believe that it'll result in hardware implementations of the permutation and you hope that they'll be flexible enough to support what you /really/ want to do with it. I'm not sure that would be the best approach even if your goal was to move to a permutation-based world. Instead I would nail down the whole family as you would like to see them and use small chips to try and push it. K12 is probably a step in the right direction.

Yes, I'm much more rooting for the Sponge construction than Keccak/SHA3 specifically. I'd like to see any permutation function get broad hardware support.

> I think there's wide agreement that SHA-3 is too conservative.

As you probably know, SHA3 originally had 18 rounds, but this was raised to 24 during the competition. This of course hurt performance. In some early proposals for a stream cipher, the authors reduced the rounds to 12 to get better performance.

They split the rounds in half (down to 12 rounds) for Kangaroo Twelve: https://cryptologie.net/article/393/kangarootwelve/

I agree with you. But at the same time I wouldn't toss SHA-3 which is the answer for many applications. If you're agile and you know what you're doing sure. But for people who don't know what they are doing, following the NIST standard is a safe decision.

In 20 years maybe we'll get hardware support for keccak-f. In the mean time we can get excited at the idea of most of your crypto primitives (kdf, prng, aead, hash, ...) being built on top of a single, simple, short and well audited implementation of keccak-f.

SHA-3 is currently a reasonable choice for effectively zero real world use-cases. SHA-2 has many advantages over it, and I can't think of a single disadvantage.

SHA-3 has practical value as a moderately well-analyzed cipher in the event of rapid catastrophic cryptanalysis of SHA-2, but barring that kind of event its real value is in the general theoretical advancement of sponge constructions.

I think your mis using "world use case". It's a hash function and so it is usable as a hash function.

I'll repeat what was said elsewhere. SHA-2 of vulnerable to length extension attacks.

I said it's not a reasonable choice in the real world. Virtually anywhere you could use it, you would be better off be using a SHA-2 variant instead.

Length extension attacks only apply in cases where you've chosen to use the wrong cryptographic primitive (a hash) for your purpose (a MAC).

Being resistant to common misuses is indeed a feature in real world cryptography. We saw that with DSA where you leak the private key if you happen to reuse a nonce; yes, you shouldn't do that, but it's better to choose an algorithm that doesn't suffer from this in the first place.

I'm not rooting for SHA-3; I'm just saying that it's wrong to dismiss length extension attacks only because they're commonly effective agains misuses of hash functions.

As tptacek points out elsewhere, if this is a concern for you, just use SHA-512/256. It's just as misuse-resistant, practically guaranteed to be in any library that would bother to implement SHA-3, faster by a mile, and has been the target of significantly more cryptanalysis.

There are zero practical reasons to choose SHA-3.

> Virtually anywhere you could use it, you would be better off be using a SHA-2 variant instead

I don't see why, unless you're google and you care about small performance improvement in hash implementations. My point of view is that SHA-3 (or Blake2/K12) are THE hash functions you should use virtually anywhere.

> Length extension attacks only apply in cases where you've chosen to use the wrong cryptographic primitive (a hash) for your purpose (a MAC).

Two things:

1) so what? People actually use it this way. Are you just going to say "oh they're using it wrong there is nothing we can do about it"?

2) why is it wrong to use a hash to create a MAC? It is the most logic thing to do imo. The H(key | data) makes total sense to me. It is only when you know SHA-2's internals and their weaknesses that you realize you can't use it like that.

A cryptographic operation making sense to you doesn't make it reasonable. H(password) makes sense to a lot of people, but we're not going to sub in Argon2 for every use of a hash function as a result.

I'm with you on BLAKE2 as a solid choice, but at least it has an advantage in some areas over SHA-2 (speed, margin of security). SHA-3 has no advantages over SHA-2 (particularly SHA-512/256, which is going to be available in any library that would bother to implement SHA-3).

SHA-3 is slower, has been the result of less cryptanalysis, has less battle-tested implementations, and is less widely supported. Practically the only thing it has going for it is a higher number.

You will be hard-pressed to find a respected cryptographer advocating general adoption of SHA-3 — being in the field, I'd wager you'll find 100 who encourage developers to choose SHA-2 for every one who backs SHA-3. Case in point, in this thread alone you have Colin Percival and Thomas Ptacek coming to the defense of SHA-2, not to mention Adam Langley authoring the original post.

If your hash function behaves like a random oracle it should allow you to do h(key|data) hashing password requires more than a random oracle because of the input small entropy so your argument is moot.

The fact that sha-3 has received less analyzis has already been debunked Here:https://twitter.com/aarontoponce/status/869973259969675264 .

K12 has the advantage of the speed. See https://cryptologie.net/article/393/kangarootwelve/

Appeal to authority does nothing. Plenty of people are actually supporting or making use of keccak/sha-3 already. To begin with NIST's cryptographers chose it. David Leon Gil was into it. People talked about heavy usages of SHA-3 in Ethereum or Chain here. I can tell you that one of the originator of SHA-3 created AES (Daemen). Jp aumasson (creator or blake2), samuel neves and philipp jovanovic are using sponges woth NORX. Mike Hamburg is building Strobe based on keccak. I can name drop as well you see.

Plus I'll say that I support it as a wannabe cryptographer. That must count for something :D

> hashing password requires more than a random oracle because of the input small entropy so your argument is moot.

As I pointed out, SHA-512/256 allows you to safely do H(key|data).

But your original argument was along the lines of "H(key|data) makes sense intuitively. You have to understand the details in order to realize it's unsafe. Therefore a hash that allows H(key|data) is better".

This same line of reasoning, without modification, works to argue in favor of deploying Argon2 any place a hash is used. It's safe to use in that way, it additionally is safe to use for H(password), and you accept the exact same list of downsides.

Misuse-resistance constructions are important. But the problem with H(key|data), as both I and cperciva have pointed out, is that you've chosen an entirely inappropriate cryptographic primitive for the task at hand. A glaring indicator of this is stuffing multiple concepts into one function argument. A block cipher analogue would be something like AES-ECB(key, iv|plaintext). If you're cramming things into the wrong arguments of cryptographic primitives, you're generally going to have a very, very bad time.

That said, even by this criteria, SHA-512/256 is on the same footing as SHA-3. And has none of the downsides. For the third time, SHA-3 has no advantages here whatsoever.

> The fact that sha-3 has received less analyzis has already been debunked Here:https://twitter.com/aarontoponce/status/869973259969675264 .

There is nothing in that thread that even remotely suggests that the 9 years of analysis Keccak has received as a candidate for SHA-3, and then as a virtually-unused choice for SHA-3 carries the same weight as the 16 years of analysis we have of SHA-2 having been widely deployed. Not to mention the comparatively short amount of time we've been cryptanalyzing Sponge functions compared to the virtually 40 years we've had to understand Merkle-Damgård.

> Appeal to authority does nothing.

I am not arguing that they are right because they are authorities. I am pointing out that well-respected authorities are also in this thread and also making compelling arguments against generalized use of SHA-3.

Ahem: "It is well known as a fallacy, though it is most often used in a valid form."[1]

> NIST's cryptographers chose it.

As a backup, in the event of catastrophic cryptanalysis against SHA-2. From a NIST employee on NIST's own press release[2] announcing the SHA-3 winner:

    "SHA-3 is very different from SHA-2 in design," says
    NIST's Shu-jen Chang. "It doesn't replace SHA-2, which
    has not shown any problem, but offers a backup. It
    takes years to develop a new standard, and we wanted to
    be prepared in case problems do occur."
As the SHA-3 competition went on and SHA-2 continued to resist the analysis that sunk SHA-1, NIST changed the criteria they were using to judge entrants. Instead of looking for a hash function that was all-around an improvement, they heavily prioritized looking for a hash function that was structurally different from SHA-2 (this is, in my opinion, essentially the only reason BLAKE2 wisn't chosen). SHA-3 won, not because it is better than SHA-2. It won because it was good enough while being different from SHA-2.

Being different from SHA-2 is not a sufficient reason to use it for general purposes today.

> People talked about heavy usages of SHA-3 in Ethereum or Chain here.

Having been chosen for Ethereum doesn't mean it was the best choice. It probably wasn't. Note that this is an actual argument from authority.

> I can tell you that one of the originator of SHA-3 created AES.

Irrelevant to whether or not you should actually be deploying SHA-3. Nobody's arguing it's insecure. We're arguing that it's always an inferior choice.

> Jp aumasson (creator or blake2), samuel neves and philipp jovanovic are using sponges woth NORX. Mike Hamburg is building Strobe based on keccak.

Again, irrelevant to the choice of SHA-3. Sponges are cool. I'm looking forward to more widespread application of sponges in the future. People working on new ciphers built on top of sponges does not advocate for the use of SHA-3, a specific sponge.

> I can name drop as well you see.

You've named NIST, who is explicitly on record saying that SHA-3 is there as a backup. You've named a guy who was "into it", you sort-of-but-not-really named "people/" who've said that Ethereum uses it (though it's not actually used in Ethereum's proof of work function, so I'm not sure you could call that "heavy" use). You've named one of the authors of SHA-3, and you've named people who are building totally different things based on the same underpinnings, but I can find no evidence of any of these named people are actually encouraging anyone to deploy SHA-3 over SHA-2.

I would love for you to link me to a single location where one of these named people advocates that. I don't think you can.

[1]: https://en.wikipedia.org/wiki/Argument_from_authority [2]: https://www.nist.gov/news-events/news/2015/08/nist-releases-...

Even after 16 yrs, SHA-2 has actually not received so much cryptanalysis effort, the majorty of which comes from the same group, and the effort is further diluted into two different 32/64-bit designs [1]. Add to that the absence of openness from the SHA-2 designers themselves…

[1] http://eprint.iacr.org/2016/374.pdf

> SHA-512/256

Sure, if you know what you're doing :) In the mean time I will keep recommending SHA-3.

> the virtually 40 years we've had to understand Merkle-Damgård

I think at this point we've understand that M-D sucks.

> blabla

I don't see why I would go on on a name dropping battle :)

I'll sum up our battle here: some people are advocating SHA-3, some people are not.

I'm in the first category. You're in the second. Your only arguments are that SHA-3 is "an inferior choice" and "big name is against SHA-3". Let's see how this pans out.

As someone neutral about the subject, my summary of the discussion looks a bit different.

stouset claimed that there is no practical advantage of using SHA-3 over SHA-512/256. He gave three main arguments: 1. SHA-3 is slower. 2. SHA-3 was subject to less cryptanalysis. 3. Virtually all respected cryptographers do not recommend SHA-3 over SHA-2.

On the other hand, baby claimed that there are practical advantages of using SHA-3 over SHA-2. However, even after reading the thread twice, I couldn't find out what they are supposed to be. (I'm leaving out the LEA here, since it does not affect SHA-512/256 either.)

Baby claimed that argument 2 was debunked, but the supporting link pointed to a thread that stated nothing to that effect. He also claimed that argument 3 carries no weight since it's an appeal to authority. While an appeal to authority is not a valid scientific argument, it's about the best practical argument you can make as a non-cryptographer when choosing cryptographic algorithms.

This prompted me to post my thoughts here: https://cryptologie.net/article/400/maybe-you-shouldnt-skip-...

SHA-2 of vulnerable to length extension attacks.

SHA2 is vulnerable to length extension attacks the same way as bread is vulnerable to being buttered. If you're using bread and you're worried about buttering, you fundamentally misunderstand what bread is.

I think you lost everyone here :D

If length extension attacks are a problem, you're using a hash function for something which a hash function is not intended to be used for; you probably wanted a MAC instead.

1. Yes. And people do that. I'm nkt talking about me. I'm talking about things I see in my work (I'm a consultant).

2. People should be able to do that. If you're expecting your hash function to behave somehow like a random oracle there is no reason that h(key|data) doesn't work. I see people using sha-2 to derive keys all the time. Even this is unsafe depending on how you do it.

Why must MAC mean "HMAC RFC2104?"

It doesn't. I said MAC, not HMAC.

Hash functions have a pigeonhole principle problem, but that is also the point.

... if you use full-width SHA-2 256 or SHA-2 512. If you have enough design freedom to choose SHA-3, just use SHA-2 512/256.

Replying here because you said most of the things I was going to say. This article just makes me sad -- Keccak was chosen for SHA-3 specifically because it turned out SHA-2 wasn't as broken as we feared, and although slow Keccak brings to the table cryptographic diversity and the slew of cool things that can be done with a sponge construction that are now possible using standard cryptographic primitives. NIST made clear that by the time of finalist selection, the point of SHA-3 was NOT to replace SHA-2. OP should know better.

But NIST didn't make that clear. The thing they made clear is in fact the exact opposite and the OP mentions it. They called it SHA-3, saying to the world that this was a better SHA-2. If they wanted to make clear that SHA-3 wasn't to replace SHA-2, they should really have named it something else. Now everybody not knowledgeable to the details will sadly assume the 3 is better than the 2.

3 doesn't imply better, just n+1.

People who aren't sweating the details aren't the intended audience for NIST.

They said it explicitly ehen they announced the winner.

It is better.

NIST made clear that by the time of finalist selection, the point of SHA-3 was NOT to replace SHA-2. OP should know better.

That's exactly what this post is about. That we shouldn't replace SHA-2 with SHA-3, just what NIST ordered :)

None of these are good reasons to build SHA-3 into a design today, which is the point of the post. He's not criticizing SHA-3's design; pretty much everyone thinks it's interesting and valuable. The point is that the resulting standard doesn't have a lot of immediate practical benefit for real systems.

I don't think that's true. TuppleHash is THE solution for fingerprints. SHAKE is useful for really a lot of different purposes. Imo self made protocols could really benefit from STROBE today as well.

And all the SHA-3-derived functions in SP 800-185. For instance, ParallelHash is super fast and a NIST standard.

A good and simple example of a Keccak benefit is the KMAC, which is a lot simpler than the construct HMAC needed by the Merkle-Damgård family of hashes.

But the article is right that newer crypto, or more crypto, is not necessarily a good thing. It is also true that there is no need for people to rush to Keccak - SHA-256 and SHA-512 are not broken. It is mostly an interesting function for development of new cryptographic constructs.

It is simpler, but how meaningful is that in practice? HMAC is universally available in crypto libraries, far more so than SHA-3 is, and since HMAC needs to interoperate (even more so than ciphers do), it's practically never botched in code.

I sort of object to the idea that HMAC is complicated, also. It's more complicated than KMAC which barely deserves a name, but any intern can implement HMAC reliably from the diagram on the Wikipedia page. Hash twice, domain-separating each hash with a simple constant. Done.

The only implementation flaw stemming from HMAC in real systems is shared by KMAC (variable-time comparison functions).

People still occasionally implement HMAC as `H(K||msg)` or `H(msg||K)`, use defective AES modes, use IV's and nonces inappropriately, use ARC4, use MD5, and use SHA-1. Pretty sure a quick Wikipedia read would have stopped Adobe from DES encrypting passwords as well.

Simpler means less likely to misunderstand or fuck up, which is a good property to have.

I did also point out that it is mostly interesting for development of new constructs. There's very little point in replacing a working, properly implemented HMAC with a KMAC at this point.

... In the future, however, Keccak may be used for more constructs (aead, for example), at which point a shared "core" for all the crypto you use would be an interesting thing to have, reducing the amount of trusted constructs, sharing optimizations (handwritten asm or hw), and so forth. At that point, migrating would be a sensible thing to do.

I'll add that Keccak brought the duplex construction as well! Root to keyak, ketje and Strobe!

Another advantage is that SHA-3 just uses AND, XOR, ROT, NOT, but not SHR (logical shift right) which allows for much easier formal analysis of the algorithm!

Could you comment(or provide a link) on why SHR is more difficult for formal analysis than ROT?

Which is ROT(ate), one of the listed options, even though you have to express it with shifts when writing C.

I'm not sure I understand what you two mean.

1) How does not having SHR makes it easier to audit?

2) how are sha-3 implementations actually using SHR not making the parent's comment moot?

A rotate operation is invertible, a logic shift by itself is not.

Another fun one is x ^= x >> n with n some strictly positive constant, this is also invertible. It's used a lot in the murmurhash family of non-cryptograhic hash functions.

Why does invertibility matter? AND and OR are not invertible, but they are used?

Here's an answer: https://crypto.stackexchange.com/questions/47872/why-is-kecc...

1) it forces you to allocate more memory (to do the Davies-Mayer construction)

2) it complicates the security analysis because you get collisions (if you do not use a permutation, it means you have an injective construction)

And more from http://sponge.noekeon.org/CSF-0.1.pdf section 8.1.3

screenshot here: http://i.imgur.com/4xncceB.png

Indeed, the whole permutation is invertible and I'm not sure why either.

Permutations are always invertible. Otherwise several different inputs will be mapped to the same output, which needlessly reduces your state space.

The inverse is also required for decryption: If you encrypt as ciphertext = permutation(plaintext + key), decryption is plaintext = inverse(ciphertext) - key.

My question is why does Keccak not use a one-way function instead of a permutation.

> The inverse is also required for decryption: If you encrypt as ciphertext = permutation(plaintext + key), decryption is plaintext = inverse(ciphertext) - key.

This is not how encryption/decryption works with Keccak. Keccak is used to create a stream (it is then XORed with the plaintext or ciphertext).

GP disputed SHR is used in the snippet you highlighted.

Fwiw, Ethereum uses Keccak for pretty much all hashes other than the proof of work (which uses an ASIC-resistant function).

> It allows you to build all the symmetric cryptographic primitives from one basic element! This means: hashes, PRNGs, MACs, encryption, AEAD, etc

Isn't this true for most cryptographic hash functions? MAC with HMAC or H(key || m), encryption with a CTR-like mode on top of the MAC, etc.

Observe also that Chapoly, which is built on a stream cipher generated from a hash running in counter mode, written by basically one of the pioneers of ciphers-from-hash-constructions, does not use its hash core as its MAC; it's simply faster to use a polynomial MAC.

With a sponge construction the AEAD is a one-pass operation. You feed plaintext in, out comes ciphertext. In the end you crank it one more time and out comes the MAC/tag.

This is unlike other constructions, where you need to go over the data twice, with two different crypto algorithms (encryption + mac) that may or may not share a core component.

OCB does the same thing, without a sponge construction.

This is interesting, I've always seen it as bad because you had to decrypt first, and then you could verify the tag. But now I understand that this also allows you to have a one-pass operation.

True, the symmetric primitives are sort of interchangeable, but I'd like to see how to build an AEAD using hash functions that doesn't require you process the data twice. I don't think it can be easily done, if at all. It certainly won't do to combine a HMAC with a CTR-mode encryption.

Keccak allows for very simple and efficient implementations of all these primitives.

> H(key ‖ m)

This is insecure for SHA2 (length extension). However, it is secure when done with SHA3. In fact, it's pretty much how you construct a MAC from it, that's how natural it is.

HMAC was invented specifically to deal with the shortcomings of SHA2 and predecessors. For reference, it is H(key ⊕ opad, H(key ⊕ ipad, m)), where opad = 0x5C… and ipad = 0x36…. Compare that with the above and you'll see what I mean with simple and efficient.

Why can't you just use a prefix MAC with SHA-2 512/224?

You can, and you just invented the Sponge construction in the process! :)

The key idea is to only expose part of the state to inputs/outputs.

Sure, but SHA-2 512/224 predates SHA-3 and the Sponge construction by, what, a decade and a half?

And prefix MACs predate all these things by multiple decades (they just happen to be insecure in full-width-output MD hashes).

I don't think the ability to use Keccak as a simple prefix MAC is all that gamechanging. You can hash data once --- and much, much faster --- by using SHA-2.

This is an example of a wide-pipe hash function, not a sponge.

I think a block cipher allows you to build a lot. Cmac for mac, a block cipher is at the center of sha1/2, aes-ctr for prngs, ... But it's rather that it's used as the center piece of some algorithm whereas doing anything with keccak is just natural.

More than that, a block cipher is a keyed permutation — that is, many permutations selected by the key, soooo by fixing the key we can have an unkeyed permutation and use it for sponges!

Or... BLAKE2 uses ChaCha permutation, but includes additions of key words to turn it into a block cipher, and then uses this block cipher in a MD-like construction. Metamorphosis...

Of course, if you take some block cipher, fix the key, and make a sponge from it, nobody guarantees that it will be secure. Just like trying to build a block cipher from some permutation will not result in anything good. So it's not on the same level as building higher-level secure things (such as PRNG) from a primitive.

I think what I'm trying to say is: in crypto you can indeed build many things from other things.

SHA-3 does seem to have relatively little to offer by way of incentives to switch. "Just as good" isn't motivation, and any notions of higher cryptographic strength haven't been extensively discussed. "Easier to implement in hardware" will be more compelling when such hardware exists.

I'm curious about the statement that SHA-3 is slow; it links to https://www.imperialviolet.org/2016/05/16/agility.html , which doesn't seem related, and matches the previous link. I wonder if that was supposed to link to somewhere else, like http://bench.cr.yp.to/results-sha3.html (as linked from https://www.imperialviolet.org/2012/10/21/nist.html )?

From that, SHA-3 certainly doesn't run significantly faster than alternatives (variants of BLAKE do indeed outperform it), but it seems roughly on par with SHA-256/SHA-512. But "on par" doesn't give any incentive to switch.

I wonder how much relative attention the SHA-3 winner (Keccak) gets compared to other alternatives, like BLAKE?

Gah, thank you. I did, indeed, mean to link to bench.cr.yp.to. Fixed.

(The BLAKE2 page (https://blake2.net/) has a graph too.)

> "Easier to implement in hardware" will be more compelling when such hardware exists.

Most things are software; from previous experience we know that it basically takes 10 to 15 years between introduction of a primitive (AES and SHA-2 both standardised around 2000; ISA extensions for mainstream CPUs introduced in 2010-2017). From the software PoV SHA-3 would be a rather large regression in performance with a "maybe it's fast by 2030 if Intel is really nice" attached. BLAKE2 on the other hand is an improvement in performance and offers a function that is more modern overall. (E.g. not length-extensible thus low-overhead keyed hash usage, flexible output sizes, tree hashing built-in)

Yes, perhaps before they choose the next "faster in hardware" crypto algorithm, they also get a commitment from the major chip makers (Intel, AMD, Qualcomm, MediaTek, etc) that they will implement them within let's say 2 years after the contest is over.

Otherwise it's just hope that the chip makers will implement it when the algorithms are chosen, and I'm not sure that's good enough after a carefully orchestrated 5-year process of choosing a next-generation algorithm.

If they can't get that commitment, then they should choose whatever is faster in software (granted all the other factors are more or less equal).

>I'm curious about the statement that SHA-3 is slow; [...] I wonder how much relative attention the SHA-3 winner (Keccak) gets compared to other alternatives, like BLAKE?

Coincidentally, I ran a bunch of hash performance benchmarks last week. These were my findings:

  test:  hash a 500MB block of memory.
  hardware:  Intel Core i7-5820K Haswell-E 6-Core 3.3GHz
  compiler:  MSVC2017 (19.10.25019), 32-bit exe:
    blake2sp - official reference code[1]     153MB/sec
    SHA3 - Keccak official reference code[2]   12MB/sec
    SHA3 - rhash sha3[3]                       45MB/sec
    SHA3 - Crypto++ library v5.6.5[4]          57MB/sec
    SHA256 - Crypto++                         181MB/sec
    SHA256 - MS Crypto API[5]                 113MB/sec
    SHA1 - MS Crypto API                      338MB/sec
    MD5 - Crypto++                            345MB/sec
    CRC32 - Crypto++                          323MB/sec
The conclusion is that the fastest SHA3 implementation (Crypto++ lib with its assembly language optimizations) is more than 2x-3x slower than SHA256. I can't speak for SHA3 implemented in FPGA/ASIC but as far as C++ compilation targeting x86, it's slow. I've been meaning to try the Intel Compiler to see if it yields different results but haven't gotten around to it yet.

Blake2sp is fast. The official reference code is not quite as fast as Crypto++ implementation of SHA256 but it's faster than Microsoft's Crypto API of SHA256. (There are several variants of BLAKE and I chose blake2sp because that's the algorithm WinRAR uses. I think the specific variant of BLAKE that directly competed with Keccack for NIST standardization is slower.)

[1] https://github.com/BLAKE2/BLAKE2

[2] http://keccak.noekeon.org/files.html

[3] https://github.com/rhash/RHash/blob/master/librhash/sha3.c

[4] https://www.cryptopp.com/

[5] https://msdn.microsoft.com/en-us/library/ms867086.aspx

I'll just say this - there are private implementations of Keccak that blow the pants off of those numbers.

Keccak is one of the primary functions in Ethash (Ethereum mining). It is heavily researched and completely destroys SHA2 performance-wise if you have the right implementation.

Also, Keccak doesn't require the construction of a key-schedule, and can be implemented much more elegantly in parallel (SIMD software) and in hardware than SHA2.

I guess 32-bit x86 performance is maybe not the best benchmark. I think people aren't optimizing for that ISA to the same extent as they are optimizing for x86-64, 64-bit ARMv8, or 32-bit ARM.

If you care about performance and you don't have dedicated SHA-256 instructions then on a 64-bit platform you should evaluate SHA-512 as it is much faster. If you only have 256 bits of storage available then truncate its output to 256 bits. IIRC, it's about 1GB/sec on my Haswell laptop.

>I guess 32-bit x86 performance is maybe not the best benchmark.

I compiled for 32bit instead of 64bit because I wanted the same executable to also run on a 32bit Macbook. When Thomas Pornin ran benchmarks[1] in 2010 for both 32bit & 64bit, the SHA256 hash performance didn't change as much as the SHA512. I'll recompile for 64bit and report back if there was a massive difference.

[1] https://stackoverflow.com/questions/2722943/is-calculating-a...

SHA-3 lanes are 64 bits. In 64 bits arch they can use full registers for operations. I'd bet it would be way faster on 64bit

Mismatches can show an interesting property. It is likely that SHA-256 is slower than Keccak on 64-bit platforms, and that SHA-512 is slower than Keccak on 32-bit platform.

What about cShake or better KangarooTwelve? (K12 is to SHA-3 what blake2 is to blake).

Hmm, something is wrong with your benchmark: the absolute values in MB/s are very low for i7, and relative too — blake2sp should be much faster than SHA256.

>blake2sp should be much faster than SHA256.

blake2sp is indeed faster than Microsoft's builtin Crypto API for SHA256. However, it is not as fast as Wei Dai's Crypto++ library implementation of SHA256 that has lots of hand tuned assembly language code.

The official C source code for blake2sp does not have assembly language primitives in it. It's very possible that if an assembly language expert wrote optimizations for blake2sp, it would beat Crypto++ SHA256 performance.

The code I used is really simple. Used files "blake2sp-ref.c" and "blake2s-ref.c" from the BLAKE website. The hash code (no loops) is:

  blake2sp_state S[1];    // BLAKE 1 element array of the struct for state
  blake2sp_init(S, BLAKE2S_OUTBYTES);
  blake2sp_update(S, BufInput, iBuffersize);
  blake2sp_final(S, hashval_blake2sp_bytes, BLAKE2S_OUTBYTES); 
... where iBuffersize is 500MB but I got the same results at larger buffer sizes 1GB+.

With that code, I'm guessing anyone can whip up a C++ project to benchmark blake2sp in 10 minutes. It would be interseting to see what MB/sec that others achieve.

My pure JavaScript implementation of blake2s (which is approximately half the speed of parallelized sp variant) on 2.6 GHz Core i5 hashes at 170 MiB/s. JavaScript! Whatever you do with your benchmark, you're doing it wrong. Also, there is no reason to use such large buffer sizes, I suspect this only makes benchmarks more unreliable.

For real numbers on many platforms, see https://bench.cr.yp.to/results-hash.html (warning: very large page)

For long messages on 2015 Intel Core i5-6600; 4 x 3310MHz:

    blake2b: 3.33 cycles/byte
    blake2s (remember, half speed of sp): 4.87 cycles/byte
    sha512: 5.06 cycles/byte
    sha256: 7.63 cycles/byte
This means BLAKE2s is close to 700 MB/s, and BLAKE2sp will hash data at more than 1 GB/s.

Here are the SHA-256 implementations measured: https://bench.cr.yp.to/impl-hash/sha256.html

As you can see, the winner varies by platform, but in most cases OpenSSL wins, and Wei Dai's implementation is close.

>My pure JavaScript implementation of blake2s (which is approximately half the speed of parallelized sp variant) on 2.6 GHz Core i5 hashes at 170 MiB/s.

If I recompile blake2sp with "/O2" optimization, it improves to 171MB/sec. I ran tests with "/Od" optimizations disabled because the default Crypto++ library project has optimizations disabled when it makes the lib file. Therefore, every hash had no optimizations to keep the comparisons apples to apples.

I see your Javascript code of BLAKE2 on github so I'll try it and compare. (https://github.com/dchest/blake2s-js)

>Also, there is no reason to use such large buffer sizes, I suspect this only makes benchmarks more unreliable.

I chose 500MB because I wanted the fastest hashes (CRC32, MD5) to take at least 1 second and many data sizes I want to hash will be 10GB+.

If I recompile blake2sp with "/O2" optimization, it improves to 171MB/sec.

Too slow! You're doing something wrong or measuring some slow implementation :) It should be more than 500 MB/s.

Therefore, every hash had no optimizations to keep the comparisons apples to apples.

That's not apples to apples at all. Most performant code is written specifically to be optimized by compiler. Use /O3 for benchmarking. Also, you just wrote that you were measuring a hand-optimized assembly version and then compared it to a C version compiled with optimization disabled? O_o

I chose 500MB because I wanted the fastest hashes (CRC32, MD5) to take at least 1 second and many data sizes I want to hash will be 10GB+.

Then call the update function with a reasonable 8K buffer many times. Using such large buffers will generate a lot of noise in benchmarks.

* * *

Speaking of JS, you can try my newer implementations/ports by cloning https://github.com/StableLib/stablelib. run ./scripts/build, cd into packages/blake2s (/sha3, /sha256, etc.) and run: node lib/.bench.js Note that SHA-3 (and SHA-512, BLAKE2b [not implemented yet]) is slow in JS compared to SHA-256, BLAKE2s, etc. because it uses 64-bit numbers, so in JS I have to emulate them by using two 32-bit ones for low and high bits.

* *

Benchmarks on Intel Skylake:


>Too slow! You're doing something wrong

I do now notice that my ASUS motherboard monitor software is reporting that my CPU is at 1.2GHz instead of 3.3GHz. There's probably something wrong there. However, even if I get it up to 3.3GHz, the relative speeds between different benchmarks won't change. I got the same relative numbers on the Macbook.

>measuring a hand-optimized assembly version and then compared it to a C version compiled with optimization disabled? O_o

Because there's lots of Wei Dai code that's C++ code instead of asm. He delivered his MSVC project with optimizations disabled instead of "/O3" so it made the most sense to start with optimizations disabled everywhere as a preliminary benchmark. If I recompile Wei Dai's code with optimization, it will make Crypto++ perform faster and make blake2sp look slower. In the end, it's a moot point because blake2sp with "/O2" is still slower than Crypto++ SHA256.

>Using such large buffers will generate a lot of noise in benchmarks.

Why? If you study the blake2sp source code, you'll see a loop inside the hash update() function to handle arbitrary sizees of buffers. Why does a loop outside that update() mean "less noise"? Why does adding more function calls of BUFSIZETOTAL divided by 8192 equal less noise?

If I recompile Wei Dai's code with optimization, it will make Crypto++ perform better and make blake2sp look slower.

Just do it. If it's slower, you're doing something wrong. Which implementation of blake2sp are you measuring? It should be at least 1.5x as fast.

Why? If you study the source code, you'll see a loop inside the hash update() function. Why does a loop outside that update() mean "less noise"? Why does adding more function calls of BUFSIZETOTAL divided by BLOCK PROCESSING SIZE equal less noise?

Because you'll be measuring a lot of memory copying time apart from hashing time. Dealing with memory outside of CPU cache introduces a lot of variance.

I suggest that instead of our discussion you should try to reproduce the results of https://bench.cr.yp.to (which is a highly trusted source, e.g. it was used during SHA-3 competition by NIST and participants): if something doesn't approximately match it means you did something wrong. In the process, you'll learn how to properly benchmark hash functions and make some sciency science by reproducing results! :-)

I think I found the issue. In terms of absolute (not relative) MB/sec performance, the main culprit is the slow cpu speed of 1.2GHz single threaded performance instead of 3.3GHz. The other investigations into "/O2 /O3" optimizations or 8k outer loop were red herrings.

However, for relative MB/sec performance comparison to SHA256, it seems to point back to the blake2 official reference code (non SSE) being very slow. Wei Dai Crypto++ also happens to have BLAKE2 algorithm and when I executed that, it ran at 525MB/sec which was faster than SHA256 and also faster than SHA1. No outer 8k chunk loop necessary for Crypto++ benchmark.

>Because you'll be measuring a lot of memory copying time apart from hashing time.

Yes, I notice the numerous memcpy() functions in the blake2s?-ref.c. For additional tests, I rewrote the loop to call update() on chunks and tried various sizes (8k, 16k, 32k, ... 256k, 512k, 1MB). At 256k chunks and below, I got 235MB/sec which was an improvement but still slower than SHA256. As stated above, the real key was to use an optimized BLAKE2 algorithm instead of the official reference code.

>you should try to reproduce the results of https://bench.cr.yp.to

I can't tell if the blake2 entries in https://bench.cr.yp.to are using official reference or optimized code so trying to replicate those results with official reference files may be a wild goose chase.

If your machine is throttling the CPU during benchmarks, all bets are off. You need to disable all power management and energy savings or else figure out why your machine is causing a throttle to kick in (likely thermal management). You can't compare any results (even relative to one another) until that issue is resolved.

The implementations measured are listed here:


Code mirror: https://github.com/floodyberry/supercop

Yeah, of course reference implementation is a lot slower — it's for algorithm reference, so has readable code, which is slow. It can even be greatly improved by just unrolling loops and inlining message indexing by sigma constants, even without SSE (like my JavaScript implementation).

As for memory copying — I didn't mean memcpy(), what I meant is that your CPU would have to get chunks of your huge buffer from RAM, which is slower and has more unpredictable performance.

>Yeah, of course reference implementation is a lot slower — it's for algorithm reference, so has readable code, which is slow.

Exactly! That's why my original post had the footnote that I using the slower blake2sp reference code. Same situation as the SHA-3 reference code being the slowest implementation.

>As for memory copying — I didn't mean memcpy(), what I meant is that your CPU would have to get chunks of your huge buffer from RAM, which is slower and has more unpredictable performance.

But this observation also applies to all the other hash performance tests. If the blake2sp hash is handicapped by computing large RAM buffers beyond the L1/L2/L3 caches, the MD5/SHA1/SHA256/etc are also handicapped the same way. Whatever "noise" exists in the tests can be evened out by multiple executions.

Restricting the tests to tiny memory sizes that fit in L1/L2/L3 is not realistic for my purposes.

There is AVX/AVX2 assembly for BLAKE2 at https://github.com/minio/blake2b-simd , benchmarked at a 1.8x-3.9x speedup.

It'd be interesting to see what SHA2-512 would do since I think modern intel CPUs prefer that size to 256. (I could be wrong though as i have no source to cite just something I remember reading.)

You're not wrong. Type this into Terminal:

     openssl speed sha512 sha256

tl;dr sha512 outperforms sha256 as message size increases.

pastebin output for lack of decent formatting:


on a linode system with 2 xeon E5-2680v3 cores @ 2.5ghz and 4GB of ram on this version and config of openssl on ubuntu server 17.04:

OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-p_sOry/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM

You might also want to test out BLAKE2bp, which is optimized for 64-bit platforms, while BLAKE2sp is for 32-bit platforms. So the numbers for BLAKE2 will be even higher

I am quite surprised that CRC32 is slower than MD5.

> I think the specific variant of BLAKE that directly competed with Keccack is slower

BLAKE is slower than BLAKE2 indeed. BLAKE also has more rounds than BLAKE2.

CRC32 is slower because implementing it in software a way that exploits modern CPU's instruction level paralelism is non-trivial, also the obvious way how to implement CRC32 on 32bit byte oriented CPU isn't exactly cache friendly.

Are you referring to Intel's Slicing by 8 algorithm[0].

[0] https://static.aminer.org/pdf/PDF/000/432/446/a_systematic_a...

Daemen talked about some figures for KangarooTwelve during RWC 2017:

* Intel Core i5-4570 (Haswell) 4.15 c/b for short input and 1.44 c/b for long input

* Intel Core i5-6500 (Skylake) 3.72 c/b for short input and 1.22 c/b for long input

* Intel Xeon Phi 7250 (Knights Landing) 4.56 c/b for short input and 0.74 c/b for long input


I recently came to believe (correct me if I'm wrong) that the recommended XOFs SHAKE128 and SHAKE256 are actually nice and fast, but often when people talk about SHA-3 they focus on the slower "drop-in replacements" like SHA3-256 which are quite conservative.

Yes SHAKE is faster than SHA-3 because of "better" parameter choices.

This page has an interesting speed graph: http://kangarootwelve.org/

> SHA-3 did introduce something useful

To me the most useful part of SHA-3 is that you don't need to use HMAC as it is not vulnerable to length extension attacks. Meaning that it's much faster than SHA-2 when used as a MAC construction.

That's a good point, but the truncated SHA-2 variants (512/224 and 512/256) are also not susceptible to length extension, and are faster than SHA-3. BLAKE2 was a SHA-3 finalist, and thus by definition isn't vulnerable to length extension either.

> BLAKE2 was a SHA-3 finalist, and thus by definition isn't vulnerable to length extension either.

BLAKE2 wasn't, but BLAKE was.

Aside: Most people should continue to use HMAC-SHA256 instead of worrying about LEAs and justifying a switch to non-HMAC H(secret || message) constructions.

> BLAKE2 was a SHA-3 finalist

I think you mean BLAKE. BLAKE2 was not part of the SHA3 competition.

> the truncated SHA-2 variants (512/224 and 512/256) are also not susceptible to length extension

Any link to some reference document on this topic?

It's pretty straight forward.

SHA-2 works by, after padding the message and appending the message's length, breaking the message into chunks. It sets up an initial state, and then sequentially iterates over the chunks updating the state as it goes. At the end, your state is your hash.

In other words:

   state = INITIAL_STATE
   for chunk in chunks:
      state = SHA2 (state, chunk)
   hash = state
So it's easy to see that an attacker can take that hash and keep running iterations after the fact.

Truncated SHA-2 variants are immune because attackers aren't given state, they're given only a piece of state. They have no way of knowing the rest of state, so they can't continue iterating.

But aren't we trying to get away from MACs into authenticated ciphers?

Not all authenticated messages need to be encrypted, or encrypted using the same key, etc.

For AWS, you authenticate calls to the API by authenticating each message with an HMAC. The call is encrypted over TLS, but TLS is only used to authenticate Amazon to the client, it's not used to authenticate the client to Amazon. There are some use cases where you wouldn't be able to just use client-side certificates anyway--for example, I can create a signature for a single S3 upload to a specific URI with an expiration date, and then send the signature to you, and then you can upload a file directly to my S3 account before the signature expires. (I suspect a big reason client-side certificates aren't used for AWS is because nobody knows how to use them, but they're also less flexible.)

Technically you could also do this using an authentication-only variant of authenticated ciphers, like GMAC, but at that point it would still be classified as a MAC.

There are many applications for MACs and keyed hashes, not all involve encryption (e.g. hash-based deduplication). All mainstream AEAD constructions (AES-GCM, Chapoly, AES-OCB) also generate "only" 128 bit authentication tags, while EtM easily allows larger MACs; whether this is an actual advantage or simply satisfies "crypto paranoia" ... ;)

Not "crypto paranoia"; current mainstream AEAD tends to break both confidentiality and integrity at the same time when misused (e.g. nonce reuse). EtM doesn't do that.

An authenticated cipher is usually just a composition of a cipher and a MAC.

Presumably, if you're using the composition, it's hardened against the length-extension attack in the first place, right?

I mean the advantage seems to only be there when you're cobbling together your own constructions, i.e. doing what you shouldn't be doing.

Length extension attacks are idiosyncratic to hash functions with a particular underlying design (Merkle-Damgard compression). They are the reason we have HMAC. AEADs built on MACs that are not built on general-purpose hash functions don't have that problem.

That's historically true, but these days I think GCM is the most common. GCM also has hardware support from the CLMUL instruction set available on Intel processors since 2010 and AMD processors since 2011.

GCM is just a composition of CTR mode and GMAC.

Well, technically you'd get a different result if you applied GMAC after using CTR mode, because the final stage of the GMAC would get a different length value, but the process would be very similar.

From a different perspective, GMAC is just GCM without encryption. It's not like GMAC existed and then it was combined with AES to make AES-GCM, it was designed as authenticated encryption and has the ability to encrypt zero bytes while still authenticating other bytes.

So no, GCM is not just GMAC + AES.

I'm not sure what you're trying to argue here. Yes: you can in fact use GMAC without encrypting (technically, that's what "GMAC mode" is.)

That's my point. We're not moving away from MACs, and most authenticated schemes are an encryption mode (usually CTR) composed with a MAC.

I think the term "block cipher mode" is responsible for the confusion. It has no coherent meaning.

I think you're making a rather fine point here. You're saying that we haven't stopped using MACs because AES-GCM is a composition of a MAC with an encryption mode.

But that's not true. There's no such MAC. A MAC was derived from GCM but that MAC is not used to authenticate encrypted GCM messages.

If you can tell me which MAC is used for AES-GCM, go ahead, I'd like to know what it's called.

This MAC did not magically appear from GCM, it is a variant of Carter-Wegman MAC. It's in the same class as Poly1305 and UMAC, VMAC. It was combined with CTR mode to create GCM.

Poly1305 as used in NaCl (XSalsa20-Poly1305) or in TLS (ChaCha20-Poly1305) also will give different results, especially compared to the original proposal Poly1305-AES MAC, but this doesn't mean that those two are not compositions of a cipher and MAC.

It's in the Wikipedia entry, even. https://en.wikipedia.org/wiki/Galois/Counter_Mode


If you're, elsewhere, using GMAC for some reason and getting a different result when you combine it with CTR mode: Well, that's weird. I'd have to have two implementations to compare to analyze further. But otherwise, this conversation is moot.

If you look at the block diagram in the GCM article you linked, you'll see that the second-to-last block of data added to the authentication tag is len(A)||len(C). These are 64-bit values, concatenated to form a single 128-bit value.

So if you use GMAC, you'll have len(A) >= 0, and len(C) == 0.

If you use GCM with encrypted data, you'll end up with len(C) > 0, which will give you different results from GMAC.

GMAC is a special case of GCM, and GCM is not simply GMAC + CTR. This is supported by the NIST publication which defines them:

> If the GCM input is restricted to data that is not to be encrypted, the resulting specialization of GCM, called GMAC, is simply an authentication mode on the input data.

Again, GCM != CTR + GMAC, if you make this assumption you will fail to interoperate with correct implementations of GCM.

This is silly. You can call CTR + HMAC-SHA2 an AEAD (if you make allowances for extra data), and it will be one. But if you give two people the independent task to generate a CTR+HMAC-SHA2 AEAD construction, their results will not interoperate (which you might or might not care about). You're fixated on the fact that GCM is at pains to make sure it interoperates, and so specifies specific details about how to couple AES-CTR and GHASH. Every AEAD construction has to do that. That doesn't mean they're not compositions of ciphers and MACs.

It is a composition. Some examples of non-composition schemes are sponge-based Keyak and NORX.

Which MAC does AES-GCM use? It's not GMAC, because GMAC would give different results...

It uses a Carter-Wegman MAC variant.

A variant which only exists as part of the GCM standard, and has no name..

GHASH only exists in the GCM standard too, but does have a name. This, too, is a silly point.

My suggestion after looking around for the best hash function is SHA-256. Not SHA-512, not SHA-512/256. Not even SHA-224 because it offers no performance benefits.

BLAKE2 is a good candidate right now considering speed. But SHA-256 instructions will only get more prevalent which will take care of the speed issue.

There have been speed comparisons between SHA-256 and BLAKE2 without SHA instructions[1]. BLAKE2 seems like good option then, with the optimized code.

But once SHA extensions use hardware instructions, it gets a 5.7x boost which blows BLAKE2 out of the water.[2] There are instructions for SHA-256 in NEON too although I couldn't find any benchmarks. I'd say it'll definitely be faster than BLAKE2 with HW instructions.

Combine that with SHA256 being uber popular everywhere, that'll be one less headache.

I also doubt SHA-256 will be over thrown in the near future, atleast until SHA-3 instructions become commonplace, which I anticipate will take about a decade from now.

[1]: https://news.ycombinator.com/item?id=14357197

[2]: https://github.com/randombit/botan/issues/807#issuecomment-2...

EDIT: Looks like SHA-224 might be the best option, considering it has the same core with a truncated output which will make it immune to length-extension attacks.

The advantage to 224 or 512/256 is that they don't expose the full register state of the hash at output and thus, like Blake2 and SHA-3, aren't vulnerable to length-extension attacks: you can (though it would be idiosyncratic to do so) use a simple prefix MAC with them, rather than HMAC.

You are right. I guess I'll rethink SHA-224. SHA-512/256 is faster in 64 bit computers but it can't really take advantage of the hardware instructions unlike SHA-224.

Prefixing a MAC sounds like it'll work but I'm just wondering why complicate things.

Prefix MAC doesn't mean prefix the MAC. It means make a MAC by prefixing a key and concatenating that onto the data to be tagged. Basically what KMAC does: H(K+pad||message). The truncated hash functions aren't vulnerable to length extension attacks, so using SHA2-512/256 is faster than HMAC with no loss in security AFAIK.

Of course that "no loss in security" bit depends on a proper implementation. There are plenty of good library implementations of HMAC out there, and if you want a high speed MAC there's always Poly1305. But if all you have is SHA2 and don't need to interoperate with systems using HMAC, then this is a reasonably good way to go about it. It's certainly simpler than implementing HMAC on your own.

With SHA 224, there is only 32 bits worth of information that are lost. Isn't a LEA "feasible" by generating the ~ 4 billion messages and finding the 1 that validates.

With 512/256 this is infeasible.

Is this a real concern with 224 in practice?

I'm not a cryptographer, but looking at the attack a bit more closely, I don't think this will work. Someone let me know if I'm wrong.

For this attack to work, one needs to know the full 256 bits, so that they can continue with appending their own data to the hash's internal state to get a valid hash. But in this case, they don't have the complete internal state.

So they can't really continue because the missing 32 bits are required to continue. Even though it is just 32 bits, they can't just validate the hash in anyway because they don't even know the complete message, just the length of the message. So the validation process isn't really possible.

Why this works for non-truncated hashes is because the attacker knows that the hash is valid for secret+data and also the hash internal state (which is the hash itself), but if the complete internal state isn't known, there's no way to attack it.

His idea is correct -- you're right, you don't know the entire internal state. But you can generate 2^32 possible internal states (because there are 32 bits of missing information), then perform the length extension process using each possible state.

If the system is using the hash function in such a way that length extension creates a vulnerability, you can then try the 2^32 different possible valid length extended hashes, and one of them will be correct (and that will be evident because the exploit would work). But you are also correct that, in isolation, there is no way to determine which of the 2^32 resultant digests is the correct.

Got it, thanks. So looks like it is possible, but infeasible in many cases.

This obviously wouldn't work over internet because of the number of tries required, so what is a possibility that this would actually be a possible vulnerability?

I agree that if your oracle is remote, the attack is probably going to take weeks.

Now I'm not sure you can conclude that this kind of scenario never allow for more efficient oracles.

It sounds like you're right.

224 is 512/224.

According to https://en.m.wikipedia.org/wiki/SHA-2

There is SHA 224 which has 256 bit internal state and SHA 512/224 which has 512 bit internal state. In my previous comment, I was referring to the former.

This is how it's defined in FIPS-180, more or less (SHA-224 is SHA-256/224). This confusion is why I like explicitly stating SHA-256/x or SHA-512/x.

Wait, so I was talking about SHA-224 which is SHA-256 truncated to 224 bits, not SHA-512 truncated to 224 bits.

Nevertheless, does this mean SHA-224 is still susceptible to length extension attacks?

SHA-256 truncated to 224 bits is vulnerable to length extension attacks if the attacker can brute force the remaining 32 bits of the state. Some use cases will make this possible or practical, some will not. It depends on the details of the problem being solved.

But people don't know that. That's why it is best to just push people towards sha-3 and have the in-house cryptographer decide when sha-2/blake2/kangarootwelve might be relevant for speed purposes.

I agree that using prefix MACs with SHA-2 is a bad idea, for that reason, but since everyone uses HMAC already and since there are much faster MACs than KMAC, I don't think there's a clear argument for SHA-3 here.

I'm just saying, if you're going to recommend one of the SHA-2's, it should probably be one of the truncated ones.

The best (ie. fastest and safest) SHA2-family function is SHA-384. http://securitydriven.net/inferno/#Implementation%20Details

*fastest in software on 64 bit platforms without hardware instructions.

Note that it doesn't have the hardware instructions that are going to be common place for SHA-256 and SHA-224. Also if you have a 32-bit ARM phone (which is most phones in existence), it will take a hit on performance. That's because SHA-512 has 64-bit word size while SHA-256 has 32-bit word size.

I thought I had the winner with SHA-256, but looks like I blanked on the fact that SHA-256 is susceptible to length extension attacks while the truncated versions of SHA2 are not.

So I'll say SHA-224 is the optimal choice because even if you are running on 64 bit platforms, the hardware instructions will make it faster than SHA-512. Also insisting on higher security than SHA-256 is not a very smart move. SHA-256 offers more than enough security margin.

The assessment that SHA-3's performance is bad completely disregard the results from hardware benchmarking (FPGAs and ASICs). All the hardware results have shown conclusively that Keccak is multiple times faster than SHA-2. While I can't say that the algorithm is going to be supported in HW, with enough usage it might be the case.

Edit: Removed my comment regarding BLAKE as it was incorrect.

By selecting primitives designed for hardware (AES and GHASH) rather than primitives that use operations that are commonly applicable (ChaCha20 and Poly1305), we've ended up with extra hardware to support AES and GHASH. But it's not clear that was actually a good idea, or just path dependence.

ChaCha20, Poly1305, BLAKE2 benefit from improvements that benefit a wide-range of applications, while SHA-3, AES and GHASH do not. Thus the "cost" of high performance support for the former can be amortised over a much wider base.

I suspect that the extra hardware support is not a major concern. This might be because any company that has a true security concern will eventually need to designated an area on its silicon for cryptographic purposes. This area will be security hardened to protect against any side-channel attacks.

Also, sharing HW resource for cryptographic purposes is not possible for any device that needs to pass certain security certification.

Edit: Typo and additional comment

It's already a problem. The most popular authenticated encryption construction is AES-GCM, which is/was difficult to implement safely on popular mobile platforms because their ARM ISAs required side-channel-prone table-based implementations to be performant. We had to select, in protocols, between Chapoly and GCM to get safety and performance on all those platforms.

Most vulnerabilities in cryptosystems happen in the joinery. Anything we can do to eliminate joinery is going to make our cryptosystems more resilient. Selecting new primitives that will require hardware support to be performant seems like an own-goal.

As someone who has done a number of audits for certified devices, I don't think your statement about shared hardware is accurate. Are you talking about FIPS 140?

That's the reason why I said "suspect" =). I do not claim to know the exact reason for the selection. In any case, the standard is finalized. If you're concern about this being the problem for the next standard which will likely to affect the use of AES-GCM, I suggest you participate in the current cryptographic contest that would target authenticated encryption: CAESAR (https://competitions.cr.yp.to/caesar-submissions.html). I'm not sure how this will affect the overall usage of authenticated encryption in the industry, but this is currently one of the main topics of interest for cryptographic researchers.

"As someone who has done a number of audits for certified devices, I don't think your statement about shared hardware is accurate. Are you talking about FIPS 140?"

Yes. Is my understanding incorrect? I'd like to be informed if this is the case. Thanks.

There are FIPS certification levels where shared hardware footprint is an issue, but most commercial devices don't need to ship devices with that certification.

I really don't care about what the standards say; thankfully, the important standards, like TLS, aren't bound by what NIST standardizes.

Well if it's only strong in hardware then that means it's a poor choice for KD in software, as a potential attacker would have a significant advantage.

Isn't this something you don't want? I'd rather use an algo which is painful to do in bulk.

That's helpful for passwords, but most hash applications aren't for passwords. If you're hashing a large file you don't want something super slow, and you don't need to pad out a tiny password into something hard to guess.

A rather minor point but:

>[The di­ver­sity of cryp­to­graphic prim­i­tives] con­tributes to code-size, which is a worry again in the mo­bile age

In the "mobile age" we use embedded processors with gigabytes worth of RAM and several times the amount of NAND storage, I don't really think a SHA-3 implementation is going to tip the boat over.

The author's other points seem very fair though.

I can confirm that the code size of BoringSSL is something that we worry about for mobile apps. A 50KB increase (which wouldn't be much for an optimised SHA-3) would raise questions.

I admit that when I see the 100MB+ size of apps that I have to download, I do wonder why. But I assume that it would be even worse were people not worrying about this stuff.

> A 50KB increase (which wouldn't be much for an optimised SHA-3) would raise questions.

Can you elaborate on that? That's orders of magnitude bigger than a naive implementation, and wouldn't fit in L1 cache either.

I think it's more relevant for the IoT devices, which run or ARM M0/M4 (see http://csrc.nist.gov/groups/ST/lwc-workshop2015/presentation... and https://realtimelogic.com/products/sharkssl/Cortex-M0/).

That massive storage is often still using a heavily constrained data connection. Apple, for example, won't let you download an app over 100MB over cellular data. You want to really try to keep your app under that limit, otherwise a lot of potential users will abandon the app if they discover it while they're not on WiFi.

I wonder why is that, I had a shared plan (3 numbers) that gave about 99 GB each month and Apple wouldn't let me download a 100 MB game through LTE?

Part of the rule might be an incentive for developers to keep size under control.

I wonder if that rather incentivises them to download assets as a separate HTTP request after the app is started. If not how do AAA games work on the iPhone?

Big games just ignore the size limit entirely and require you to download over WiFi.

> As I've mentioned before, diversity of cryptographic primitives is expensive. ... exponential number of combinations ... limited developer resources ...

It's a good point that diversity is expensive. There's a counter-argument too: if we all decide to use one thing, but it goes wrong - then we're in serious trouble. If we have multiple approaches, then the damage is lessened if one of them turns out to have problems.

In other words, I think it's better to have a variety of approaches to manage risk, even if it does impose costs.

The track record on that design strategy is not good. We've been pursuing designs with cryptographic diversity built in for several decades, and the result has been decades of terrible vulnerabilities --- almost none of which have been mitigated by diversity!

Have we had some big crypto failures besides "everyone was using md5" and "everyone was using DES"?

Yes: for instance, the TLS BEAST vulnerability, which affected all the block ciphers in that version of TLS, because the protocol standardized interchangeable ciphers and a message encryption construction based on CBC as separate components.

Ironically, when the RC4 attack refinements were published, for awhile sites were forced to choose between a vulnerable CBC implementation and using the vulnerable RC4 cipher.

With the exception of MD5 in X.509 certificates and RC4 in TLS, has there been another cryptographic vulnerability that required the retirement of a cipher? SWEET32 is about as close as you come to that, right?

And, obviously, there have been many more problems, even just in TLS, than X.509 MD5 and RC4.

Wasn't that more because we had not really thought of aead constructions? I mean we didn't even know we had to mac after encrypting.

If we had more stream ciphers at the time we would not have been stuck with beast or rc4.

Imo sweet32 was just the weak but well-marketed cherry on the top to get rid of the previous generation. We just had the sha-1 collision.

We had plenty of good ciphers available in TLS when this happened, but because the BEAST bug was in joinery and not in the cipher itself, it didn't help. Cipher agility didn't help, and in fact probably hurt us.

Which "good" ciphersuites [1] were usable in TLS 1.0 when BEAST was made public in June 2011; with BEAST affecting only TLS 1.0?

TLS 1.0 per RFC 2246 defined DES40_CBC, DES_CBC, 3DES_EDE_CBC, RC4_40, RC4_128, and IDEA_CBC.

RFC 4132 added CAMELLIA_128_CBC and CAMELLIA_256_CBC in 2005, RFC 4162 proposed SEED_CBC in 2005, and RFC 6209 in April 2011 informationally specified ARIA_128_CBC, ARIA_256_CBC, ARIA_128_GCM, and ARIA_256_GCM, but the GCM ones only work in TLS 1.2 or later. Same story with CAMELLIA_*_GCM.

Informational RFC 5469 in 2009 deprecated the use of DES and IDEA ciphersuites, so that left RC4, 3DES_EDE, the Korean ciphers SEED and ARIA, and CAMELLIA, with the block ciphers in CBC. Pretty much no one supported the latter three, and when Mozilla rejiggled the ciphersuites in 2013, support for these was removed entirely [2].

Your point still stands about BEAST being a flaw in the glue rather than the ciphers themselves, but I believe the parent poster makes a stronger point about the complete lack of AEAD constructions before TLS 1.2, and the lack of other stream ciphers which were more self-contained in TLS 1.0 and thus unaffected.

[1] https://www.iana.org/assignments/tls-parameters/tls-paramete... [2] https://news.ycombinator.com/item?id=12393683

I feel that tptacek's point is that excessive cipher agility in TLS precluded adoption of AEAD (and "unconventional" modes in general), because it was built on assumption that the interchangeable primitive is the block cipher itself (ie. keyed primitive that transforms n bit input to n bit output) and not the combination of cipher+mode (which transform arbitrary length input into some, preferrably longer, output). You cannot plug AEAD into framework that expects block cipher and MAC.

(Due to how all this is really implemented, this in fact was not technical problem, but problem of "design culture" or something like that)

I don't mean that something like TLS should support every type of crypto under the sun. As you say, that means TLS is only as secure as it's weakest component.

What I mean is that if I make a new protocol, it could use a different type of crypto from what is in mainstream use, so at least this new protocol will fail independently from other protocols.

The SHAKE family of functions (and cryptographic sponges in general) are very cool, but like the article says, SHA-3 is weakened by failing to focus on performance. If SHA-2 is sufficiently secure (which looks more and more the case) then a faster SHA-2 would have been a better way of looking at what was wanted out of SHA-3.

Faster primitives means more PBKDF rounds, more usage of hashes to secure underlying data.

If you're concerned about the primitives making up your password hash, you shouldn't be using PBKDF2 to begin with; better to use a purpose-built KDF like scrypt, bcrypt, or argon2.

Speed isn't a black-and-white panacea property, cost of brute-forcing (ram, cpu, time and money) and other attacks should also be prioritized for a given use-case (ie argon2, scrypt). Fixating on a single, fast algorithm and making it a near universal monoculture makes it easier for well-funded state and corporate entities to reduce their costs to generate a hash collison or do something else nasty because they only have to invest in one kind of possibly cheaper/simpler ASIC building-block.

Having several different, but throughly sensible algorithms who's strengths, weaknesses and limitations are well known is a better approach than saying "everyone use X" a-la NESSIE. Users with Paradox of Choice going on can pick the lunch special, AES.

in short: "SHA-3 should probably not be used. It offers no compelling advantage over SHA-2 and brings many costs."

Why don't we use hash functions that combine the output of two discrete hashing functions and combine the results?

It seems to me it's easy to break sha or md5, but what about breaking two fundemantally different algorithms at the same time?

Implementation seems as easy as concatenating the two hex outputs into a single string and comparing the results.

I'm curious to what others think of this idea.

A couple of criticisms I've seen for this idea, as best I understand them as an amateur:

(1) It's hard to use a multihash construct correctly. For example, do you want "hash1(message) + hash2(message)" or "hash1(message + hash2(message))" or "hash1(hash2(message))"? The answer is that it depends which features of a hash function matter for your application, and if you pick the wrong one you will have the worst of all worlds instead of the best of all worlds. (Simple example: "md5(password)+scrypt(password)" is a broken way to store passwords, while "sha3(md5(document))" is a broken way to authenticate documents; vice versa would probably be OK.)

(2) If you're paranoid, bytes and CPU cycles are (arguably) better spent on extending your best option instead of tacking on less good ones. Breaks tend to come in the form of "it is now feasible to crack a key with length X" or "find collisions with a hash with Y rounds" or whatever, where X and Y were originally picked to strike a balance of security and performance. So if you were going to spend performance on something nonstandard to guard against mathematical breakthroughs, experts might recommend doubling X and Y instead of throwing in a second less-secure cypher. (Although really they'd tell you just to use the standard thing.)

It turned out that breaking the combination is SHA-1 and MD-5 is much easier than people expected: https://www.iacr.org/archive/crypto2004/31520306/multicollis...

Shameless plug (this is one of Sean's, though):


I believe this is a good complememt to agl's post with a focus on K12 instead: https://cryptologie.net/article/393/kangarootwelve/

The preimage strength debacle is probably worth mentioning

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact