Much more than that!
Keccak (SHA-3) introduced the Sponge construction. This goes _way_ further than extendable output functions: It allows you to build all the symmetric cryptographic primitives from one basic element! This means: hashes, PRNGs, MACs, encryption, AEAD, etc. The only thing you need that looks like crypto code is a single permutation function. This significantly reduces the amount of security critical code to write. If you talk about hardware implementations, the permutation function takes less die space than SHA2, and optimizes all these primitives.
The sponge construction and all it's derivatives are proven secure assuming the permutation function is. And assuming SHA3 is secure, the permutation function is too. The alternative, the Merkle–Damgård construction, has a bunch of known problems. It's also used by pretty much every thing else, so it made sense to diversify.
Honestly, I'm a bit sad by the reputation SHA-3 is getting. It is an eye-opening elegant design that makes all previous crypto look overly complex.
But deciding to skip SHA-3 doesn't preclude any of that. SHA-3 is the hash standard itself, and even the Keccak team appear to be pushing K12 rather than SHA-3. It seems unlikely that a full set of primitives built around the Keccak permutation would choose to use the SHA-3 parameters at this point.
Indeed, SHA-3 adoption might inhibit that ecosystem by pushing towards those bad parameters. (I think there's wide agreement that SHA-3 is too conservative.)
On the other hand, you might want SHA-3 because you believe that it'll result in hardware implementations of the permutation and you hope that they'll be flexible enough to support what you /really/ want to do with it. I'm not sure that would be the best approach even if your goal was to move to a permutation-based world. Instead I would nail down the whole family as you would like to see them and use small chips to try and push it. K12 is probably a step in the right direction.
> I think there's wide agreement that SHA-3 is too conservative.
As you probably know, SHA3 originally had 18 rounds, but this was raised to 24 during the competition. This of course hurt performance. In some early proposals for a stream cipher, the authors reduced the rounds to 12 to get better performance.
In 20 years maybe we'll get hardware support for keccak-f. In the mean time we can get excited at the idea of most of your crypto primitives (kdf, prng, aead, hash, ...) being built on top of a single, simple, short and well audited implementation of keccak-f.
SHA-3 has practical value as a moderately well-analyzed cipher in the event of rapid catastrophic cryptanalysis of SHA-2, but barring that kind of event its real value is in the general theoretical advancement of sponge constructions.
I'll repeat what was said elsewhere. SHA-2 of vulnerable to length extension attacks.
Length extension attacks only apply in cases where you've chosen to use the wrong cryptographic primitive (a hash) for your purpose (a MAC).
I'm not rooting for SHA-3; I'm just saying that it's wrong to dismiss length extension attacks only because they're commonly effective agains misuses of hash functions.
There are zero practical reasons to choose SHA-3.
I don't see why, unless you're google and you care about small performance improvement in hash implementations. My point of view is that SHA-3 (or Blake2/K12) are THE hash functions you should use virtually anywhere.
> Length extension attacks only apply in cases where you've chosen to use the wrong cryptographic primitive (a hash) for your purpose (a MAC).
1) so what? People actually use it this way. Are you just going to say "oh they're using it wrong there is nothing we can do about it"?
2) why is it wrong to use a hash to create a MAC? It is the most logic thing to do imo. The H(key | data) makes total sense to me. It is only when you know SHA-2's internals and their weaknesses that you realize you can't use it like that.
I'm with you on BLAKE2 as a solid choice, but at least it has an advantage in some areas over SHA-2 (speed, margin of security). SHA-3 has no advantages over SHA-2 (particularly SHA-512/256, which is going to be available in any library that would bother to implement SHA-3).
SHA-3 is slower, has been the result of less cryptanalysis, has less battle-tested implementations, and is less widely supported. Practically the only thing it has going for it is a higher number.
You will be hard-pressed to find a respected cryptographer advocating general adoption of SHA-3 — being in the field, I'd wager you'll find 100 who encourage developers to choose SHA-2 for every one who backs SHA-3. Case in point, in this thread alone you have Colin Percival and Thomas Ptacek coming to the defense of SHA-2, not to mention Adam Langley authoring the original post.
The fact that sha-3 has received less analyzis has already been debunked Here:https://twitter.com/aarontoponce/status/869973259969675264 .
K12 has the advantage of the speed. See https://cryptologie.net/article/393/kangarootwelve/
Appeal to authority does nothing. Plenty of people are actually supporting or making use of keccak/sha-3 already. To begin with NIST's cryptographers chose it. David Leon Gil was into it. People talked about heavy usages of SHA-3 in Ethereum or Chain here. I can tell you that one of the originator of SHA-3 created AES (Daemen). Jp aumasson (creator or blake2), samuel neves and philipp jovanovic are using sponges woth NORX. Mike Hamburg is building Strobe based on keccak. I can name drop as well you see.
Plus I'll say that I support it as a wannabe cryptographer. That must count for something :D
As I pointed out, SHA-512/256 allows you to safely do H(key|data).
But your original argument was along the lines of "H(key|data) makes sense intuitively. You have to understand the details in order to realize it's unsafe. Therefore a hash that allows H(key|data) is better".
This same line of reasoning, without modification, works to argue in favor of deploying Argon2 any place a hash is used. It's safe to use in that way, it additionally is safe to use for H(password), and you accept the exact same list of downsides.
Misuse-resistance constructions are important. But the problem with H(key|data), as both I and cperciva have pointed out, is that you've chosen an entirely inappropriate cryptographic primitive for the task at hand. A glaring indicator of this is stuffing multiple concepts into one function argument. A block cipher analogue would be something like AES-ECB(key, iv|plaintext). If you're cramming things into the wrong arguments of cryptographic primitives, you're generally going to have a very, very bad time.
That said, even by this criteria, SHA-512/256 is on the same footing as SHA-3. And has none of the downsides. For the third time, SHA-3 has no advantages here whatsoever.
> The fact that sha-3 has received less analyzis has already been debunked Here:https://twitter.com/aarontoponce/status/869973259969675264 .
There is nothing in that thread that even remotely suggests that the 9 years of analysis Keccak has received as a candidate for SHA-3, and then as a virtually-unused choice for SHA-3 carries the same weight as the 16 years of analysis we have of SHA-2 having been widely deployed. Not to mention the comparatively short amount of time we've been cryptanalyzing Sponge functions compared to the virtually 40 years we've had to understand Merkle-Damgård.
> Appeal to authority does nothing.
I am not arguing that they are right because they are authorities. I am pointing out that well-respected authorities are also in this thread and also making compelling arguments against generalized use of SHA-3.
Ahem: "It is well known as a fallacy, though it is most often used in a valid form."
> NIST's cryptographers chose it.
As a backup, in the event of catastrophic cryptanalysis against SHA-2. From a NIST employee on NIST's own press release announcing the SHA-3 winner:
"SHA-3 is very different from SHA-2 in design," says
NIST's Shu-jen Chang. "It doesn't replace SHA-2, which
has not shown any problem, but offers a backup. It
takes years to develop a new standard, and we wanted to
be prepared in case problems do occur."
Being different from SHA-2 is not a sufficient reason to use it for general purposes today.
> People talked about heavy usages of SHA-3 in Ethereum or Chain here.
Having been chosen for Ethereum doesn't mean it was the best choice. It probably wasn't. Note that this is an actual argument from authority.
> I can tell you that one of the originator of SHA-3 created AES.
Irrelevant to whether or not you should actually be deploying SHA-3. Nobody's arguing it's insecure. We're arguing that it's always an inferior choice.
> Jp aumasson (creator or blake2), samuel neves and philipp jovanovic are using sponges woth NORX. Mike Hamburg is building Strobe based on keccak.
Again, irrelevant to the choice of SHA-3. Sponges are cool. I'm looking forward to more widespread application of sponges in the future. People working on new ciphers built on top of sponges does not advocate for the use of SHA-3, a specific sponge.
> I can name drop as well you see.
You've named NIST, who is explicitly on record saying that SHA-3 is there as a backup. You've named a guy who was "into it", you sort-of-but-not-really named "people/" who've said that Ethereum uses it (though it's not actually used in Ethereum's proof of work function, so I'm not sure you could call that "heavy" use). You've named one of the authors of SHA-3, and you've named people who are building totally different things based on the same underpinnings, but I can find no evidence of any of these named people are actually encouraging anyone to deploy SHA-3 over SHA-2.
I would love for you to link me to a single location where one of these named people advocates that. I don't think you can.
Sure, if you know what you're doing :) In the mean time I will keep recommending SHA-3.
> the virtually 40 years we've had to understand Merkle-Damgård
I think at this point we've understand that M-D sucks.
I don't see why I would go on on a name dropping battle :)
I'll sum up our battle here: some people are advocating SHA-3, some people are not.
I'm in the first category. You're in the second. Your only arguments are that SHA-3 is "an inferior choice" and "big name is against SHA-3". Let's see how this pans out.
stouset claimed that there is no practical advantage of using SHA-3 over SHA-512/256. He gave three main arguments:
1. SHA-3 is slower.
2. SHA-3 was subject to less cryptanalysis.
3. Virtually all respected cryptographers do not recommend SHA-3 over SHA-2.
On the other hand, baby claimed that there are practical advantages of using SHA-3 over SHA-2. However, even after reading the thread twice, I couldn't find out what they are supposed to be. (I'm leaving out the LEA here, since it does not affect SHA-512/256 either.)
Baby claimed that argument 2 was debunked, but the supporting link pointed to a thread that stated nothing to that effect. He also claimed that argument 3 carries no weight since it's an appeal to authority. While an appeal to authority is not a valid scientific argument, it's about the best practical argument you can make as a non-cryptographer when choosing cryptographic algorithms.
SHA2 is vulnerable to length extension attacks the same way as bread is vulnerable to being buttered. If you're using bread and you're worried about buttering, you fundamentally misunderstand what bread is.
2. People should be able to do that. If you're expecting your hash function to behave somehow like a random oracle there is no reason that h(key|data) doesn't work. I see people using sha-2 to derive keys all the time. Even this is unsafe depending on how you do it.
People who aren't sweating the details aren't the intended audience for NIST.
That's exactly what this post is about. That we shouldn't replace SHA-2 with SHA-3, just what NIST ordered :)
But the article is right that newer crypto, or more crypto, is not necessarily a good thing. It is also true that there is no need for people to rush to Keccak - SHA-256 and SHA-512 are not broken. It is mostly an interesting function for development of new cryptographic constructs.
I sort of object to the idea that HMAC is complicated, also. It's more complicated than KMAC which barely deserves a name, but any intern can implement HMAC reliably from the diagram on the Wikipedia page. Hash twice, domain-separating each hash with a simple constant. Done.
The only implementation flaw stemming from HMAC in real systems is shared by KMAC (variable-time comparison functions).
Simpler means less likely to misunderstand or fuck up, which is a good property to have.
I did also point out that it is mostly interesting for development of new constructs. There's very little point in replacing a working, properly implemented HMAC with a KMAC at this point.
... In the future, however, Keccak may be used for more constructs (aead, for example), at which point a shared "core" for all the crypto you use would be an interesting thing to have, reducing the amount of trusted constructs, sharing optimizations (handwritten asm or hw), and so forth. At that point, migrating would be a sensible thing to do.
1) How does not having SHR makes it easier to audit?
2) how are sha-3 implementations actually using SHR not making the parent's comment moot?
Another fun one is x ^= x >> n with n some strictly positive constant, this is also invertible. It's used a lot in the murmurhash family of non-cryptograhic hash functions.
1) it forces you to allocate more memory (to do the Davies-Mayer construction)
2) it complicates the security analysis because you get collisions (if you do not use a permutation, it means you have an injective construction)
screenshot here: http://i.imgur.com/4xncceB.png
The inverse is also required for decryption: If you encrypt as ciphertext = permutation(plaintext + key), decryption is plaintext = inverse(ciphertext) - key.
> The inverse is also required for decryption: If you encrypt as ciphertext = permutation(plaintext + key), decryption is plaintext = inverse(ciphertext) - key.
This is not how encryption/decryption works with Keccak. Keccak is used to create a stream (it is then XORed with the plaintext or ciphertext).
Isn't this true for most cryptographic hash functions? MAC with HMAC or H(key || m), encryption with a CTR-like mode on top of the MAC, etc.
This is unlike other constructions, where you need to go over the data twice, with two different crypto algorithms (encryption + mac) that may or may not share a core component.
Keccak allows for very simple and efficient implementations of all these primitives.
> H(key ‖ m)
This is insecure for SHA2 (length extension). However, it is secure when done with SHA3. In fact, it's pretty much how you construct a MAC from it, that's how natural it is.
HMAC was invented specifically to deal with the shortcomings of SHA2 and predecessors. For reference, it is H(key ⊕ opad, H(key ⊕ ipad, m)), where opad = 0x5C… and ipad = 0x36…. Compare that with the above and you'll see what I mean with simple and efficient.
The key idea is to only expose part of the state to inputs/outputs.
And prefix MACs predate all these things by multiple decades (they just happen to be insecure in full-width-output MD hashes).
I don't think the ability to use Keccak as a simple prefix MAC is all that gamechanging. You can hash data once --- and much, much faster --- by using SHA-2.
Or... BLAKE2 uses ChaCha permutation, but includes additions of key words to turn it into a block cipher, and then uses this block cipher in a MD-like construction. Metamorphosis...
Of course, if you take some block cipher, fix the key, and make a sponge from it, nobody guarantees that it will be secure. Just like trying to build a block cipher from some permutation will not result in anything good. So it's not on the same level as building higher-level secure things (such as PRNG) from a primitive.
I think what I'm trying to say is: in crypto you can indeed build many things from other things.
I'm curious about the statement that SHA-3 is slow; it links to https://www.imperialviolet.org/2016/05/16/agility.html , which doesn't seem related, and matches the previous link. I wonder if that was supposed to link to somewhere else, like http://bench.cr.yp.to/results-sha3.html (as linked from https://www.imperialviolet.org/2012/10/21/nist.html )?
From that, SHA-3 certainly doesn't run significantly faster than alternatives (variants of BLAKE do indeed outperform it), but it seems roughly on par with SHA-256/SHA-512. But "on par" doesn't give any incentive to switch.
I wonder how much relative attention the SHA-3 winner (Keccak) gets compared to other alternatives, like BLAKE?
(The BLAKE2 page (https://blake2.net/) has a graph too.)
Most things are software;
from previous experience we know that it basically takes 10 to 15 years between introduction of a primitive (AES and SHA-2 both standardised around 2000; ISA extensions for mainstream CPUs introduced in 2010-2017). From the software PoV SHA-3 would be a rather large regression in performance with a "maybe it's fast by 2030 if Intel is really nice" attached. BLAKE2 on the other hand is an improvement in performance and offers a function that is more modern overall. (E.g. not length-extensible thus low-overhead keyed hash usage, flexible output sizes, tree hashing built-in)
Otherwise it's just hope that the chip makers will implement it when the algorithms are chosen, and I'm not sure that's good enough after a carefully orchestrated 5-year process of choosing a next-generation algorithm.
If they can't get that commitment, then they should choose whatever is faster in software (granted all the other factors are more or less equal).
Coincidentally, I ran a bunch of hash performance benchmarks last week. These were my findings:
test: hash a 500MB block of memory.
hardware: Intel Core i7-5820K Haswell-E 6-Core 3.3GHz
compiler: MSVC2017 (19.10.25019), 32-bit exe:
blake2sp - official reference code 153MB/sec
SHA3 - Keccak official reference code 12MB/sec
SHA3 - rhash sha3 45MB/sec
SHA3 - Crypto++ library v5.6.5 57MB/sec
SHA256 - Crypto++ 181MB/sec
SHA256 - MS Crypto API 113MB/sec
SHA1 - MS Crypto API 338MB/sec
MD5 - Crypto++ 345MB/sec
CRC32 - Crypto++ 323MB/sec
Blake2sp is fast. The official reference code is not quite as fast as Crypto++ implementation of SHA256 but it's faster than Microsoft's Crypto API of SHA256. (There are several variants of BLAKE and I chose blake2sp because that's the algorithm WinRAR uses. I think the specific variant of BLAKE that directly competed with Keccack for NIST standardization is slower.)
Keccak is one of the primary functions in Ethash (Ethereum mining). It is heavily researched and completely destroys SHA2 performance-wise if you have the right implementation.
Also, Keccak doesn't require the construction of a key-schedule, and can be implemented much more elegantly in parallel (SIMD software) and in hardware than SHA2.
> I think the specific variant of BLAKE that directly competed with Keccack is slower
BLAKE is slower than BLAKE2 indeed. BLAKE also has more rounds than BLAKE2.
If you care about performance and you don't have dedicated SHA-256 instructions then on a 64-bit platform you should evaluate SHA-512 as it is much faster. If you only have 256 bits of storage available then truncate its output to 256 bits. IIRC, it's about 1GB/sec on my Haswell laptop.
I compiled for 32bit instead of 64bit because I wanted the same executable to also run on a 32bit Macbook. When Thomas Pornin ran benchmarks in 2010 for both 32bit & 64bit, the SHA256 hash performance didn't change as much as the SHA512. I'll recompile for 64bit and report back if there was a massive difference.
blake2sp is indeed faster than Microsoft's builtin Crypto API for SHA256. However, it is not as fast as Wei Dai's Crypto++ library implementation of SHA256 that has lots of hand tuned assembly language code.
The official C source code for blake2sp does not have assembly language primitives in it. It's very possible that if an assembly language expert wrote optimizations for blake2sp, it would beat Crypto++ SHA256 performance.
The code I used is really simple. Used files "blake2sp-ref.c" and "blake2s-ref.c" from the BLAKE website. The hash code (no loops) is:
blake2sp_state S; // BLAKE 1 element array of the struct for state
blake2sp_update(S, BufInput, iBuffersize);
blake2sp_final(S, hashval_blake2sp_bytes, BLAKE2S_OUTBYTES);
With that code, I'm guessing anyone can whip up a C++ project to benchmark blake2sp in 10 minutes. It would be interseting to see what MB/sec that others achieve.
For real numbers on many platforms, see https://bench.cr.yp.to/results-hash.html (warning: very large page)
For long messages on 2015 Intel Core i5-6600; 4 x 3310MHz:
blake2b: 3.33 cycles/byte
blake2s (remember, half speed of sp): 4.87 cycles/byte
sha512: 5.06 cycles/byte
sha256: 7.63 cycles/byte
Here are the SHA-256 implementations measured: https://bench.cr.yp.to/impl-hash/sha256.html
As you can see, the winner varies by platform, but in most cases OpenSSL wins, and Wei Dai's implementation is close.
If I recompile blake2sp with "/O2" optimization, it improves to 171MB/sec. I ran tests with "/Od" optimizations disabled because the default Crypto++ library project has optimizations disabled when it makes the lib file. Therefore, every hash had no optimizations to keep the comparisons apples to apples.
>Also, there is no reason to use such large buffer sizes, I suspect this only makes benchmarks more unreliable.
I chose 500MB because I wanted the fastest hashes (CRC32, MD5) to take at least 1 second and many data sizes I want to hash will be 10GB+.
Too slow! You're doing something wrong or measuring some slow implementation :) It should be more than 500 MB/s.
Therefore, every hash had no optimizations to keep the comparisons apples to apples.
That's not apples to apples at all. Most performant code is written specifically to be optimized by compiler. Use /O3 for benchmarking. Also, you just wrote that you were measuring a hand-optimized assembly version and then compared it to a C version compiled with optimization disabled? O_o
I chose 500MB because I wanted the fastest hashes (CRC32, MD5) to take at least 1 second and many data sizes I want to hash will be 10GB+.
Then call the update function with a reasonable 8K buffer many times. Using such large buffers will generate a lot of noise in benchmarks.
* * *
Speaking of JS, you can try my newer implementations/ports by cloning https://github.com/StableLib/stablelib. run ./scripts/build, cd into packages/blake2s (/sha3, /sha256, etc.) and run: node lib/.bench.js Note that SHA-3 (and SHA-512, BLAKE2b [not implemented yet]) is slow in JS compared to SHA-256, BLAKE2s, etc. because it uses 64-bit numbers, so in JS I have to emulate them by using two 32-bit ones for low and high bits.
Benchmarks on Intel Skylake:
I do now notice that my ASUS motherboard monitor software is reporting that my CPU is at 1.2GHz instead of 3.3GHz. There's probably something wrong there. However, even if I get it up to 3.3GHz, the relative speeds between different benchmarks won't change. I got the same relative numbers on the Macbook.
>measuring a hand-optimized assembly version and then compared it to a C version compiled with optimization disabled? O_o
Because there's lots of Wei Dai code that's C++ code instead of asm. He delivered his MSVC project with optimizations disabled instead of "/O3" so it made the most sense to start with optimizations disabled everywhere as a preliminary benchmark. If I recompile Wei Dai's code with optimization, it will make Crypto++ perform faster and make blake2sp look slower. In the end, it's a moot point because blake2sp with "/O2" is still slower than Crypto++ SHA256.
>Using such large buffers will generate a lot of noise in benchmarks.
Why? If you study the blake2sp source code, you'll see a loop inside the hash update() function to handle arbitrary sizees of buffers. Why does a loop outside that update() mean "less noise"? Why does adding more function calls of BUFSIZETOTAL divided by 8192 equal less noise?
Just do it. If it's slower, you're doing something wrong. Which implementation of blake2sp are you measuring? It should be at least 1.5x as fast.
Why? If you study the source code, you'll see a loop inside the hash update() function. Why does a loop outside that update() mean "less noise"? Why does adding more function calls of BUFSIZETOTAL divided by BLOCK PROCESSING SIZE equal less noise?
Because you'll be measuring a lot of memory copying time apart from hashing time. Dealing with memory outside of CPU cache introduces a lot of variance.
I suggest that instead of our discussion you should try to reproduce the results of https://bench.cr.yp.to (which is a highly trusted source, e.g. it was used during SHA-3 competition by NIST and participants): if something doesn't approximately match it means you did something wrong. In the process, you'll learn how to properly benchmark hash functions and make some sciency science by reproducing results! :-)
However, for relative MB/sec performance comparison to SHA256, it seems to point back to the blake2 official reference code (non SSE) being very slow. Wei Dai Crypto++ also happens to have BLAKE2 algorithm and when I executed that, it ran at 525MB/sec which was faster than SHA256 and also faster than SHA1. No outer 8k chunk loop necessary for Crypto++ benchmark.
>Because you'll be measuring a lot of memory copying time apart from hashing time.
Yes, I notice the numerous memcpy() functions in the blake2s?-ref.c. For additional tests, I rewrote the loop to call update() on chunks and tried various sizes (8k, 16k, 32k, ... 256k, 512k, 1MB). At 256k chunks and below, I got 235MB/sec which was an improvement but still slower than SHA256. As stated above, the real key was to use an optimized BLAKE2 algorithm instead of the official reference code.
>you should try to reproduce the results of https://bench.cr.yp.to
I can't tell if the blake2 entries in https://bench.cr.yp.to are using official reference or optimized code so trying to replicate those results with official reference files may be a wild goose chase.
Code mirror: https://github.com/floodyberry/supercop
As for memory copying — I didn't mean memcpy(), what I meant is that your CPU would have to get chunks of your huge buffer from RAM, which is slower and has more unpredictable performance.
Exactly! That's why my original post had the footnote that I using the slower blake2sp reference code. Same situation as the SHA-3 reference code being the slowest implementation.
>As for memory copying — I didn't mean memcpy(), what I meant is that your CPU would have to get chunks of your huge buffer from RAM, which is slower and has more unpredictable performance.
But this observation also applies to all the other hash performance tests. If the blake2sp hash is handicapped by computing large RAM buffers beyond the L1/L2/L3 caches, the MD5/SHA1/SHA256/etc are also handicapped the same way. Whatever "noise" exists in the tests can be evened out by multiple executions.
Restricting the tests to tiny memory sizes that fit in L1/L2/L3 is not realistic for my purposes.
openssl speed sha512 sha256
pastebin output for lack of decent formatting:
on a linode system with 2 xeon E5-2680v3 cores @ 2.5ghz and 4GB of ram on this version and config of openssl on ubuntu server 17.04:
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-p_sOry/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
* Intel Core i5-4570 (Haswell) 4.15 c/b for short input and 1.44 c/b for long input
* Intel Core i5-6500 (Skylake) 3.72 c/b for short input and 1.22 c/b for long input
* Intel Xeon Phi 7250 (Knights Landing) 4.56 c/b for short input and 0.74 c/b for long input
To me the most useful part of SHA-3 is that you don't need to use HMAC as it is not vulnerable to length extension attacks. Meaning that it's much faster than SHA-2 when used as a MAC construction.
BLAKE2 wasn't, but BLAKE was.
Aside: Most people should continue to use HMAC-SHA256 instead of worrying about LEAs and justifying a switch to non-HMAC H(secret || message) constructions.
I think you mean BLAKE. BLAKE2 was not part of the SHA3 competition.
Any link to some reference document on this topic?
SHA-2 works by, after padding the message and appending the message's length, breaking the message into chunks. It sets up an initial state, and then sequentially iterates over the chunks updating the state as it goes. At the end, your state is your hash.
In other words:
state = INITIAL_STATE
for chunk in chunks:
state = SHA2 (state, chunk)
hash = state
Truncated SHA-2 variants are immune because attackers aren't given state, they're given only a piece of state. They have no way of knowing the rest of state, so they can't continue iterating.
For AWS, you authenticate calls to the API by authenticating each message with an HMAC. The call is encrypted over TLS, but TLS is only used to authenticate Amazon to the client, it's not used to authenticate the client to Amazon. There are some use cases where you wouldn't be able to just use client-side certificates anyway--for example, I can create a signature for a single S3 upload to a specific URI with an expiration date, and then send the signature to you, and then you can upload a file directly to my S3 account before the signature expires. (I suspect a big reason client-side certificates aren't used for AWS is because nobody knows how to use them, but they're also less flexible.)
Technically you could also do this using an authentication-only variant of authenticated ciphers, like GMAC, but at that point it would still be classified as a MAC.
Not "crypto paranoia"; current mainstream AEAD tends to break both confidentiality and integrity at the same time when misused (e.g. nonce reuse). EtM doesn't do that.
I mean the advantage seems to only be there when you're cobbling together your own constructions, i.e. doing what you shouldn't be doing.
From a different perspective, GMAC is just GCM without encryption. It's not like GMAC existed and then it was combined with AES to make AES-GCM, it was designed as authenticated encryption and has the ability to encrypt zero bytes while still authenticating other bytes.
So no, GCM is not just GMAC + AES.
That's my point. We're not moving away from MACs, and most authenticated schemes are an encryption mode (usually CTR) composed with a MAC.
I think the term "block cipher mode" is responsible for the confusion. It has no coherent meaning.
But that's not true. There's no such MAC. A MAC was derived from GCM but that MAC is not used to authenticate encrypted GCM messages.
If you can tell me which MAC is used for AES-GCM, go ahead, I'd like to know what it's called.
Poly1305 as used in NaCl (XSalsa20-Poly1305) or in TLS (ChaCha20-Poly1305) also will give different results, especially compared to the original proposal Poly1305-AES MAC, but this doesn't mean that those two are not compositions of a cipher and MAC.
GCM = CTR + GMAC
If you're, elsewhere, using GMAC for some reason and getting a different result when you combine it with CTR mode: Well, that's weird. I'd have to have two implementations to compare to analyze further. But otherwise, this conversation is moot.
So if you use GMAC, you'll have len(A) >= 0, and len(C) == 0.
If you use GCM with encrypted data, you'll end up with len(C) > 0, which will give you different results from GMAC.
GMAC is a special case of GCM, and GCM is not simply GMAC + CTR. This is supported by the NIST publication which defines them:
> If the GCM input is restricted to data that is not to be encrypted, the resulting specialization of GCM, called GMAC, is simply an authentication mode on the input data.
Again, GCM != CTR + GMAC, if you make this assumption you will fail to interoperate with correct implementations of GCM.
BLAKE2 is a good candidate right now considering speed. But SHA-256 instructions will only get more prevalent which will take care of the speed issue.
There have been speed comparisons between SHA-256 and BLAKE2 without SHA instructions. BLAKE2 seems like good option then, with the optimized code.
But once SHA extensions use hardware instructions, it gets a 5.7x boost which blows BLAKE2 out of the water. There are instructions for SHA-256 in NEON too although I couldn't find any benchmarks. I'd say it'll definitely be faster than BLAKE2 with HW instructions.
Combine that with SHA256 being uber popular everywhere, that'll be one less headache.
I also doubt SHA-256 will be over thrown in the near future, atleast until SHA-3 instructions become commonplace, which I anticipate will take about a decade from now.
EDIT: Looks like SHA-224 might be the best option, considering it has the same core with a truncated output which will make it immune to length-extension attacks.
Prefixing a MAC sounds like it'll work but I'm just wondering why complicate things.
Of course that "no loss in security" bit depends on a proper implementation. There are plenty of good library implementations of HMAC out there, and if you want a high speed MAC there's always Poly1305. But if all you have is SHA2 and don't need to interoperate with systems using HMAC, then this is a reasonably good way to go about it. It's certainly simpler than implementing HMAC on your own.
With 512/256 this is infeasible.
Is this a real concern with 224 in practice?
For this attack to work, one needs to know the full 256 bits, so that they can continue with appending their own data to the hash's internal state to get a valid hash. But in this case, they don't have the complete internal state.
So they can't really continue because the missing 32 bits are required to continue. Even though it is just 32 bits, they can't just validate the hash in anyway because they don't even know the complete message, just the length of the message. So the validation process isn't really possible.
Why this works for non-truncated hashes is because the attacker knows that the hash is valid for secret+data and also the hash internal state (which is the hash itself), but if the complete internal state isn't known, there's no way to attack it.
If the system is using the hash function in such a way that length extension creates a vulnerability, you can then try the 2^32 different possible valid length extended hashes, and one of them will be correct (and that will be evident because the exploit would work). But you are also correct that, in isolation, there is no way to determine which of the 2^32 resultant digests is the correct.
This obviously wouldn't work over internet because of the number of tries required, so what is a possibility that this would actually be a possible vulnerability?
Now I'm not sure you can conclude that this kind of scenario never allow for more efficient oracles.
There is SHA 224 which has 256 bit internal state and SHA 512/224 which has 512 bit internal state. In my previous comment, I was referring to the former.
Nevertheless, does this mean SHA-224 is still susceptible to length extension attacks?
I'm just saying, if you're going to recommend one of the SHA-2's, it should probably be one of the truncated ones.
Note that it doesn't have the hardware instructions that are going to be common place for SHA-256 and SHA-224. Also if you have a 32-bit ARM phone (which is most phones in existence), it will take a hit on performance. That's because SHA-512 has 64-bit word size while SHA-256 has 32-bit word size.
I thought I had the winner with SHA-256, but looks like I blanked on the fact that SHA-256 is susceptible to length extension attacks while the truncated versions of SHA2 are not.
So I'll say SHA-224 is the optimal choice because even if you are running on 64 bit platforms, the hardware instructions will make it faster than SHA-512. Also insisting on higher security than SHA-256 is not a very smart move. SHA-256 offers more than enough security margin.
Edit: Removed my comment regarding BLAKE as it was incorrect.
ChaCha20, Poly1305, BLAKE2 benefit from improvements that benefit a wide-range of applications, while SHA-3, AES and GHASH do not. Thus the "cost" of high performance support for the former can be amortised over a much wider base.
Also, sharing HW resource for cryptographic purposes is not possible for any device that needs to pass certain security certification.
Edit: Typo and additional comment
Most vulnerabilities in cryptosystems happen in the joinery. Anything we can do to eliminate joinery is going to make our cryptosystems more resilient. Selecting new primitives that will require hardware support to be performant seems like an own-goal.
As someone who has done a number of audits for certified devices, I don't think your statement about shared hardware is accurate. Are you talking about FIPS 140?
"As someone who has done a number of audits for certified devices, I don't think your statement about shared hardware is accurate. Are you talking about FIPS 140?"
Yes. Is my understanding incorrect? I'd like to be informed if this is the case. Thanks.
I really don't care about what the standards say; thankfully, the important standards, like TLS, aren't bound by what NIST standardizes.
>[The diversity of cryptographic primitives] contributes to code-size, which is a worry again in the mobile age
In the "mobile age" we use embedded processors with gigabytes worth of RAM and several times the amount of NAND storage, I don't really think a SHA-3 implementation is going to tip the boat over.
The author's other points seem very fair though.
I admit that when I see the 100MB+ size of apps that I have to download, I do wonder why. But I assume that it would be even worse were people not worrying about this stuff.
Can you elaborate on that? That's orders of magnitude bigger than a naive implementation, and wouldn't fit in L1 cache either.
It's a good point that diversity is expensive. There's a counter-argument too: if we all decide to use one thing, but it goes wrong -
then we're in serious trouble. If we have multiple approaches, then the damage is lessened if one of them turns out to have problems.
In other words, I think it's better to have a variety of approaches to manage risk, even if it does impose costs.
Ironically, when the RC4 attack refinements were published, for awhile sites were forced to choose between a vulnerable CBC implementation and using the vulnerable RC4 cipher.
With the exception of MD5 in X.509 certificates and RC4 in TLS, has there been another cryptographic vulnerability that required the retirement of a cipher? SWEET32 is about as close as you come to that, right?
And, obviously, there have been many more problems, even just in TLS, than X.509 MD5 and RC4.
If we had more stream ciphers at the time we would not have been stuck with beast or rc4.
Imo sweet32 was just the weak but well-marketed cherry on the top to get rid of the previous generation. We just had the sha-1 collision.
TLS 1.0 per RFC 2246 defined DES40_CBC, DES_CBC, 3DES_EDE_CBC, RC4_40, RC4_128, and IDEA_CBC.
RFC 4132 added CAMELLIA_128_CBC and CAMELLIA_256_CBC in 2005, RFC 4162 proposed SEED_CBC in 2005, and RFC 6209 in April 2011 informationally specified ARIA_128_CBC, ARIA_256_CBC, ARIA_128_GCM, and ARIA_256_GCM, but the GCM ones only work in TLS 1.2 or later. Same story with CAMELLIA_*_GCM.
Informational RFC 5469 in 2009 deprecated the use of DES and IDEA ciphersuites, so that left RC4, 3DES_EDE, the Korean ciphers SEED and ARIA, and CAMELLIA, with the block ciphers in CBC. Pretty much no one supported the latter three, and when Mozilla rejiggled the ciphersuites in 2013, support for these was removed entirely .
Your point still stands about BEAST being a flaw in the glue rather than the ciphers themselves, but I believe the parent poster makes a stronger point about the complete lack of AEAD constructions before TLS 1.2, and the lack of other stream ciphers which were more self-contained in TLS 1.0 and thus unaffected.
(Due to how all this is really implemented, this in fact was not technical problem, but problem of "design culture" or something like that)
What I mean is that if I make a new protocol, it could use a different type of crypto from what is in mainstream use, so at least this new protocol will fail independently from other protocols.
Faster primitives means more PBKDF rounds, more usage of hashes to secure underlying data.
Having several different, but throughly sensible algorithms who's strengths, weaknesses and limitations are well known is a better approach than saying "everyone use X" a-la NESSIE. Users with Paradox of Choice going on can pick the lunch special, AES.
It seems to me it's easy to break sha or md5, but what about breaking two fundemantally different algorithms at the same time?
Implementation seems as easy as concatenating the two hex outputs into a single string and comparing the results.
I'm curious to what others think of this idea.
(1) It's hard to use a multihash construct correctly. For example, do you want "hash1(message) + hash2(message)" or "hash1(message + hash2(message))" or "hash1(hash2(message))"? The answer is that it depends which features of a hash function matter for your application, and if you pick the wrong one you will have the worst of all worlds instead of the best of all worlds. (Simple example: "md5(password)+scrypt(password)" is a broken way to store passwords, while "sha3(md5(document))" is a broken way to authenticate documents; vice versa would probably be OK.)
(2) If you're paranoid, bytes and CPU cycles are (arguably) better spent on extending your best option instead of tacking on less good ones. Breaks tend to come in the form of "it is now feasible to crack a key with length X" or "find collisions with a hash with Y rounds" or whatever, where X and Y were originally picked to strike a balance of security and performance. So if you were going to spend performance on something nonstandard to guard against mathematical breakthroughs, experts might recommend doubling X and Y instead of throwing in a second less-secure cypher. (Although really they'd tell you just to use the standard thing.)