
Blake3 is 10 times faster than SHA-2 - tosh
https://nextjournal.com/mk/BLAKE3
======
nneonneo
A lot of people in the comments are wondering about using SHA-2 vs Blake3 for
password hashing. The answer is, neither of these is suitable by itself for
password hashing. Sadly there’s still a lot of advice on the internet (and,
sadly, real systems) which do things like store md5(password) in a database
and call it a day.

Blake2/3, SHA-2/3, and other cryptographic hash functions are meant to be
_fast_ , while also effectively guaranteeing no collisions or any way of
reversing the hash (pre-image attack). This makes them ideal for stuff like
checking file integrity, message authentication, inputs for cryptographic
signatures and the like. But since they’re meant to be fast, they’re very poor
choices for hashing low-entropy data such as passwords - attackers can test
vast numbers of passwords (even if you salt!!) on GPU hardware.

For low-entropy data, you want a hash function that is slow and hard to
parallelize. Use a specialized password hashing algorithm like bcrypt, scrypt
or Argon2, and tune it so it’s slow enough without impacting overall usability
(e.g. takes 50ms on your server). These algorithms are resistant to
parallelization, so attackers will only be able to test a small number of
passwords per second. This slows attackers down and can prevent them from
recovering passwords that might be recoverable if stored using SHA-2, etc.

~~~
littlestymaar
This is a good time to ask for a question I have always wondered about these
slow hash functions: are they slow because they are computationally intensive
or is it that they have other tricks that makes them slow without costing too
much load on your server (like needing RAM access, syscalls or anything that
add latency but not much CPU load). If it's the former, isn't it a good DOS
target, making your server busy hashing random crap and unable to process
other requests?

~~~
eyegor
> are they slow because they are computationally intensive...

In general, no. Most "slow" hashing schemes are designed to be memory-hard and
highly serial (more threads = more contention, not throughput). Some of the
earlier approaches just try to maximize computational requirements, but modern
research acknowledges the march of computing power makes that an inherently
poor choice for long term sensitive data. Argon2 is just one of the more
advanced versions of a memory-hard scheme, building on prior research. I could
try to give a poor explanation, but instead I'll link you the spec which is
very readable: [https://github.com/P-H-C/phc-winner-
argon2/blob/master/argon...](https://github.com/P-H-C/phc-winner-
argon2/blob/master/argon2-specs.pdf)

~~~
littlestymaar
Thanks!

------
baby
JP Aumasson just announced Blake3 at Real World Crypto a few days ago during
the lightning talks. This was also right after he presented on "Too Much
Crypto"[1] which argues that we use too many rounds in symmetric
constructions: our security margins are too high and don't match any of the
best "practical" attacks. We're too paranoid for our own good. He suggests
reducing the number of rounds for a number of constructions like AES, SHA-3,
Blake2, etc.

[1]:
[https://eprint.iacr.org/2019/1492.pdf](https://eprint.iacr.org/2019/1492.pdf)

Blake3 is in part a reduction of number of rounds (so it's faster), it's also
a bit of a design change (the round function, and the parallelization), and
it's now just one multi-purpose function (instead of several instances) that
takes an arbitrary-length input and gives an arbitrary-length output.

My questions are:

* what are standards using Blake2 (like Noise) going to do?

* what do cryptanalysts think?

* is it really that fast? (I'll have to benchmark this for my own use-case.)

~~~
dathinab
> which argues that we use too many rounds in symmetric constructions: our
> security margins are too high and don't match any of the best "practical"
> attacks. We're too paranoid for our own good. He suggests reducing the
> number of rounds for a number of constructions like AES, SHA-3, Blake2, etc.

This seems a quite dangerous way of thinking. New attacks, methods to attack
crypto are discovered regularly. A view years ago people also said SHA-1 will
never get broken to the degree it just got broken a recently.

 __So the only think we can do is to be slightly paranoid, everything else
would be strongly negligent __.

Especially if we consider how long it often takes in practice to update
software and phase out previous encryption and hash methods without having to
worry about e.g. downgrade attacks or similar.

Sure software should be possible to be updated asap, but honestly how long
will it take until SHA-1 is phased out?

EDIT: Through that also means if you believe you will always be able update
the _deployed_ relevant parts of your software ASAP (which can't work for
libraries) and you are sure that you will follow the crypto community close
enough and you believe some one else does so to if you are unavailable because
of sickness, then yes go ahead and use less paranoid/secure crypto.

~~~
pvg
_This seems a quite dangerous way of thinking. New attacks, methods to attack
crypto are discovered regularly._

It's not, really, because there is obviously _some_ security margin that is
too high. It makes a lot of sense to consider and try to quantify what that
is. If all we have is some vague 'well, there are always new attacks' then how
do we know the current margins are enough? Why not double the rounds, crank up
the hash sizes, etc? It's not really a meaningful response to Aumasson's work.

~~~
im3w1l
1\. As a community: Because you have to balance cost and security. Maybe we
could increase them a bit without it stinging too much. But we can't do it
boundlessly.

2\. As an individual: Because I'd probably mess it up and pick a number of
rounds that is e.g. divisible by 23 which weakens the system for some number
theoretical reason that's far beyond my understanding.

------
zimmerfrei
This is a bit pointless, since it does not say on which platform the
comparison is made.

In any case, KangarooTwelve [1][2] (on AVX-512) is also 10x faster than
SHA-256.

KangarooTwelve is a fast hash function of the Keccak family (from which the
SHA-3 derives from), and it is clearly what BLAKE3 tries to catch up with.

However, KangarooTwelve probably benefits from the more extensive scrutiny
that SHA-3 has received and will receive, and (from a performance standpoint)
from future HW extensions that will speed it up even more (the first one
appeared on ARMv8.2).

[1]
[https://keccak.team/files/KangarooTwelve.pdf](https://keccak.team/files/KangarooTwelve.pdf)

[2]
[https://www.cryptologie.net/article/393/kangarootwelve/](https://www.cryptologie.net/article/393/kangarootwelve/)

~~~
throwawaymath
There is a comparison of Blake3 to various hash functions (including SHA-2 and
SHA-3 families) on AWS c5.metal; see the chart here:
[https://github.com/BLAKE3-team/BLAKE3](https://github.com/BLAKE3-team/BLAKE3)

~~~
zimmerfrei
Yes, and it didn't go unnoticed that KangarooTwelve (which - mind - does not
belong to the SHA-3 family right now) is not included on that prominent
comparison diagram, while it is considered in the BLAKE3 paper.

~~~
oconnor663
I felt it would've been unfair to include KangarooTwelve in that particular
bar chart, because at 16 KiB of input it hasn't reached its peak throughput.
At the same time, the goal was to focus on more widely used algorithms.

------
lucb1e
Speed comparison from the BLAKE3 authors:
[https://github.com/BLAKE3-team/BLAKE3/](https://github.com/BLAKE3-team/BLAKE3/)

I hadn't heard of BLAKE3 before (I thought BLAKE2b was the latest and
greatest), judging by commits from the repositories of that user it's about
half a year old now.

Paper's abstract starts with:

> BLAKE3 [is] an evolution of the BLAKE2 cryptographic hash that is both
> faster and also more consistently fast across different platforms and input
> sizes. BLAKE3 supports an unbounded degree of parallelism, using a tree
> structure that scales up to any number of SIMD lanes and CPU cores. On Intel
> Cascade Lake-SP, peak single-threaded throughput is 4× that of BLAKE2b, 8×
> that of SHA-512, and 12× that of SHA-256, and it can scale further using
> multiple threads. BLAKE3 is also efficient on smaller architectures [like]
> 32-bit ARM1176

In the introduction, the reason for BLAKE3 is described:

> A drawback of BLAKE2 has been its large number of incompatible variants. The
> performance tradeoffs between different variants are subtle, and library
> support is uneven. BLAKE2b is the most widely supported, but it is not the
> fastest on most platforms. BLAKE2bp and BLAKE2sp are more than twice as fast
> on modern x86 processors, but they are sparsely supported and rarely
> adopted. BLAKE3 eliminates this drawback. It is a single algorithm with no
> variants

Paper:
[https://github.com/BLAKE3-team/BLAKE3-specs/](https://github.com/BLAKE3-team/BLAKE3-specs/)
(By the way, the download doesn't work for me in the latest Firefox on Linux
and GitHub's rendering is plain awful. Not sure if anyone else can reproduce
that or if my Firefox is broken.)

~~~
zahllos
The reason you haven't heard of it before now is that it was released
2020-01-09, last Thursday at the time of writing, at Real World Crypto.

~~~
lucb1e
The only author of BLAKE3 that spoke there was JP (@veorq). The day matches
but the title of his talk is "Too Much Crypto". Is that where it was
announced? (Quick link to program:
[https://rwc.iacr.org/2020/program.html#day-2020-01-09](https://rwc.iacr.org/2020/program.html#day-2020-01-09))

~~~
doomrobo
It was announced in a lightning talk following that session

~~~
lucb1e
That doesn't sound very official. Most text in the paper seems to date from
August, so it looks like this has been out for a while and was not only just
announced, even if it was mentioned or officially "announced" (while being
already released) in a surprise talk at that conference.

~~~
oconnor663
JPA and I announced BLAKE3 at the RWC lightning talks, after JPA's Too Much
Crypto talk, and we published all the repos and crates.io packages about 30
minutes prior to that. The commit history that you see in the repos was
private until January 9.

------
mintyc
Do the runtime tests decouple from I/O times e.g. file and memory caching
aren't affecting the results?

Surprised because I'd expect a big file like that to be I/o bound in practice

~~~
thephyber
There is a tech talk about Bao (I think it’s the same author) where he
prefaced the talk by describing a few caveats like “benchmarks are lies” and
he mentioned that you should always run a timed process a second time to tease
out bias from cached/pages instructions and data.

The core reason this method is faster than traditional SHAs is because Blake3
is a binary tree where the leaf blocks can be hashed in parallel and SHA is
entirely sequential

~~~
lucb1e
> you should always run a timed process a second time

I should hope that anyone doing benchmarks does them multiple times to avoid
crap like popcon or adobe reader update randomly keeping the system busy
during one of the two algorithms' benchmarks. I don't expect that the author
ran this only once and decided to make a blog post based on that, even if the
post doesn't show multiple runs.

~~~
kvlr
Author here, I did run it a few times but you don’t have to take my word for
it, you can rerun the notebook yourself if you sign up for Nextjournal and
remix my notebook. Full disclosure: I’m also a Cofounder of Nextjournal.

But I wouldn’t suggest to use Nextjournal for serious benchmarking (yet).
We’re running on Google Cloud and it’s not suited for benchmarking unless you
pay for a (big) sole tenant instance. In the future we plan to offer dedicated
instances for benchmarking.

------
diego_moita
Is "faster" always a good thing in hashing algorithms?

Suppose a hashed password database falls into wrong hands. If they're hashed
with a faster algorithm wouldn't it be easier to try a dictionary attack to
discover the real passwords?

~~~
geofft
You can always make a fast hashing algorithm into a slow-enough password
hashing algorithm by repeating it multiple times (as with PBKDF2) or better
yet using a construction that also uses a minimum amount of memory, not just
computational time. For most cryptographic applications (signing data in
transit, signing data at rest, computing things like Merkle trees for
applications like git and dm-verity), you want a hash that is fast to compute.
It's basically only password hashing and cryptocurrency mining that needs to
be slow.

BLAKE3's readme says:
([https://github.com/BLAKE3-team/BLAKE3](https://github.com/BLAKE3-team/BLAKE3))

 _NOTE: BLAKE3 is not a password hashing algorithm, because it 's designed to
be fast, whereas password hashing should not be fast. If you hash passwords to
store the hashes or if you derive keys from passwords, we recommend Argon2._

Argon2 ([https://github.com/P-H-C/phc-winner-
argon2/blob/master/argon...](https://github.com/P-H-C/phc-winner-
argon2/blob/master/argon2-specs.pdf)) is based on BLAKE2b.

------
diegocg
I don't think it's a coincidence that BLAKE3 has been released shortly after
the latest SHA-1 news. Many SHA-1 users would rather avoid the performance
impact of SHA-2/3.

------
PabloGranolabar
Reducing 10 rounds in Blake2 to 7 in Blake3.. huh?

The entire industry is going in exactly the opposite direction of this, to
increase the complexity and depth of the attack surface of cryptographic
methods and algorithms due to advances in quantum computing. Cloudflare,
Microsoft Research etc are all investing heavily into post-quantum
cryptography technologies, almost exactly the opposite of what this guy is
promoting. Makes no sense. ‾\\_(ツ)_/‾

------
blakesterz
This is completely pointless, but there's so many random things I share a name
with, and somehow I've only met a few PEOPLE named Blake in real life.

------
rhn_mk1
What are the criteria for choosing an integrity checking hash? Speed is always
desired, and longer output makes collisions due to random bit swaps less
likely.

Even CRC32 is still used in embedded systems for short pieces of data.

Is there anything else important for detecting transmission errors if the data
was sent encrypted, or is something like MD5 still sufficient for such uses?

Is Blake3 even worth considering for such a use case?

~~~
dagenix
Never use MD5 except for compatibility.

~~~
lucb1e
This. There are hash functions that are faster than MD5, so it's not the
fastest, and everyone knows (since 1996, actually[1]) that MD5 is insecure.

If you want transmission error checking, use CRC32 or Siphash or whatever. If
you want an actual, cryptographic hash function, then BLAKE3 is one of the
options, though it's a very young one so unless you really know what you're
doing, you should probably not use this for another few years. Heck, if
anything in this comment is new to you, you'll want to steer clear of such
low-level decisions and just use a library that makes the decision for you.

[1] [https://en.wikipedia.org/wiki/MD5](https://en.wikipedia.org/wiki/MD5) "In
1996, Dobbertin announced a collision of the compression function of MD5
(Dobbertin, 1996). While this was not an attack on the full MD5 hash function,
it was close enough for cryptographers to recommend switching to a
replacement, such as SHA-1 or RIPEMD-160."

~~~
rhn_mk1
> Heck, if anything in this comment is new to you, you'll want to steer clear
> of such low-level decisions and just use a library that makes the decision
> for you.

This reads very condescending: I posed the question to learn something, not be
told off.

~~~
lucb1e
I'm sorry, I hate it when others do that and don't notice when I do it myself
apparently.

It was really just meant to be read in the literal sense and not as that such
a person is of less value or stupid. It can easily (and has) lead to security
breaches if assumptions are made about this. I'm not saying _you_ don't know
the difference between md5 and crc32, but if anyone doesn't yet know that,
then it's risky to make decisions about these things until they learn more.
Cryptographers commonly advise non-experts to use a library to do things for
you (even those that know a bit more) since cryptography can be very subtle,
but clearly I should have phrased that better.

~~~
rhn_mk1
> Cryptographers commonly advise non-experts to use a library to do things for
> you

On the high level, the problem is that libraries are usually "CRC32", "MD5",
"SHA256", etc. There's no guiding resource that I have found that matches use
cases to libraries, which is where the initial question comes from.

That is, unless you meant a library that covers the entire use case, but there
isn't always one.

~~~
SAI_Peregrinus
That's part of why NaCl, Libsodium, and Libhydrogen have the APIs they do:
they cover secure ways to use the primitives, make those ways easy, and make
it harder to misuse things. It's also why so many security professionals like
the "noise" protocol: it doesn't leave any of the difficult choices open to be
misused.

------
jupp0r
This benchmark is not a comparison between BLAKE3 and SHA-2, it’s a comparison
between b3sum and sha2sum. Things like file access pattern, hiding io latency,
distributing work to cores, etc can make huge differences that have nothing to
do with the hash functions themselves.

------
nullc
Unless you have SHA-NI in your processor... in which case Blake3 is slower
than SHA-2.

------
andybak
I look forward to a future point in time where Blake7 is released.

~~~
oconnor663
[https://twitter.com/dchest/status/1215751501567709185](https://twitter.com/dchest/status/1215751501567709185)

~~~
JdeBP
1978 isn't on the graph. (-:

Hint: andybak was making a joke.

~~~
oconnor663
Straight talk, should I watch this show? :)

~~~
fanf2
A friend recently wrote a re-watch review:
[https://nwhyte.livejournal.com/tag/tv:%20blakes%207](https://nwhyte.livejournal.com/tag/tv:%20blakes%207)

------
wolf550e
I would like to see the performance of KanagarooTwelve in the same tree
structure as BLAKE3, with AVX-512, on the same hardware (AWS c5.metal).

~~~
oconnor663
Take a look at Figure 3 in the Performance section of the BLAKE3 spec:
[https://i.imgur.com/smGHAKA.png](https://i.imgur.com/smGHAKA.png). That graph
doesn't account for differences in the tree structure, but it compares the
performance of both with AVX-512.

------
nloladze
Uhhh, I have an idea how sha-256 works but not blake3. Is blake3 better for
storing passwords? If not, what is the go-to recommended hash for password
storage of say, a simple front-facing app for consumers.

~~~
klodolph
SHA-256 is absolutely not recommended for storing passwords, and it was never
designed for that purpose. If you are storing passwords with SHA-256 you
should immediately migrate them to a password hashing algorithm. See
“Upgrading Legacy Hashes” in the link below.

[https://cheatsheetseries.owasp.org/cheatsheets/Password_Stor...](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html)

Recommended algorithms:

\- Argon2

\- PBKDF2

\- Bcrypt

------
ww520
Is it better in preventing hash collisions?

~~~
lucb1e
Compared to?

MD2, MD4, MD5, SHA-1, etc.: There are collisions known, so yes.

SHA-2, SHA-3, BLAKE2b, Skein, Whirlpool, etc.: We don't have any collisions
for these, so it doesn't get any better than that.

~~~
espadrine
> _SHA-2, SHA-3, BLAKE2b, Skein, Whirlpool, etc.: We don 't have any
> collisions for these, so it doesn't get any better than that._

There are important differences; some of those primitives are definitely more
trustworthy than others.

SHA-2 and Whirlpool use the old Merkle–Damgård construction, which has fallen
out of favor because of unfortunate properties.

They are vulnerable to length extension attacks, such that it can be misused.
For instance, if you want to use it as a keyed hash, you cannot just do
sha2(key + message) (where + is concatenation). You have to use something like
HMAC for that.

Skein is much better, but has a bit of an aged design now, and an insane 72
rounds. It doesn’t have as much love as BLAKE2 for instance, which makes it
unlikely that people will build things competitive with BLAKE3 with it.

SHA-3, BLAKE2b and BLAKE3 are the best. Like Skein, they are not vulnerable to
length extension attacks, so they have a trivial keyed hash function
construction.

SHA-3 in particular has an elegant new design, the Sponge construction, which
yields precise cryptographic protections in terms of the number of bits of
security. It has displaced Merkle–Damgård as The Way We Do Things™ now; many
modern hashes use it, eg Gimli relied on it to make comparisons.

One of the great things about this form is that it makes it a XOF: you can
make your hash as long as you desire. (BLAKE also shares this property, but
with a different construction.)

However, SHA-3 is slow, so if you want speed, use BLAKE2b (or Kangaroo12) over
SHA-3, and BLAKE3 over them both.

If you want to be conservative about security and don't care about speed, a
safe bet is still SHA-512/256, which is safe against length extension attacks.

~~~
lucb1e
That was informative, thanks! It matched my gut feeling about the algorithms
and I knew some of these things, but not everything :)

> a safe bet is still SHA-512/256, which is safe against length extension
> attacks.

I would like to add here that this does not mean "SHA-512 or SHA-256" (I
remembered that SHA-256 was not length-extension resistant and looked up the
details). Instead, it means the version of SHA-512 that gets truncated to 256
bits. This is apparently relatively new (or at least added after I last read
up on SHA-2):

> In March 2012, the standard was updated in FIPS PUB 180-4, adding the hash
> functions SHA-512/224 and SHA-512/256, and describing a method for
> generating initial values for truncated versions of SHA-512.

\--
[https://en.wikipedia.org/wiki/SHA-2#Hash_standard](https://en.wikipedia.org/wiki/SHA-2#Hash_standard)

I feel like that notation could have been chosen better by NIST.

~~~
nayuki
The hash function "SHA-512/256" has different initial constants than
"SHA-512", so it is not equivalent to truncating a "SHA-512" hash value.

SHA-224 and SHA-384 are somewhat resistant to length-extension attacks because
they output a truncated state.

Yeah, the slash in the names are an unfortunate choice that hurts clarity.

------
icf80
If you hash the password, and then use that hash as the key to
encrypt(AES/ChaCha20) the original password it self, isn't this good?

------
jagged-chisel
I can write a fast digest algorithm. Doesn't mean it's safe for anything more
than triviality.

------
SlowRobotAhead
I have a project I couldn’t use SHA-2 on because of speed and because it was a
224bit or 256bit result.

Is there a way to use Blake3 in a reduced hash result way? Like 12-16 byte
results. (IIRC, you can not truncate hashes for smaller results)

~~~
garmaine
SHA-224 is literally truncated SHA-256. There's no security proof this is
safe, but only in the sense there is no proof SHA does what it claims to do at
all. Hash truncation is in the design of SHA-2. There's even an appendix of
one of the FIPS documents explaining how to do the truncation for arbitrary
sizes.

But be very careful to make sure that you only need preimage resistance...

~~~
bscphil
A 224 bit output is going to be collision resistant too, though, for the
foreseeable future. And when it comes to preimage resistance, even md5 is safe
for the time being. (With the usual qualifier that there's absolutely no
justification for using it with better hashes available.)

~~~
garmaine
He's talking about 12 to 16 byte outputs though, which is 96-128 bits of
preimage resistance, and only 48-64 bits of collision resistance.. which would
be very broken today.

~~~
SlowRobotAhead
Mostly.

There are 12-16 byte hash or hash-like results are plenty secure. They’re
keyed though.

The output from HMAC and AEAD ciphers with no encryption (think the additional
data portion of ChaCha20-Poly1305 in a Nonce-MAC mode)... or maybe I’m wrong
because these require nonces and keys.

~~~
garmaine
Whether you are using it in a keyed authentication mode is orthogonal to
collision resistance.

------
rurban
Not really. It's even 2x slower than SHA256-NI, the builtin.

b3sum is much faster than sha256sum, and blake3 is about 2x faster than
blake2.

[http://rurban.github.io/smhasher/doc/table.html](http://rurban.github.io/smhasher/doc/table.html)

~~~
antpls
Not sure why you are downvoted. Your table looks more factual than the
blake3's github.

~~~
espadrine
RE: the downvotes; while I didn't partake, I know there is history between the
BLAKE/BLAKE2/BLAKE3 author, JP Aumasson, and Rurban, in particular related to
differing opinions over the publication of SipHash.

Differing opinions here means that most of the cryptographic community agree
that SipHash clears its claims as a PRF, while Rurban indicated he has found
undisclosed vulnerabilities that he wishes not to publish.

RE: the table; a point to note is that BLAKE3's main implementation, in Rust,
benefits a lot from multithreading on large files, which SHA2 cannot. The C
version, which Rurban copied, does not leverage multithreading:
[https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.m...](https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md)

~~~
temac
Comparing single threaded speeds is also very valid for tons of contexts. Of
course if one algo can't be multithreaded, maybe it will be better to use one
that can instead, if that's interesting for a particular application, but it
won't be _necessarily_ interesting for _all_ applications. So there is no
single ordering about which is faster, in this case. If you sometimes needs to
compute hashes on a computer that would have multiple unused cores at that
moment, the multithreaded hash will have an advantage. If you are always
loaded, you better consider the total amount of work; and maybe likewise if
you want to prioritize energy efficiency (although it will not be completely
linear)

------
throwaway838475
I always shake my head when performance is touted for new
functions/constructions. It’s more important to minimize the costs of
legitimate use while maximizing the costs of software and hardware attacks.
Minimizing the former inevitably leads to minimization of the latter in brute
force (cloud cluster, ASIC, FPGA, etc.) attacks. The fundamental impetuses of
scrypt (a PBKDF rather than an unauthenticated hash function) was a righteous
and necessary advance that must never be forgotten with each fashionable
iteration of newness that seems to lose sight of how faster or newer isn’t
always automatically better.

