
A SHA-1 chosen-prefix collision attack - hansoolo
https://www.zdnet.com/article/sha-1-collision-attacks-are-now-actually-practical-and-a-looming-danger/
======
arkadiyt
Their attacks are based on previous chosen-prefix work from Marc Stevens, who
tweeted this about the attack [1]:

"Their $100K figure is based on as-of-yet undisclosed improvements. History
shows many claims of low-cost SHA-1 attacks that have not stood up to peer
review. I am very sceptical that their attack costs in total less than the
$110K building block (SHAttered) that they use."

[1]:
[https://twitter.com/realhashbreaker/status/11282604227868549...](https://twitter.com/realhashbreaker/status/1128260422786854913)

~~~
pslam
Marc Stevens quotes $500K, which is very much still a threat (even an order of
magnitude more would be). Plenty of organizations would be willing to spend
that much pocket change on a single attack.

The game-changer is it's chosen-prefix. A vendor can produce a pair of
entirely different binaries with the same hash, but most importantly, they
look and behave sane except for the last few blocks of the file. This is
easily hidden, especially if the binary is encrypted.

It's not a stretch of the imagination to see how, for example, an IP camera
vendor could do exactly this. Yes, it requires a nefarious/complicit vendor,
or an insider who can pull this off undetected (not everyone has a fully
automated build/release pipeline).

So it changes the threat model. SHAtter was waived by many because the threat
model didn't convincingly apply to them. Example: git. That analysis needs to
be repeated.

(All this assuming the attack described in the paper is correct and practical
in real world implementation)

~~~
bin0
Git isn't really designed for cryptographic security, is it? I have heard that
Linus wants it mostly secure so people can verify the integrity of linux
source code, but it's not its core competency, so to speak. Though I suppose a
project the size of the linux kernel could be a serious target for a collision
attack.

Regardless, it's switching to SHA-256:
[https://stackoverflow.com/questions/28159071/why-doesnt-
git-...](https://stackoverflow.com/questions/28159071/why-doesnt-git-use-more-
modern-sha/47838703#47838703)

~~~
gnomewascool
Git signatures _are_ designed for cryptographic security, and they (currently)
rely on the SHA-1 hashes.

When you're signing a commit (or a tag) you're just signing the commit (or
tag) message which includes the SHA-1 hash of the relevant "root" tree object
(or commit object). The tree object, in turn is (in effect) a list of SHA-1
hashes of the files directly in the directory, along with their file sizes and
permissions, plus the hashes of the tree objects corresponding to any
subdirectories.

Consequently, if you replaced a file with another having the same SHA-1 hash
(and the same file size — a considerable complication), all the hashes would
remain the same and the signature would still be valid.

Obviously, once git transitions to SHA-256, the problem will disappear.

~~~
pslam
An easy example with git would be to create a pair of read-only repositories:
one public-facing which is cloned by the general public, and one with
(potentially entirely) different contents which can be selectively pulled
depending on the client.

There's a complication with the few appended trailing blocks being invalid
data, but the format might allow it, and git doesn't verify its integrity
recursively.

~~~
avar
> git doesn't verify its integrity recursively

Yes it does. On clone the entire history is recursively hashed, and
incrementally on fetch.

There isn't even any place in the protocol to transfer pre-hashed content, it
must be hashed to make it addressable.

------
anderskaseorg
“Everyone should switch to (in order of preference): • BLAKE2b / BLAKE2s •
SHA-512/256 • …”

You know, SHA-512/256 was a terrible name. For someone who’s not a
cryptographer, it’s way too easy to confuse the single algorithm SHA-512/256,
which resists length extension attacks, with the pair of algorithms SHA-512 /
SHA-256, which do not.

~~~
foobiekr
"truncated sha512" would be a much better way to talk about it.

~~~
wiml
It isn't exactly truncated SHA-512 — that is, you can't compute a SHA-512/256
hash by computing a SHA-512 hash and then truncating it. Although the
algorithm is the same as doing that, the initial state of the hash context is
different.

(The same is true for SHA-224, which could be called SHA-256/224; it's a
truncated SHA-256, but with a different internal state.)

~~~
foobiekr
True, that’s a good point.

------
evv
Can collision attacks provide the same size of data? I suspect it would be
dramatically more difficult to produce a collision of equal data-size as the
original.

So perhaps, the easiest way to defend against a collision attack is to
transmit the size of the data alongside the checksum. Like a checksum, it is
extremely lightweight and easy to check.

In my data framework project (where I use sha1 to identify blocks), I've been
looking for an excuse to do this, because knowing size of a block is
incredibly useful for applying performance heuristics. I suspect it is also a
significant defense against collision attacks. Am I wrong?

~~~
chias
I included this comment in a response to a child comment, but reproducing it
here to make sure you see it:

If your output includes the size of the input, then what you've got is no
longer a hash function. A hash function is defined as taking an arbitrary
input and returning an n-bit output for some fixed value of n. By including
the size of the input in your output, the size of your output grows
logarithmically with the size of your input. This may seem pedantic, but
fixed-size-output and arbitrary-size-input is extremely important for general
usage of a hash function.

~~~
justincormack
Sha1 and sha256 already hash the message length into the hash function and it
is limited to 2^64 bits so they do not support arbitrary input size.

------
zdw
From the paper, this doesn't seem to be able to create a collision while
retaining the same length of input data.

It seems that checking both the hash and input length would be a very cheap
way of identifying attempts at hash collisions.

~~~
chrisseaton
If you're going to modify your code to add an extra check you might as well
just switch to another algorithm entirely.

~~~
Someone1234
A lot of existing schemes already check both.

The above comment didn't mention modification explicitly and could be read as
talking about the level of threat this poses.

------
mzs
paper:
[https://eprint.iacr.org/2019/459.pdf](https://eprint.iacr.org/2019/459.pdf)

------
wolf550e
previous discussion:
[https://news.ycombinator.com/item?id=19878917](https://news.ycombinator.com/item?id=19878917)

------
macawfish
Last night I was pondering about future "compression" schemes that relied on
hyper-powerful quantum computers that can resolve hash collisions very, very
quickly, so that rather than literally compressing the data, you'd just share
a hash and then this hyper-powerful computer will enumerate possible strings
of data that give that hash, and then check them against some secondary
condition (another hash?) to find the right one.

I don't know how feasible this actually is, but it made for an interesting
sci-fi thought.

 _edit:_ Thanks for all of the enlightening replies!

~~~
paulddraper
The problem is that is a _huge_ number of strings that becomes a hash.

Each SHA-256 hash has 2^(8 * 1024 - 256) = 10^2389 possible 1KB inputs.

So...perhaps it could be done, but the quantum computer would have programmed
with a very good selection criteria.

~~~
stouset
No, there is no way this could be done, because there is no way to know which
of multiple colliding inputs was the right one. Imagine a one-bit hash
function. You start your “decompression” process and read in a 0. What input
produced that bit? Literally fifty percent of _all possible strings_ would
produce that same output. Without more information, you cannot choose between
them. And the information needed to choose correctly is exactly the same
amount of information in the original input.

[https://en.m.wikipedia.org/wiki/Pigeonhole_principle](https://en.m.wikipedia.org/wiki/Pigeonhole_principle)

~~~
greiskul
Yup, the pigeonhole principle really defines a hard limit on compression, in
that it is impossible to compress (make smaller) _ALL_ files.

The way compression algorithms get around that, is by abusing the fact that we
rarely want to send around arbitrary data, real world data has lots of
redundancy in it, so we can make algorithms where real world data gets mapped
into compressed files smaller then they are (by finding the redundancies in
the data), and purely random data gets mapped into files that are actually
slightly longer than the data.

~~~
caf
Often rendered as the pithy phrase _" A compression algorithm is actually an
expansion algorithm with some interesting failure cases."_

------
ChrisSD
It must be about time for SHA-4.

~~~
CiPHPerCoder
Nope. SHA-2 (known to developers as SHA224, SHA256, SHA384, and SHA512) was
the replacement for SHA-1. SHA-3 was created as an insurance policy in case
the SHA-2 family was broken too. So far, it hasn't been.

We won't need a SHA-4 any time soon. SHA-2 is fine, BLAKE2 is fine (and
faster), SHA-3 is fine.

~~~
optimiz3
Also, SHA-2 underpins Bitcoin; it's the ultimate billion dollar pot of gold.
SHA-2 is perhaps the most exhaustively researched (both publicly and
privately) cryptographic hash because of this.

~~~
mehrdadn
Do you know why it's said to be last-resort in the article?

~~~
CiPHPerCoder
Catalin was quoting me in the article, so it's only fair that I elaborate
here.

There are four _common_ flavors of the SHA2 family you're likely to run into:

    
    
      - SHA-224
      - SHA-256
      - SHA-384
      - SHA-512
    

And then there are two more variants of "truncated SHA-512" (except they also
use different initialization vectors than SHA-512, which is kind of an
important detail)

    
    
      - SHA-512/256
      - SHA-512/224
    

These latter two don't have _nearly_ the cross-platform support as the first
four.

For example, in PHP, hash('sha256', 'some text') worked since PHP 5.1.2
(without PECL), but hash('sha512/256', 'some text') didn't work until PHP
7.1.0 (which is only a few years old).

See for yourself: [https://3v4l.org/A1dZc](https://3v4l.org/A1dZc)

As a typical software developer, you might see SHA-{magic/numbers} and
probably discover from Google/StackOverflow/etc. that they're in the SHA2
family and that the SHA2 family is secure, and then reason that whatever
you're doing must also be also secure.

But there's a problem.

SHA-256 and SHA-512 (not the truncated varieties) are known to be vulnerable
to length extension attacks. This is only a problem if you're using these hash
functions _in a vulnerable way_. (Which isn't as uncommon as you'd think in
homebrew crypto.)

If you're using HMAC, length extension attacks are a moot point. There's a
reason 'tptacek always recommends HMAC for symmetric authentication.

SHA-224, SHA-384, SHA-512/224, and SHA-512/256 are not vulnerable to length
extension attacks.

BLAKE2 is not vulnerable to length-extension attacks. It's also at least as
secure as SHA2, but faster than MD5. SHA3 is at least as secure as BLAKE2, but
is significantly slower in software.

Thus, the recommendations are:

    
    
      - BLAKE2
        for speed and security
      
      - SHA-512/256
        if you want speed, length-extension attack resistance, and arbitrary standards compliance
      
      - SHA3-256
        if you care more about security and what the FIPS authors think than speed
      
      - SHA-384 
        if you're concerned about backwards compatibility but don't want to accidentally cause
        junior developers that poorly mimic your designs to introduce LEAs into their code
      
      - Any other SHA-2 family hash function 
        if you're not interested in all of this nuance and want something
        guaranteed to be secure for the next few years that is widely implemented
    

Of course, there should be a huge asterisk with this list: If you're not a
crypto expert, _you shouldn 't be making this decision_. Just stop using SHA1.

It's also worth noting that Marc Stevens-- hash function breaker
extraordinaire-- disagrees with my recommendation because BLAKE2 isn't a
{NIST,FIPS,ISO,whateverStandardsBodyYouTrust}-approved hash function, and for
better or worse, believes strongly in reinforcing public trust in standards
organizations.
[https://twitter.com/realhashbreaker/status/11283816001468948...](https://twitter.com/realhashbreaker/status/1128381600146894848)

He has a point _in general_ , but in this specific case, I think BLAKE2 is
going to become the de jure SHA2 successor at least until SHA3 hardware
acceleration becomes ubiquitous.

Also, standards bodies have a nasty habit of digging in their heels on their
mistakes, instead of issuing new guidance in response to research. See also:
WPA3 and Dragonfly vs SPAKE2-EE or OPAQUE.

For a higher-level example, look at the failures baked into the JOSE standards
(JWT, etc.) versus PASETO.

Fun fact: If you take JOSE and replace JSON with CBOR, without fixing any of
the protocol security problems, you get COSE... which reared its ugly head in
W3C's WebAuthn standard. Bad standards never die.

Until we get a standards committee that isn't garbage, I don't entirely agree
with Marc's appeal to faith here.

While NIST et al. certainly do a better job at deciding on primitives than
self-styled post-2010 cypherpunks (y'know, the ones that try to cascade a
bunch of ciphers together in case one is broken but then use CRC32 for mixing
files into the encryption key?), their failure to correct (let alone learn
from) their mistakes and update recommendations in a timely manner is a
problem that can't be dealt with through blind adherence to whatever they
publish.

~~~
avar
> I think BLAKE2 is going to become the de jure SHA2 successor at least until
> SHA3 hardware acceleration becomes ubiquitous.

I think you mean "de facto", unless you think NIST is going to amend SHA-3 at
this point to designate BLAKE over Keccak.

As someone on the sidelines, having read the linked Twitter discussion you had
with Marc Stevens your summary of it honestly seems a bit disingenuous.

You mention speed prominently, but fail to mention his counterargument that
raw hash speed isn't relevant for most applications.

In practice hashing speed is drowned out by other things, you're not going to
have "Android/iOS devices" (as you bring up) hashing GBs of data as a common
use-case, and even if you did the cycles/byte for the hash are nothing
compared to other things.

For applications where hashing speed _does_ matter (e.g. some server needing
to batch-hash things) you have the option of buying hardware-accelerated
SHA-256 to get on the order of 30% faster than BLAKE:
[https://bench.cr.yp.to/results-hash.html](https://bench.cr.yp.to/results-
hash.html)

Then as you note downthread your criteria of "at least as secure" only takes
into account "known attacks". The absurd logical conclusion of that criteria
taken at face value is that we'd all be better off if we each used our own
bespoke hash function, since cryptanalysis would never be able to keep up.

Or, in other words, if the algorithm that became SHA-1 hadn't been picked by
NIST in 1995 it would be a viable 160-bit hash function today, since there
would likely be no known attacks against it, as it would have been obscure
enough that nobody would have bothered with it.

So the criteria for a "secure" hash must consider some balance of its
algorithm, as well as (or more importantly) the cumulative amount of attention
cryptographers have spent on it.

~~~
CiPHPerCoder
> The absurd logical conclusion of that criteria taken at face value is that
> we'd all be better off if we each used our own bespoke hash function, since
> cryptanalysis would never be able to keep up.

The reason this "logical conclusion" sounds so absurd is that your reasoning
here is absurd.

What matters isn't "cryptanalysts haven't published an attack", what matters
is "cryptanalysts _tried and failed_ to attack the design".

If you conflate the two, you will end up confused.

~~~
avar
A reductio ad absurdum is always going to be absurd if taken at face value,
but that doesn't mean the underlying point it's making isn't valid.

Obviously I'm not saying that BLAKE hasn't gotten any analysis. It was a SHA-3
finalist, it's been looked at by a lot of smart people.

I am saying it's a continuum, and it's safe to assume SHA-2 has gotten a lot
more eyeballs & man hours by now. Both due to its age, and due to its
widespread production use as a NIST standard.

Is that a trite point? Yes, but one you managed to omit in your long summary
of a Twitter exchange with a SHA-1 researcher, which is why I'm pointing it
out. You're seemingly treating "analyzed" as a boolean property.

~~~
CiPHPerCoder
> Is that a trite point? Yes

The next question I would ask myself in this context is, "Should I have even
raised the trite point in the first place?"

~~~
avar
It's trite because it should go without saying, but one you managed to
selectively omit when summarizing a Twitter discussion. That makes you seem
disingenuous.

It's clear from anyone who reads that Twitter discussion that the thrust of
Marc's point is not that we should be "reinforcing public trust in standards".
To summarize the discussion in those terms amounts to attacking a strawman.

~~~
CiPHPerCoder
> the thrust of Marc's point is not that we should be "reinforcing public
> trust in standards"

It isn't?

[https://twitter.com/realhashbreaker/status/11283305376455639...](https://twitter.com/realhashbreaker/status/1128330537645563906)

[https://twitter.com/realhashbreaker/status/11283322027758632...](https://twitter.com/realhashbreaker/status/1128332202775863296)

[https://twitter.com/realhashbreaker/status/11283784887324385...](https://twitter.com/realhashbreaker/status/1128378488732438528)

"Marc Stevens wants to reinforce the public trust in standards, especially
among non-experts" isn't an uncharitable summary of this conversation.

> That makes you seem disingenuous.

If I'm mistaken about the point he's defending, it's not an act of dishonesty.

Given that he at multiple points agreed with my arguments that _standards
bodies make dumb mistakes_ and yet continued to double down on "we should just
follow standards anyway" without caveat, there aren't many alternative
interpretations that I'm aware of.

If I still seem disingenuous, it might be that you _want_ me to seem that way.
In which case, there's no point in either of us continuing to participate
because it has ceased to be about security and instead has become a discussion
of ego, which I'm frankly uninterested in.

~~~
realhashbreaker
Actually, I did add caveats and do recognize that not all standards are
equally important. In the tweets you link I'm talking about secure standards,
as in standards that are actually considered secure.

But my point was not "just follow standards anyway". This tweet clearly shows
that:
[https://twitter.com/realhashbreaker/status/11283830292552581...](https://twitter.com/realhashbreaker/status/1128383029255258112?s=09)

Rather that it is better to promote to work through standards as a community
as opposed to your way of where individuals promote their pet primitives.
Standards serve a role as a focal point: the benefit of being big targets,
getting a lot of scrutiny and easier for non-experts to find out if somebody
found security issues.

~~~
CiPHPerCoder
Okay, then let's work together to get better standards than the ones currently
available.

