
The first chosen-prefix collision for SHA-1 - ynezz
https://sha-mbles.github.io/
======
nneonneo
So to be clear about what this is (because the website doesn’t quite clarify):
this collision lets you pick two different prefixes P1, P2, then calculates
some pseudorandom data C1, C2 such that SHA1(P1+C1) = SHA1(P2+C2). The length
extension property of SHA1 (and MD5) means that now SHA1(P1+C1+X) =
SHA1(P2+C2+X) for any X.

A similar attack (which requires only a few hours on modest hardware nowadays)
has been known for a long time for MD5, but this is the first time it’s been
demonstrated for SHA-1.

The previous attack, called Shattered
([https://shattered.io](https://shattered.io)) was a regular collision, that
is, they chose a single prefix P and found different C1, C2 such that
SHA1(P+C1) = SHA1(P+C2). This can also be length extended, so that
SHA1(P+C1+X) = SHA1(P+C2+X). However, this attack is more limited because
there is little to no control over the pseudorandom C1 and C2 (the only
differing parts of the messages).

With a chosen prefix collision, though, things are way worse. Now you can
create two documents that are arbitrarily different, pad them to the same
length, and tack on some extra blocks to make them collide.

Luckily, the first collision should have already warned people to get off of
SHA1. It’s no longer safe to use for many applications. (Note, generally for
basic integrity operations it might be OK since there’s no preimage attack,
but I’d still be a bit wary myself).

~~~
dickjocke
Can you give a specific example of the danger here? I understand the principle
behind the attack (kinda).

I just don't understand what danger being able to pad two documents to make
them collide poses?

edit: My guess is that it can be abused to make something that I believe to be
library X actually be library Y when I download it from the internet. Lets say
I want to download something, and I check the signature provided. Assuming the
attacker is able to send me the wrong library via a MITM attack, how can this
prefix collision work? It seems that the original library AND the original
signature on the library's website have not been altered, so their efforts to
use this and make them match are impossible. And it seems like if they can
alter the signature on the website and stuff, then all bets are off--why not
just send the malicious library at that point?

~~~
nneonneo
Suppose your system uses SHA-1 hashes for codesigning verification (e.g. to
load a system driver). I create an innocent-looking device driver and convince
a signing authority to sign it. However, secretly I've created a malicious
driver (e.g. a rootkit) which collides with my innocent one. Now, I can load
the malicious one on your machine - which the signing authority has never seen
- using the signing certificate of the legitimate one.

This might sound far-fetched; after all, you'd need to convince a signing
authority to sign the code. But this is pretty much exactly how Apple's
Gatekeeper verification works: your software is submitted to them, and they do
some security checks and notarize your bundle
([https://developer.apple.com/developer-
id/](https://developer.apple.com/developer-id/)), and I'm sure there's many
more such examples out there.

~~~
dickjocke
Hey OP,

im not much of a math guy, I'm getting hung up on this part:

SHA1(P1+C1+X) = SHA1(P2+C2+X) for any X.

The example above seems like SHA1(GOOD_DRIVER) == SHA1(BAD_DRIVER+C2+X)
somehow.

How does the C1 and X get appended to the signature of the good driver.

~~~
nneonneo
Most executable file formats (including drivers) put the code first followed
by the data. So you could construct your drivers thusly:

GOOD_DRIVER = P1 (good code and some data) + C1 (data) + X (more data)

BAD_DRIVER = P2 (bad code and some data) + C2 (data) + X (more data)

You'd disguise the random-looking block of C1 data in the middle of the good
driver as e.g. a cryptographic key to avoid suspicion. The "more data" part
couldn't be modified in the bad driver, but since you can arbitrarily modify
P2 this wouldn't be a severe restriction.

~~~
dickjocke
thank you OP and everyone. I have that shaky initial understanding but makes
much more sense.

------
bjornsing
> We note that classical collisions and chosen-prefix collisions do not
> threaten all usages of SHA-1. In particular, HMAC-SHA-1 seems relatively
> safe, and preimage resistance (aka ability to invert the hash function) of
> SHA-1 remains unbroken as of today.

Nice to see this bit of intellectual honesty. Would be even nicer if they had
explained what that means in terms of PGP keys.

~~~
_notreallyme_
It means if someone you want to impersonate uses the Web Of Trust, i.e. their
key is signed by other people whose keys have been signed the same way, you
can generate a GPG key for which all of these signatures are still valid.

For example, if an attacker gains access to a victim email account, they could
send to their contacts a "trusted" key (as explained above) and then use it to
send signed documents to the victim's contacts.

This would defeat an adversary "paranoid" enough to check a key signature, but
not paranoid enough to obtain a clear explaination/confirmation of why the key
changed...

~~~
bjornsing
> It means if someone you want to impersonate uses the Web Of Trust, i.e.
> their key is signed by other people whose keys have been signed the same
> way, you can generate a GPG key for which all of these signatures are still
> valid.

No...

> For example, if an attacker gains access to a victim email account, they
> could send to their contacts a "trusted" key (as explained above) and then
> use it to send signed documents to the victim's contacts.

Ok... But in this scenario the attacker has the victim’s new private key, so
they don’t need to create a collision (using OP). They can just use the new
private key to sign the documents. Right?

~~~
_notreallyme_
> No...

Why ?

> in this scenario the attacker has the victim’s new private key

You don't want to keep your private key in cleartext on your email provider
servers, do you ?

~~~
makomk
This allows you to take two messages and append some data to both of them
which causes the modified versions to have the same SHA-1 hash - but you need
to modify both messages, and in order to use this in an attack you need to set
up a scenario where the SHA-1 hash of one of your modified messages is trusted
for some purpose. Creating a message with the same hash as another, existing
message requires a second-preimage attack which is much harder and not
feasible for any cryptographic hash that's currently in use.

------
tambourine_man
This kind of thing always brings me down a bit. It's not rational, but it
does.

I mean I truly admire these folks skills, the math involved is obviously
remarkable.

But I think the feeling is related to not being able to rely on anything in
our field. Hard to justify going to the trouble of encrypting your backup. 10
years from now, it might be as good as plain text.

It's not security only, nothing seems to work in the long term. Imagine an
engineer receiving a call at midnight about his bridge because gravity changed
during daylight saving in a leap year. That's our field.

~~~
WorldMaker
There's an MC Frontalot song called "Secrets from the Future" and the refrain
is "You can't hide secrets from the future." It's something of a useful mantra
to remind oneself that if "the future" is a part of your threat model, yes
your encryption likely isn't enough because on a long enough timescale it is
likely "the future" will crack it.

As with any other security issue, the question is "what is your threat model?"
You can still justify encrypting your backup today if your threat model
includes today's actors, however much you worry about "the future".

> 10 years from now, it might be as good as plain text.

Or 10 years from now it might be the next Linear A tablets to confuse
cryptoarcheologists, unreadable and untranslatable and entirely foreign. If
"the future" is in your threat model, don't forget the other fun forms of
entropy beyond encryption being cracked such as encodings changing, file
formats falling out of service/compatibility, "common knowledge" about slang
or memes lost to the ages leaving things indecipherable, and so on and so
forth. Most of those things are probably unlikely on only a 10 year time
horizon, but you never can tell with "the future".

~~~
CobrastanJorji
The point about archeologists is a good one because it speaks to motive. In
general, we should be very supportive of the efforts of distant historians who
want to understand what humanity used to be like. We should not WANT to hide
secrets from a sufficiently far future. I can't think of any secret that
deserves to be hidden from them for any reason besides perhaps modesty.

~~~
WorldMaker
Relatedly, that is a part of why I love the term "cryptoarcheology" in
general, as a reminder that digital spaces will need archeologists too.

There's it's somewhat shortened form "cryptarch", generally more used as a
title ("cryptoarcheologist"), which was used in places in the Quantum Thief
trilogy of books and is most probably already burned into your brain if you
have played any of the Destiny series of videogames (and I presume was heavily
influenced by Quantum Thief).

~~~
moyix
Hmm, I thought cryptarch was just crypt + arch with the arch part meaning
“leader” (i.e. this sense
[https://www.etymonline.com/word/arch-](https://www.etymonline.com/word/arch-)),
not archaeologist. Is there something about this in the Quantum Thief trilogy
I’ve forgotten?

~~~
WorldMaker
It's been a bit since I read it, but I recall the meaning overlap/double
meaning between leader -arch (such as plutarch) and arch- from archaeo- was
directly intentional word play in the book trilogy, and yes that leader -arch
meaning does add spice to the neologism.

(I don't think Destiny does much with the playful dual meaning, though.
Certainly the cryptarchs in Destiny have never yet been meaningful leaders.)

------
newscracker
General questions:

(edit: these are indeed general questions, not just about SHA1)

Has anyone else been worried about data deduplication done by storage and/or
backup systems, considering that they usually use hashes to detect data blocks
that are "the same" (without additional metadata) and avoid storing those
"duplicate data blocks" again? Doesn't this seem far worse when you also
consider that systems like Dropbox deduplicate data across all their users
(expanding the footprint for collisions)? Are there any research
papers/articles/investigations about this?

~~~
MichaelMoser123
are there any systems that do sha-1 for dedup? I am only aware of sha-256.

~~~
munificent
Git?

~~~
MichaelMoser123
git computes the sha1 checksum over the header that includes the length + the
file data; also checks the length in the header as well as the the hash; Linus
says that it would be less practical to find a collision that has the same
length as the original data; I guess at some stage even that might become
possible.

------
jlokier
Just a curiosity, since people are talking about Git still using SHA-1
(despite work on SHA-256 since 2017).

I see that Git doesn't actually use SHA-1 any more, it uses "hardened SHA-1":
[https://stackoverflow.com/questions/10434326/hash-
collision-...](https://stackoverflow.com/questions/10434326/hash-collision-in-
git/43355918#43355918)

~~~
FactolSarin
How would an attack on a git repo work? You create a repo with identical
hashes but different content and next time the user clones from scratch they
get your modified version?

~~~
flatiron
yeah my thoughts about git are similar. look at the two messages they have an
an example:

Key is part of a collision! It's a trap!yE'NsbK#ދW]{1gKmCx's/vr|
-pJO_,1$)uB1qXv#U)9ESU;p~0G:Y
ݕbBIjFra눰3&t'lB_!h5M([,˴QMK#|o5pv|i,+yYpݍD7_Rf\'GUZ, _ϵdvAYAugV=Lk8_E_ 2
+nolBtxXoQt&+?Y3LP:'Qt(,ۛuԪWJm:A"M6<|B4kVv̨ޠA=M+m%殺j5N|EMA\Ed-
s&@u@:a?pq^Xf0U?R}

and

Practical SHA-1 chosen-prefix collision!'lka}vbI3,·W]Ǟ+gK}Cxs/v&r| }-hRJO_
rO̳;bzC ,1&uRP-MXrU3aO;pr0:sY'2 l&r7#(A{oNyCJ_W,8
əbحBYީpFr2a8#&t+n_15q(_,ˤQMW#hzYMgVV=L,kO0E*N +oc@BpXoᯖd&?+?[{3LвP&'U t (
WJÏm\:A"6>>|SB(k;Vv̨ޠ^A=Y ;om%j-|cUAAۜEТ&@o@:La3psH^eXf0QJm ݶd

they have the same sha1sum, but in all practicality its nonsense since both
messages are pure trash. you couldn't have malicious C code that would have
the same hash as non malicious C code in this example

~~~
saalweachter
Isn't that like incredibly simple?

Dump your garbage string behind a // or inside an #if 0, restrict the garbage
string character set to characters which will not disturb that, and your
compiler will whistle while it works.

~~~
flatiron
anyone checking diffs would notice that, or working on the file, etc. it
wouldn't survive long

~~~
munk-a
I think active projects would detect this fine - but what if that commit was
pushed to lpad and everyone ended up pulling it to local because it's a
dependency of a dependency of a dependency in NPM?

Or what if it's a really obscure library for parsing like... pyramidal
jpeg2000s, are the library consumers going to be checking the source? Heck,
most people already don't check download checksums unless their downloader
does it automatically.

------
mehrdadn
> SHA-1 has been broken for 15 years, so there is no good reason to use this
> hash function in modern security software.

Why are cryptographers always exaggerating things and so out of touch with
reality? The first actual collision was like 3 years ago. It's not like the
world has been on fire in the meantime, and it's not like SHA-1 is broken for
every single possible usage even now. And why the nonsense with "no good
reason"? Obviously performance is one significant consideration for the
unbroken use cases. Do they think painting a different reality than the one we
live in somehow makes their case more compelling?

~~~
shilch
Like computer scientist, they think binary: Either it's secure, or it's not.
In reality there's a spectrum where you also have "good enough".

~~~
mbowcutt
"good enough" relies on a threat model. Cryptography researchers work in the
abstract - without a threat model you must consider cases where your attacker
has unlimited resources.

It's good enough for you and me, but research isn't meant to be practical, imo

~~~
Ar-Curunir
What. The first thing any security paper defines is the assumed threat model.
People design all kinds of schemes for different threat models.

The point with assuming conservative threat models for key primitives like
hash functions is that the threat model can change rapidly even within the
same application, and attackers only get stronger. So you err on the side of
caution, and don't rely on luck to keep safe.

------
whatshisface
> _A countermeasure has been implemented in commit edc36f5, included in GnuPG
> version 2.2.18 (released on the 25th of November 2019): SHA-1-based identity
> signatures created after 2019-01-19 are now considered invalid._

Since SHA-1 was always possible to break, and since NSA probably gets access
to big computers and sophisticated techniques before researchers, why doesn't
this invalidate every SHA-1 signature ever made and not just ones from last
year?

~~~
Boulth
Actually it's even worse than that: signature creation time is added by the
signer so it's totally under control of the attacker. IMHO all SHA-1 based
signatures should be ignored.

~~~
NieDzejkob
In this case, signature creation time isn't under control of the attacker. The
attack scenario being considered is that Mallory can convince Alice to sign a
key K1, provided by Mallory, such that it looks like Alice signed K2. The
party creating the signature is honest here.

~~~
tialaramex
The problem is that the dates are part of the document being signed.

Rather than keys, Alice will typically sign a document (e.g. an X.509 to-be-
signed certificate)

Mallory creates two documents, the legitimate seeming document A (a to-be-
signed certificate Alice willingly signs) and document B (the content of which
is controlled by Mallory to an extent depending on the details of the
collision). In document A I'm sure Alice will insist on the date being roughly
correct so you'd detect that. But Alice never sees document B, she isn't aware
it exists, so it can specify any date, including one chosen not to set off
alarms.

For the Web PKI we were triply safe because:

1\. We told Alice (the public CAs) never to sign anything at all with the
dangerous algorithm after a set date. So long as Mallory wasn't able to
develop and use a collision before that date and Alice did as she was told‡
this would be safe in perpetuity.

2\. We already had a countermeasure in the documents, very early in each Web
PKI X.509 certificate is the Serial Number, if you look at yours you'll notice
it's a crazy huge number and seemingly not "serial" in any sense. It's random.
Can't do a chosen prefix collision attack if you can't choose the prefix.

3\. Since no more new documents were being signed clients in the Web PKI were
able to stop recognising these signatures thus permanently ensuring the attack
was impossible within about 18 months.

‡ A very small number of exceptions were explicitly granted, and a similarly
small number of exceptional cases occurred for which no permission was asked.
All investigated to everybody's satisfaction. As you may see if you poke
around in the demo documents from this article, a collision document may not
jump out as problematic from a crowd but it certainly isn't so innocuous as to
survive careful scrutiny, and with such a small number of exceptions to look
at this scrutiny was possible in a way it never would be for the wider Web
PKI.

------
ebg13
Quick question about the "What should I do" section. It says " _use instead
SHA-256_ ". Isn't SHA-512 both better and faster on modern hardware?

~~~
CiPHPerCoder
SHA-256 and SHA-512 are both in the same family (SHA-2).

Latacora says to use SHA-2. If you can get away with it, SHA-512/256 instead
of SHA-256. But they're all SHA-2 family hash functions.

[https://latacora.micro.blog/2018/04/03/cryptographic-
right-a...](https://latacora.micro.blog/2018/04/03/cryptographic-right-
answers.html#hashing-algorithm)

No need to bikeshed this. But if you must: SHA-512/256 > SHA-384 > SHA-512 =
SHA-256

If you're wondering, "Why is SHA-384 better than SHA-512 and SHA-256?" the
answer is the same reason why SHA-512/256 is the most preferred option:
[https://blog.skullsecurity.org/2012/everything-you-need-
to-k...](https://blog.skullsecurity.org/2012/everything-you-need-to-know-
about-hash-length-extension-attacks)

Additionally, the Intel SHA extensions target SHA1 and SHA-256 (but not
SHA-512), which makes SHA-256 faster than SHA-512 on newer processors.

Isn't crypto fun?

~~~
colanderman
I'm super confused. Are SHA-256 and SHA256 _different_ , and if so, why in the
world would this be considered a sane naming scheme?

If not, I completely do not understand the inequation you wrote, which
seemingly lists SHA-256 (and -512) multiple times.

~~~
timdumol
You're probably confused by "SHA-512/256", which does not mean SHA-512 or 256,
but rather a truncated version of SHA-512:
[https://en.wikipedia.org/wiki/SHA-2](https://en.wikipedia.org/wiki/SHA-2) in
the third paragraph.

~~~
Ajedi32
So why would a truncated version of SHA-512 be better than SHA-512? And why is
SHA-512 = SHA-256?

~~~
CiPHPerCoder
Truncated hash functions are not vulnerable to length-extension attacks.

Length-extension attacks are relevant when you design a MAC by passing a
secret and then a message to a hash function, where only the message is known.

Truncating the hash (which is what SHA-512/256 and SHA-384 do to SHA-512)
removes the ability to grab an existing hash H(k || m) (where k is unknown and
m might be known) and append junk because a truncated hash does not contain
sufficient information to recover the full state of the hash function in order
to append new blocks.

~~~
p1mrx
Why do SHA-512/160 and SHA-512/128 not exist? They could be useful as drop-in
replacements for SHA1 and MD5.

~~~
SAI_Peregrinus
Because 224 bits is considered the minimum safe output length for a general
purpose hash function. So they'd be drop-in replacements but still wouldn't be
safe. Safer than MD5/SHA1, but not actually safe.

So rather than push off getting people to make things actually safe by
providing a footgun NIST just didn't do that.

~~~
p1mrx
> 224 bits is considered the minimum safe output length for a general purpose
> hash function.

Considered by whom?

~~~
CiPHPerCoder
Truncating a hash function to 224 bits put it at the 112-bit security level,
which is roughly equivalent to 2048-bit RSA under today's understanding of the
costs of distributed cracking attacks.

There are a lot of standards organizations all over the world with various
recommendations. [https://www.keylength.com](https://www.keylength.com)
collates quite a few of them. Pick the one most closely relevant for your
jurisdiction.

Most of them recommend 2048-bit RSA as their minimum for asymmetric security,
and AES-128 / SHA-256 as their minimum for symmetric security. This is a [112,
128]-bit security lower bound.

Truncating a hash to 160 bits yields 80-bit security, which is insufficient.
128 bits (64-bit security) is out of the question.

~~~
p1mrx
"Cryptographic hash functions with output size of n bits usually have a
collision resistance security level n/2 and preimage resistance level n."

Depending on what you're doing, "SHA-512/128" could have a 128-bit security
level. But I guess it's safer to assume n/2 when making a general
recommendation.

------
kibwen
Out of curiosity, can anyone explain in layman's terms the differences in
design that make SHA-1's successors immune to the known attacks against SHA-1?
Ultimately was this the result of an apparent flaw in SHA-1 that only became
obvious in retrospect, or was it something totally unforeseeable?

~~~
Zaak
SHA-2 is based on similar techniques to those in SHA-1, which prompted the
SHA-3 competition when weaknesses in SHA-1 were first discovered (as they
could conceivably have been present in SHA-2 as well). As it turns out, SHA-2
appears to be resistant to the attacks found thus far.

SHA-3 (originally named Keccak) is built on an entirely different foundation
(called a sponge function), so it is unlikely that any attack against SHA-1
will be relevant to SHA-3. However, sponge functions are a relatively new
idea, and weaknesses in the basic principles could conceivably be found in the
future, as could weaknesses in the Keccak algorithm specifically.

------
0x0
Q: Does this make it even more urgent for git to move to a different hash?

~~~
EGreg
It may, because now an attacker can replace code with arbitrary other valid
code as long as developers are willing to ignore the long weird random comment
at the end ;-)

I’m gonna say many developers will not care but and many compilers will not
care either.

So yeah, Linus’ main deterrent reason (code won’t compile) doesn’t apply
anymore.

 _HOWEVER!_

1\. A chosen-prefix attack still needs to compute TWO suffixes m1 and m2 so
that _h(a1+m1) = h(a2+m2)_. This does NOT mean that given _a1_ and _a2_ you
can find a single _m2_ so that _h(a1) = h(a2+m2)_. So that ONLY THE ORIGINAL
AUTHOR OF THE COMMIT could spoof their own commit, by preparing in advance and
attaching a long and weird comment in the end. And you could build tools to
watch out for such commits in the first place

2\. If git had used HMAC based on SHA1 then it would have been fine, even
after this attack has become feasible.

3\. Furthermore, it is likely still kinda fine because Merkle Trees have nodes
referencing previous nodes. You’d have to spoof every historical node as well,
to push malicious code. BitTorrent also requires computers to supply an entire
merkle branch when serving file chunks.

Maybe someone can elaborate on this.

~~~
rkangel
If you look in this 2017
([https://marc.info/?l=git&m=148787047422954](https://marc.info/?l=git&m=148787047422954))
email from Linux, he discusses how git also encodes length. That would mean
that you need a collision of the same length _and_ the right functionality, so
you can't just append data.

~~~
toyg
Now that you can arbitrarily produce collisions, the second step is easy
enough for a skilled and well-funded attacker.

------
edwintorok
> security level 2 (defined as 112-bit security) in the latest release (Debian
> Buster); this already prevents dangerous usage of SHA-1

FWIW this doesn't apply to Fedora currently, because it has a patch that re-
enables SHA-1 in security level 2 in non-FIPS mode:
[https://src.fedoraproject.org/rpms/openssl/blob/master/f/ope...](https://src.fedoraproject.org/rpms/openssl/blob/master/f/openssl-1.1.1-seclevel.patch)

------
jVinc
So how would someone go about gaining more than 45k USD in profit from a
single case of using the chosen-prefix collision? Not being candid here, I am
honestly curious here. I'd guess that even in situations where you somehow get
a signed e-mail sent off spoofing a CEO saying "Please pay these guys 50k$"
the actual payout seems unlikely and that puts the attacker 45k in the red.
But maybe there are some obvious avenues of abuse that I'm missing, or is this
more a case of "In a decade it will become economical to abuse this for
profit"?

~~~
racingmars
I'm familiar with an organization that lost about $750,000 because someone
spoofed an email from the CEO to the CFO asking to wire money to an account.
The CFO fell for it. AFAIK, the money was never recovered (nor was the CFO
fired... it was all just chalked up to 'the cost of doing business').

That was with NO crypto/signature spoofing involved... if the CFO has now been
trained to not act on large dollar amount requests from the CEO without at
least checking a digital signature... perhaps the CFO would be _more_ likely
to fall for it now since he has been "trained" that cryptographic signatures
are a sign of authenticity?

~~~
jVinc
$750,000 is a lot of money, but seeing as some people will act on emails
without signature, and that you would effectively have to invest that amount
up front in order to attempt this attack on 15 individuals, and then hope that
at least one falls for it just to make it even, I can't really see this being
a viable attack vector. Maybe if the cost goes down significantly to the
$100-$1000 range it might be something you would see in the wild.

------
eerrt
The full paper is
[https://eprint.iacr.org/2020/014.pdf](https://eprint.iacr.org/2020/014.pdf)
if anyone is interested

------
perl4ever
Let's say that you know that someone stores documents by SHA, and silently
overwrites collisions. Is there any way this would help to deceive them after
being forced to give them your data? It seems like once the data is out of
your control, you can't match an existing SHA, and if you created a pair of
documents that match SHAs, you can't predict which one will be overwritten.

------
jwilk
Link to the GnuPG commit:

[https://dev.gnupg.org/rGedc36f59fcfc](https://dev.gnupg.org/rGedc36f59fcfc)

------
LennyWhiteJr
The root certificate authority for my company's Active Directory is signed
using a sha1 hash. What are the practical implications of this chosen
collision?

How do I convince my IT department to update our CA to sha256?

~~~
tatersolid
The signatures on trusted root certs do not matter and are ignored; it’s the
public key you’re trusting.

Many public trusted CA certain are self-signed with SHA-1. These keys sign
using SHA-256

------
notlukesky
> Responsible Disclosure

We have tried to contact the authors of affected software before announcing
this attack, but due to limited resources, we could not notify everyone.

Is there a list of affected software out there?

~~~
adminss
"> <script>alert()</script>

------
RcouF1uZ4gsC
Does this affect Git? I believe it uses SHA-1 for commits. Is it possible to
use this attack to add malicious code to a git repository without changing the
hashes for the commits?

~~~
LennyWhiteJr
Potentially. GitHub at least already started collision detection after
Shattered was published.

[https://github.blog/2017-03-20-sha-1-collision-detection-
on-...](https://github.blog/2017-03-20-sha-1-collision-detection-on-github-
com/)

------
femto113
While a meaningful accomplishment, suggesting the algorithm is in a "shambles"
seems hyperbolic to me. For one thing there's a non-trivial practical leap
between formulating two colliding identities and forging an existing one, and
for another this was only modestly better than a pure brute force attack. If
anything I'm somewhat reassured by the idea that it still costs $40,000+ of
GPU time to pull something like this off while doing the same with MD5 is
feasible on a mobile phone.

------
nvartolomei
I assume there was a lot of work (read money) put in those collision attacks
rather than it being discovered by accident. I'm wondering who is sponsoring
this work and for what purpose? The argument about proving that an algorithm
is broken and working on better cryptography wouldn't suffice in this case, as
issues were shown before that. Here the purpose was to make the attack
cheaper?

~~~
tptacek
That is not how cryptographic research works, at all.

------
noja
Is a collision impossible with two hashes, each using a different algorithm?

~~~
femto113
Not impossible, but assuming there's not a mathematical flaw that affect both
algorithms the difficulty is roughly the product of the difficulty of finding
a collision in each. AFAIK no one has come up with a joint collision for
MD5+SHA1 despite collisions in each being practical for several years.

~~~
noja
So should we do that, as well as keep finding new algorithms?

~~~
femto113
Concatenating two hashes is an algorithm. Mathematically concatenating two 128
bit hashes is not any stronger than a single 256 bit hash (and is likely
weaker), but if it’s all you have (or all you can afford to compute) two weak
hashes is definitely much better than one.

~~~
noja
Why is concatenating two 128 bit hashes (each with a different algorithm) not
stronger than a single-algorithm 256 bit hash?

~~~
femto113
One reason is that it is theoretically possible to use memory instead of
computation to attack the combined hashes by pre-generating a large number of
collisions under one algorithm and then simply checking those using the other
one, which means you don't need to do both algorithms for every check. Can't
say for sure if that works out cheaper in terms of money but if the memory is
available it could definitely save a lot of time.

Another reason is that for any given hash its theoretical maximum strength
against any attack will be less than or equal to its bit length, but the
practical strength always trends lower over time as attacks are found, and
having two algorithms to attack increases the chances of finding flaws.

------
tinus_hn
Are there any crypto currencies that use SHA-1 for their proof of work?

------
silasdavis
Who paid for this?

~~~
jessant
Judging by the paper[1], I would say any or all of Inria, Nanyang
Technological University, and Temasek Laboratories.

[1]
[https://eprint.iacr.org/2020/014.pdf](https://eprint.iacr.org/2020/014.pdf)

------
umvi
Is "a Shambles" British or something? I've always heard it as "in Shambles"

~~~
cpach
AFAICT ”a shambles” is correct usage. See
[https://brians.wsu.edu/2016/05/24/in-shambles-a-
shambles/](https://brians.wsu.edu/2016/05/24/in-shambles-a-shambles/)

~~~
OGWhales
Neat, but don't know if I can switch because of how it rolls off the tongue.

~~~
umvi
It got easier for me once I learned that "shambles" is a noun that is a
synonym of "slaughterhouse". Once I learned that, I emphasized _shambles_
slightly differently as I rolled the phrase off my tongue.

"SHA-1 is a _slaughterhouse_ "

"SHA-1 is a _shambles_ "

~~~
OGWhales
Yeah I got that from the link, I had no idea before. However, I feel like if
it was "is a shamble" it would make more sense to me than "is a shambles".

------
emilfihlman
>Can I try it out for myself? Since our attack on SHA-1 has pratical
implications, in order to make sure proper countermeasures have been pushed we
will wait for some time before releasing source code that allows to generate
SHA-1 chosen-prefix collisions.

Sigh. Again with this idiocy. All instances where the adversary is capable of
launching this attack financially mean they also have the capability to write
the exploit themselves.

~~~
lm28469
> All instances where the adversary is capable of launching this attack
> financially mean they also have the capability to write the exploit
> themselves.

Iran will eventually create a nuclear bomb, why don't we gave them one now,
it's the same thing isn't it ?

------
rustybolt
> By renting a GPU cluster online, the entire chosen-prefix collision attack
> on SHA-1 costed us about 75k USD.

So they just decided to try their attack and spend two years worth of salary
on it?? That's crazy.

~~~
martpie
You can see their emails at the bottom: those are universities/research
institutes domains. You can be sure _they_ did not actually spend that
themselves.

~~~
rustybolt
I'm just amazed that they were willing to take the risk that there was a bug
in the code and they wouldn't find a collision.

~~~
basilgohar
It was a sound premise to investigate. EFF spent much more to demonstrate the
weakness of DES decades ago, and it highlighted the need for stronger crypto
and the fact that nation states would be more than able to break commonly-used
crypto at the time.

