Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] SHA-1 collisions now cost $45k [pdf] (iacr.org)
128 points by AndrewDucker 41 days ago | hide | past | favorite | 62 comments

Big discussion of the same material from 4 months ago: https://news.ycombinator.com/item?id=21979333

Please don't editorialize titles. This one broke the site guidelines badly.

Cherry-picking the detail you think is most important from an article is editorializing. Threads are sensitive to initial conditions and the title is the dominant initial condition, so this is a big deal. Being the first (or luckiest) user to submit an article doesn't confer any special right to frame it for everyone else. HN readers should make up their own minds about what parts of an article are important.

If you want to say what you think is important about an article, that's fine, but do so in the comments. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

HTML summary of the same content, by the same authors: https://sha-mbles.github.io/

Nice URL

As a reaction, Openssh will deprecate SHA-1 support in a future release: http://www.openssh.com/txt/release-8.3

The test fails on my OpenWRT 19.07.03 router's Dropbear installation.

It also cost 2 months (FTA)... So the ever returning question is : how long does it take when NSA/military/government-levle funding is applied ?

It would scale linearly right? Double the amount of computing thrown at it, you'd half the time on average. Or am I mistaken?

And of course double the compute for half the time = same cost (more or less). I would imagine the NSA has sunk enough into hardware to do this fairly cheaply per-pass, and very quickly (hours not weeks).

Cloud computing makes this true for everybody else too now, at least within a practical range. Now I can easily afford a multi-million dollar distributed compute facility... for a few hours, rented from Amazon.

I'd be surprised if the NSA has a clear purpose for bulk colliding SHA-1. It's a pretty niche thing to want to do even compared to say, "cracking" DES. For MD5 we know such government agencies made some collisions to exploit various technologies that didn't stop trusting MD5 in a timely fashion, but it wasn't something they did a lot just one collision here or there as necessary. e.g. https://en.wikipedia.org/wiki/Flame_(malware)

This isn't actually true in practice though. Most of the cloud providers have quota's on accounts and actually won't let you provision that many resources without getting the quota's increased, which you are unlikely to be able to do unless you're actually regularly spending that much money.

I do think it's amazing that we can now script out a million dollar data center, build up, tear down for just an hour of use (and just a fraction of the cost). And that's available to ordinary people (with a little budget obviously). Compare what we had 20 years ago. Mind blown.

In theory [0], this problem is fully parallelizable, so you would be correct. Each mutation and hash calculation is fully independent so it can scale forever. On a national lab sized cluster, you could probably generate a collision in minutes.

[0] Have not read the paper thoroughly enough to determine if this is true for this technique

Depending on the algorithm, you might be able to skip from GPGPU straight to ASIC.

At this scale - I’m not convinced that $5,000 worth of ASICs would have the same performance as $5,000 worth of GeForce cards - especially after factoring in lead-time to design the ASIC.

As for FPGAs - forgive my ignorance - but can they even handle that kind of load? In my head I just see them breaking-down under the sheer thermal load.

> As for FPGAs - forgive my ignorance - but can they even handle that kind of load? In my head I just see them breaking-down under the sheer thermal load.

They worked for Bitcoin before miners switched to ASICS; I imagine they'd be fine in this application as well.

Build an asic, at a few megabucks it will beat an hpc cluster many times the cost even at very old nodes

One of the best things to come out of protocol labs is https://multiformats.io/

Really simple mechanisms for things like identifying the hash algorithm and gives you a programmatic way of supporting new hash algorithms without breaking or changing anything that depends on the old.

The PHC string format https://github.com/P-H-C/phc-string-format/blob/master/phc-s... (used by stuff like Rust's LibreAuth/BoringAuth) is a less fancy but probably better-suited-to-passwords solution, because it can indicate salt values and additional parameters.

The first example [1] doesn't have the hash-func-type prefix

[1] https://multiformats.io/multihash/#the-multihash-format

Edit: someone already detected this: https://github.com/multiformats/website/pull/62

I do not think that this would work for the hashes that are used in signatures sadly. In addition I do not see the point for the digest-length parameter.

man 3 crypt | fgrep '$'

It's been done before. ;)

Not knowing much about binary executable formats, what does this mean for binary executables or libraries? How easy is it to insert e.g. a remote shell or keystroke logger into an executable or library using only a prefix? Or would that require arbitrary in-place edits to a file?

If you control the executable then it's easy. If you can only append then it's harder.

Is git fixed yet?

If you wonder whether Git supports non-sha1 hashes yet, then it's not ready yet. But it is "fixed" in that it includes code to detect sha1 collisions:


Not a crypto expert, how easy/hard it is with this (or other techniques) at the moment to generate a random file which matches a given SHA1 hash? Can have totally random bits lets say.

What's been done here is a chosen-prefix collision attack [1] where the attacker can produce two files that have the same hash. What you're asking about is a preimage attack [2] where one of the files is already created and the attacker can't influence it.

The practical attack enabled here is mostly around digital signatures. An attacker could produce documents A and B that both have the same SHA-1. They can then get someone to sign document A (which really is signing the SHA-1 of document A), then use the signature with document B and make it look like they have a document B signed with a valid signature.

As an example, if document A is a regular SSL certificate request, and document B is a "CA certificate", the attacker can trick a real CA into signing a rogue CA into existence, which can then sign its own certificates that will be trusted by every browser. This has already happened with MD5 in 2008 [3].

[1] https://en.wikipedia.org/wiki/Collision_attack

[2] https://en.wikipedia.org/wiki/Preimage_attack

[3] http://www.phreedom.org/research/rogue-ca/

It should be noted that certain modern signature schemes (such as ed25519) are fine with just preimage resistance.

Even MD5 is still strong against that sort of attack

This is not a preimage attack.

Remember, collisions can be reused[1] to create lots of pairs of files with the matching hashes.

[1] https://alf.nu/SHA1

How safe is SHA-256 now? Or Is SHA-512 needed in the near future?

Well, there's already a SHA-512.

But none of the SHA family of hashes have ever been recommended for passwords, not because they are weak, but because they are too fast.

For other purposes, the logical successor to SHA-256/512 is SHA-3:


But this is far from the only choice. Hashing algorithms are trendy right now, and there's plenty to choose from.

I am not sure why you mentioned passwords. Collision attacks do not affect the use of hash functions for password hashing.

For collision avoidance, not for cryptographic hashing.

SHA-256 is still secure for plenty of applications, but for awhile it's been regarded as a suboptimal choice for password hashing. SHA512 is probably overkill.

"for a while" suggests that this was ever a recommended choice, which is not the case. As others have pointed out, for password hashing you want a specialized algorithm, not a general purpose cryptographic hash function. This is true regardless of whether the hash function is "compromised;" it just wasn't designed for that application in the first place.

Algorithms designed for password hashing are intentionally both compute and memory intensive, to make guessing slower, whereas general cryptographic hash functions are by contrast intended to be fast, as most applications will want that. The idea is that password hashing algorithms should be fast enough to keep up with a human and no faster; if you already know he password the performance is not a hindrance but if you have to guess it makes doing so impractical.

That's a fair point, I didn't want to come off like it was ever a great choice. I definitely would argue that people are more aware now than they used to be, so thanks for clarifying that.

Not only sub-optimal, but unsuitable. Use algorithms specifically designed for password hashing: argon2 / scrypt etc.

SHA256, SHA512 and Blake* algorithms are suitable for secure checksums and HMACs, but not password hashing

I disagree, I would argue that they are optimal for passwords with sufficient entropy (which includes passwords generated by a password manager)

> argon2

It uses BLAKE2 internally

If you have two solutions and the first solution requires humans behave in a certain manner (getting them to use high entropy passwords) and the second does not. The second is more secure.

I do not see how this is relevant to what I said.

The method you purpose is less secure as humans often use low entropy passwords even when you ask them not to. If you are building a system only for humans that use high entropy passwords (are you really willing to bet the farm on that just to save a couple clock cycles) or other machines it might work but I also see no benefit to that approach so you might as well just bcrypt it and call it a day anyway.

I proposed no method. I simply made the statement that typical cryptographic hash functions are optimal (and better than the alternatives) for high entropy passwords. I said nothing regarding low-entropy passwords.

> I also see no benefit to that approach

- less primitives

- faster

- less memory usage

- no concern regarding cycles

Secure checksums should have high performance. Password hashing should have low performance (ie high cost).

Please justify why hashing a high entropy password should have a high cost. I can't see any benefit arising from this. If anything you lose entropy if you use something like pbkdf due to cycles.

Because it slows down anyone looking to crack/reverse the passwords? The only thing protecting your high entropy password is the cost of the hash. If you could run infinite attempts in 2 seconds then even your high entropy password would fail.

Anyways, most people don't use high entropy passwords, so there's little point in arguing against this IMHO.

> Because it slows down anyone looking to crack/reverse the passwords?

Good luck brute-forcing through 2^256 passwords. The speed of the hash function should not matter.

If you still want a slow hash function though then just use more rounds.

> The only thing protecting your high entropy password is the cost of the hash

No, not really. It is the fact that the password is high entropy, combined with the preimage resistance of the hash.

> If you could run infinite attempts in 2 seconds then even your high entropy password would fail.

So would your pkdf.

Safe enough, but why not use a modern hash function instead?

What are some of the modern hashing alternative for uniqueness? Mainly for speed, bit distribution, and collision risk while having small hash size. Not for cryptographic purpose.

Depends on what you want to do. For hash tables and macs there is siphash. There is also Chaskey (although this is optimised for embedded systems I think).

Also in the super fast, but not designed to be cryptographically secure category:

xxhash (https://github.com/Cyan4973/xxHash) with 32/64 bit output. The latest version, xxh3, supports up to 128 bit output.

meow hash (https://github.com/cmuratori/meow_hash)

The recently released Blake3 which is designed to be cryptographically secure is very fast also (https://github.com/BLAKE3-team/BLAKE3)

I understand not wanting to use SHA-1 now for security reasons, but is it still an OK practice to use it as a general hashing function for a uuid/data checksum?

> is it still an OK practice to use it as a general hashing function for a uuid/data checksum?

No. If you don't care about collision resistance, use MD5. It's faster, it's smaller, and it makes it obvious to everyone than your software isn't supposed to rely on collision resistance.

No. MD5 is a cryptographic hash function. For the purposes stated one uses a non-cryptographic hash function, such as seahash. The difference is the latter is much faster but does not provide protection against an intentional collision.

1: MD5 still provides preimage resistance (both first and second), which is sometimes useful.

2, and my real objection:

  $ md5sum /dev/null
  d41d8cd98f00b204e9800998ecf8427e  /dev/null
  $ seahashsum /dev/null
  <stdin>:2:0: seahashsum: command not found
That said, my main point was don't use SHA-1, because if you actually need a half-broken hash function for something, MD5 has all the same properties (good and bad) for cheaper.

The real question is why you would want to. You have a lot of other options! For example, on 576 byte messages on a core i7, eBASH reports the following performance characteristics:

- Blake2b (~20% faster)

- SHA-384/192 and SHA-512/256 (~50% slower)

- SHA-256 (~100% slower)

- SHA3-224 and SHA3-256 (~150% slower)

So if speed is absolutely important to you, like you are hashing millions of messages a minute and you have profiled and the speed of the hash function is absolutely the most important thing, then link the Blake2 or more recently Blake3 libraries and get the extra speed AND you don't have to deal with all of the security vulnerabilities.

Or, if speed is modestly important to you but you need to use primitives that are available on every single computer you will ever encounter, use SHA-512 and then truncate it to 32 bytes. Or if you really need that 160-bit level truncate to 20 bytes. (Truncation is usually a good thing with hash functions, defending against something called length-extension attacks.)

Or if you want to be substantially safer than all of these options and you are not doing a lot of hashing, use SHA-3. Also the performance of the others (that are not BLAKE) is generally somewhat artificially enhanced by dedicated processor instructions which will almost surely also happen for BLAKE (as it is based on the ChaCha cipher which is reasonably well used) and SHA-3 (as it is the new US government standard). I can't off the top of my head speak to the CoffeeLake Core i7 architecture without digging up some research about what instructions it implements, but its SHA-1 is 25% faster than its MD5 which suggests some dedicated SHA-1 instructions, at least.

Are you protecting against random accidental data corruption? Or malicious attackers?

Everyone is protecting against accidental data corruption, until someone figures out how to turn it into an attack.

Even when you ask people "is there any universe in which this could be used as an attack?" most people will reflexively say "no, of course not."

Most of us are wrong, which is why you have professional pen testers.

What could get the cost down even lower?

Cheaper/faster hardware probably

Increasing supply by making them more easy to compute?

Faster/cheaper hardware probably

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact