Hacker News new | past | comments | ask | show | jobs | submit login
When Will We See Collisions for SHA-1? (schneier.com)
168 points by mikegerwitz on Oct 5, 2012 | hide | past | web | favorite | 51 comments

AFAIK GPUs are far more cost effective. I've just produced this estimate.

  2xAMD 7970 : 8000Mhash/s ~= 2^33
  Seconds per year ~= 2^24
  Total Hashes ~= 2^57

  1kW * $0.15/kWh * 1 year ~= $1315
  Rig cost (guess)  ~= $2000
  Total cost ~= $3315
Price for 2^60 hashes in a year: 2^60 / 2^57 * $3155 ~= $25000

If you assume a 6X discount in 2021, then it will be <5k. Still, it takes a year, or more money for machines. Corrections welcome. Trying to run sixteen 7970s for a year straight is probably a total crap shoot.

The hash rate is from http://www.codinghorror.com/blog/2012/04/speed-hashing.html .

[EDIT: I wouldn't be shocked if FPGAs could do it with 1/10th the power and heat, with the same variable cost (hardware & power) but with an additional fixed cost to build the devices; I'm basing this estimate on bitcoin mining boards like these: https://en.bitcoin.it/wiki/Mining_hardware_comparison ]

I think this is the correct way to look at. a CPU is vastly slower at performing rainboow attacks than a GPU. Many of the largest bitcoin rigs could probably find a SHA-1 collision in a matter of months.

Yeah, I haven't made any provision for the other calculations required by the attack, so maybe those take a lot of resources, maybe they don't.

The combined network hashrate of Bitcoin is currently right around 20,000Ghash/s: http://bitcoin.sipa.be/

Which is about 2,500X the processing power of an AMD 7970. Bitcoin is hashing SHA256, which is about as computationally expensive as SHA-1 (or maybe even cheaper, according to comments on the linked blog post).

Much of the Bitcoin computational power comes from large mining operations controlled by individuals. Their processing power is easily diverted elsewhere, if the profit motive is there...

At current BTC price (~$12/BTC), mining generates 50 BTC every 10 minutes which translates into $1 value per second, or $1 for 20,000 GHash.

Worth mentioning here that practical collision attacks do not automatically jeopardize all security protocols that depend on the hash. For instance, MD5 is totally broken, but there are no known practical attacks against HMAC-MD5. I hope it goes without saying that this stuff generally does not apply to password hash constructs either.

I need a "digital public notary" service, where I send some hash to the notary, they sign it and give me back a signed copy (with the notary's system or public key of choice, e.g. 4096 RSA or 256 bit ECDSA or whatever)

If I want it to be considered reasonable valid proof that I did, indeed, have the data leading to the hash when the notary signed it, and I want stuff I sign today to be acceptable 10 years from now, would you consider HMAC-MD5 viable? How about HMAC-SHA1? I guess plain MD5 and SHA1 are out of the question according to this article.

Speed is not a concern here, so I would be happy with HMAC-SHA3 or anything else... Also, I keep reading that multiple signatures (MD5 + SHA1) are only as strong as the strongest one, but that does not make any sense to me - If you have two differently seeded (initial internal state) MD5 hashes, it should already be much harder to exploit (perhaps not double the number of bits, but surely a large load factor)?

You can hedge your bets with hash combiners: http://homepages.cwi.nl/~pietrzak/publications/FLP08.pdf

C_{4P}(SHA-2-512, SHA-3-512) should remain solid for a long long time. Your point of failure is, now, the signature scheme. To last for decades, do not use RSA or DSA. Use elliptic curve DSA or similar (EdDSA comes to mind).

What's the reasoning for why RSA/DSA with a sufficient key length would be vulnerable sooner than elliptic curve?

A 512-bit hash provides roughly 256 bits of resistance against collision attacks. We (presumably) want to match that level of strength with our signature scheme.

RSA and DSA are based on the integer factorization and finite field discrete logarithm problems, respectively. The number field sieve has a complexity (very roughly) proportional to 2^(b^(1/3)), for b-bit keys. This means that keys have to be proportional to 2^(b^3) to achieve b-bit security. For a concrete example, 256-bit security requires ~16384 RSA/DSA keys (this is using a less rough approximation of the NFS complexity).

On the other hand, well-chosen elliptic curve groups have (or appear to) much less structure than the above. In generic groups, the best we can do to solve the discrete logarithm are generic attacks, like Rho or BSGS. Those run in time proportional to 2^(b/2). Therefore, we only need keys of size 2^(2b) to achieve b-bit security (512-bit keys for 256-bit security). Bit for bit, elliptic curves are much more efficient to deliver the same level of security. This results in other advantages, like tolerable speed.

Mathematicians are making progress with the multiplicative groups used by RSA and DSA, which indicates that the security margins for RSA and DSA are lower than they are for elliptic curve groups. I can't pretend to be literate here but Daniel Bernstein would point out the numeric field sieve as an example of something that threatens RSA and DSA but not ECC.

As I understand it, ECC is also significantly faster.

Does anyone know of a smart card that implements ECC? Accessible from Linux is a plus.

(And in a related issue - does anyone know why smart card deployment in the US is so far behind Europe? Why is it that I need to get a PGPcard from germany and can't find anything comparable in the US?)

The startup GuardTime offers exactly this service: http://www.guardtime.com/signatures/

You hash your data, and send the hash to them. They embed the hash in a long hash-chain, and publish the current head of the hash chain in Financial Times every month. Once the hash chain head has been published, it is impossible to forge the signed content without destroying all copies of the newspaper in existence.

It seems that the service is free for basic use, only availability SLAs and customer support is paid.

GuardTime (www.guardtime.com) does keyless signatures. You send the hash, they give you a signature. The beauty of keyless is that there are no secrets involved and even if hacked or goes out of business, signatures will still be valid.

GuardTime seems like what I was looking for.


Have you seen Citizen Media Notary? https://github.com/Miserlou/CitizenMediaNotary

Is it as important to be moving away from SHA-1 as Schneier says?

For systems using HMAC? No. For systems using RSA signatures of SHA-1 hashes? Yes.

The fallacy of "A 32-core server is fast, and I can rent servers for 4 cents an hour, so I can rent a 32-core server for 4 cents" is unfortunaltely more common than I'd wish. If that simple error is in this analysis (and affects the cost analysis by at least 2^6), what other errors are in the math that I'm not as qualified to catch?

GPU's are obviously more cost efficient than general purpose x86 CPUs. But as we are learning, specialty hardware using very little power can be an extremely effective hashing device hash as butterfly labs is doing with SHA-2[1]. If a similar device were developed for SHA-1, I'm not crypto-proficient enough to know the difference, but I imagine it would be extremely efficient as well.

[1]: http://news.yahoo.com/butterfly-labs-announces-next-generati...

I wonder how the upcoming generations of Bitcoin mining hardware will change this calculation. They are all SHA1 FPGAs/ASICs that implement SHA1. It would also be interesting to see how GPUs perform.

Really? They are going to be bummed when they learn Bitcoin is SHA256.

Bitcoin is SHA-2-256 I believe.

oops, close enough :)

Crypto-newb question: is there any reason it takes the "number of computations" difficulty as a given, rather than estimating the rate at which attacks will get smarter and require fewer computations?

The analysis that's given includes estimates of decreasing computation cost from Moore's law, but no corresponding estimate of decreasing cost from patterns found in the SHA-1 hash function.

Also, why does the cost estimate focus on commodity CPUs, rather than GPUs, which are much better at the (parallel) computations that hash attacking involves?

He's only providing a lower bound for this numbers, he specifically spells your concerns. He mentions that any improvement on:

* number of cores per CPU

* number of CPUs per server

* any improvements in cryptanalysis

* instruction set improvements (arm sha1, gpus)

Will only make his number even scarier.

It is impossible to estimate decreases from "patterns found in the SHA-1 hash function". There is a good likelihood no patterns will be found in 10 years. It is also possible (although probably unlikely) an attack could be found that utterly breaks SHA-1.

I can't answer why GPUs weren't considered, though I'd guess because they weren't as familiar with the numbers and this was a "back of the envelope" calculation.

The reason why I wouldn't take into account better attacks is because that's much harder to predict than Moore's law. Also, this is supposed to be more of a worst case scenario, so even if no advances are found it will cost at least that much.

The bitcoin network does >20 terahashes per second now in total (but SHA-2, not SHA-1). Those 2^60 hashes would take only 16 hours at that speed.

> Since this argument only takes into account commodity hardware and not instruction set improvements (e.g., ARM 8 specifies a SHA-1 instruction), [...] the need to transition from SHA-1 for collision resistance functions is probably more urgent than this back-of-the-envelope analysis suggests.

So the ARM 8 SHA-1 instruction contributes to making itself obsolete? :-)

It is sad to see an Intel engineer misstate/misuse Moore's Observation. The actual observation was that the number of transistors you could put on a chip doubled every 18 months, not performance. For a long time more transistors meant more performance but now that isn't the case. So people put more 'cores' on to a chip to use up those transistors.

That said, the ARM threat is under estimated since the power budget is so much lower and the cost per CPU is also lower. So 10,000 ARM8 cores with SHA hardware are cheaper than 10,000 x86 cores doing the calculation in software.

When talking about embarrassingly parallel problems, it is okay to equate more transistors with more performance.

Especially when we're talking about something like SHA-1 that was designed to be efficiently implemented in hardware.

Disagree, more transistors does not help when the transistors are spent on another FPU (not useful for SHA) or additional L2/L3 caches (already fits in cache) for example.

From the Varnish documentation, addressing an assumption of the Varnish source code that unique inputs will produce a unique SHA-1 hash: "The risk of collision is very small however, and I can all but promise you, that you will be fully offset in fame and money for any inconvenience a collision might cause, because you will be the first person to find a SHA256 collision."


Not everybody agrees that Moore's law will continue, see for example "Dark silicone and end of multicore scaling"



They use existing data to model CPU perofrmance increase and conclude with "Through 2024, only 7.9 average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation."

I imagine "dark silicone" would be quite different than "dark silicon".

What is the use case for this? At best, I could see replacing data with something with the same hash but is gibberish. Why would a "crime syndicate" want this?

Also wouldn't an n-tuple of multiple hash methods and the file size be even more impossible to replicate than a single hash, but not be that much harder to generate initially?

There are humorous examples from the past, where there 20 PDF files prepared in advance, all having the same MD5 hash, where each one of them says "We hereby predict X is going to be president".

You publish the hash in advance, and after elections you only publish the right PDF file, thereby "proving" your clairvoyance.

More seriously: You can get a notary (e.g. Microsoft Authenticode) to sign one file, and then provide another with the same hash.

For now, it only helps you with planned fraud, not after-the-fact forgery.

Ah, got it. Didn't think about the "controlling both copies" subterfuge, but that makes a lot of sense.

md5? I think you mean md4. There are few examples of md5 collisions: http://stackoverflow.com/a/1224282/92493

From the link at the bottom of that page to a collision finder program:

> It takes about half an hour on average on a standard PC to find a collision.

That's not a whole lot of time.

Edit: it really does work, too:

    $ md5 msg[12].txt
    MD5 (msg1.txt) = bd1ba334fdd5bf3836b9b8c4bfebae16
    MD5 (msg2.txt) = bd1ba334fdd5bf3836b9b8c4bfebae16
    $ openssl sha1 msg[12].txt
    SHA1(msg1.txt)= fd5bb8044dc51ab2deef608fabf3b7dcb7a84c17
    SHA1(msg2.txt)= 7d490719b22f9a0e365e4fab4e3ab551b4a8898e
Took about half an hour on my MacBook Air, just as promised.

The primary use case is identity fraud in financial transactions, software distribution, or in private communications.

For example, PGP v4 Key IDs are a subset of PGP v4 Key Fingerprints. PGP v4 Key Fingerprints are SHA1 hashes of some protocol information and the key material (for more detailed information see RFC4880 section 12.2). So, a proper collision, with other properly constructed packets, could yield two seemingly "identical" keys. Therefor, they may be able to produce a fake PGP key that you think belongs to a person you trust and want to communicate privately with but actually does not.

Hash function collisions can be similarly exploited with digital certificate protocols (like SSL web certs). So, a crime syndicate may be able (depending upon how badly certain hash functions are broken) to produce a fake SSL certificate for your bank's website that your browser thinks in valid. Or even worse, they may be able to produce a fake root CA singing certificate that your browser thinks is valid.

It's more complex than just "can compute hash collisions = fake certs" because there's a very limited amount of information hashed in both certs and PGP keys. The more data that is hashed the easier it is to find a collision. With PGP much of the data has to be very specific like version and algorithm designation flags and valid cryptographic values for the algorithm designated. It's safe to say "very limited flexible data = very difficult to find a collision".

Therefor, these calculations are interesting and do suggest that everyone should move away from using SHA1. But, the calculations to determine when a criminal or an anti-sec use case would be possible are much more complex and involve many more variables.

I don't think I undertand this right-

Does the collision attacker get a hash of something used for security, and then look for another input that returns the same hash? Isn't it possible that they found the original input? Why is it important that its a collision?

Yes. But, they look for another input that has the same hash and for which they also have other secret information. For example, a PGP public key with the same hash (Fingerprint) for which they have a valid private key thereby affording them a working "collision" key pair. Obviously, very difficult.

There are a wide array of attacks. Using the method just described and the existence of a hash collision to fake a signature, rather than a key pair, in some cases can be much easier. Depending upon the protocol and procedures used, it may also be possible to use a different method such as providing the true authorized signer with any content that shares a collision with something you'd like the authorized signer to sign (but that they would not normally sign). This can be especially true when the signing procedure is fully automated (For example, some CA's SSL certificate acquisition process is fully automated).

The reason that the collision is important is because many cryptosystem implementations (and humans of course) use hashes as unique identifiers of key pair material (PGP Keys & PKI certs).

Best practice when storing passwords is to hash them, then compare the hash of the user input to the stored hash. If a user can compute all the possible input->hash mappings for a given set of parameters (salt, etc.) they can figure out the password from the hash.

What the article is about is hash collisions - two different pieces of data with the same hash value, not using a hash to obtain the original data.

That's what I'm wonder at the value for.

Otherwise your're totally accurate.

I think we can assume that governments are capable of cracking SHA-1 today, and all common internet encryption protocols like AES-128 last year. The costs of building such a facility would be trivial and could be hidden within almost any TLA agency's budget.

You obviously know something about AES that I do not.

Moreover they know things that aren't available to university cryptographers. For example, the Flame virus used an unreported collision scheme.

Imagine if somebody at the NSA/CIA firgures out that all pn were really p!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact