This is very interesting; thanks! I am not very familiar with hashing algorithms...

JoshTriplett · on Dec 13, 2014

A good digest alorithm should already handle length extension attacks with just the digest. HMAC algorithms handle that as far as I know, and I think modern hashes don't suffer from the length extension problem in the first place.

comex · on Dec 13, 2014

In any case, length extension attacks apply to (hashes misused as) keyed message authentication codes - a scenario where you give a message and its authenticator to a server knowing the key, and it checks whether someone knowing the key generated them. The attacks let you, given a message and its authenticator, calculate the authenticator for a different message. However, in this case, the hash cannot be tampered with; breaking integrity would require finding a different message with the same hash, which is a straight-up collision, considered to totally break a hash algorithm. At the moment, the only commonly used hash algorithm with known collisions is MD5; SHA-1 will probably follow in the next several years.

hrjet · on Dec 13, 2014

I perhaps used the wrong term. I meant straight-up collision with a modified length.

The known MD5 collision attack needs to modify the length of the message; finding a collision with the same length is extremely difficult. It would be reasonable to assume that attacks on other hashing algorithms would suffer the same constraint.

Wouldn't having the length specified in the integrity attribute help reduce chances of a future attack? The cost of specifying the length is negligible; about 12 bytes for a base-64 encoded 64-bit unsigned int.

sjy · on Dec 13, 2014

You're essentially using the length as a(n extremely weak) secondary hash in that case. That might be sufficient to disable one known MD5 exploit, but if you're worried about vulnerabilities in your primary hash algorithm it would make more sense to use another real hash algorithm for your backup.

hrjet · on Dec 13, 2014

Thanks for the answer. I am really not familiar with the mathematics of hash algorithms and I ask just to learn.

Why is length essentially a weak hash? Isn't it an additional constraint that works orthogonal to the hash? It serves to restrict the space of collisions and hence directly reduces the exploit surface. Moreover, the length can be independently verified of the hash function, and its space & time complexity is negligible.

Dylan16807 · on Dec 13, 2014

> Why is length essentially a weak hash?

It's utterly trivial to match by itself. Adding length to a real hash is a mild difficulty increase. Adding a second hash is a massive difficulty increase.

> Isn't it an additional constraint that works orthogonal to the hash?

Yes. But so is a second hash.

> It serves to restrict the space of collisions and hence directly reduces the exploit surface.

Very inefficiently.

> Moreover, the length can be independently verified of the hash function, and its space & time complexity is negligible.

A second hash is independent of the first hash too. Hashing a second time compared to downloading and hashing the first time is pretty close to negligible.