Do you mean that there could be two 3K files with the same sha256 hash and the p...

andrewaylett · on March 8, 2015

It's not that there might be a collision, but that collisions are guaranteed. How many 3Mb files are there? 2^(3M8) = 2^25165824. How many distinct hashes are there? 2^(2561001) = 2^256256.

By the pigeonhole principle, we can't fit 2^25165824 objects into 2^256256 holes; indeed each file will have on average 2^24909568 other files that share the same set of 1001 sha256 hashes.

The reason that we are able to work with hashes, and that compression works in practice, is that we don't have to deal with every 3M file. Most of these files are gibberish and will never be seen, and finding two files that match even one hash is incredibly difficult. But once we start talking about brute-forcing, we start encountering problems -- and having to dedicate an awful lot of processing power to the problem isn't the biggest one...

prettyrandom100 · on March 9, 2015

There is just one 3MB file and we divide that file into 1000 parts. I agree there will be collision per hash (for each part). But I'm skeptical that all 1000 hashes will produce such bits so that the final file will cause collision on the hash of the original file (remember we do have hash of the original file). If the final hash does not match the hash of original file we would have to recompute all the hashes again by randomly generating the bits for each 1000 file-parts. Do you mean to say that the collisions are guaranteed and that the collision inside any file-part will also cause a collision in the original file hash when the parts are combined ?

andrewaylett · on March 18, 2015

Not every collection of 1000 correctly hashed parts will make a correctly hashed whole, but there are an awful lot of different collections of parts that will hash correctly (2^24909824 permutations of them) and of those, one in 2^256 will also match the full-file hash.