Hacker News new | past | comments | ask | show | jobs | submit login

its not just widely accepted.... hashing algorithms, due to the pigeonhole principle you just explained, by definition are full of collisions.

the only security relevant part is how hard it is to find them for use in various scenarios......

(this is why i really dont get why zfs ever did deduplication relying on hash only.... sure verify is an option, but it would be insane to use hash only....... unlessi am overlooking aomething statistical that makes it make sense (maybethe odds of a collision are fR less than the odds of total hardwarefailure ?still.....)

Your last sentence hits the nail on the head. Even when storing petabytes of data, the odds on a freak hash collision are still many orders of magnitude longer than the odds on a hardware failure.

There's a statement that has been "pretty much permanently" on a whiteboard-covered wall of the computer lab at my college telling a joke about "the difference between a mathematician and an engineer", that goes through the math behind a specific type of prime number generator, calculates the likelihood that it might fail, and then claims the mathematician cares about that while the engineer knows that is orders of magnitude less likely than a guaranteed algorithm failing due to a cosmic ray hitting it in RAM and flipping one of its bits. ;P

Indeed... a quick search turns up someone from the zfs team indicating that a collision in zfs dedup (sha256) is 50 orders of magnitude less likely than an uncorrected hardware error. i shoukd have looked before posting.

I meant, of course, that we'll find one.

We know the exist. We just don't have any examples to present.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact