

Proposing a hashing standard for URLs? - Tichy
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/84e6140bb882f9c4?hl=en

======
DrJokepu
What about collisions? Even if the hashing algorithm is the same everywhere,
two different urls can yield the very same hash, and the birthday effect (that
is, there's a very large chance that two students share the same birthday in a
class of 30) makes that an important problem. URL shorteners need to have some
sort of mechanism to deal with hash collisions, but this means that it can
only be assumed that two identical hashes point to the same url if we know
that the very same database is backing both.

Sure, you can increase the length of the hash to a level where the birthday
effect isn't very significant but that would mean longer hashes. For hashes
only five-six characters long, the chances of collisions are just too high.

~~~
Tichy
Of course if you find URL references via that hash, you would still have to
check that it is the correct URL. As you said, otherwise it might be the wrong
one because of collisions.

I think it couldn't replace URL Shorteners, but supplement them. So you could
tweet "check out <http://bit.ly/> whatever #globalhashxyz" and by looking for
#globalhashxyz one could at least find references to the target URL (even if
those search results still would have to be filtered).

------
Tichy
I just posted that comment in the Twitter development group because of a
problem I have with Twitter URL shorteners, but on second thought I realize it
might be a good idea in general?

The idea is to have a standard for generating a short hash of a URL - in a way
a web standard for URL shortening. I think this could useful in a number of
situations where URL shortening is usually provided.

