Hacker News new | past | comments | ask | show | jobs | submit login

Radix sort (similar to bucket sort) groups items based on individual digits of the non-hashed values. If you hash the values before, you will end up with the data being sorted according to their hash, but they will appear almost random in their unhashed form.

A hash function is any function that maps arbitrary data to fixed-size values. A radix is a type of hash. Hashes are not defined as random or required to sort differently than the unhashed values. If you define a hash function that returns the first 32 bits of it’s input, then you have a hash that sorts almost the same as the unhashed values, as long as the first 32 bits are changing frequently, and you also have a hash function that you can call a radix.

I have never heard about hash function in the context of radix sort or anything similar as you describe. Wikipedia says about hash functions, that they "[S]cramble the bits of the key so that the resulting values are uniformly distributed over the key space". I would say that isn't the case for the function in radix sort that is used to 'pick' a digit.

Well, good you were here, now you have heard about radixes as hashes. ;) It's good to see and understand the connections and relationships between these things.

You're right that a radix doesn't scramble the key, but the quote you've picked is a qualified subset of hash functions. That paragraph is attempting to define a practical/good hash function that is used in specific ways. Not all hash functions scramble the bits, and the Wikipedia article is very clear about this if you read the whole thing.

You skipped over two important sentences that came before it, and a whole sub-section on radix hashes after it:

"A hash function is any function that can be used to map data of arbitrary size to fixed-size values." (very first sentence, emphasis mine.)

"In some cases, the key is the datum itself." (Right before the 'scramble' quote)


String hashing is sometimes similar to radix as well: "Simplistic hash functions may add the first and last n characters of a string along with the length" and I've seen string hashes in production that do only the first n characters and stop. That kind of hashing is frequently useful in small, embedded systems, video games, etc. where you have a limited set of strings and a good idea of how well distributed the keys are.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact