Hacker News new | past | comments | ask | show | jobs | submit login

Small nitpick: radix sort has nothing to do with hashing.





A radix sort is a type of hashing, no? You're bucketing the items based on a reduced form projection of them onto some smaller subspace.

I think it's different in that you don't care what the output of a hash is, as long as it's sufficiently unique or whatever. Whereas here the buckets are are intimately tied to the input. It's more of an arithmetic hack I'd say, as it only works on decimal numbers

Radix sort works in any base

Radix sort also works on strings, etc. (anything that can be lexicographically ordered)

I guess it could work well for sets of strings that you know will not go above a certain length? But after that it might become painful, i.e. the algorithm complexity will start depending on the maximum string length

It's the same problem when working with numbers, since you have to assume some maximum number of digits as well. Radix sort essentially treats numbers as zero-padded strings.

When sorting single words consisting only of the letters A-Z, for example, you can think of it as the same thing but with 26 buckets (27 if you pad with spaces instead of As) instead of 10. Or you can think of it as a specific subset of numbers in base 36, if that makes more sense to you.


Well, it just means you actually see the complexity. Normally that is hidden in a strcmp, which is actually O(L) for the Length of the shorter string.

No, the point of radix sort is that you don't have to do comparisons. Radix sort on strings is the same as radix sort on numbers, just with more buckets.

What they mean is that a standard comparative sort can also become very long if the strings are long, because strong compare can take up to the length of the string to return a result

Sorry, I meant decimal as in excluding some rationals e.g. not 1/3.

Radix sort (similar to bucket sort) groups items based on individual digits of the non-hashed values. If you hash the values before, you will end up with the data being sorted according to their hash, but they will appear almost random in their unhashed form.

A hash function is any function that maps arbitrary data to fixed-size values. A radix is a type of hash. Hashes are not defined as random or required to sort differently than the unhashed values. If you define a hash function that returns the first 32 bits of it’s input, then you have a hash that sorts almost the same as the unhashed values, as long as the first 32 bits are changing frequently, and you also have a hash function that you can call a radix.

I have never heard about hash function in the context of radix sort or anything similar as you describe. Wikipedia says about hash functions, that they "[S]cramble the bits of the key so that the resulting values are uniformly distributed over the key space". I would say that isn't the case for the function in radix sort that is used to 'pick' a digit.

Well, good you were here, now you have heard about radixes as hashes. ;) It's good to see and understand the connections and relationships between these things.

You're right that a radix doesn't scramble the key, but the quote you've picked is a qualified subset of hash functions. That paragraph is attempting to define a practical/good hash function that is used in specific ways. Not all hash functions scramble the bits, and the Wikipedia article is very clear about this if you read the whole thing.

You skipped over two important sentences that came before it, and a whole sub-section on radix hashes after it:

"A hash function is any function that can be used to map data of arbitrary size to fixed-size values." (very first sentence, emphasis mine.)

"In some cases, the key is the datum itself." (Right before the 'scramble' quote)

https://en.wikipedia.org/wiki/Hash_function#Radix_conversion...

String hashing is sometimes similar to radix as well: "Simplistic hash functions may add the first and last n characters of a string along with the length" and I've seen string hashes in production that do only the first n characters and stop. That kind of hashing is frequently useful in small, embedded systems, video games, etc. where you have a limited set of strings and a good idea of how well distributed the keys are.


Nope, if you hash the inputs then you won't be able to order them properly.



Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: