Hacker News new | comments | show | ask | jobs | submit login

Radixsort isn't linear (https://en.wikipedia.org/wiki/Radix_sort). I have actually published research that said Radixsort was linear, only to have it later explained to me that it is not linear, for a subtle an hard to remember reason involving number theory.

It's O(k * n) where k is the number of digits. It takes log-b(N) digits to represent N distinct integers in base-b, so O(k * N) reduces to O(N log N) when there's no bound on the range of keys.

It's still pretty useful for sorting data where you know the keys are small integers (say, less than a machine word).

Thank you!

So if you put a bound on the size of the keys, let's say 32bit, it becomes linear? Obviously it would be cheating to put a giant number here :)

Right. Basically, if the key size is bounded, then k becomes a constant and it reduces to O(N).

Actually, it's useful when 'N' (the number of elements) is much larger than 'k', the number of significant digits in the largest number in the set.

If k ∈ Θ(log N), then O(Nk) becomes O(N log N), which is asymptotically no better than any optimal comparison based sort.

However, if k ∈ o(log N), then we get an asymptotically better algorithm.

Quicksort is O(k*n log n) though since it does O(n log n) comparisons and a comparison is going to be O(k), so radix sort is still asymptotically faster.

See response to Retric:


A comparison isn't always O(k). It's O(k) for strings. It's O(1) for machine-word integers. It can be much more than O(k) for user-defined types, eg. it can be O(k^3) if you have to multiply a matrix out and take the determinant of the product.

For word sized integers, k=O(1) so radix sort is O(n) and quicksort is O(n log n). If you have a comparison function that takes O(k^3) then you most likely can't even implement a radix sort for it, so the comparison can't be made.

So the fact is that quicksort is a factor O(log n) slower than radix sort. The flip side is that quicksort is more generally applicable to any comparison function whereas radix sort only works for lexicographic sorts. In almost all cases that is exactly what you want, but for example I'm not aware of any way to efficiently sort rational numbers with radix sort.

It all depends on what you're counting. Sort algorithm analysis typically counts key comparisons or record movement. In radixsort (and trie algorithms in general), you are relying on operations on digits of the key, not comparisons of entire keys. So linear isn't necessarily wrong, but you do have to say what you're counting.

Number theory? Not seeing the connection.

It wasn't number theory, but I couldn't remember and was hand-waving.

Is it nonlinear even if you have a maximum number of digits in the numbers you're sorting (like say, 64-bit integers)?

No, if you assume that the size of each object is constant, O notation will swallow it and report that, in terms of the number of items to sort, Radix sort is linear in time.

The usual bound of Omega(n log n), proved using decision trees, is only applicable when your only operation is to compare two elements. Radix sort asks for more than this, and so it can assume a specific structure of its input, so it can violate the lower bound.

Depending on which operations you assume, sorting can become more or less easy. In the extreme case, if you can ask the array "Please sort yourself." as a basic operation, sorting is O(1). Radix sort assumes bitmasking as a basic operation, which falls into the "make things easier" spectrum, leading to an O(n) algorithm under the stated assumption of constant bit length (or any encoding, really, it doesn't really need to be bits).

In that case, most other algorithms (e.g. insertion sort) technically take linear time too (though with a rather impractical constant factor).

Can you elaborate on that. It is proven that sorting based only on comparisons has a lower bound of nlog(n) comparison operations. Given that algorithms such as insertion sort use only comparison operations to probe the data, I do not see how they can break the bound.

The O(n log n) bound is a worst-case upper bound, which is a tight bound if all elements are distinct. It doesn't apply if there are guaranteed to be a sufficient number of duplicates.

For example, one might implement a variant of InsertionSort that stores a duplicate count for distinct elements, and then an insertion sort on e.g. 32-bit integers would require at most 4294967296 comparisons per insertion -- a constant factor that can be technically ignored in the complexity analysis. (I did warn you that the constant factor would become unwieldy!)

Note that this doesn't require values to be integers -- it suffices for them to be comparable with a lot of duplicates. The variant of InsertionSort described above would require O(k×n) comparisons where `n` is length of the input list and `k` is the number of distinct values, or O(n) whenever `k` is bounded by a constant (as required for radixsort to work in linear time).

That's not to say that radixsort doesn't outperform other sorting algorithms in practice -- it usually does. However, that isn't obvious from a strictly complexity theoretic point of view.

What's nlogn for a known n? A constant - therefore O(1).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact