
Radix Sort Revisited - ColinWright
http://codercorner.com/RadixSortRevisited.htm
======
jholman
Lots of nit-picky problems with this article, though maybe it's still good
reading overall. Though I think you might be better off just linking to
[http://en.wikipedia.org/wiki/Category:Sorting_algorithms](http://en.wikipedia.org/wiki/Category:Sorting_algorithms)

I haven't even gotten to his bits about floats and negative floats, which I
will assume for now are very clever. But his basic grasp of basic
undergraduate algorithmics is a little... idiosyncratic.

The author defines "radix" incorrectly. "Radix" is a synonym for "base", in
the sense used in "base 16" or "base 10" or "base twenty six".

The author's first pass at explaining radix sort (up to "Radix sort is stable
by the way") is not radix sort at all, it's counting sort. Admittedly, I
suspect that any good explanation of radix sort will start by teaching some
stable linear-time sort, probably counting sort (but alternately bucket
sort).... but that doesn't mean that they're the same algorithm.

Then, up to the section marked Sorting Floating Point Values, he actually
explains radix sort. This part seems okay to me.

His analysis of the running time is atrocious... for one thing, he doesn't
distinguish between his counting-sort phase and his radix-sort phase, which
makes it a bit hard to illustrate exactly where he's being silly, but...
First, counting sort takes O(n * R), where R is the size of the range of input
values (R is known as the "radix", remember). In his case this is implicitly
256, which he appears to equate to 1 (yes, they're asymptotically equivalent
if it's always 256, but in my opinion 256 is a large enough constant factor to
be worthy of note). In a more general algorithm, this radix should be seen as
a parameter of the analysis. Of course, he indirectly addresses this in his
section about "extending to words or dwords" (i.e. his section on actual radix
sort), but it would be better to be explicit about it.

Further, radix sort does d passes, each taking O(n * R), where d is the number
of digits and R is still the Radix. What's the largest value? It's around R^d.
So if you want to sort digits with a maximum of M, you need d=log_R(M). If you
assume that your range of values is at least as large as your set of values
(you have n values, remember), then you need log(n) passes of time O(n), in
other words it takes, on the whole O(n logn). The only way to beat this is
when you're sorting values which live in a smaller domain than the value-set,
like say this list:

    
    
        [3, 4, 5, 5, 2, 3, 3, 2, 6, 6, 2, 1, 4, 1, 1, 6, 6, 4, 3, 2]
    

(that's 20 values, each in the range 1..6)

Oh, and while I'm kvetching, he writes O(4 * n), which is just a weird thing
to write. It's not technically wrong (since after all O(n) and O(4 * n) each
refer to sets of functions, and the sets they refer to are co-extensional),
but it's a "code smell".

So. In summary, he uses a lot of sloppy terminology, is sloppy about
separation of concerns, and fails to actually analyze the actual runtime
complexity of his algorithm, except that in a special case he turns out to be
coincidentally correct.

Also note, btw, that this article was last edited in 2000 (er, plus a footnote
in 2007). But no big deal: the article isn't "news", so it doesn't really
matter that it's old; the content of this article, and my criticisms thereof,
are neither more nor less relevant now than they were in 2000.

EDITS: lots of (minor) edits in the first 10 minutes of being posted, mostly
formatting

~~~
klipt
> first, counting sort takes O(n * R)

O(n + R), actually:
[http://en.wikipedia.org/wiki/Counting_sort#Analysis](http://en.wikipedia.org/wiki/Counting_sort#Analysis)

I agree that he overstates radix sort though. Asymptotically it just changes
the runtime from O(n log n) (ala quicksort) to O(n log k) where k is the size
of the set of possible numbers, and usually n is smaller than k. Dividing into
256 parts instead of 2 at each step changes the constant, but not the
asymptotic big O. You could do the same with quicksort (at each step choose
255 random pivots and use them to define 256 buckets of sublists). That
changes the average runtime from n log_2 n to n log_256 n, but the only
difference is a constant factor.

~~~
jholman
Whups, yes, of course you are right. Thank you!

Note that I made this dumb error in three places (two obvious, one more
subtle, I'll leave it to the interested reader to see where).

But my overall critique of his analysis of radix sort is, I believe,
unaffected.

------
valtron
Also works for sorting strings in alphabetical order.

~~~
gamegoblin
Strings are merely numbers of a higher base (YMMV with Unicode craziness)

------
taybin
"In every decent programmer’s toolbox lies a strange weapon called a Radix
Sort"

Oh no, I've never used it. I must not be a decent programmer. Downvote
downvote downvote. Don't let anyone know.

