
Faster than radix sort: Kirkpatrick-Reisch sorting - milo_im
https://sortingsearching.com/2020/06/06/kirkpatrick-reisch.html
======
vanderZwan
So since this sorting algorithm involves a trie, would there another
optimization possibility by using a data structure inspired by the MergedTrie?

My first thought would be to split the list of numbers into a prefix and a
suffix part and building two tries connected at the leaves[1][2], replacing
the trie used in the article. Then we sort both tries using the Kirkpatrick-
Reisch method (but in reverse order for the suffix trie so that the final
result is sorted correctly), and finally we would have to reconnect the two
while walking the tries.

[0]
[https://journals.plos.org/plosone/article?id=10.1371/journal...](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215288)

[1] more or less, the MT in the linked paper works a bit differently but also
has a different use-case in mind.

[2] Also I have no idea if it make sense to have two depth 2 tries, or if
there is _another_ algorithm out there with two depth 1 tries that _kind_ of
looks like this algorithm

~~~
vanderZwan
So I tried working this out on paper. The simplest variation I could think of:

\- split the numbers into a top and bottom half (from now on: prefix and
suffix) (linear time)

\- make an unordered suffix trie (linear time). First level has suffixes as,
second level has prefixes

\- make a (recursively sorted) ordered prefix set, and a (recursively sorted)
ordered suffix set

\- initiate an ordered prefix trie, but only the first level for now - that
is, don't insert suffixes yet (linear time over the ordered prefix set)

\- in order of the ordered suffix set, walk over the suffix trie and for each
prefix leaf insert the parent suffix into the appropriate prefix bucket in the
prefix trie (linear time)

\- now we can walk the prefix trie in order and combine prefix and suffix
again (like in the article)

This _feels_ like it should have comparable computational complexity - as far
as I can see the only real difference is that it recursively sorts twice as
often (once for the prefix set and once for the suffix set). Either way it
still seems to have horrible memory overhead, requiring a trie for each level
of recursion and all that.

Then I realized that if we are at the base case where prefix/suffix can be
sorted with a counting sort, then the above can actually be simplified to LSB
radix sort where we sort the suffixes into a temporary secondary array, and
the prefixes from the secondary array into the original array (I think we can
safely say that using a plain array of _n_ elements has both lower memory
overhead and better computational performance than a trie with _n_ leaves).
But... couldn't I then optimize _the entire recursion_ into an LSB radix sort?
Which would imply it must have... worse time complexity than Kirkpatrick-
Reisch sorting? Wait what? Where did I go wrong then?

------
olliej
I suspect memory indirection would clobber the theoretical perf, but I'd be
happy to be proved wrong.

My inclination is that this would be slower than "standard" high perf radix
sorting, but I'm not sure if the high level overview of this algorithm
represents an equivalent level of implementation.

------
oxxoxoxooo
If you are into integer sorting, this might be of interest as well:

[https://yourbasic.org/algorithms/fastest-sorting-
algorithm/](https://yourbasic.org/algorithms/fastest-sorting-algorithm/)

[https://sorting.cr.yp.to/](https://sorting.cr.yp.to/)

------
nathell
Written by Tomek Czajka, a 3x TopCoder winner and algorithmic mastermind.
Worth following!

~~~
mirekrusin
Remember him at high school programming olympiads, top place year after year
(also on math olympiads and likely other competitions I'd have to recall),
everybody admired him.

------
1wd
O(n+n * log(w/log(n)) )

Wouldn't this decrease again for large enough n, and even go negative after
n=2^(w * 2)?

~~~
karpierz
The recursion assumes that log(n) > w; if log(n) <= w, then you're in the base
case and it's O(n).

------
cwzwarich
> Faster

Benchmarks?

~~~
vvanders
Yeah, would be curious as well. There's two really _awesome_ things about
radix sort:

1\. It scans in linear order, so if you tune your radix size to L1/L2 cache it
will happily beat other "faster" algorithms thanks to the prefetcher.

2\. If preserves ordering for keys with the same value.

#2 makes is a really good depth-sorting algorithm for alpha rendering, and #1
just makes it darn fast. There's a nice floating point implementation out
there for it as well.

~~~
corysama
vvanders, I believe you’ve worked in games so you might already know about how
the PlayStation 1 kindof had radix sort baked into the hardware. The hardware
had no Z buffer, so all polygons had to be ordered back-to-front using the
Painter’s Algorithm for visibility. The hardware understood a linked list of
polygons; as odd as that sounds. And, the standard practice presented by the
API was to have a pre-allocated linear array of NOP list nodes forming a radix
as the starting point for inserting sorted polys.

------
xiaodai
I might be missing something but radix sort I can sort a 64 bit vector 11 bits
at a time.

