

Burstsort: Fastest known algorithm to sort large set of strings - Xichekolas
http://goanna.cs.rmit.edu.au/~jz/fulltext/alenex03.pdf

======
ryancox
This is the algorithm used in Hadoop's record setting TeraSort benchmark:

[http://perspectives.mvdirona.com/2008/07/08/HadoopWinsTeraSo...](http://perspectives.mvdirona.com/2008/07/08/HadoopWinsTeraSort.aspx)

[http://svn.apache.org/viewvc/hadoop/core/trunk/src/examples/...](http://svn.apache.org/viewvc/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java?revision=673517&view=markup)

------
jws
Burstsort's design considers cache effects. (A cache miss can easily cost you
200 instructions.) In the grand old days of sorting algorithms it was
instruction count that mattered.

Judging from their graphs, Burstsort is fastest by a wide margin for
sufficiently large datasets.

~~~
Xichekolas
It also has very consistent asymptotic performance... ie. doesn't degrade as
badly as Quicksort when the data is strange in some way. (When a list is
already in reverse order, naive Quicksort degrades to O(n^2)).

Also oddly interesting, according to some page I was reading on Quicksort
before this, there exist adaptive algorithms that generate worst case data
sets for Quicksort no matter what the partitioning scheme is.

~~~
eru
How general can the partitioning scheme be? Can it generate worst cases for
random selection of the pivot?

Also one can always select the median in (deterministic) linear time. This way
quicksort won't degenerate to O(n^2) ever.

~~~
arebop
I think Xichekolas had in mind adversarial techniques such as the one
described in <http://www.cs.dartmouth.edu/~doug/mdmspe.pdf>. That paper
assumes O(1) pivot selection.

~~~
eru
I read the paper. Seems like they can not beat a median pivot (that takes O(k)
to find at each step, but keeps the quicksort runtime of O(n log n).)

Still, a cool progam.

