
Flashsort - misja111
https://en.wikipedia.org/wiki/Flashsort
======
twic
This requires that you know the distribution before sorting; i'm not sure i
ever do.

I suppose you could pick the bounds by sampling: pick m random elements, sort
them, and use those as the upper bounds. Better yet, pick km random elements,
sort them, and use every k'th one as an upper bound. But the author's
intention is that the number of classes is a (large!) fraction of the number
of elements, so this would give you O(n log n) overall complexity, missing the
point entirely.

I note that the classification algorithm, if run on already-sorted input, puts
the classes in reverse order. If the classes are small, this doesn't matter,
but if they were bigger, it would be worth using the TimSort trick of checking
for runs of increasing elements and just flipping them.

I also note that this algorithm involves making three passes over the data:
one to count the number of elements in each class, one to classify elements
(this pass involves random access), and another to sort the classes. No worse
than quicksort, but not ideal for an external sort, and maybe not too cache-
friendly.

~~~
thomasmg
Related is samplesort
[https://de.wikipedia.org/wiki/Samplesort](https://de.wikipedia.org/wiki/Samplesort)
\- it is a comparison sort, and yes the complexity is also O(n log n), same as
quicksort and merge sort. But it may have less branches, and possibly better
cache locality. It is stable btw. Disadvantage: it uses a O(n) memory.

I made some experiments with samplesort, and found with Java it can be about
40% faster than Arrays.sort, and for C++ maybe 5-10% faster:
[https://github.com/thomasmueller/fastSort_java](https://github.com/thomasmueller/fastSort_java)
and
[https://github.com/thomasmueller/fastSort_cpp](https://github.com/thomasmueller/fastSort_cpp)

~~~
twic
Oh yes, i've basically reinvented samplesort! Fifty years too late, sadly.

~~~
thomasmg
Well, the same thing happened to me. I implemented it, wrote a benchmark, and
later was told this is samplesort...

Historic footnote: samplesort was the Python sort algorithm before it was
replaced with Timsort:
[https://bugs.python.org/issue587076](https://bugs.python.org/issue587076)
(specially see the attached timsort.txt)

------
sophiebits
Closer to how humans often sort things.

~~~
logicallee
For sure. If you had to sort a list of words, most would put ones near the top
of the alphabet (like c, d, f) at the top, the middle like o,n, p near the
middle, and ones from the end like t, s, v near the end, before putting them
in the correct order.

My latter two examples weren't even in order, I was just recalling my
impression.

~~~
firethief
That sounds more like a bucket sort, which kind of becomes flash sort as the
number of buckets approaches the number of items in the list

~~~
phamilton
I think about this when I play hearts.

Bucket by suit, sort each suit.

------
Paperweight
Good for sorting hashes.

~~~
martin_a
Why would you want to sort hashes? Not sure I see the application for hashes
here, but that's totally my fault.

~~~
kyrieeschaton
To have a duplicable traversal order.

------
nift
Interesting sorting algorithm I haven't encountered before(even though it
seems to be from 1998) m, but one that actually makes logical sense.

However, my first thoughts it does seem (from its concept) not so easy to
implement(?). Additionally I would have concerns with regards to how much an
overhead this calculation adds compared to just a “simple” comparison.

Maybe the calculation is worth it if the comparison is costly enough? My guess
would at least be that we would need fewer comparisons in Flashsort as we
should have a higher chance of “knowing” where things should go.

The Wikipedia article shares no plots/data (guess I should dig deeper for
that), but would be interesting to see how well it fares against more modern
and/or optimized versions or Quicksort as it is unclear if the claim that it
becomes faster than Quicksort is correct :)

~~~
maweki
I think assignment into the buckets can be done in parallel.

------
alanbernstein
Neat, almost like a probabilistic extension of radix sort.

~~~
DougBTX
Similar to pigeonhole sort too:

[https://en.wikipedia.org/wiki/Pigeonhole_sort](https://en.wikipedia.org/wiki/Pigeonhole_sort)

------
naich
This is basically just guessing with style.

------
shakna
The reference implementation [0], might not be the easiest to translate to a
new language, because it makes use of Fortran arrays really well, but
shouldn't be too hard.

[0] [https://www.drdobbs.com/database/the-
flashsort1-algorithm/18...](https://www.drdobbs.com/database/the-
flashsort1-algorithm/184410496#l1)

~~~
Jaxan
FORTRAN got a new version in 2018! Is that it new enough for you :-)?

~~~
shakna
I have no issues with Fortran. But Dimension doesn't translate easily to C
arrays, or other languages that don't have matrix support.

------
crazypython
How does this perform against timsort?

