
WikiSort – Fast, stable, O(1) space merge sort algorithm - beagle3
https://github.com/BonzaiThePenguin/WikiSort
======
_delirium
To be pedantic, in-place sorting algorithms (typically?) take O(log n) space,
not O(1). They don't copy the data to be sorted, but they do need some
temporary space that isn't fixed, but scales (slowly) with the size of the
array being sorted. The usual sources of the log-n space growth come from
either a stack recursing on things (explicitly or implicitly) and/or the array
indices.

This particular implementation (looking at the C version) uses fixed-size
'long ints' for its temporary data storage, which means it only works on
arrays up to LONG_MAX elements. If you had larger arrays, your need for
temporary data would grow, e.g. you could upgrade all those long ints to long
long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic
growth is very slow.

~~~
jmpeax
So, is there such a thing as a useful O(1)-space algorithm? I can't think of a
single one.

~~~
repsilat
(Particular, individual) regular expressions can be implemented with O(1)
space. Regular expressions can do lots of useful things.

~~~
jmpeax
This is incorrect, under deliriums pedantic interpretation of space
complexity. Some regular expressions require one to accept a certain number of
examples of a character. This means that storage space for the count is
required, which scales logarithmically in the count.

~~~
repsilat
This is why my comment began with the words "particular, individual". A regex
like (1{5}0{3})\\* can be implemented in constant space, but a language like

    
    
        matches(n, m, string):
            return matches((1{n}0{m})\*, string)
    

cannot.

Just think about it -- the former is just a DFA, and so _of course_ you can do
it in constant space (provided your input stream is abstracted away, or you
use a TM.

------
beagle3
Also, interesting discussion on reddit:
[http://www.reddit.com/r/programming/comments/20fxv8/making_a...](http://www.reddit.com/r/programming/comments/20fxv8/making_a_fast_and_stable_sorting_algorithm_with/)

------
tbingmann
I made a video of how the algorithm works:
[http://youtu.be/NjcSyD7p660](http://youtu.be/NjcSyD7p660)

To make it I needed a C++ version with iterators, which I though would be
faster. But it is still about 20% slower than stable_sort for the default
random input test. It probably also stays the same for other inputs.

------
beagle3
Previous innovation that I remember in sorting was Python's TimSort - it's
just MergeSort with a few tweaks, but it's better than any other sort I've met
when applied to real world data.

~~~
masklinn
> it's just MergeSort with a few tweaks

"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of
insertion and merge sort (it's an insertion sort below 64 elements, and it
uses insertion sorts to create sorted sub-sections of 32~64 elements before
applying the main merge sort)

~~~
beagle3
Yes, it is a bit of understatement. Other than the insertion at smaller sizes,
it adds:

\- scans array to find merge-able runs (rather than use a "standard" size like
more merge sorts); This makes it closer to O(n) for mostly-sorted arrays, a
feature that is mostly associated with Bubble Sorts - but without giving up
any of the good things about MergeSort

\- identifies "reverse runs", and just reverses them - making mostly-reverse-
sorted arrays closer to O(n), which no other general sort achieves.

It's still O(n log n) in the worst case - but it just works exceptionally well
on real life datasets, which often have sorted or reversed sections.

~~~
TwoBit
Our benchmarks consistently show Tim sort as the fastest -stable- sort, But
intro sort consistently beats it.

~~~
beagle3
Which benchmarks would that be?

TimSort as implemented in Python goes through the Python machinery of object
comparison and object management in general. Make sure you do an
apples<->apples comparison when benchmarking.

~~~
stuhood
Timsort is in JDK7, btw.

[http://grepcode.com/file/repository.grepcode.com/java/root/j...](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/TimSort.java)

------
taylorbuley
I'm sure people will loathe me for saying this, but I'd really like to see the
implemented into JavaScript.

We've got Crossfilter ([https://github.com/square/crossfilter/wiki/API-
Reference);](https://github.com/square/crossfilter/wiki/API-Reference\);)
however, as more data moves client-side with storage APIs like IndexedDB, I
see a need for "as efficient as possible"

~~~
phillmv
1\. Skimming the paper, it only matters if you need an efficient _stable_
sort. If you just need O(1) memory you can stay with heapsort, which at least
will have more reference implementations.

This lead me to look up browser sort implementations;
[http://stackoverflow.com/questions/234683/javascript-
array-s...](http://stackoverflow.com/questions/234683/javascript-array-sort-
implementation) \- it seems Moz uses mergesort and Webkit may or may not do
something silly for non contiguous arrays.

So, there _could_ be a use for it. For most applications you're about fine as
it is.

2\. I'm interested in hearing about applications where you're loading millions
of array elements in people's _browsers_.

I was going to be cranky and make rude comments but I can _envision_ people
wanting to play with their data without loading it in specialized
toolsets/learn R/build a DSL in $lang_of_choice.

~~~
iandanforth
I do some ML in browser for ease of visualization/portability. Don't often
need to sort all the connection weights, but hey you asked :)

~~~
taylorbuley
Here's my work-in-progress visualization with a dataset of 1 million+ IMDB
entries to be stored in IDB [http://dashdb.com/#/](http://dashdb.com/#/)

It purposefully pushes IDB way further than it should be taken in most cases.

------
dignati
This has a really nice documentation, should make it easy to implement.

~~~
danbruc
Despite the good documentation this algorithm is complex and I bet most
implementations will be incorrect.

UPDATE: I could not find the article I had initially in mind but I found this
one [1] showing that even prominent implementations of simple algorithms like
binary search or quicksort contain bugs more often than one expects and they
may even remain unnoticed for decades.

So take this as a warning - if you implement this algorithm you will almost
surely fail no matter how smart you are or how many people look at your
implementation.

[1] [http://googleresearch.blogspot.de/2006/06/extra-extra-
read-a...](http://googleresearch.blogspot.de/2006/06/extra-extra-read-all-
about-it-nearly.html)

~~~
datawander
There is also a complete lack of tests. It'd be nice if there was a library of
tests for all sorting algorithms. I'm sure there is, but something more widely
accepted and well known.

------
TheLoneWolfling
Is there a way that better minds than me can see to parallelize this
algorithm?

~~~
stokedmartin
The merge step[0] can be parallelized -

\- take the two sorted arrays A & B (assume both are of size n) and make
partitions of size log n in one of them, let's say A

\- Now considering there are n/logn processors, assign each partition to a
processor. On each partition take the last element (l) and do a binary search
to find a cut point in the other array B such that all elements in B are <= l.
Cut points of two such partitions in A correspond to a partition in B which
can then be sequentially merged by the processor.

Span is _O(log n)_ ; Work is _O(n)_ ; so parallelism is _O(n /logn)_ Detailed
information here [1]

[0]
[https://github.com/BonzaiThePenguin/WikiSort/blob/master/Cha...](https://github.com/BonzaiThePenguin/WikiSort/blob/master/Chapter%203:%20In-
Place.md)

[1] [http://electures.informatik.uni-
freiburg.de/portal/download/...](http://electures.informatik.uni-
freiburg.de/portal/download/3/6950/)

------
sesqu
I've been meaning to compare various in-place mergesorts, so this'll
definitely make it into my bookmarks.

------
jokoon
why is a sort function so important ?

I mean what sorts of big data sets are you sorting that much often ?

~~~
rguldener
Believe it or not, sorting is one of the most common operations software
performs. As a simple example think of how many times somebody queries
something like `SELECT (...) FROM huge_table WHERE (...) ORDER BY (...)`
Obviously the order by means the data needs to be (at least partially) sorted
before it can be returned. To be fair that is a different case algorithmically
since DB's are almost never able sort entirely in memory. But there are plenty
of other examples where in memory sorting is necessary or provides advantages
for later computation steps (eg. ability to cut of elements larger than a
certain threshold).

~~~
jokoon
yeah but it's already implemented into db software, why would devs reinvent
the wheel ?

~~~
thaumasiotes
Because "do it once, never improve again" is a bizarre philosophy?

~~~
jokoon
I think most db software are already quite well optimized.

I mean unless you're a db software dev, and unless you're profiling it for
each use case I wonder if you can really find something to optimize.

I just meant that's it's a niche. I honestly got no idea how db software are
programmed but I doubt any dev can pretend to do better.

I guess that algorithm would interest people who recompile their db software,
or who don't use those db software.

So here comes the question : what are the pro cons of using a db software ?
Why would some devs still use plain files to store data ?

------
jsonified
nice to see even in sort() we have innovation!

~~~
chrismonsanto
If you're interested in alternative sort algorithms, you might enjoy the self-
improving sort [1]. A simplified tl;dr: given inputs drawn from a particular
distribution + a training phase, the result is a sort that is optimal for that
particular distribution. The complexity is in terms of the entropy of the
distribution, and can beat the typical worst case O(n log n) for comparison
sorts.

[1]:
[http://www.cs.princeton.edu/~chazelle/pubs/selfimprove.pdf](http://www.cs.princeton.edu/~chazelle/pubs/selfimprove.pdf)

------
bmvakili
I got widely different results there: C - 105.868545% C++ - 80.0518% Java -
61.664313124608775% Probably the optimizations there; can't easily be done in
Java one; interesting nonetheless, thanks for post.

~~~
pjscott
Those benchmark ratios are not comparable across languages.

The C code compares running time with a very standard mergesort.

The C++ code compares with std::stable_sort.

The Java code compares with a standard mergesort -- very similar to the code
in the C version -- but has hard-to-predict JIT warmup effects.

------
apples2apples
How does it compare to an O(1) space version of quicksort?

~~~
beagle3
Quicksort is worst case O(n^2) time, unless you incorporate something like
Quickselect for your pivot (which no one ever does, because it makes it
relatively complicated. Have you ever seen an O(n log n) guaranteed quicksort
implemented? I haven't - best I've seen is median-of-3 or median-of-5 pivots -
or randomized). Furthermore, I've never seen an O(1) space version of
quicksort and I'm not sure one can exist -- see, e.g.
[http://stackoverflow.com/questions/11455242/is-it-
possible-t...](http://stackoverflow.com/questions/11455242/is-it-possible-to-
implement-quicksort-with-o1-space-complexity)

The meaningful comparison would actually be to Heapsort, which is in-place,
O(1) space, and NOT stable - though much, much, simpler.

ADDED:

Anyone who uses quicksort should read this gem from Doug McIlroy, which
elicits an O(n^2) behaviour from most quicksort implementations:
[http://www.cs.dartmouth.edu/~doug/mdmspe.pdf](http://www.cs.dartmouth.edu/~doug/mdmspe.pdf)
-

~~~
klmr
Many/most widely used “quicksort” implementations are actually introsort (in
particular, `std::sort` is), and thus O(nlogn) worst case.

------
zenciadam
It's the quarterly post from the guy who just reinvented radix sort.

