WikiSort – Fast, stable, O(1) space merge sort algorithm

_delirium · on March 15, 2014

To be pedantic, in-place sorting algorithms (typically?) take O(log n) space, not O(1). They don't copy the data to be sorted, but they do need some temporary space that isn't fixed, but scales (slowly) with the size of the array being sorted. The usual sources of the log-n space growth come from either a stack recursing on things (explicitly or implicitly) and/or the array indices.

This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.

Buge · on March 15, 2014

Here is what CLRS says about this:

The data types in the RAM model are integer and ﬂoating point (for storing real numbers). Although we typically do not concern ourselves with precision in this book, in some applications precision is crucial. We also assume a limit on the size of each word of data. For example, when working with inputs of size n, we typically assume that integers are represented by c lg n bits for some constant c >= 1. We require c >= 1 so that each word can hold the value of n, enabling us to index the individual input elements, and we restrict c to be a constant so that the word size does not grow arbitrarily. (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time—clearly an unrealistic scenario.)

I like to think of the memory requirement as the number of integers required, not that the integers have to get larger. If we considered the size of the inters then traditional O(log n) memory algorithms (like the average case of quicksort) would be determined to take O((log n)^2) memory.

tbingmann · on March 15, 2014

In a nutshell: to keep an index into an array of size n requires log_2(n) bits. Thus any algorithm, which keeps even just one index into the input takes Omega(log n) space. Yay for pedantic asymptotics.

detrino · on March 15, 2014

Heap sort is an example of an in-place sorting algorithm that is constant space. Just because this implementation uses some kind of fixed size buffer, doesn't mean it has an upper limit for the amount of data it can sort. It is possible, but probably requires a deeper understanding of the algorithm to know for sure.

_delirium · on March 15, 2014

Maybe I'm missing some way of implementing heapsort without array indices, but all the implementations I've seen use O(log n) space. For example, the pseudocode on Wikipedia (https://en.wikipedia.org/wiki/Heapsort#Pseudocode) at a quick glance uses five temporary variables: 'count', 'end', 'start', 'child', and 'root'. How much temporary space is this? Well, proportional to the log of the size of the array, at a minimum. If you're sorting 256-element arrays, these can each be 8-bit integers, and you need 5 bytes of temporary space. If you're sorting a 2^32 element array, these need to each be at least 32-bit integers, and you need 20 bytes of temporary space. If you have a 2^64 element array, they need to be 64-bit integers, and you need 40 bytes of temporary space. If you have a 2^128 element array, they need to be 128-bit integers, and you need 80 bytes of temporary space. Therefore, temporary space needed grows with the log of the array size. Which is very slow growth, but still growth!

The two things you can do are: 1) pick a fixed-size int and cap the max size of array you can handle; or 2) use a bigint datatype, and the size of your bigints will (asymptotically) grow with log(n).

tomp · on March 16, 2014

But this is a really irrelevant point, as you can't even represent an array of size n without log n space (for the start pointer and the size/end pointer). So, every algorithm involving arrays needs at least log n space, if nothing else for the input arguments.

_delirium · on March 16, 2014

It's irrelevant in practice, mostly because log(n) is very small for essentially all values of n, so one could consider it "close enough to constant". Close enough in the sense that a reasonably small constant (like "64") bounds any value of it you could possibly encounter in a sorting algorithm. But that's true of many things in big-O analysis.

If your temporary space was used by some indices represented as unary-encoded integers (which take O(n) space), it'd suddenly be harder to ignore the indices, because O(n) is not "very small" for many commonly encountered sizes of n. So I don't think taking the temporary space used by indexing into account is conceptually wrong, it just happens not to matter here because log(n) factors often don't matter.

makomk · on March 16, 2014

In practice, though, it makes sense in this case to assume integers representing array indices are constant-space and arithmetic on them takes constant time. The computers we're running these algorithms on use base 2 and the size of integer they can operate on in a single operation has scaled nicely with the amount of storage they have access to.

detrino · on March 15, 2014

I understand your point now, I thought you were mainly referring to the stack space quick sort uses.

Edit: A comment above claims that without this bound, quicksort isn't even log n space.

cbhl · on March 16, 2014

Granted, about twenty years passed before anyone had an array of >2^31 elements in Java and noticed this bug in binary search:

http://googleresearch.blogspot.ca/2006/06/extra-extra-read-a...

If people are using abstract data types and/or built-ins, in practice, presumably they can use a fixed int type for O(1) runtime and just update the data type every 20 years or so.

On that note, has anyone actually created a 2^128 element array yet? I suspect such an array would be too large to represent in the memory of all the world's computers at the moment.

sorbits · on March 16, 2014

Big-Oh notation deals with theory, and in theory the cost of a variable is fixed.

Furthermore, an array of size N is normally storing N variables (pointers) not N bits, so calculating storage required in bits relative to the input size (without a unit) is disingenuous.

jmpeax · on March 16, 2014

So, is there such a thing as a useful O(1)-space algorithm? I can't think of a single one.

repsilat · on March 16, 2014

(Particular, individual) regular expressions can be implemented with O(1) space. Regular expressions can do lots of useful things.

jmpeax · on March 27, 2014

This is incorrect, under deliriums pedantic interpretation of space complexity. Some regular expressions require one to accept a certain number of examples of a character. This means that storage space for the count is required, which scales logarithmically in the count.

repsilat · on April 3, 2014

This is why my comment began with the words "particular, individual". A regex like (1{5}0{3})\* can be implemented in constant space, but a language like

    matches(n, m, string):
        return matches((1{n}0{m})\*, string)

cannot.

Just think about it -- the former is just a DFA, and so of course you can do it in constant space (provided your input stream is abstracted away, or you use a TM.

JackpotDen · on March 17, 2014

int x = 0;

x++;

would be two examples.

jmpeax · on March 27, 2014

How many times would you need to execute line 2? The more times you need to do this, the more values x needs to be able to represent. So running x++ in O(n) time means the space needed to store the accumulation grows as O(n log_2 n)... under delirium's "pedantic" interpretation of space complexity.

Unknowingly you have allowed me to stumble onto an actual algorithm with O(1) storage pace, that is, one where there are n numbers, and you want to calculated the modulo-k sum. In this case, the algorithm scales logarithmically in the constant parameter k, but O(1) in n.

beagle3 · on March 15, 2014

Also, interesting discussion on reddit: http://www.reddit.com/r/programming/comments/20fxv8/making_a...

tbingmann · on March 15, 2014

I made a video of how the algorithm works: http://youtu.be/NjcSyD7p660

To make it I needed a C++ version with iterators, which I though would be faster. But it is still about 20% slower than stable_sort for the default random input test. It probably also stays the same for other inputs.

beagle3 · on March 15, 2014

Previous innovation that I remember in sorting was Python's TimSort - it's just MergeSort with a few tweaks, but it's better than any other sort I've met when applied to real world data.

masklinn · on March 15, 2014

> it's just MergeSort with a few tweaks

"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of insertion and merge sort (it's an insertion sort below 64 elements, and it uses insertion sorts to create sorted sub-sections of 32~64 elements before applying the main merge sort)

beagle3 · on March 15, 2014

Yes, it is a bit of understatement. Other than the insertion at smaller sizes, it adds:

- scans array to find merge-able runs (rather than use a "standard" size like more merge sorts); This makes it closer to O(n) for mostly-sorted arrays, a feature that is mostly associated with Bubble Sorts - but without giving up any of the good things about MergeSort

- identifies "reverse runs", and just reverses them - making mostly-reverse-sorted arrays closer to O(n), which no other general sort achieves.

It's still O(n log n) in the worst case - but it just works exceptionally well on real life datasets, which often have sorted or reversed sections.

TwoBit · on March 15, 2014

Our benchmarks consistently show Tim sort as the fastest -stable- sort, But intro sort consistently beats it.

beagle3 · on March 15, 2014

Which benchmarks would that be?

TimSort as implemented in Python goes through the Python machinery of object comparison and object management in general. Make sure you do an apples<->apples comparison when benchmarking.

stuhood · on March 16, 2014

Timsort is in JDK7, btw.

http://grepcode.com/file/repository.grepcode.com/java/root/j...

taylorbuley · on March 15, 2014

I'm sure people will loathe me for saying this, but I'd really like to see the implemented into JavaScript.

We've got Crossfilter (https://github.com/square/crossfilter/wiki/API-Reference); however, as more data moves client-side with storage APIs like IndexedDB, I see a need for "as efficient as possible"

phillmv · on March 15, 2014

1. Skimming the paper, it only matters if you need an efficient stable sort. If you just need O(1) memory you can stay with heapsort, which at least will have more reference implementations.

This lead me to look up browser sort implementations; http://stackoverflow.com/questions/234683/javascript-array-s... - it seems Moz uses mergesort and Webkit may or may not do something silly for non contiguous arrays.

So, there could be a use for it. For most applications you're about fine as it is.

2. I'm interested in hearing about applications where you're loading millions of array elements in people's browsers.

I was going to be cranky and make rude comments but I can envision people wanting to play with their data without loading it in specialized toolsets/learn R/build a DSL in $lang_of_choice.

iandanforth · on March 15, 2014

I do some ML in browser for ease of visualization/portability. Don't often need to sort all the connection weights, but hey you asked :)

taylorbuley · on March 17, 2014

Here's my work-in-progress visualization with a dataset of 1 million+ IMDB entries to be stored in IDB http://dashdb.com/#/

It purposefully pushes IDB way further than it should be taken in most cases.

hltbra · on March 15, 2014

Crossfilter's link is broken (parenthesis and semicolon were included).

dignati · on March 15, 2014

This has a really nice documentation, should make it easy to implement.

danbruc · on March 15, 2014

Despite the good documentation this algorithm is complex and I bet most implementations will be incorrect.

UPDATE: I could not find the article I had initially in mind but I found this one [1] showing that even prominent implementations of simple algorithms like binary search or quicksort contain bugs more often than one expects and they may even remain unnoticed for decades.

So take this as a warning - if you implement this algorithm you will almost surely fail no matter how smart you are or how many people look at your implementation.

[1] http://googleresearch.blogspot.de/2006/06/extra-extra-read-a...

datawander · on March 15, 2014

There is also a complete lack of tests. It'd be nice if there was a library of tests for all sorting algorithms. I'm sure there is, but something more widely accepted and well known.

TheLoneWolfling · on March 15, 2014

Is there a way that better minds than me can see to parallelize this algorithm?

stokedmartin · on March 15, 2014

The merge step[0] can be parallelized -

- take the two sorted arrays A & B (assume both are of size n) and make partitions of size log n in one of them, let's say A

- Now considering there are n/logn processors, assign each partition to a processor. On each partition take the last element (l) and do a binary search to find a cut point in the other array B such that all elements in B are <= l. Cut points of two such partitions in A correspond to a partition in B which can then be sequentially merged by the processor.

Span is O(log n); Work is O(n); so parallelism is O(n/logn) Detailed information here [1]

[0] https://github.com/BonzaiThePenguin/WikiSort/blob/master/Cha...

[1] http://electures.informatik.uni-freiburg.de/portal/download/...

sesqu · on March 15, 2014

I've been meaning to compare various in-place mergesorts, so this'll definitely make it into my bookmarks.

jokoon · on March 15, 2014

why is a sort function so important ?

I mean what sorts of big data sets are you sorting that much often ?

rguldener · on March 15, 2014

Believe it or not, sorting is one of the most common operations software performs. As a simple example think of how many times somebody queries something like `SELECT (...) FROM huge_table WHERE (...) ORDER BY (...)` Obviously the order by means the data needs to be (at least partially) sorted before it can be returned. To be fair that is a different case algorithmically since DB's are almost never able sort entirely in memory. But there are plenty of other examples where in memory sorting is necessary or provides advantages for later computation steps (eg. ability to cut of elements larger than a certain threshold).

jokoon · on March 15, 2014

yeah but it's already implemented into db software, why would devs reinvent the wheel ?

thaumasiotes · on March 16, 2014

Because "do it once, never improve again" is a bizarre philosophy?

jokoon · on March 16, 2014

I think most db software are already quite well optimized.

I mean unless you're a db software dev, and unless you're profiling it for each use case I wonder if you can really find something to optimize.

I just meant that's it's a niche. I honestly got no idea how db software are programmed but I doubt any dev can pretend to do better.

I guess that algorithm would interest people who recompile their db software, or who don't use those db software.

So here comes the question : what are the pro cons of using a db software ? Why would some devs still use plain files to store data ?

misterjangles · on March 16, 2014

I think it's valid to ask questions like this. We get advice all the time warning us not to try to invent our own algorithms for certain things. Obviously if we all did that, though, we would make no progress as programmers.

I'm not really an expert but it looks like this algorithm does sorting in a way that doesn't require as much extra memory as others..? I could be wrong about that, but the point is that this algorithm likely has some certain situations where it performs better than others.

malkia · on March 15, 2014

for 3D graphics, you need to sort objects (meshes) by Z so that you end up with semi-accurate blending (Z-Buffer does not help there).

jsonified · on March 15, 2014

nice to see even in sort() we have innovation!

chrismonsanto · on March 15, 2014

If you're interested in alternative sort algorithms, you might enjoy the self-improving sort [1]. A simplified tl;dr: given inputs drawn from a particular distribution + a training phase, the result is a sort that is optimal for that particular distribution. The complexity is in terms of the entropy of the distribution, and can beat the typical worst case O(n log n) for comparison sorts.

[1]: http://www.cs.princeton.edu/~chazelle/pubs/selfimprove.pdf

bmvakili · on March 15, 2014

I got widely different results there: C - 105.868545% C++ - 80.0518% Java - 61.664313124608775% Probably the optimizations there; can't easily be done in Java one; interesting nonetheless, thanks for post.

pjscott · on March 15, 2014

Those benchmark ratios are not comparable across languages.

The C code compares running time with a very standard mergesort.

The C++ code compares with std::stable_sort.

The Java code compares with a standard mergesort -- very similar to the code in the C version -- but has hard-to-predict JIT warmup effects.

michaelmior · on March 15, 2014

What are your result numbers supposed to mean?

apples2apples · on March 15, 2014

How does it compare to an O(1) space version of quicksort?

astrada · on March 15, 2014

quicksort is not a stable sort (http://en.wikipedia.org/wiki/Stable_sort#Stability)

beagle3 · on March 15, 2014

Quicksort is worst case O(n^2) time, unless you incorporate something like Quickselect for your pivot (which no one ever does, because it makes it relatively complicated. Have you ever seen an O(n log n) guaranteed quicksort implemented? I haven't - best I've seen is median-of-3 or median-of-5 pivots - or randomized). Furthermore, I've never seen an O(1) space version of quicksort and I'm not sure one can exist -- see, e.g. http://stackoverflow.com/questions/11455242/is-it-possible-t...

The meaningful comparison would actually be to Heapsort, which is in-place, O(1) space, and NOT stable - though much, much, simpler.

ADDED:

Anyone who uses quicksort should read this gem from Doug McIlroy, which elicits an O(n^2) behaviour from most quicksort implementations: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf -

klmr · on March 15, 2014

Many/most widely used “quicksort” implementations are actually introsort (in particular, `std::sort` is), and thus O(nlogn) worst case.

anonymoushn · on March 15, 2014

There is no O(1) space version of quicksort.

zenciadam · on March 15, 2014

It's the quarterly post from the guy who just reinvented radix sort.