Hacker News new | past | comments | ask | show | jobs | submit login
WikiSort – Fast, stable, O(1) space merge sort algorithm (github.com/bonzaithepenguin)
165 points by beagle3 on March 15, 2014 | hide | past | favorite | 54 comments



To be pedantic, in-place sorting algorithms (typically?) take O(log n) space, not O(1). They don't copy the data to be sorted, but they do need some temporary space that isn't fixed, but scales (slowly) with the size of the array being sorted. The usual sources of the log-n space growth come from either a stack recursing on things (explicitly or implicitly) and/or the array indices.

This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.


Here is what CLRS says about this:

The data types in the RAM model are integer and floating point (for storing real numbers). Although we typically do not concern ourselves with precision in this book, in some applications precision is crucial. We also assume a limit on the size of each word of data. For example, when working with inputs of size n, we typically assume that integers are represented by c lg n bits for some constant c >= 1. We require c >= 1 so that each word can hold the value of n, enabling us to index the individual input elements, and we restrict c to be a constant so that the word size does not grow arbitrarily. (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time—clearly an unrealistic scenario.)

I like to think of the memory requirement as the number of integers required, not that the integers have to get larger. If we considered the size of the inters then traditional O(log n) memory algorithms (like the average case of quicksort) would be determined to take O((log n)^2) memory.


In a nutshell: to keep an index into an array of size n requires log_2(n) bits. Thus any algorithm, which keeps even just one index into the input takes Omega(log n) space. Yay for pedantic asymptotics.


Heap sort is an example of an in-place sorting algorithm that is constant space. Just because this implementation uses some kind of fixed size buffer, doesn't mean it has an upper limit for the amount of data it can sort. It is possible, but probably requires a deeper understanding of the algorithm to know for sure.


Maybe I'm missing some way of implementing heapsort without array indices, but all the implementations I've seen use O(log n) space. For example, the pseudocode on Wikipedia (https://en.wikipedia.org/wiki/Heapsort#Pseudocode) at a quick glance uses five temporary variables: 'count', 'end', 'start', 'child', and 'root'. How much temporary space is this? Well, proportional to the log of the size of the array, at a minimum. If you're sorting 256-element arrays, these can each be 8-bit integers, and you need 5 bytes of temporary space. If you're sorting a 2^32 element array, these need to each be at least 32-bit integers, and you need 20 bytes of temporary space. If you have a 2^64 element array, they need to be 64-bit integers, and you need 40 bytes of temporary space. If you have a 2^128 element array, they need to be 128-bit integers, and you need 80 bytes of temporary space. Therefore, temporary space needed grows with the log of the array size. Which is very slow growth, but still growth!

The two things you can do are: 1) pick a fixed-size int and cap the max size of array you can handle; or 2) use a bigint datatype, and the size of your bigints will (asymptotically) grow with log(n).


But this is a really irrelevant point, as you can't even represent an array of size n without log n space (for the start pointer and the size/end pointer). So, every algorithm involving arrays needs at least log n space, if nothing else for the input arguments.


It's irrelevant in practice, mostly because log(n) is very small for essentially all values of n, so one could consider it "close enough to constant". Close enough in the sense that a reasonably small constant (like "64") bounds any value of it you could possibly encounter in a sorting algorithm. But that's true of many things in big-O analysis.

If your temporary space was used by some indices represented as unary-encoded integers (which take O(n) space), it'd suddenly be harder to ignore the indices, because O(n) is not "very small" for many commonly encountered sizes of n. So I don't think taking the temporary space used by indexing into account is conceptually wrong, it just happens not to matter here because log(n) factors often don't matter.


In practice, though, it makes sense in this case to assume integers representing array indices are constant-space and arithmetic on them takes constant time. The computers we're running these algorithms on use base 2 and the size of integer they can operate on in a single operation has scaled nicely with the amount of storage they have access to.


I understand your point now, I thought you were mainly referring to the stack space quick sort uses.

Edit: A comment above claims that without this bound, quicksort isn't even log n space.


Granted, about twenty years passed before anyone had an array of >2^31 elements in Java and noticed this bug in binary search:

http://googleresearch.blogspot.ca/2006/06/extra-extra-read-a...

If people are using abstract data types and/or built-ins, in practice, presumably they can use a fixed int type for O(1) runtime and just update the data type every 20 years or so.

On that note, has anyone actually created a 2^128 element array yet? I suspect such an array would be too large to represent in the memory of all the world's computers at the moment.


Big-Oh notation deals with theory, and in theory the cost of a variable is fixed.

Furthermore, an array of size N is normally storing N variables (pointers) not N bits, so calculating storage required in bits relative to the input size (without a unit) is disingenuous.


So, is there such a thing as a useful O(1)-space algorithm? I can't think of a single one.


(Particular, individual) regular expressions can be implemented with O(1) space. Regular expressions can do lots of useful things.


This is incorrect, under deliriums pedantic interpretation of space complexity. Some regular expressions require one to accept a certain number of examples of a character. This means that storage space for the count is required, which scales logarithmically in the count.


This is why my comment began with the words "particular, individual". A regex like (1{5}0{3})\* can be implemented in constant space, but a language like

    matches(n, m, string):
        return matches((1{n}0{m})\*, string)
cannot.

Just think about it -- the former is just a DFA, and so of course you can do it in constant space (provided your input stream is abstracted away, or you use a TM.


int x = 0;

x++;

would be two examples.


How many times would you need to execute line 2? The more times you need to do this, the more values x needs to be able to represent. So running x++ in O(n) time means the space needed to store the accumulation grows as O(n log_2 n)... under delirium's "pedantic" interpretation of space complexity.

Unknowingly you have allowed me to stumble onto an actual algorithm with O(1) storage pace, that is, one where there are n numbers, and you want to calculated the modulo-k sum. In this case, the algorithm scales logarithmically in the constant parameter k, but O(1) in n.



I made a video of how the algorithm works: http://youtu.be/NjcSyD7p660

To make it I needed a C++ version with iterators, which I though would be faster. But it is still about 20% slower than stable_sort for the default random input test. It probably also stays the same for other inputs.


Previous innovation that I remember in sorting was Python's TimSort - it's just MergeSort with a few tweaks, but it's better than any other sort I've met when applied to real world data.


> it's just MergeSort with a few tweaks

"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of insertion and merge sort (it's an insertion sort below 64 elements, and it uses insertion sorts to create sorted sub-sections of 32~64 elements before applying the main merge sort)


Yes, it is a bit of understatement. Other than the insertion at smaller sizes, it adds:

- scans array to find merge-able runs (rather than use a "standard" size like more merge sorts); This makes it closer to O(n) for mostly-sorted arrays, a feature that is mostly associated with Bubble Sorts - but without giving up any of the good things about MergeSort

- identifies "reverse runs", and just reverses them - making mostly-reverse-sorted arrays closer to O(n), which no other general sort achieves.

It's still O(n log n) in the worst case - but it just works exceptionally well on real life datasets, which often have sorted or reversed sections.


Our benchmarks consistently show Tim sort as the fastest -stable- sort, But intro sort consistently beats it.


Which benchmarks would that be?

TimSort as implemented in Python goes through the Python machinery of object comparison and object management in general. Make sure you do an apples<->apples comparison when benchmarking.



I'm sure people will loathe me for saying this, but I'd really like to see the implemented into JavaScript.

We've got Crossfilter (https://github.com/square/crossfilter/wiki/API-Reference); however, as more data moves client-side with storage APIs like IndexedDB, I see a need for "as efficient as possible"


1. Skimming the paper, it only matters if you need an efficient stable sort. If you just need O(1) memory you can stay with heapsort, which at least will have more reference implementations.

This lead me to look up browser sort implementations; http://stackoverflow.com/questions/234683/javascript-array-s... - it seems Moz uses mergesort and Webkit may or may not do something silly for non contiguous arrays.

So, there could be a use for it. For most applications you're about fine as it is.

2. I'm interested in hearing about applications where you're loading millions of array elements in people's browsers.

I was going to be cranky and make rude comments but I can envision people wanting to play with their data without loading it in specialized toolsets/learn R/build a DSL in $lang_of_choice.


I do some ML in browser for ease of visualization/portability. Don't often need to sort all the connection weights, but hey you asked :)


Here's my work-in-progress visualization with a dataset of 1 million+ IMDB entries to be stored in IDB http://dashdb.com/#/

It purposefully pushes IDB way further than it should be taken in most cases.


Crossfilter's link is broken (parenthesis and semicolon were included).


This has a really nice documentation, should make it easy to implement.


Despite the good documentation this algorithm is complex and I bet most implementations will be incorrect.

UPDATE: I could not find the article I had initially in mind but I found this one [1] showing that even prominent implementations of simple algorithms like binary search or quicksort contain bugs more often than one expects and they may even remain unnoticed for decades.

So take this as a warning - if you implement this algorithm you will almost surely fail no matter how smart you are or how many people look at your implementation.

[1] http://googleresearch.blogspot.de/2006/06/extra-extra-read-a...


There is also a complete lack of tests. It'd be nice if there was a library of tests for all sorting algorithms. I'm sure there is, but something more widely accepted and well known.


Is there a way that better minds than me can see to parallelize this algorithm?


The merge step[0] can be parallelized -

- take the two sorted arrays A & B (assume both are of size n) and make partitions of size log n in one of them, let's say A

- Now considering there are n/logn processors, assign each partition to a processor. On each partition take the last element (l) and do a binary search to find a cut point in the other array B such that all elements in B are <= l. Cut points of two such partitions in A correspond to a partition in B which can then be sequentially merged by the processor.

Span is O(log n); Work is O(n); so parallelism is O(n/logn) Detailed information here [1]

[0] https://github.com/BonzaiThePenguin/WikiSort/blob/master/Cha...

[1] http://electures.informatik.uni-freiburg.de/portal/download/...


I've been meaning to compare various in-place mergesorts, so this'll definitely make it into my bookmarks.


why is a sort function so important ?

I mean what sorts of big data sets are you sorting that much often ?


Believe it or not, sorting is one of the most common operations software performs. As a simple example think of how many times somebody queries something like `SELECT (...) FROM huge_table WHERE (...) ORDER BY (...)` Obviously the order by means the data needs to be (at least partially) sorted before it can be returned. To be fair that is a different case algorithmically since DB's are almost never able sort entirely in memory. But there are plenty of other examples where in memory sorting is necessary or provides advantages for later computation steps (eg. ability to cut of elements larger than a certain threshold).


yeah but it's already implemented into db software, why would devs reinvent the wheel ?


Because "do it once, never improve again" is a bizarre philosophy?


I think most db software are already quite well optimized.

I mean unless you're a db software dev, and unless you're profiling it for each use case I wonder if you can really find something to optimize.

I just meant that's it's a niche. I honestly got no idea how db software are programmed but I doubt any dev can pretend to do better.

I guess that algorithm would interest people who recompile their db software, or who don't use those db software.

So here comes the question : what are the pro cons of using a db software ? Why would some devs still use plain files to store data ?


I think it's valid to ask questions like this. We get advice all the time warning us not to try to invent our own algorithms for certain things. Obviously if we all did that, though, we would make no progress as programmers.

I'm not really an expert but it looks like this algorithm does sorting in a way that doesn't require as much extra memory as others..? I could be wrong about that, but the point is that this algorithm likely has some certain situations where it performs better than others.


for 3D graphics, you need to sort objects (meshes) by Z so that you end up with semi-accurate blending (Z-Buffer does not help there).


nice to see even in sort() we have innovation!


If you're interested in alternative sort algorithms, you might enjoy the self-improving sort [1]. A simplified tl;dr: given inputs drawn from a particular distribution + a training phase, the result is a sort that is optimal for that particular distribution. The complexity is in terms of the entropy of the distribution, and can beat the typical worst case O(n log n) for comparison sorts.

[1]: http://www.cs.princeton.edu/~chazelle/pubs/selfimprove.pdf


I got widely different results there: C - 105.868545% C++ - 80.0518% Java - 61.664313124608775% Probably the optimizations there; can't easily be done in Java one; interesting nonetheless, thanks for post.


Those benchmark ratios are not comparable across languages.

The C code compares running time with a very standard mergesort.

The C++ code compares with std::stable_sort.

The Java code compares with a standard mergesort -- very similar to the code in the C version -- but has hard-to-predict JIT warmup effects.


What are your result numbers supposed to mean?


How does it compare to an O(1) space version of quicksort?



Quicksort is worst case O(n^2) time, unless you incorporate something like Quickselect for your pivot (which no one ever does, because it makes it relatively complicated. Have you ever seen an O(n log n) guaranteed quicksort implemented? I haven't - best I've seen is median-of-3 or median-of-5 pivots - or randomized). Furthermore, I've never seen an O(1) space version of quicksort and I'm not sure one can exist -- see, e.g. http://stackoverflow.com/questions/11455242/is-it-possible-t...

The meaningful comparison would actually be to Heapsort, which is in-place, O(1) space, and NOT stable - though much, much, simpler.

ADDED:

Anyone who uses quicksort should read this gem from Doug McIlroy, which elicits an O(n^2) behaviour from most quicksort implementations: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf -


Many/most widely used “quicksort” implementations are actually introsort (in particular, `std::sort` is), and thus O(nlogn) worst case.


There is no O(1) space version of quicksort.


It's the quarterly post from the guy who just reinvented radix sort.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: