To be pedantic, in-place sorting algorithms (typically?) take O(log n) space, not O(1). They don't copy the data to be sorted, but they do need some temporary space that isn't fixed, but scales (slowly) with the size of the array being sorted. The usual sources of the log-n space growth come from either a stack recursing on things (explicitly or implicitly) and/or the array indices.
This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.
The data types in the RAM model are integer and floating point (for storing real numbers). Although we typically do not concern ourselves with precision in this book, in some applications precision is crucial. We also assume a limit on the size of each word of data. For example, when working with inputs of size n, we typically assume that integers are represented by c lg n bits for some constant c >= 1. We require c >= 1 so that each word can hold the value of n, enabling us to index the individual input elements, and we restrict c to be a constant so that the word size does not grow arbitrarily. (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time—clearly an unrealistic scenario.)
I like to think of the memory requirement as the number of integers required, not that the integers have to get larger. If we considered the size of the inters then traditional O(log n) memory algorithms (like the average case of quicksort) would be determined to take O((log n)^2) memory.
In a nutshell: to keep an index into an array of size n requires log_2(n) bits. Thus any algorithm, which keeps even just one index into the input takes Omega(log n) space. Yay for pedantic asymptotics.
Heap sort is an example of an in-place sorting algorithm that is constant space. Just because this implementation uses some kind of fixed size buffer, doesn't mean it has an upper limit for the amount of data it can sort. It is possible, but probably requires a deeper understanding of the algorithm to know for sure.
Maybe I'm missing some way of implementing heapsort without array indices, but all the implementations I've seen use O(log n) space. For example, the pseudocode on Wikipedia (https://en.wikipedia.org/wiki/Heapsort#Pseudocode) at a quick glance uses five temporary variables: 'count', 'end', 'start', 'child', and 'root'. How much temporary space is this? Well, proportional to the log of the size of the array, at a minimum. If you're sorting 256-element arrays, these can each be 8-bit integers, and you need 5 bytes of temporary space. If you're sorting a 2^32 element array, these need to each be at least 32-bit integers, and you need 20 bytes of temporary space. If you have a 2^64 element array, they need to be 64-bit integers, and you need 40 bytes of temporary space. If you have a 2^128 element array, they need to be 128-bit integers, and you need 80 bytes of temporary space. Therefore, temporary space needed grows with the log of the array size. Which is very slow growth, but still growth!
The two things you can do are: 1) pick a fixed-size int and cap the max size of array you can handle; or 2) use a bigint datatype, and the size of your bigints will (asymptotically) grow with log(n).
But this is a really irrelevant point, as you can't even represent an array of size n without log n space (for the start pointer and the size/end pointer). So, every algorithm involving arrays needs at least log n space, if nothing else for the input arguments.
It's irrelevant in practice, mostly because log(n) is very small for essentially all values of n, so one could consider it "close enough to constant". Close enough in the sense that a reasonably small constant (like "64") bounds any value of it you could possibly encounter in a sorting algorithm. But that's true of many things in big-O analysis.
If your temporary space was used by some indices represented as unary-encoded integers (which take O(n) space), it'd suddenly be harder to ignore the indices, because O(n) is not "very small" for many commonly encountered sizes of n. So I don't think taking the temporary space used by indexing into account is conceptually wrong, it just happens not to matter here because log(n) factors often don't matter.
In practice, though, it makes sense in this case to assume integers representing array indices are constant-space and arithmetic on them takes constant time. The computers we're running these algorithms on use base 2 and the size of integer they can operate on in a single operation has scaled nicely with the amount of storage they have access to.
If people are using abstract data types and/or built-ins, in practice, presumably they can use a fixed int type for O(1) runtime and just update the data type every 20 years or so.
On that note, has anyone actually created a 2^128 element array yet? I suspect such an array would be too large to represent in the memory of all the world's computers at the moment.
Big-Oh notation deals with theory, and in theory the cost of a variable is fixed.
Furthermore, an array of size N is normally storing N variables (pointers) not N bits, so calculating storage required in bits relative to the input size (without a unit) is disingenuous.
This is incorrect, under deliriums pedantic interpretation of space complexity. Some regular expressions require one to accept a certain number of examples of a character. This means that storage space for the count is required, which scales logarithmically in the count.
This is why my comment began with the words "particular, individual". A regex like (1{5}0{3})\* can be implemented in constant space, but a language like
matches(n, m, string):
return matches((1{n}0{m})\*, string)
cannot.
Just think about it -- the former is just a DFA, and so of course you can do it in constant space (provided your input stream is abstracted away, or you use a TM.
How many times would you need to execute line 2? The more times you need to do this, the more values x needs to be able to represent. So running x++ in O(n) time means the space needed to store the accumulation grows as O(n log_2 n)... under delirium's "pedantic" interpretation of space complexity.
Unknowingly you have allowed me to stumble onto an actual algorithm with O(1) storage pace, that is, one where there are n numbers, and you want to calculated the modulo-k sum. In this case, the algorithm scales logarithmically in the constant parameter k, but O(1) in n.
To make it I needed a C++ version with iterators, which I though would be faster. But it is still about 20% slower than stable_sort for the default random input test. It probably also stays the same for other inputs.
Previous innovation that I remember in sorting was Python's TimSort - it's just MergeSort with a few tweaks, but it's better than any other sort I've met when applied to real world data.
"a few tweaks" is a bit of an understatement, at a high-level it's a hybrid of insertion and merge sort (it's an insertion sort below 64 elements, and it uses insertion sorts to create sorted sub-sections of 32~64 elements before applying the main merge sort)
Yes, it is a bit of understatement. Other than the insertion at smaller sizes, it adds:
- scans array to find merge-able runs (rather than use a "standard" size like more merge sorts); This makes it closer to O(n) for mostly-sorted arrays, a feature that is mostly associated with Bubble Sorts - but without giving up any of the good things about MergeSort
- identifies "reverse runs", and just reverses them - making mostly-reverse-sorted arrays closer to O(n), which no other general sort achieves.
It's still O(n log n) in the worst case - but it just works exceptionally well on real life datasets, which often have sorted or reversed sections.
TimSort as implemented in Python goes through the Python machinery of object comparison and object management in general. Make sure you do an apples<->apples comparison when benchmarking.
1. Skimming the paper, it only matters if you need an efficient stable sort. If you just need O(1) memory you can stay with heapsort, which at least will have more reference implementations.
So, there could be a use for it. For most applications you're about fine as it is.
2. I'm interested in hearing about applications where you're loading millions of array elements in people's browsers.
I was going to be cranky and make rude comments but I can envision people wanting to play with their data without loading it in specialized toolsets/learn R/build a DSL in $lang_of_choice.
Despite the good documentation this algorithm is complex and I bet most implementations will be incorrect.
UPDATE: I could not find the article I had initially in mind but I found this one [1] showing that even prominent implementations of simple algorithms like binary search or quicksort contain bugs more often than one expects and they may even remain unnoticed for decades.
So take this as a warning - if you implement this algorithm you will almost surely fail no matter how smart you are or how many people look at your implementation.
There is also a complete lack of tests. It'd be nice if there was a library of tests for all sorting algorithms. I'm sure there is, but something more widely accepted and well known.
- take the two sorted arrays A & B (assume both are of size n) and make partitions of size log n in one of them, let's say A
- Now considering there are n/logn processors, assign each partition to a processor. On each partition take the last element (l) and do a binary search to find a cut point in the other array B such that all elements in B are <= l. Cut points of two such partitions in A correspond to a partition in B which can then be sequentially merged by the processor.
Span is O(log n); Work is O(n); so parallelism is O(n/logn)
Detailed information here [1]
Believe it or not, sorting is one of the most common operations software performs. As a simple example think of how many times somebody queries something like `SELECT (...) FROM huge_table WHERE (...) ORDER BY (...)` Obviously the order by means the data needs to be (at least partially) sorted before it can be returned. To be fair that is a different case algorithmically since DB's are almost never able sort entirely in memory. But there are plenty of other examples where in memory sorting is necessary or provides advantages for later computation steps (eg. ability to cut of elements larger than a certain threshold).
I think it's valid to ask questions like this. We get advice all the time warning us not to try to invent our own algorithms for certain things. Obviously if we all did that, though, we would make no progress as programmers.
I'm not really an expert but it looks like this algorithm does sorting in a way that doesn't require as much extra memory as others..? I could be wrong about that, but the point is that this algorithm likely has some certain situations where it performs better than others.
If you're interested in alternative sort algorithms, you might enjoy the self-improving sort [1]. A simplified tl;dr: given inputs drawn from a particular distribution + a training phase, the result is a sort that is optimal for that particular distribution. The complexity is in terms of the entropy of the distribution, and can beat the typical worst case O(n log n) for comparison sorts.
I got widely different results there:
C - 105.868545%
C++ - 80.0518%
Java - 61.664313124608775%
Probably the optimizations there; can't easily be done in Java one; interesting nonetheless, thanks for post.
Quicksort is worst case O(n^2) time, unless you incorporate something like Quickselect for your pivot (which no one ever does, because it makes it relatively complicated. Have you ever seen an O(n log n) guaranteed quicksort implemented? I haven't - best I've seen is median-of-3 or median-of-5 pivots - or randomized). Furthermore, I've never seen an O(1) space version of quicksort and I'm not sure one can exist -- see, e.g. http://stackoverflow.com/questions/11455242/is-it-possible-t...
The meaningful comparison would actually be to Heapsort, which is in-place, O(1) space, and NOT stable - though much, much, simpler.
ADDED:
Anyone who uses quicksort should read this gem from Doug McIlroy, which elicits an O(n^2) behaviour from most quicksort implementations: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf -
This particular implementation (looking at the C version) uses fixed-size 'long ints' for its temporary data storage, which means it only works on arrays up to LONG_MAX elements. If you had larger arrays, your need for temporary data would grow, e.g. you could upgrade all those long ints to long long ints and accommodate arrays up to LLONG_MAX. Of course, logarithmic growth is very slow.