 Writing a Faster Sorting Algorithm 201 points by zefei on Jan 3, 2017 | hide | past | web | favorite | 33 comments Neat algorithm, but it is largely comparing apples to oranges. std::sort is universal, ska_sort is not:> Another problem is that I’m not sure what to do for data that I can’t sort. For example this algorithm can not sort a vector of std::sets> Another problem is that right now there can only be one sorting behavior per type. You have to provide me with a sort key, and if you provide me with an integer, I will sort your data in increasing order. If you wanted it in decreasing order, there is currently no easy interface to do that.Useful algorithm, but we already knew there are faster algorithms if we introduce additional requirements! With some generic programming, even complex data can be radix sorted in linear time. Google the Discriminator papers of Fritz Henglein.  Do you know of any resource which details these faster algorithms when given extra requirements? Radix sort (and its variant trie sort) are the biggest ones:https://en.wikipedia.org/wiki/Radix_sortThe O(n log n) lower bound is only for comparison based sorts. If you don't take a comparison function, but instead look at the internal structure, you can do better, but of course that depends on that internal structure and what order you want them in. This is a classic. In fact it's close to the first algorithm ever patented - SyncSort, patented in 1971. Still being developed and sold, SyncSort remains the leading technology for big sorts. When you need to sort a few terabytes, they have a product for that. It's a radix sort with self-adjusting bucket boundaries. So it can deal with keys that are not well distributed. Can't test the code as there are numerous compilation errors with my version of gcc and libc++. Besides what others have written, the author seems to have taken only one measurement for each case. And more importantly there is no mention of randomization at all. If he sorted all input sizes using algorithm A and then all input sizes using algorithm B (instead of doing it in random order) it is possible that there is bias in the measurements (due to cache, branch prediction, etc), specially if he used the same input, just varying the size. I'm also worried about what he used to measure time, as he mentions that for a small number of elements it takes a few microseconds (gettimeofday(3) has a resolution of 1 microsecond, so the measured time is dangerously close to the clock resolution - clock_gettime(3) should be use instead if available, having a resolution of 1ns - or the C++ equivalent for nanosecond precision). Making the algorithm code available is not enough, it is just as important (perhaps even more) to make the benchmark code available. The "tests and benchmarks" code available on Github is a .hpp unfit for reproduction. I want to see the raw data used in the plots and the code to generate it, and I want to be able to run it myself. Otherwise there's no reason to trust it. It might just be faster because the author is comparing their hybrid radix sort implementation to std::sort (which is comparison-based).Would be more valid to compare to Boost's "spreadsort" which is similarly a hybrid radix sort algorithm (and also outperforms std::sort). I had a feeling that O(N) is pretty impossible for the general use case. It is impossible for comparison-based sorts; sketch of proof: there are N! different arrangements. sorting is equivalent to figuring which one of those N! arrangements we are seeing. In the worst case, each binary comparison can reduce at most, 50% of the search space. Hence the worst case may need log_2(N!) comparisons, which is O(N log N)However, it might be possible if comparisons are not essential - e.g. radix sort is (with some assumptions) O(N) I think there's a (provable) mathematical impossibility to have a general O(N) sorting algorithmBasically, how many elements do you have to move in a list of N elements to end up with one of the possible orderings It's not impossible to have a O(N) sorting algorithm if you add more constraints, e.g. use operations other than comparisons.Most people would be fine with a hybrid radix sort for day to day use. You only ever have to move every element once to get a (pre-known) permutation that represents the sorted list.The problem is the information in a permutation (n! possible orderings) and one can only ever "throw away" half of them on every comparison, leading to log_2(n!) or nlog(n). Why doesn't the STL switch to the hybrid sorting approach? The STL sort uses operator<, so they can't implement it as radix sort without changing the interface. Maybe they could manage to special case it for some fundamental types, but most likely they'd need to add a new function, like for stable_sort. AFAIK the STL is free to use any algorithm they want as long as it's O(n log n). This is correct. The C++ standard specifies that sort must have a big-O complexity of n log(n). This can be seen on pdf page 925 of this working draft from 2014 . So providing a faster sort would be a violation?The standard doesn't seem to say "or better" here, but I know that in other places it does (or least used to say something similar). No, f = O(n log n) means that f doesn't grow significantly faster than n log n. That is true for f = n log n, but it's also true for f = n. The "or better" is implicit by using big-O.Note that this wouldn't be true if the standard said that it has to be Θ(n log n). Big O complexity is an upper bound.If something is O(1) it's also O(n) and O(n!), since it's an upper bound. STL does use hybrid sorting. It is usually implemented as an introsort, which begins with a quicksort and switches to heapsort based on the number of elements to be sorted. Sorry if that wasn't clear. I'm asking why don't they switch to the hybrid sort similar to Boost's, since it's known to be faster for most cases? They probably should. After all Boost is often a staging area for C++ standard library. "the" STL? STL means "standard template library." It's perfectly reasonable to say "the standard template library." I think the author just wanted to point out that "the" STL is not valid / accurate here; while the interface and behavior is defined, the implementation is afaik still vendor specific, e.g., the implementation in GCC may be different from the one provided by Microsoft. There are several compatible and competing implementations of the C++ standard library. Just to share since it is related, you can sort faster than a naive O(nlogn) algorithm whenever data is in the form of partitions where the partitions are sorted, but the contents of each partition are not. i.e. The largest element in each partition is smaller than the smallest in the next partition.There the total number of elements are n and the average size of a partition is m. Then you need only apply any O(log(m)) algorithm to each partition to sort everything. That multiplies the time complexity by n/m. Then substitute n = m^c and the sort becomes O(n/c*log(n)) time. That gives you a factor of c speed up on what would otherwise be an O(nlogn) operation.This trick is usable when you want to sort a B-Tree like structure where the data in each node is unsorted, but the nodes themselves are sorted. The file system hierarchy on a machine is like that. The default output of zfs list is also like that. Every sort algorithm should take locality of data-access into account. Without this, a comparison to other algorithms may not be fair. Why pick this one detail to account for, and not one of the infinity of other details that may be relevant in certain circumstances? > This is a trivial function which is less complicated than the comparison operator that you would have to write for std::sort.Unless it uses the same interface (aka call a comparator for each eval) it's apples to oranges. I also didn't see comparisons vs swaps enumerated, which matter for complex data structures or complex comparators. I'd be happy to see, how well it would behave compared to the famous Timsort (https://en.wikipedia.org/wiki/Timsort). Watching std:sort in sort sound- i can only wonder, why this second fine granular sorting pass, long after you pushed out the first sorting part out of cache? Search: