
Quadsort: a stable non-recursive merge sort - geophertz
https://github.com/scandum/quadsort
======
nightcracker
C's qsort() is notoriously bad because the comparison function isn't inlined,
meaning the majority of time is spent in call overhead.

Zhangxp1998 ported quadsort to C++ and compared with std::sort and found it to
be slower:
[https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0d...](https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0dbf2d862a)

Shameless self promotion, if you want a faster sorting implementation, check
out pdqsort
([https://github.com/orlp/pdqsort](https://github.com/orlp/pdqsort)) which is
almost twice as fast as libstdc++'s std::sort for randomly shuffled integers,
and has all sorts of tricks for input distributions with patterns (like pre-
sorted, reverse sorted, single out-of-order element appended, etc).

~~~
vhvjkyhkogvv
Comparing a stable sort to a normal sort is unfair. Compare it with
std::stable_sort.

~~~
zhangxp1998
I agree is unfair. The point of my benchmark was that the OP's claim "quad
sort is faster than quicksort" is false.

~~~
abjKT26nO8
libstdc++'s std::sort doesn't implement quicksort per se. It implements
introsort[1]. I'm curious how a pure C implementation of introsort would fare
against std::sort.

[1]:
<[https://en.wikipedia.org/wiki/Introsort>](https://en.wikipedia.org/wiki/Introsort>)

~~~
zhangxp1998
It's just quick sort with insertion sort for small base cases.

~~~
abjKT26nO8
You forgot about heapsort. It's a combination of three sorting algorithms, not
two. However trivial the difference may seem, I'd still prefer to look at a "C
introsort vs C++ introsort" benchmark than a "C quicksort vs C++ often
quicksort, but not really" benchmark.

------
nneonneo
How does this fare against Python's famous Timsort (used by several languages
and systems)? How about the dual-pivot quicksort used by Java for primitive
arrays?

Someone has to have put together a nice benchmark for comparing many sorting
algorithms. I wish that the author had done some benchmarking first, so that
the proposed algorithm can properly be positioned w.r.t. state-of-the-art
techniques.

~~~
monadic2
I’d like to see this too but benchmarking sort algorithms is a pain in the ass
due to the wide array of sorting shapes and sizes, I wouldn’t expect a well
maintained benchmark suite for language X.

------
jph
Summary: this is a non-recursive merge sort with improvements.

Benchmark of quadsort() versus C qsort():

* ~10x faster on forward-order items

* ~2x faster on reverse-order items

* ~equivalent on random-order items

Improvements:

* Ordering: when blocks of items are in order, or in reverse-order, then do special case handling, which gives quadsort O(n + log n) instead of qsort O(n * log n).

* Boundaries: compare data rather than traditional merge sort wasteful boundary checks.

~~~
chalst
Good summary. Quibble: O(n + log n) = O(n).

~~~
kannmig
I think OP made the distinction intentional to “show their working”, so to
speak.

------
xucheng
Quick sort is not the fastest sorting algorithm. It would be nice if there is
a benchmark comparing with other state of the art algorithms like Timsort.

~~~
nwellnhof
It depends on the implementation but Quicksort typically beats Timsort with
random data. Quicksort is unstable though, so that's not a fair comparison.

------
proc0
Interesting, but I'm surprised if this is the first time we have sorting
algorithm that is swapping more than two elements at a time. I would have
guessed every possible iteration of sorting algorithms has been already
explored, proven and tested.

~~~
thaumaturgy
It isn't. Sorting networks
([https://en.wikipedia.org/wiki/Sorting_network](https://en.wikipedia.org/wiki/Sorting_network))
provide the best possible in-place sorting combinations for N elements, where
N is currently < about 11. If somebody could find a way to generalize the
sorting network algorithm for any number of input elements, they'd be settling
the issue of sorting things pretty much forever.

At lazy glance, it looks kind of like Quadsort has re-derived a variation of
the 4-element sorting network.

~~~
csense
> If somebody could find a way to generalize the sorting network algorithm for
> any number of input elements

In section 2 in that Wikipedia article, it links to many constructions that do
exactly that.

A couple are O(n log(n)), but complicated and impractical. The algorithms
people use are O(n log^2(n)) ones.

FWIW, sorting networks are _data-independent_. I.e. for any input, you always
compare-and-swap the same elements.

~~~
thaumaturgy
I was imprecise. I was talking about the construction of optimal sorting
networks. You're correct to point out that there are several approaches now
for constructing sorting networks for arbitrary numbers of inputs, but as
noted in that section, they all have some very important tradeoffs. I said,
"...provide the _best_ possible in-place sorting combinations for N elements,
where N is currently < about 11"; the 11 there was a reference to this part
from the second section of that article:

"For one to eleven inputs, minimal (i.e. size-optimal) sorting networks are
known, and for higher values, lower bounds on their sizes S(n) can be derived
inductively using a lemma due to Van Voorhis: S(n + 1) ≥ S(n) + ⌈log2(n)⌉. The
first ten optimal networks have been known since 1969, with the first eight
again being known as optimal since the work of Floyd and Knuth, but optimality
of the cases n = 9 and n = 10 took until 2014 to be resolved.[11] An optimal
network for size 11 was found in December of 2019 by Jannis Harder..."

Clearly I also should've said < ~12.

~~~
csense
The point I was trying to make is that sorting networks restrict your
algorithm to a data-independent memory access pattern.

For some particular N, you might be able to find the smallest sorting network
that sorts N elements. And then you might be able to find an _even faster_
algorithm that sorts N elements using some strategy that's not equivalent to a
sorting network (e.g., if your strategy does data-dependent memory access).

------
kccqzy
This reminds me of a programming exercise I was asked to write when I first
learned programming: write a sorting program generator that given N, generates
a program that sorts an array of N elements optimally: the generated code has
N! branches, one for each possible permutation. With some CSE help from the
compiler, it can be really quite fast at the expense of code size.

The author's explanation isn't entirely clear, but it seems similar to the
above construction with a fixed N and then a merge sort afterwards.

~~~
praptak
A truly optimal sorting for a given N is a nontrivial problem. By truly
optimal I mean the actual absolute minimum in the number of comparisons, no
O(...) approximation.

For five elements the lower bound from counting the permutations is
ceil(log2(5!)) which says you cannot sort 5 elements in less than 7
comparisons.

An actual 7 comparison algorithm exists but it is not very easy to write it.
For greater numbers it gets much much trickier - in general the
log(#permutations) lower bound cannot be met.

~~~
kccqzy
Agreed. My use of the word "optimal" in the original comment was a bit
careless.

~~~
praptak
Hope my comment did not got across as a disagreement. Just wanted to add some
relevant detail.

------
xeeeeeeeeeeenu
In his benchmark, the author is assuming that qsort() is implemented using
quicksort, but that's not necessarily true. For example, glibc is using
mergesort (although it falls back to quicksort if the system is short on
memory).

~~~
hinkley
I can’t imagine the sort of bugs you get when your code relies on stable sort
but calls qsort and everything works great until the machine is under heavy
load.

~~~
Sean1708
The most annoying part of development, your users can and _will_ rely on any
observable behaviours of your software.

~~~
hinkley
Someone recently did something very stupid/clever with an API that I wrote.

In the code review I initially complained, but then I couldn’t think of any
other way to interpret the design. So I guess that’s a feature now. ‘Course
later we found a performance problem, but, you know…

------
zhangxp1998
qsort has to invoke your comparison function repeatedly, which incurs a lot of
overhead. Try C++'s std::sort

~~~
zhangxp1998
See
[https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0d...](https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0dbf2d862a)
for a comparison of quadsort with C++'s std::sort. The compare functions are
inlined.

~~~
thedance
Thanks for saving us the time. The punch line:

    
    
      Summarize: Slower than std::sort except on random tail.
    

Also a very important tidbit that std::sort is 10x faster than c's qsort for
ordered inputs.

------
Naac
Since we're already using O(N) space, it would be interesting to see how this
compares to Radix sort[0], which is O(N) space but O(N) time ( due to just
hashing everything ).

Like others have said, it would be cool to see quadsort stacked up to other
current state-of-the-art sorting algorithms.

[0]
[https://en.wikipedia.org/wiki/Radix_sort](https://en.wikipedia.org/wiki/Radix_sort)

~~~
stkdump
Small nitpick: radix sort has nothing to do with hashing.

~~~
darawk
A radix sort is a type of hashing, no? You're bucketing the items based on a
reduced form projection of them onto some smaller subspace.

~~~
stkdump
Radix sort (similar to bucket sort) groups items based on individual digits of
the non-hashed values. If you hash the values before, you will end up with the
data being sorted according to their hash, but they will appear almost random
in their unhashed form.

~~~
dahart
A hash function is any function that maps arbitrary data to fixed-size values.
A radix is a type of hash. Hashes are not defined as random or required to
sort differently than the unhashed values. If you define a hash function that
returns the first 32 bits of it’s input, then you have a hash that sorts
almost the same as the unhashed values, as long as the first 32 bits are
changing frequently, and you also have a hash function that you can call a
radix.

~~~
stkdump
I have never heard about hash function in the context of radix sort or
anything similar as you describe. Wikipedia says about hash functions, that
they "[S]cramble the bits of the key so that the resulting values are
uniformly distributed over the key space". I would say that isn't the case for
the function in radix sort that is used to 'pick' a digit.

~~~
dahart
Well, good you were here, now you have heard about radixes as hashes. ;) It's
good to see and understand the connections and relationships between these
things.

You're right that a radix doesn't scramble the key, but the quote you've
picked is a qualified subset of hash functions. That paragraph is attempting
to define a practical/good hash function that is used in specific ways. Not
all hash functions scramble the bits, and the Wikipedia article is very clear
about this if you read the whole thing.

You skipped over two important sentences that came before it, and a whole sub-
section on radix hashes after it:

"A hash function is _any_ function that can be used to map data of arbitrary
size to fixed-size values." (very first sentence, emphasis mine.)

"In some cases, the key is the datum itself." (Right before the 'scramble'
quote)

[https://en.wikipedia.org/wiki/Hash_function#Radix_conversion...](https://en.wikipedia.org/wiki/Hash_function#Radix_conversion_hashing)

String hashing is sometimes similar to radix as well: "Simplistic hash
functions may add the first and last n characters of a string along with the
length" and I've seen string hashes in production that do only the first n
characters and stop. That kind of hashing is frequently useful in small,
embedded systems, video games, etc. where you have a limited set of strings
and a good idea of how well distributed the keys are.

------
QuinnWilton
I've never seen a sorting algorithm that uses a non-binary comparison function
to order values. Is that a novel technique?

It seems really obvious in hindsight, so I'm sure there's just prior art I
don't know about.

~~~
klodolph
This is still binary comparison.

There are, in general, a number of different sorting algorithms which are
optimized for a specific number of elements. In this case, four. It uses five
binary comparisons to sort four elements.

You can find other algorithms like this, such as an algorithm that uses seven
comparisons to sort five items, or one that uses ten comparisons to sort six
items.

~~~
QuinnWilton
Ah, I completely misunderstood what was going on with the quad swap at the
start. Rereading it makes more sense. Thanks!

------
loeg
Seems it's mergesort but with a slightly more complicated comparison
primitive.

~~~
proc0
Or it's like mergesort without the wasted steps that are proportionally less
needed as data becomes less random.

------
chkas
This Quicksort is almost twice as fast

[https://rextester.com/XHCGA23293](https://rextester.com/XHCGA23293)

~~~
pvidler
Only in the random case. Already sorted in either direction and it's ~10x
slower.

~~~
chkas
Who wants to sort sorted data. If the input data is more often sorted, you can
test this before sorting.

~~~
simias
Sorting sorted or mostly-sorted arrays is not uncommon in many use cases.

~~~
chkas
"Mostly-sorted" is a very vague definition.

~~~
seanhunter
Usually what people mean by mostly sorted in CS is that there is some small K
such that each element in the input is no more than K places from the position
it would be in if the input was sorted.

~~~
chkas
According to this definition, the "random tail" test data is not "mostly
sorted".

~~~
seanhunter
Well you could extend the definition to allow a small number of items which
are entirely out of place. The point is that the right sort algorithm depends
a lot on tthe distribution of input data and how much you care about worst-
case vs average case trade offs.

------
thdespou
I'm a bit sceptical because I don't see any mathematical proof. Only
benchmarks which do not prove a lot. It may be faster only by a constant
factor on a particular machine but for sufficiently large n it would be as
fast as mergesort.

1000000 is peanuts. We need to see convergence for about billions+ of numbers.

~~~
mcherm
No one is claiming that this sort algorithm (or any other) is asymptotically
faster than O(n log(n)). It can be mathematically proved that no sort
algorithm can improve on this. The object with sort algorithms is to find one
with good constant-factor performance on typical inputs.

Frankly, I am more persuaded by the arguments in favor of an algorithm like
"Tim-sort", which doesn't claim to micro-optimize hardware more efficiently
(that's basically what "better swapping" is claiming), but instead claims that
the algorithm is particularly efficient on some commonly-seen patterns (like
"partially-sorted", or "pieces of the list are reverse-sorted"). Of course,
any kind of argument about why one sort is better than another are inferior to
actual benchmarks.

~~~
amcsi
I'm quite a noob, and this is a bit off-topic, but if it's mathematically
proven that no sorting algorithm can be faster then O(n*log(n)), but we know
for sure that checking if an array is sorted is O(n), then doesn't that prove
that P ≠ NP?

~~~
C4stor
No it doesn't, since O(n*log(n)) is in P too. P = NP doesn't imply that the
complexity of finding the solution has the exact same complexity than
verifying it, just that both are polynomial.

------
aliabd
Man, that's a good gif

~~~
jerf
You may enjoy this:
[https://www.youtube.com/watch?v=kPRA0W1kECg](https://www.youtube.com/watch?v=kPRA0W1kECg)

There are several similar videos on YouTube demonstrating sorts. If that's a
bit sterile for you, you can get a more human touch via the playlist
[https://www.youtube.com/watch?v=EdIKIf9mHk0&list=PLOmdoKois7...](https://www.youtube.com/watch?v=EdIKIf9mHk0&list=PLOmdoKois7_FK-
ySGwHBkltzB11snW7KQ)

~~~
rthille
In the 1990's, NeXT had a demo of their threading capabilities, and you could
select a bunch of different sorts to run in parallel. Doesn't seem to be much
still "live" on the net about it though:
[https://www.google.com/search?q=%22SortingInAction%22](https://www.google.com/search?q=%22SortingInAction%22)

------
WillyNourson
Oh lord it's almost 1k long

~~~
msafadieh
quad_swap64 and quad_sort64 are only about 350 lines combined.

------
ummonk
I’ll read the article when I get the chance, but would this be in-place?

~~~
N3XT
From the article:

"These operations do require doubling the memory overhead for the swap space."

It seems it is not in-place.

~~~
metalliqaz
I'm pretty sure it's in-place, thus the need for "swap space" to hold the
values that are being moved.

~~~
cwzwarich
The term "in-place" almost always means O(1) additional space, whereas this
uses O(n) additional space.

~~~
gpderetta
technically quick sort needs O(log N) additional space and it is still
considered in-place. I guess the threshold for in-place-ness would be less
than linear additional space?

~~~
utopcell
quicksort is not stable. the disadvantage of classic mergesort is that it
needs to copy elements out of the input array, although there exist (slower)
in-place variations.

