
Why is quicksort better than other sorting algorithms in practice? (2013) - tambourine_man
https://cs.stackexchange.com/questions/3/why-is-quicksort-better-than-other-sorting-algorithms-in-practice
======
amelius
There are annual contests for sorting large amounts of data, e.g.:

[http://sortbenchmark.org/](http://sortbenchmark.org/)

(I could be wrong but I don't think the winners ever used Quicksort.)

~~~
SpicyLemonZest
Big data sorting is a very different problem. At the top level, a merge sort
is mandatory, because that's the only algorithm that can be effectively
parallelized across N nodes.

~~~
jrimbault
Weirdly, merge sort might the only sorting algorithm that I learned in school
I can actually write without thinking. It's very intuitive and simple, at
least to me ?

~~~
klyrs
Weird flex: I can implement bogosort in my sleep...

~~~
xvector
I implement cosmic-ray-bit-flip-bogo-sort in every program I write!

------
Aardwolf
There are too many different tradeoffs and dimensions to the problem to say
that one sorting algorithm is the "best". E.g. quicksort without fallback has
a denial of service attack against it.

~~~
matheusmoreira
Insertion sort is very interesting: as an online algorithm, it is capable of
producing partial results.

~~~
kadoban
I think you might mean selection sort, which picks out elements from the
original in sorted order one at a time.

Insertion sort has partial results early as well, but they probably wouldn't
be as useful, the only thing you could report early would be "here's the
sorted order of the first k elements in the list" (as opposed to the first k
elements in sorted order from the whole list, that selection sort gives you).

If you really did want online output like that, I think you'd want to look at
heapsort instead.

If you compare the worst-case you have O(n) time to prepare the heap (once at
the start) and then O(lg n) time to produce each single element of the answer.

Selection sort you'd require O(n) time to produce each element.

Selection sort doesn't win out in average or best case either. The best you
could really hope for is it to win on constant factors, but it'd be hard for
realistic inputs.

~~~
nwallin
I think the poster means you aren't given the completed set. Elements come
dribbling in a few at a time. With insertion sort, you're able to say, "this
is the sorted list at the time we know it right now." Obviously when you get
the next element, you don't have a sorted list anymore; you have a sorted list
and one extraneous element. But then you apply insertion sort iteration and
have a sorted list again.

~~~
kadoban
Ah, yes that's true, good point.

If that is the goal, BST sort might be a good idea as well (put elements into
a balanced binary search tree as you receive them). Not _quite_ the same
semantics (you don't have an array exactly at each step), but much faster
asymptotics and should be able to answer any queries you'd be doing anyway.

~~~
nwallin
Yes, exactly.

One thing to keep in mind is that it's surprisingly common for a sorted array
to outperform a binary search tree because of cache locality and pointer
indirection being slow, even though random inserts into arrays are slow.
Especially for small sets or when reads greatly outnumber writes. Boost has
flat_map and flat_set for this.

------
AstralStorm
Oh my. Comparison sorts are still slower than counting sorts on typical
architectures, esp. radix sort. (Coupled with insertion sort of small sizes.)

Or burstsort if the data size is truly huge.

As usual, StackOverflow is missing the forest for the trees.

~~~
RavlaAlvar
Can you elaborate on why bubble sort is fast on huge dataset?

Also, If counting based sorting algorithm is faster in practice , why we are
not seeing more of it in database system?

~~~
taneq
That was burst sort, not bubble sort.

(That said, bubblesort can be fast if the data are usually very close to
sorted - one example here is depth-sorting polygons for a rendering engine on
a highly constrained platform.)

~~~
aratakareigen
See also:
[https://en.m.wikipedia.org/wiki/Adaptive_sort](https://en.m.wikipedia.org/wiki/Adaptive_sort)

Another example application of adaptive sorts is the sweep-and-prune
broadphase collision detection algorithm that's somewhat commonly used in
physics engines.

------
burakcosk
if the input doesn't fit into memory than usually merge sort is better,
because it is sequential and we do not need to jump around

~~~
circlingthesun
I had an interesting experience sorting exam papers when I was a TA. I found
that quicksort used up too much desk/floor space. I settled on splitting the
papers into piles of 10 or so, applying insertion sort on each pile and then
pairwise merge sorting them until I had a sorted pile.

~~~
kadoban
Bucket sort can be pretty effective physically. An example would be separating
the papers by first initial if you're sorting by name (pick whatever matches
your sorting key), then sorting each bucket individually (pick whatever seems
appropriate, another layer of buckets or just insertion sort for instance).
When you've sorted each bucket, you just put the buckets in order, no real
merge work needed.

You can adapt what you're doing by how the buckets look (huge bucket? do
another layer of bucketing in there. Small bucket? Just sort it) and it's easy
to see progress and you can "discard" buckets as you go (put them in one
output pile as they're done).

~~~
klipt
In a way, quick sort, radix sort and merge sort are all just recursive
versions of bucket sort :-)

* Merge sort: sort within buckets, then between buckets

* Quick sort: sort between buckets, then within buckets.

* Radix sort: can be either depending on whether you sort by most or least significant radix first.

Humans can handle dividing into more than 2 at each stage though, so sorting
100 things with 10 piles of 10 tends to be easier than 7 binary divisions.

------
c3534l
I checked the Haskell implementation of sort a while ago. Apparently it used
to be Quicksort, which has a really elegant recursive implementation in
Haskell, but they found that mergesort was actually faster. I wonder if it has
anything to do with how a programming language implements lists, or if people
just assume QuickSort is the fastest because of threads like these and don't
bother trying out alternatives.

~~~
masklinn
> they found that mergesort was actually faster. I wonder if it has anything
> to do with how a programming language implements lists

It probably also has to do with using a non-in-place quicksort, which is where
it really shines. Even more so as you'd implement the low arities using an
insertion sort rather than a quicksort, which wouldn't work either for
immutable collections.

------
alecco
*for scar execution.

SIMD and GPGPU record-breaking parallel implementations use radix, bitonic,
etc.

~~~
hinkley
The thing that always bothers me about these sort tests is that once I got out
of college I've hardly ever sorted numbers, either because the data isn't
numeric or the API I'm calling does it for me.

I'm sorting text, or I'm sorting dates. Or I'm doing a compound sort, where
it's dates, then text, then numbers. It's rare to me to see a benchmark where
the compare() operation is accounted for. It is not difficult at all for the
complexity of the compare() operation to trend toward log(n), causing an
algorithm with a tighter upper bound on number of comparisons but high
constant overhead to perform better.

I presume we can vectorize string comparisons at this point? How would we
handle compound sorts with SIMD operations?

~~~
nwallin
Or if I am actually sorting stuff, I'm using the sort provided by the standard
library. I don't think I've ever implemented a sort outside of an academic
context. Used them plenty of times, but I couldn't imagine not using the
standard library.

Schools talk about sorting algorithms because it's instructive, not because
they're important problems. You can explain the problem in thirty seconds, and
even if you explain it badly, everyone will have an intuitive understanding of
the problem.

Then you can show students intuitive algorithms of insertion sort and
selection sort and they'll understand the algorithm.

Then you time it. Then you show them weird non intuitive algorithms like heap
sort or quicksort and show that they're a ton faster. Not a lot faster, but
like way way way faster.

Then your students understand they'll need to carefully consider their
algorithms and all the ramifications moving forward.

If you start a CS101 student with a hard algorithms problem they'll switch
majors. And that will be your fault for being a shitty teacher, not their
fault for being lazy or dumb.

------
lmilcin
Because it is available in almost every standard library?

In most cases developers don't care for the worst case because they either
have no idea what you are talking about or, if they actually understand the
problem, they try to ensure they don't need to sort arbitrary amount of user-
controllable data.

~~~
tyingq
Libc qsort() isn't quicksort in several of the popular libc implementations.
For example, it's (usually) a merge sort in glibc.
[http://calmerthanyouare.org/2013/05/31/qsort-
shootout.html](http://calmerthanyouare.org/2013/05/31/qsort-shootout.html)

~~~
ape4
Interesting. The man page just says "sort an array".
[https://linux.die.net/man/3/qsort](https://linux.die.net/man/3/qsort)

~~~
mark-r
Exactly. As long as you get the proper results within reasonable time and
memory constraints, why should you care exactly which algorithm is used?

~~~
marmada
Maybe it's interesting to know what qsort does under the hood? Maybe you don't
know what qsort's time and memory constraints are unless you know the
underlying algorithim.

Obviously the sorting algorithim you use matters, so this post has some
theoretical value.

~~~
mark-r
Certainly it matters, but I'd expect a library implementation to have more
research applied to it than I can do on my own.

