
What’s really so bad about bubble sort? - nicknash
http://nicknash.me/2012/10/12/knuths-wisdom/
======
thaumaturgy
Since this is the third recent item on HN featuring the impact of branch
prediction on code performance, let's make sure we keep our heads on straight:
a good algorithm will outperform a bad algorithm that's tuned for branch
prediction.

Quicksort infamously abuses branch prediction, yet is still the standard
against which all other sorting algorithms are compared. Some work is being
done to improve Quicksort's branch prediction performance [1], but that's
mostly focusing on choosing better pivots.

A _perfect_ sorting algorithm, something like a very large sorting network,
will necessarily make any processor's branch predictor give up and go home --
and that'll still be faster to execute than a naive algorithm tuned for branch
prediction.

[1]:
[http://www.cs.auckland.ac.nz/~mcw/Teaching/refs/sorting/quic...](http://www.cs.auckland.ac.nz/~mcw/Teaching/refs/sorting/quicksort-
branch-prediction.pdf)

~~~
martincmartin
On the other hand, quicksort scans memory in-order. Random access to memory is
much, much worse than bad branch prediction. Using a heap in merge sort, for
example, screws with both branch prediction and memory access.

For sorting, if you're minimizing the number of comparisons, the result of
each compare should be random and independent of all others. So the only way
to have "good" branch prediction when sorting an array of random numbers is to
do lots of extra, superfluous compares, e.g. using an O(n^2) algorithm instead
of an O(n log n) one.

The base case for std::sort in glibc is insertion sort. It's used for less
than 6 elements. Nice and linear memory access patterns.

Martin

------
TheEskimo
Honestly, bubble sort isn't _that_ bad.

Couple years ago I had a fairly simple program which collected some data into
a linked list and displayed it at the end. It took about 2 seconds to load and
process the data from disk. I realized it might be neat to optionally sort the
output. Since I didn't feel like doing serious work, I just added a bubble
sort to my linked list which took a comparison function and then bubble-sorted
by pointer swapping.

Guess what time it added? less than a 10th of a second. Completely negligible.
It saved me a few hours (as writing bubble sort for a linked list takes maybe
5 minutes and has no chance of complicated bugs) and I doubt anyone will ever
notice any slowdown.

So what's the lesson? Little performance tweaks hardly matter in the average
program nowadays because everything's disk or network bound (IO bound). For
the average program it's not worth writing more complex code that will be
quicker (once you fix the hard-to-see bugs) until you actually know the
slowness is an issue.

~~~
enjo
Well sure, but who writes their own sorting algorithms these days anyways?
It's important to understand the characteristics of various sorting algorithms
so you can choose the best one, but that "best one" is almost always going to
be implemented by some standard library.

It makes sense to use the most optimal version out of the gate in these cases.

~~~
dazbradbury

        It's important to understand the characteristics of various sorting algorithms so you can choose the best one
    

I disagree - most languages offer a sort function which is nealry optimal in
the average case. You very rarely choose unless you are doing something where
the sort performance is extremely important.

However, uunderstanding various sort algorithms and their intricacies is a
great way to demonstrate fundamental algorithm design as well as some data
structures. This is why it's important - so when you do write code, you come
with a huge amount of insight into how to go about it.

------
dazbradbury
Excellent post complimenting:

[http://stackoverflow.com/questions/11227809/why-is-
processin...](http://stackoverflow.com/questions/11227809/why-is-processing-a-
sorted-array-faster-than-an-unsorted-array)

Which has been posted many times before - however, applying it to bubble sort,
which we all know well, is really interesting.

Thanks for sharing!

------
calibwam
My theory about bubble sort is that a lot of people remembers it just because
the name is so damn cute. You can't say bubble without smiling. So I say that
we should rename quick sort to something a bit nicer, something first year
comp sci students can remember instead of bubble sort. And, for good measure,
bubble sort should be "terrible sort" or something like that.

~~~
jere
Perhaps not cute, but I find the name "Timsort" pretty amusing:
<http://en.wikipedia.org/wiki/Timsort>

~~~
koenigdavidmj
The name `Tim' got much funnier after Monty Python and the Holy Grail. Thanks,
Tim the Enchanter!

Of course the sort is named after Tim Peters, its author, rather than the said
character. Sorry, Mr Peters!

------
enjo
I've asked this as an interview question in the past. It's a great question as
it has so many layers. Very few folks ever groked the role the branch
predictor plays, but those who did have always gone on to do well.

------
tsotha
Who actually _uses_ bubble sort except as a class assignment? The whole
advantage to the algorithm is it's easy to understand and implement. You learn
that one first and then you move on to more practical stuff.

In commercial programming it's pretty rare to run across a situation where
writing your own sorting routine makes sense.

~~~
snprbob86
Bubble sort is actually extremely useful when temporal locality matters more
than absolute sort order.

Example: Games often bubble sort world objects by depth. There is a small
performance gain to be had by drawing closer objects first, so that you can
depth-cull expensive pixels behind them. Game engines can run several
iterations of bubble sort, but stop before sorting the set completely, since
the z-buffer will ensure correctness. Over the span of several frames, the
incremental bubble sort will achieve a total sort, but the incremental
approach bounds cost more tightly. Since the depth of objects only changes
relatively when you look around, it's usually a pretty good approximation of
"perfect" behavior.

~~~
eridius
Is this very common? Because that would certainly explain the behavior I see
in games like WoW where turning my camera drops the framerate, but once I stop
turning the framerate jumps back up to normal. I always assumed this had
something to do with loading textures or models or geometry or something, but
I was never really satisfied with that.

~~~
sirclueless
Your assumption was right. The choice of sort is purely an optimization to
allow for fast depth-culling, and shouldn't result in noticeable graphical
glitches. However, pipelining textures and models based on viewing angles is
common. This is certainly what you are seeing.

------
pbiggar
Awesome, this was the first bit of real research I was involved in. Thanks for
putting it in blog form Nick, a great addition to HN.

~~~
nicknash
Ah no bother -- always a pleasure Paul!

------
dllthomas
1) Any comparison-based sorting algorithm that isn't branchless will mis-
predict often.

2) The article completely ignores the possibility of conditional-move
instructions, which would cut the mis-predicted branches in this case to zero
(or one if we add the "are we done?" check and always predict "no").

~~~
nicknash
I'm not sure you're quite right about 1. Depending on what you mean exactly.
For example, insertion sort is O(n^2) but mispredicts O(n) times. That doesn't
seem often to me.

There are (artificial) mergesorts that execute O(nlogn) branches but also
mispredict O(n) times. See for example "branch mispredictions don't affect
mergesort:" <http://www.cphstl.dk/Paper/Quicksort/sea12.pdf>

In the case of 2. I agree, although compiler support still isn't great -- I
think. For almost all cases (integers, floats, strings) radix sorting would be
even better -- fewer instructions, no comparison branches.

~~~
dllthomas
Hmm, you're right. My thinking was, "if we know enough about the structure of
the problem that we can make sure we're guessing right much more than half the
time, we can probably turn that into a better O()", which insertion sort as an
example doesn't really violate per se - it is possible to trade off that
knowledge for better asymptotic complexity, but clearly the claim wasn't quite
right as broadly as I stated it.

------
kyrra
Is there an analysis of some other sorting algorithms and how the behave with
branch prediction? I did some google searching and found one paper on merge
sort and branch prediction.... specifically thinking about Mergesort,
quicksort, heapsort, and introsort.

This stackoverflow post covers the "when to use" parts of various algorithms,
I'm just wondering more about the practical vs theoretical runtimes of these
sorts. [http://stackoverflow.com/questions/1933759/when-is-each-
sort...](http://stackoverflow.com/questions/1933759/when-is-each-sorting-
algorithm-used)

~~~
pbiggar
The paper linked to, and the tech report its based on, provide exactly that -
an experimental analysis of sorting and branch predictors. Both are available
at <http://paulbiggar.com/research/#sorting>.

------
mikhael
I'm confused about one part of your analysis. You write that Q^k_l =
Q^(k-1)_(l+1) * (1 - 1/(l+1)), but it's not obvious to me why the multipler,
(1 - 1/(l+1)), is correct in general (past the second sweep). I went so far as
to write a little bit of code to calculate various values of Q^3_l, exactly,
from values of Q^2_(l+1), for an 8-element array, and found that it did not
hold. (I assume that you were assuming that the initial array permutations
were picked uniformly from the set of all permutations). Have I done something
wrong or missed something totally obvious?

Thanks!

~~~
nicknash
I think it's right. One way to convince yourself is just to add counters to
the implementation of bubble sort, run it for a larger input, and then verify
the probabilities (including the conditional probabilities) match what I
claim.

You're correct that every permutation of {1, ..., n} is assumed equally likely
as an input.

This way of thinking about it might help:

All that "bubble_sweep" does to the input array is to "right-shift" the left-
to-right maxima, and "left-shift" the elements between them. The left-shifted
elements aren't re-ordered, so they still have the usual probability of being
a left-to-right-maxima.

In a bit more detail: Let L be the sequence of indices, from left-to-right, of
left-to-right maxima in the array. Now in s^{th} sweep, let M be the sequence
L concatenated with n - s. Assume M has r elements in total now.

Then the operation of "bubble_sweep" is just:

a[M_r] <\- a[M_{r - 1}] a[M_{r - 1}, ..., M_{r - 2}] <\- a[M_{r - 1} + 1, ...,
M_r - 1] . . . a[M_2] <\- a[M_1] a[M_1, ..., M_2 - 2] <\- a[M_1 + 1, .., M_2 -
1]

I think you could actually implement a (very, very) inefficient "bubble_sweep"
that worked along these lines. I think that makes it a bit clearer that, apart
from at left-to-right maxima from a previous iteration, we just have
subsequences of the original (uniform random) permutation, so the 1 - 1/(l+1)
probability holds.

I hope that's clear(er) and I didn't make too many mistakes. The sun is
shining and I must go ride my bike now!

------
pbiggar
The code from the paper:

<https://github.com/pbiggar/sorting-branches-caching>

------
bproctor
I was really hoping this article was going to end with an amazing discovery
that showed bubble sort to have some great hidden potential.

------
zwieback
Would be interesting to see how a bubble-sort algorithm on an ARM CPU would
perform with and without conditional execution.

