
A Killer Adversary for Quicksort (1999) [pdf] - beagle3
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
======
nightcracker
I used this technique to prove libc++'s implementation of std::sort is O(n^2).
There's still an open bug report for this in libc++:
[https://llvm.org/bugs/show_bug.cgi?id=20837](https://llvm.org/bugs/show_bug.cgi?id=20837).

On a related note, I'm talking to the libc++ and libstdc++ developers about
replacing introsort with pdqsort
([https://github.com/orlp/pdqsort](https://github.com/orlp/pdqsort)) as the
implementation for std::sort. It solves the worst case in a subtly different
(and potentially more efficient) way than introsort. Although that's not the
real selling point.

EDIT: I forgot I still use my old username on hacker news. To prevent
confusion, I'm also the author of pdqsort.

------
lshevtsov
I made an implementation of this algorithm as a university assignment;

here's a vanilla implementation of Quicksort [https://github.com/leonid-
shevtsov/uni-programming/blob/mast...](https://github.com/leonid-shevtsov/uni-
programming/blob/master/kr_may_2012/custom_qsort.cpp#L27)

and here's the function to generate data that causes the custom implementation
to go O(n^2) [https://github.com/leonid-shevtsov/uni-
programming/blob/mast...](https://github.com/leonid-shevtsov/uni-
programming/blob/master/kr_may_2012/generators.cpp#L78)

------
jmount
I've had bad commercial Quicksort implementations go N^2 on constant data,
[http://www.win-vector.com/blog/2008/04/sorting-in-anger/](http://www.win-
vector.com/blog/2008/04/sorting-in-anger/) . Kind of sours one on the
"Quicksort is essentially elegant" idea.

------
ilzmastr
I don't get how you could create an adversarial input in advance for an
implementation that uses a randomized-partition (index of pivot drawn from a
uniform distribution).

Can anyone fill me in?

What this article describes (for that, as far as I can tell) is an adversary
that modifies the input while quick sort is working. How realistic is that?

~~~
asdfasdfsd
That algorithm is only used to create an input pattern. When it is fed back to
the same sorting function, it will cause it to go quadratic. If your target is
using the same sorting function it will do the same on their side.

This attack is realistic. I have tried it on the glibc qsort, generic
quicksort, and Visual Studio qsort, and they were vulnerable.

~~~
ilzmastr
I'm interested in the way you use the word "attack."

I was looking at it from the perspective of you could never run into an input
that would consistently sort in O(n^2) using randomized-quicksort.

But your perspective (maybe this article's perspective) is that some malicious
adversary could force the sorter to do extra work by knowing in advance the
steps that they will take. Am I getting that right? Like from a flops security
stand point?

If so, why care about this situation? This only seems like a realistic
scenario in very weird domains, like amazon web services messing with
everyone's calls to some standard sorting code to charge them more cents for
compute time or something...

~~~
alejohausner
If I'm not mistaken, the attack is aimed at a quicksort implementation where
the pivot value is chosen as a median of first, middle, and last values. It
would not work on randomized quicksort, of course.

------
graycat
What's the problem? From D. Knuth's _The Art of Computer Programming: Sorting
and Searching_ ,

(1) if sorting n records by comparing pairs of keys, then the Gleason bound
shows that on average can't sort faster than O( n ln(n) ).

(2) Heap sort sorts n records in both worst case and average case in time O( n
ln(n) ).

Net, when sorting by comparing pairs of keys, in main memory, on a simple
machine architecture, with a single processor, ..., can't improve on Heap
sort.

So, just use Heap sort. That's what I do.

~~~
nightcracker
Big O hides all constants. A hybrid algorithm using Quicksort for the heavy
lifting is O(n log n) in the worst case, but is a significant amount faster
than heapsort - mainly due to cache effects.

See pdqsort for examples (benchmarks included):
[https://github.com/orlp/pdqsort](https://github.com/orlp/pdqsort).

It shows that on average it's twice as fast on the random input distribution.

~~~
graycat
Now and long the usual criterion of performance is big-O notation, and that is
what I considered, with the Gleason bound and heap sort and, thus, showed
that, with various common assumptions, can't do better than heap sort. As far
as I know, I was correct.

So, to claim to beat heap sort, have to set aside some of the usual
assumptions and the big-O criterion.

Quicksort is great with a cache since as it has great locality of reference
when working with the _partitions_ that fit into the cache. But, worst case of
Quicksort is still O( n^2 ) so what happens if Quicksort encounters such a
partition?

Just intuitively, it appears that heap sort has a tough time making much
exploitation of usual cache designs. We know that. There is a modification of
heap sort that helps it work better when have to do virtual memory paging.

I used to get torqued at considering big-O, and did this just because it
ignores constants; as a programmer, I worked hard to do well with such
constants. But in the end I gave in and gave up and just accepted big-O
notation as the most appropriate, simple, single criterion. And, for judging
algorithms by how they do with cache exploitation, I gave up on that, also:
Or, broadly it's the responsibility of such speedup techniques to work on
ordinary code written for a simple processor architecture, not the
responsibility of such code to do tricky things to exploit tricky
architectures.

In the land of big-O, a factor of only 2 is considered not very exciting.

If relax some of the assumptions, then can beat heap sort and O( n ln(n) ):
Use radix sort. That was the sorting algorithm of the old punch card sorting
machines. E.g., if have 100 billion records to sort where each key is an
alpha-numeric string 10 characters long, then radix sort does all the work in
just 10 passes of the 100 billion records. Or for keys m characters long, can
sort n records in O( nm ). So that's faster when m < ln(n). Or, to make the
arithmetic easier, m < log(n). But log(100 billion) = 12, so radix sort should
be faster with m = 10 and n = 100 billion.

So, for a _hybrid_ sort routine, consider also radix sort.

Am I excited about radix sort? Nope!

~~~
nightcracker
> But, worst case of Quicksort is still O( n^2 ) so what happens if Quicksort
> encounters such a partition?

I'd strongly suggest you to read the readme I wrote for
[https://github.com/orlp/pdqsort](https://github.com/orlp/pdqsort), which
explains in detail how I beat heapsort (and other sorting algorithms). It also
shows how I prevent the O(n^2) worst case.

> I used to get torqued at considering big-O, and did this just because it
> ignores constants; as a programmer, I worked hard to do well with such
> constants. But in the end I gave in and gave up and just accepted big-O
> notation as the most appropriate, simple, single criterion.

We have already established that O(n log n) is the lower bound, and we have
reached this lower bound. This is where big O stops being useful. A metric
that compares equal for every relevant algorithm (mergesort, introsort,
pdqsort, timsort, smoothsort, heapsort) is not a useful metric.

> In the land of big-O, a factor of only 2 is considered not very exciting.

I live in the land of real-world performance, powered by actual benchmarks. In
this land a factor of 2 is very exciting.

> So, for a hybrid sort routine, consider also radix sort.

I worked in the scope of std::sort, which is strictly a comparison sort.

~~~
graycat
I assumed that we were talking about _in place_ sorting.

"I feel your pain." As I tried to explain, the first time I saw big-O notation
taken seriously as the main metric for evaluating the actual running time of
actual code on actual data on an actual computer, I had your reaction, that
the constants are important and the big-O criterion, not good.

There's another point: The silent assumption of the big-O criterion is that we
are interested in (A) something _fundamental_ about algorithms, just the
algorithms, and not (B) actual running time of actual code on actual data on
actual computers.

And big progress in algorithms is also important and, as we know, in some
cases can give much better performance gains in real computing than anything a
programmer can do without the big progress.

And we have a great example: There were days before heap sort, shell sort,
quicksort when the in place sorting algorithm people coded were bubble sort or
something closely related and also O(n^2). Then in practice, the difference
was just huge.

Early in my career, I ran into that at Georgetown University: A prof had coded
up some software for teaching statistics and had used the IBM Scientific
Subroutine Package (SSP) for a most of the actual calculations. One of the SSP
routines was to find _ranks_ , and it had two loops, each essentially bubble
sort. In testing, the thing ran all lunch time.

Well, the data to be ranked was 32 bit integers, and an array index was a 16
bit integer. So, I was able to do a tricky overlay of two 16 bit integers on
the 32 bit data, use heap sort, and get O( n ln(n) ). In actually running time
on nearly any possible real data, my code as so much faster than IBM's SSP
that I _blew the doors off_ the SSP.

Lesson 1: O( n ln(n) ) instead of O( n^2 ) can be much better in practice.

Lesson 2: Progress in big-O in algorithms can be terrific stuff.

Back to practice, early in writing the software for my startup, I tried to
write a polymorphic version of heap sort. I wrote in Microsoft's Visual Basic
.NET. For the data type to be sorted, I passed just the type _object_. The
software ran correctly right away. Then I did some careful timings -- the
software commonly ran ballpark 10 times longer than I would have guessed. What
the heck? Well, the run time was smart enough to figure out the actual data
type and do the comparison, but the overhead here was enormous. Then I learned
that for _polymorphic_ code I was supposed to write some _interfaces_ , one
for each data type to be sorted, and for sorting a particular data type pass
the corresponding interface. Okay. My confusion was that that use of an
_interface_ didn't sound _polymorphic_ to me and sounded no different than
what I'd long done passing _entry variables_ in PL/I and Fortran. Okay, for
polymorphism, use _interfaces_. And in the code of the interface, have to work
essentially without _strong typing_ , essentially with untyped pointers. Okay
-- been there, done that, in both PL/I and Fortran; I was surprised that a
serious _object oriented_ language with _polymorphism_ would have essentially
just syntactic sugar over what I'd been doing in PL/I and Fortran. Okay.
Lesson learned: There are cases of hype in parts of programming languages.

Heck, in PL/I my approach to sort an array of PL/I structures would be to pass
a PL/I structure that had an element variable that was an entry variable to a
procedure that would do the comparisons. That is, could have some conventions
that would let PL/I structures be _self-describing_. Then could write code
that was more general. Maybe under the covers, that's just what's going in the
Microsoft _common language run time_ (CLR) that Visual Basic .NET used for me
without letting me know \-- more _syntactic sugar_?

So, when I rewrote my polymorphic heap sort to do the comparisons with a
passed _interface_ , the performance got okay. Still, I have non-polymorphic
versions for the more important element data types, and they are faster in
just they way you emphasized. Yup, been there, done that.

So, I, too, was concerned about actual running time of actual data. But for
the _algorithm_ itself, I was still using just heap sort and didn't try to
write a _hybrid_ routine that sometimes might exploit, say, radix sort.

On the pros and cons of using big-O for the only criterion, there's a big
example in linear programming: In practice, the simplex algorithm is
fantastically fast. Intuitively, in practice, the early iterations in effect
_focus_ what to do to get to the optimal solution (assuming feasibility, no
unboundedness, etc.). But due to some work of Klee and Minty, the worst case
of simplex is exponential and just awful. Intuitively the Klee and Minty
examples trick simplex into making really bad choices!

So, there was research to find a polynomial algorithm for linear programming.
Eventually there was an _ellipsoid_ algorithm that was polynomial but the
constant was so high that whenever in practice ellipsoid beat simplex both ran
too long to be practical.

So, really, the concern about simplex and the effort for ellipsoid was as
progress just in algorithms and in the big-O criterion. And in practice, right
away, the constant was so large that ellipsoid was a flop.

Still, in algorithms, people are straining to do better in the sense of the
big-O criterion. Of course the biggest effort here is to find an algorithm
that shows that P = NP, maybe the one of the most important problems in both
math and computer science, based on big-O for some polynomial.

With all of that, I decided in my own work, for something as simple as
sorting, just to go with heap sort, maybe even a polymorphic heap sort.
Indeed, with all the emphasis on polymorphism, people can't be very concerned
about constants! :-)!

~~~
nightcracker
I'm impressed you can write such a long story yet address none of my points.
You should consider becoming a politician.

The only relevant piece of information I could find in your anecdote is this:

> I assumed that we were talking about in place sorting.

If we limit ourselves to in-place sorting algorithms then we still are left
with block sort, pdqsort, introsort and heap sort, all of which run in O(n log
n). My argument still holds.

~~~
graycat
I wrote that long post trying to explain my attitude on sorting software from
50,000 or so feet up and, in particular, why I have been happy just to use
heap sort and forget about looking for anything faster on average.

I was reluctant to read the documentation in your GitHub submission if only
because in my software development I have never needed or used GitHub.

Also, I was reluctant to believe that there could be any version of quicksort
that would have worst case running time of O( n ln(n) ).

Why reluctant to believe?

First, sure, the usual version of quicksort has us select a _pivot_ value for
a partition from a "median of three" keys from that partition.

Second, sure: For a _good_ pivot value, we want essentially the median of the
keys in the partition we are trying to split into two partitions. And, sure,
the median of three approach to selecting a pivot value is an approximation to
the median we really want and, thus, a relatively good pivot value. So this
approach to selecting a pivot value promises on average to get faster average
running times. Fine. Faster on average is good. Terrific.

But this pivot selection rule says next to nothing about worst case running
time, and, for claiming worst case running time of O( n ln(n) ), this is a
biggie issue.

Third, with this median of three approach to selecting pivot values, we can
think of _bad_ arrays of input keys that will cause quicksort to run in time
no faster than O( n^2 ). Then maybe we can think of other ways to select pivot
values that result in a new version of quicksort, a version that runs in O( n
ln(n) ) again on the bad arrays. But with this change, here's the problem: We
are lacking a proof that the worst case running time of the new version of
quicksort has worst case running time of O( n ln(n) ).

That is, our new version of quicksort is good for some old cases of _bad_
arrays but may have created for itself some new cases of _bad_ arrays. So,
with no proof, the new version of quicksort may also have worst case running
time of O( n^2 ) like the old version did. The main difference might be that
the _bad_ arrays for the new version are more complicated to construct than
the bad arrays for the old case.

But I just looked at the file README.MD you suggested. I see two issues:

First Issue:

> When a new pivot is chosen it's compared to the greatest element in the
> partition before it. If they compare equal we can derive that there are no
> elements smaller than the chosen pivot. When this happens we switch strategy
> for this partition, and filter out all elements equal to the pivot.

If I understand this statement correctly, I suspect that for some arrays of
input keys, the resulting algorithm won't actually sort the keys. That is, we
won't have a sorting algorithm.

Second Issue:

> While this technically still allows for a quadratic worst case, the chances
> of it happening are astronomically small.

For proving worst case running time of O( n ln(n) ), "astronomically small" is
irrelevant. Maybe your remark "astronomically small" is just a side comment
and not relevant to PDQSORT having worst case running time of O( n ln(n) ),
but from your documentation I can't easily tell.

> A bad partition occurs when the position of the pivot after partitioning is
> under 12.5% (1/8th) percentile or over 87,5% percentile - the partition is
> highly unbalanced. When this happens we will shuffle four elements at fixed
> locations for both partitions. This effectively breaks up many patterns. If
> we encounter more than log(n) bad partitions we will switch to heap sort.

For this and the associated documentation, I don't see a solid argument that
PDQSORT has worst case running time of O( n ln(n) ).

For some work, a big-O guarantee on worst case performance is a biggie, as in
crucial to human life. Such a guarantee is essentially a theorem in applied
math called _computer science_. A good example of how to prove such theorems
is, of course, in D. Knuth, _The Art of Computer Programming, Volume 3,
Sorting and Searching_.

Again, I have been happy just to use heap sort and forget about looking for
anything faster on average.

~~~
nightcracker
> If I understand this statement correctly, I suspect that for some arrays of
> input keys, the resulting algorithm won't actually sort the keys. That is,
> we won't have a sorting algorithm.

You don't understand the statement correctly. However, this is not a trivial
realization.

Because pdqsort always recurses on the left partition first it means that
every element to the left of the current recursive call is equal to or less
than every element within the current recursive call. So if we find that the
selected pivot is equal to an element before the current recursive call we can
conclude that there can only be elements equal to the pivot, and not smaller,
in the current recursive call.

> Maybe your remark "astronomically small" is just a side comment and not
> relevant to PDQSORT having worst case running time of O( n ln(n) ).

Correct.

> For this and the associated documentation, I don't see a solid argument that
> PDQSORT has worst case running time of O( n ln(n) ).

There is a very simple argument. The 'bad partition counter' is initially
log(n). If we encounter a bad partition, we can do up to O(n) work. This can
happen log(n) times. So we can do up to O(n log n) work on bad partitions. If
we then encounter another bad partition, we switch to heapsort, which is O(n
log n).

If we encounter less than log(n) bad partitions Quicksort performed well
enough to be O(n log n).

For a similar approach, read up on introsort.

~~~
graycat
Okay.

