Potential Quicksort replacement in java.util.Arrays with new Dual-Pivot

alexh · on Sept 11, 2009

I read the documentation he linked to. It seems odd to me that there was no mention of a N-Pivot quicksort. Unless I missed it. It seems like the burden would be on proving that 2 was the optimal value of N

fhars · on Sept 11, 2009

With two pivots, you can start the partitioning at either end and have a well defined location for each of the three partitions (at the top, at the botton, and inbetween). With more than three partitions, you would not know where to put the second to second last border without first sorting the array. You could just guess, but that would probably result in quite a lot of additional array shifting (lucky if you have a dedicated blitter chip for that :-)).

gjm11 · on Sept 11, 2009

You can make a quicksortish sort algorithm with N log N worst case by letting the number of pivots increase with N. (Details and proof, of course, in Knuth. The details are nontrivial but not all that painful.) I don't think anyone's ever proposed that that's worth doing in practice.

Using more pivots is going to make the code more complicated. The benefits of moving from one pivot to two seem to be real but not enormous; I'd be pretty surprised if moving to three pivots produced benefits worth the cost. But of course that's just handwaving, and indeed it would be interesting to measure.

fhars · on Sept 12, 2009

My copy of Knuth does only mention that algorithm with a reference to a 1970 paper.

But what I woukd find more intereting is how the parallelizable Batcher and Pratt algorithms decsribed by Knuth fare on modern multicore machines with their weird cache hierarchies.

GrandMasterBirt · on Sept 11, 2009

Having N > 2 will require you to do a sort on the pivots :P Also you will require that the array size be at least N. N = 2 is a perfect size --

size = 0 -> no sort

size = 1 -> no sort

size = 2 -> pivots can be made, thus can be sorted

etc.

iclelland · on Sept 11, 2009

Having N > 1 requires you to do a sort on the pivots. From the algorithm:

  3. P1 must be less than P2, otherwise they are swapped.

swac · on Sept 11, 2009

If you determine the number of pivots when you write the algorithm, they could be sorted in constant time. It would just require a lot of if-else branching...

jongraehl · on Sept 11, 2009

A small simplification/optimization:

        int third = len / div;
        third=third>0?third:1;

        // "medians"
        int m1 = left  + third;
        int m2 = right - third;

Instead of the worse:

        int third = len / div;

        // "medians"
        int m1 = left  + third;
        int m2 = right - third;

        if (m1 <= left) {
            m1 = left + 1;
        }
        if (m2 >= right) {
            m2 = right - 1;
        }

jwecker · on Sept 11, 2009

I especially appreciate the proof- above and beyond the call of duty. However, I wouldn't be surprised if this was still slower in most real-world cases than timsort.

gjm11 · on Sept 11, 2009

On the other hand, it doesn't need O(N) extra memory.

adzp · on Sept 11, 2009

I would be interested in seeing the theoretical proof for the average case.

dminor · on Sept 11, 2009

I thought Java's sorting algorithm was just recently replaced: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124

Is this the same, or something new?

btilly · on Sept 11, 2009

This is new.

Java can use multiple sorting algorithms with different trade-offs. Merge sort algorithms have a good worst case scenario, but are somewhat slower in many average cases. They also need more memory. By contrast quick sort algorithms have very bad worst cases, but tend to be faster in the average case.

The bug you pointed at was a replacement of Java's merge sort with a merge sort that does a better job of speeding up when it runs across runs of already sorted data. This is a replacement of quick sort with a variation that has better average running time.

So what is the difference between a merge sort and a quick sort? Well both are divide and conquer algorithms. Merge sorts are based on splitting a list into two halves, sorting each half, then merging the two lists. Quick sorts are based on taking an element from a list called a pivot, then dividing the list into elements that are larger, smaller, or equal to the pivot. The larger and smaller lists are then sorted in turn. If you're clever about how you set things up you can do a quick sort in place, and this is faster than having to assign then free up memory.

This algorithm is a variation on quick sort that uses 2 pivots rather than 1. The win over a traditional quick sort is that you can sort in place with the same number of comparisons and less moving of data around in memory. This is a win in two ways. First of all you do less moving data around. Possibly more important is the fact that if you're doing the same number of comparisons but swapping less often, then those comparisons have more predictable results. If you understand CPU architecture you'll note that this means fewer pipeline stalls, which really helps performance.

sanj · on Sept 11, 2009

Any insights into the "fat pivot" problem?

gjm11 · on Sept 11, 2009

I thought "fat pivot" meant the approach to quicksort where you partition into { smaller, equal, larger } and then sort the three subarrays instead of two. I'm not sure what the "fat pivot problem" would be. Anyway, this new approach is a generalization of that -- fat pivoting is the special case where the two pivots are equal.

jongraehl · on Sept 11, 2009

I think you're right. I'm not sure if it's significant (because potentially there are fewer passes), but you end up having to swap more for a single pass. You also can get a stable in-place sort by treating the equal elements specially, which is why I'd do it.