
Potential Quicksort replacement in java.util.Arrays with new Dual-Pivot - fogus
http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628
======
alexh
I read the documentation he linked to. It seems odd to me that there was no
mention of a N-Pivot quicksort. Unless I missed it. It seems like the burden
would be on proving that 2 was the optimal value of N

~~~
GrandMasterBirt
Having N > 2 will require you to do a sort on the pivots :P Also you will
require that the array size be at least N. N = 2 is a perfect size --

size = 0 -> no sort

size = 1 -> no sort

size = 2 -> pivots can be made, thus can be sorted

etc.

~~~
iclelland
Having N > 1 requires you to do a sort on the pivots. From the algorithm:

    
    
      3. P1 must be less than P2, otherwise they are swapped.

------
jongraehl
A small simplification/optimization:

    
    
            int third = len / div;
            third=third>0?third:1;
    
            // "medians"
            int m1 = left  + third;
            int m2 = right - third;
    

Instead of the worse:

    
    
            int third = len / div;
    
            // "medians"
            int m1 = left  + third;
            int m2 = right - third;
    
            if (m1 <= left) {
                m1 = left + 1;
            }
            if (m2 >= right) {
                m2 = right - 1;
            }

------
jwecker
I especially appreciate the proof- above and beyond the call of duty. However,
I wouldn't be surprised if this was still slower in most real-world cases than
timsort.

~~~
gjm11
On the other hand, it doesn't need O(N) extra memory.

------
adzp
I would be interested in seeing the theoretical proof for the average case.

------
dminor
I thought Java's sorting algorithm was just recently replaced:
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124>

Is this the same, or something new?

~~~
btilly
This is new.

Java can use multiple sorting algorithms with different trade-offs. Merge sort
algorithms have a good worst case scenario, but are somewhat slower in many
average cases. They also need more memory. By contrast quick sort algorithms
have very bad worst cases, but tend to be faster in the average case.

The bug you pointed at was a replacement of Java's merge sort with a merge
sort that does a better job of speeding up when it runs across runs of already
sorted data. This is a replacement of quick sort with a variation that has
better average running time.

So what is the difference between a merge sort and a quick sort? Well both are
divide and conquer algorithms. Merge sorts are based on splitting a list into
two halves, sorting each half, then merging the two lists. Quick sorts are
based on taking an element from a list called a pivot, then dividing the list
into elements that are larger, smaller, or equal to the pivot. The larger and
smaller lists are then sorted in turn. If you're clever about how you set
things up you can do a quick sort in place, and this is faster than having to
assign then free up memory.

This algorithm is a variation on quick sort that uses 2 pivots rather than 1.
The win over a traditional quick sort is that you can sort in place with the
same number of comparisons and less moving of data around in memory. This is a
win in two ways. First of all you do less moving data around. Possibly more
important is the fact that if you're doing the same number of comparisons but
swapping less often, then those comparisons have more predictable results. If
you understand CPU architecture you'll note that this means fewer pipeline
stalls, which really helps performance.

------
sanj
Any insights into the "fat pivot" problem?

~~~
gjm11
I thought "fat pivot" meant the approach to quicksort where you partition into
{ smaller, equal, larger } and then sort the three subarrays instead of two.
I'm not sure what the "fat pivot _problem_ " would be. Anyway, this new
approach is a generalization of that -- fat pivoting is the special case where
the two pivots are equal.

~~~
jongraehl
I think you're right. I'm not sure if it's significant (because potentially
there are fewer passes), but you end up having to swap more for a single pass.
You also can get a stable in-place sort by treating the equal elements
specially, which is why I'd do it.

