

Java has switched from Mergesort to TimSort - fogus
http://paste.pocoo.org/show/133352/

======
jrockway
I always get scared when I see a factorial term in big-O notation, but it's
worth noting that this is better than n log n.

    
    
      n log n = log ( n ^ n ) = log ( n * n * n * ... )
    

while

    
    
      log ( n! ) = log ( n * (n-1) * (n-2) * ... )
    

n log n's repeating n does not get smaller, but log n!'s does. So it's
slightly better.

(I had to think this through, so I thought posting it might help someone
else.)

~~~
jbn92
<http://en.wikipedia.org/wiki/Stirling%27s_approximation> says that log(n!) is
O(n log n) anyway...

~~~
philwelch
Big O means an upper bound. n is also O(n log n),

In other words, while log(n!) is no worse than n log n, it is still better
than n log n.

~~~
swolchok
log(n!) = Theta(n log n)

Proof: let's take it as granted log(n!) = O(n log n). It remains to show that
log(n!) = Omega(n log n).

log(n!) = sum of log i for i from 1 to n

>= sum of log (n/2) for i from n/2 to n (dropping half the terms (all
nonnegative) and _under_ -approximating the rest; technically, we need to
floor or ceil n/2, but it doesn't matter)

= (n/2) log (n/2)

= (n/2) (log n - log 2)

= (1/2) n log n - n log 2 / 2

Hopefully, I don't need to bother proving that c(n log n) + kn = Theta(n log
n) for constants c and k. We've shown that log n! is Omega(n log n) and O(n
log n), so we have log n! = Theta(n log n) by definition.

So, the root of this thread is wrong (and so is the parent). Furthermore, the
fact that log(n!) is smaller than n log n by a constant is irrelevant when
comparing log(n!) to Theta(n log n). What would not be meaningless is
comparing formulae for the _exact_ number of comparisons performed by
different algorithms.

------
callahad
And here's the bug in Sun's tracker that suggested switching in the first
place: <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124>

------
kingkilr
... this same tripe was submitted to reddit, they haven't actually switched
yet, this is just a random paste of the timsort.txt file that exists within
the Python SVN repository.

~~~
callahad
Agreed. The link is solely a description of timsort and has nothing to do with
Java. There is, however, a bug in Sun's tracker suggesting that timsort be
used for java.util.Arrays.sort. That bug has a state of "Accepted," which to
my naive understanding indicates that Sun has consented to implementing the
change. But then again, I don't have any experience with Java bug triage
stages.

A better title for the submission would be "Timsort - Python's pragmatic merge
sort" or something along those lines.

~~~
sketerpot
Also, Android switched to timsort quite a while ago -- I think that's where
the Java timsort code is coming from.

------
amichail
How does TimSort improve (if at all) on what is published on the topic in
academia?

~~~
mbrubeck
Here's another link to the documentation (since svn.python.org is not
responding for me): <http://bugs.python.org/file4451/timsort.txt>

TimSort is essentially merge sort plus optimizations. The major optimization
is that it identifies already-sorted subsequences within the input, and sorts
these "runs" plus some surrounding elements using a fast binary insertion
sort. This means that for input that isn't completely random, it doesn't need
to do as many comparisons as a pure mergesort. TimSort has also been tuned
based on empirical testing on real-world data/hardware.

~~~
amichail
Adaptive sorting has been studied in academia:

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.8...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.8017)

