
Timsort is a sorting algorithm that is efficient for real-world data - signa11
https://hackernoon.com/timsort-the-fastest-sorting-algorithm-youve-never-heard-of-36b28417f399
======
CogitoCogito
Recently I "made" [0] one of those visualized sorting videos for Timsort for
real-world data. Point was to give an example of data that was already mostly
sorted and show how fast it sorts it. It may be of interest here:

[https://www.youtube.com/watch?v=ZxLxf5xqqyE](https://www.youtube.com/watch?v=ZxLxf5xqqyE)

[0] My use of "made" gives me way more credit than I'm due. Really I just
hacked others' work to use another data source. See the description in the
video for more info on how it was produced.

~~~
WalterGR
I feel like this is important to note:

 _My example of real world data is data in which the first 90% is already
sorted and the next 10% is random. Whether this is real enough is
questionable, but it does demonstrate how Timsort automatically takes
advantage of runs of sorted data._

It would be nice to see a comparison working on truly real-world data. Perhaps
from an open dataset.

~~~
cntlzw
I am experimenting with Lidar datasets lately. In lidar you work with huge
point clouds. Millions to billions. For some algorithms - convex hull - you
need to sort the points for example by x-cooridinate. I did some quick tests
with quick sort, merge sort, and timsort. In my experiments quick sort was the
slowest with 5.5 seconds on the chosen dataset. Mergesort was 5 seconds and
timsort was around 4 seconds.

~~~
beagle3
When sorting coordinates, radix sort with an appropriately chose radix (often
but not always 256) is usually fastest.

~~~
cntlzw
Hi there, thanks. Need to try this

------
raymondh
In Python 3.7, the Timsort was sped-up another 40 to 75% for homogenous lists
(just some of the common cases).

It works by replacing Python's slow generic PyObject_RichCompareBool() with a
type-specialized comparison function.

* [https://docs.python.org/3/whatsnew/3.7.html#optimizations](https://docs.python.org/3/whatsnew/3.7.html#optimizations)

* [https://bugs.python.org/issue28685](https://bugs.python.org/issue28685)

------
dean177
For those with a similar train of thought, this PR describes the standard
sorting algorithm used in Rust: [https://github.com/rust-
lang/rust/pull/38192](https://github.com/rust-lang/rust/pull/38192)

~~~
karmakaze
This is great. A simpler TimSort without galloping but keeping merging of
runs. It needs a good name (or any name for that matter) so that it can get
picked up and used elsewhere.

I'll suggest Pacesort (because the pace gait has smaller steps than a gallop
and doesn't have many search results).

~~~
gameswithgo
TimmaySort?

~~~
karmakaze
Timmay!

------
nixpulvis
I'm not a fan of the anti-academic vibes in this post. Good technology can be
born anywhere, but it's foolish to ignore the advantages of research.

~~~
archgoon
Well, to be fair, I'm always concerned about those vat grown algorithms from a
lab. I mean, are we sure they're _safe_?

~~~
sametmax
I only trust real algos that are grown in a free range PHP forum.

~~~
sangnoir
Aye, but are they better than the local-sourced samples in the _comments_ on
the PHP manual website? 17 upvotes can't be wrong.

------
bbno4
Hey! Author here! Thank you so much for posting this here! If you have any
questions feel free to ask me

~~~
kbd
> Since the algorithm has been invented it has been used as the default
> sorting algorithm in Python, Java, the Android Platform, and in versions of
> GNU.

I'm glad you mentioned how far the algorithm has spread to other languages
since originating in Python. But, "versions of GNU" what?

~~~
seba_dos1
Versions of GNU. Although it's rare to see this term being used in this
context these days, "GNU" was invented as a name of an operating system.

~~~
eesmith
Where is TimSort used in a GNU operating system?

I suspect this line originates from the Wikipedia entry for TimSort with the
line:

"Timsort has been Python's standard sorting algorithm since version 2.3. It is
also used to sort arrays of non-primitive type in Java SE 7,[3] on the Android
platform,[4] and in GNU Octave."

Notice how the order of Python, Java, Android, and GNU is the same? My
tentative hypothesis is that "Octave" was dropped, and that "GNU" here means
something more like "in the GNU project."

Also, the author uses a time complexity table with Ω(), Θ(), and O()
notations, but suggests "To learn about Big O notation", go to another
hackernoon article which only talks about O() and not big omega or big theta
notations.

------
bluecalm
Are there benchmarks available for this kind of algorithm? If you come up with
a fast sorting algorithm how would you go about making a case it's faster than
currently used ones on real world data?

~~~
hyperpape
You can go back and read the mailing list archives to see the benchmarks that
were used by Python devs to decide on using Timsort:
[https://mail.python.org/pipermail/python-
dev/2002-July/threa...](https://mail.python.org/pipermail/python-
dev/2002-July/thread.html#26837)

You can also read Tim Peter's notes about the sort:
[https://bugs.python.org/file4451/timsort.txt](https://bugs.python.org/file4451/timsort.txt)

That was back in 2002, so there's certainly been a lot of scrutiny since then.
In particular, I've never actually read about Java sorting benchmarks, but
there are tremendously sophisticated benchmarking tools/benchmarking experts
in the Java world, so I imagine that it got a lot of analysis there.

~~~
BeeOnRope
It's worth noting that Java doesnt exclusively use Timsort. Depeding on the
data type and heuristics based mostly on the size of the input array, a
simpler sort such as counting sort may be use.

You can find a more detailed summary at:

[https://stackoverflow.com/a/41129231/149138](https://stackoverflow.com/a/41129231/149138)

~~~
hyperpape
Cool find. I'd never actually realized that.

------
ylmm
Don't forget:
[https://news.ycombinator.com/item?id=9100107](https://news.ycombinator.com/item?id=9100107)

------
linsomniac
In the film _Heist_, Gene Hackman's character is asked how he pulled something
off. "I tried to imagine someone smarter than myself and I then I thought:
What would he do?"

When I need to pull something off, I ask myself: What would Tim do?

------
redcalx
I ported timsort to C#/dotnet recently. Available in this nuget:

[https://www.nuget.org/packages/Redzen/](https://www.nuget.org/packages/Redzen/)

Source code here (there are three variants):

[https://github.com/colgreen/Redzen/tree/master/Redzen/Sortin...](https://github.com/colgreen/Redzen/tree/master/Redzen/Sorting)

None of the standard framework sort methods perform a stable sort so it's
handy for that, as well as being a lot faster for some data, i.e. with pre-
sorted runs. Otherwise the framework now uses introsort which is pretty good,
so you should definitely performance test with both. I use both sorts
depending on the context and I usually end up using the built in introsort,
with timsort used for a few special cases.

------
bane
It's a great piece of engineering and because of the tongue-in-cheek name
might be confused with a joke sorting algorithm...my favorite of which is
sleep sort.

[https://www.geeksforgeeks.org/sleep-sort-king-laziness-
sorti...](https://www.geeksforgeeks.org/sleep-sort-king-laziness-sorting-
sleeping/)

------
Cynddl
> Timsort is a sorting algorithm that is efficient for real-world data and not
> created in an academic laboratory.

It's always a bit sad to see academia as an ivory tower where researchers
generate unusable knowledge. Academia is not only about finding theoretical
pure solutions but often also about finding solutions to real-world problems.
For instance, machine learning, logic design, networking algorithms,
cryptography, all deeply depend on work done in academia (and not just
“theoretical” work).

~~~
tomnipotent
> machine learning, logic design, networking algorithms, cryptography, all
> deeply depend on work done in academia (and not just “theoretical” work)

All _highly_ theoretical fields long before we found concrete applications. It
often took one or more people an entire lifetime to convince others than these
were valuable subjects to pursue.

> Academia is not only about finding theoretical

No, but it has to start there. Theoretical science is how we find new things
to turn into applied science. It required 45+ years between the time Heinrich
Hertz introduced contact mechanics and RADAR was put into use, 33 years from
when Einstein published mass–energy equivalence (E=mc2) and Otto Hahn cracked
nuclear fission but only a few more years until Robert Oppenheimer and the
Manhattan project turned that into the first atom bomb.

~~~
barrkel
_No, but it has to start there._

That's simply not true. There's a loop: sometimes, industry comes up with a
specific innovation, and academia generalized it; other times, academia comes
up with an idea, and it's implemented in industry (often a commercialisation
effort by the inventors). And sometimes academics are employed by industry and
there's a hybrid. There's no one true way.

------
anqurvanillapy
I ran the code and it gave me [2, 3, 5, 6, 7], missing the 1. Appending
the_array[i] after clearing the new_run by `new_run = []` seems to fix this
issue.

------
senatorobama
Any sorting algorithms that use deep learning?

~~~
KMag
You could of course write one. Presumably you're talking about using deep
learning to decide the order in which to compare elements, but this would be
difficult to ensure both always terminated and always gave correct results, to
say nothing of likely being very inefficient.

The difficulties in sorting algorithms aren't difficulties in recognizing
patterns, so it's difficult to see what deep learning brings to the table.

That being said, knock yourself out; surprise me!

~~~
neolefty
For creating custom comparators?

"Sort these web pages in order of quality."

~~~
KMag
I've never seen a reference text that considers the comparator part of the
sorting algorithm.

Timsort, quicksort, mergesort, introsort, smoothsort, etc. don't include the
comparator as part of the algorithm definition.

