Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: QuadSort, Esoteric Fast Sort (github.com/scandum)
78 points by Exorust on Jan 29, 2023 | hide | past | favorite | 23 comments



Uses the wolfsort benchmark, which uses a time-based seed for the pseudo-random-number generator "rand". These are therefore unreproducible results (for the randomized inputs).

The results are very interesting and generally follow the rule that bigger hand-optimized code can be faster code.


Maybe not precisely reproducible, but they ran each test a hundred times so it should be possible to reproduce it well enough in statistical terms.


In the code it looks like the seed to the benchmark can be provided as the 4th command line argument if necessary: https://github.com/scandum/quadsort/blob/master/src/bench.c#...


TLDR: A visualisation of how QuadSort adaptively combines various novel strategies to achieve the best performance on a wide variety of inputs: https://github.com/scandum/quadsort#visualization

Benchmarks of how it beats qsort, Timsort, pdqsort etc.: https://github.com/scandum/quadsort#benchmark-quadsort-vs-st...

Is it called esoteric just because it's complex?


Quadsort's author here.

This is the first time I've heard quadsort being called esoteric. It isn't much more complex than Timsort. It is however challenging to port ~1000 lines of code that can be tedious to debug. If you make one simple error/typo you could be stuck for hours in debugging hell.

Hence I recently published piposort, which is ~150 lines and pretty basic, while still offering excellent performance. Not as good as quadsort, but it could make for a stepping stone in a porting effort.

https://github.com/scandum/piposort


Good point, so it's not more esoteric than Timsort which is used in production a lot.

With quadsort, fluxsort, piposort etc., you have an impressive coverage of the design space of sorting algorithms!


It has a lot of moving parts. It is considered esoteric because it would be difficult to put this kind of thing into successful production.


looks awesome. making this the default sort would surely save lots of co_2.


Not if it does lots more memory accesses, which is not clear from the page.

For example, mergesort has nicer lower bound than quicksort, but quicksort has significantly better constant value hidden in the O(n log n), so quicksort is much lower energy cost (in general - both can be fiddled with, but if I recall, quicksort is still better after all the shenanigans you can apply).

From a quick look at the code, line 248 of quidsort.c looks like an 8 way switch per 4 elements - this will likely destroy branch prediction, making pipeline refills cost significantly more CPU cycles than quicksort or merge sort. The code throughout looks like this type of large branching situations.

If you want to save CO2, disregard standard complexity and actually measure energy usage - many things can have nice complexities and hide terrible behavior on modern hardware.


> Not if it does lots more memory accesses, which is not clear from the page.

They say it reduces comparisons, moves and temporary allocation. To me this sounds like fewer memory accesses, although they note that quicksort can use L1 cache more optimally.

> quicksort is still better after all the shenanigans you can apply

According to the benchmarks, this is better than the quicksorts (qsort and Timsort).

> this will likely destroy branch prediction

But this is branchless where it matters, which is confirmed by the benchmarks.

I do agree it would be interesting to measure the actual energy efficiency to see how well it matches the benchmark results.


Quadsort makes more comparisons than timsort, but it has far better branch prediction for random data and far less overhead under the hood.

While a switch might seem bad, in the case of random data quadsort turns 1.5 branch mispredictions into 0.84 branch mispredictions. My upcoming release of quadsort 1.1.5.4 will improve that further.

This is a difficult topic to comment on for someone who isn't 100% in the know, and I often make incorrect assumptions myself.

It would be interesting indeed to get the actual energy efficiency, but quadsort being 2-3x faster than timsort on random data pretty much guarantees it is. I'm not aware of an easy method to measure energy consumption, but it would be a better metric in this day and age.

As for mergesort vs quicksort, this is pretty much a draw and it heavily depends on data type and comparison type. Fluxsort so far appears to confirm that a hybrid mergesort/quicksort is the best overall by taking advantage of the strengths of both algorithms.

Adaptation is coming along slowly but steadily. Several people have started incorporating my branchless bidirectional merge and branchless stable partition concepts, among other things, as fluxsort contains quite a few novelties. My work on binary searching and array rotations could be even more important to reducing the energy footprint.


I'd be surprised if this is so good - it's a small instance of a sorting network [1], which has been analyzed to death for around 70 years. I first read about them from TAOCP decades ago, and have used pieces of them in algorithm design before. There's still progress in the field, but I don't see much in quadsort that isn't a direction beat to death decades ago.

But if so, it would be cool. It's always interesting to see stuff pushed to the edge. Maybe if I get time I'll build instrumentation to measure all the pieces and see what I find :)

>so in any instance where data is more likely to be orderly than disorderly this shift in probability will give an advantage

This is pretty hand wavy - for example, standard sorting networks sorts 4 values in 5 compare-and-swaps always (AB,CD,AC,BD,BC). Quadsort over the 24 possible input orders averages 5.17, but with much worse locality and branch patterns (the 5 CAS can be done with zero branches, for many data types, for example). I've not found good empirical evidence on what happens in practice to tell if, for whatever distribution of things occur in practice, if quadsort is better even at expected number of compares.

And, if quadsort is better, you can easily do any number of n-sort, and using something like Z3 theorem prover to find optimal code for just about any set of conditions you want to model. But all this stuff has been done forever, and in practice such results end up worse enough that the algorithms don't get widespread or published at all.

https://en.wikipedia.org/wiki/Sorting_network

https://ieeexplore.ieee.org/document/53587


The latest quadsort version sorts 8 elements at a time. It's not really a sorting network, and it wouldn't have worked very well prior to branch prediction, or having a multitude of registers.

So basically all the work that had been done for decades has become useless because it was carried out on different hardware. It's possible that 2 or 3 people wrote an algorithm similar to quadsort, and discarded it for being slow.

Anyways, quadsort is in fact exceptionally fast. I understand people doubt the benchmark and think it is rigged or something. I assure you it is not, plenty of people have checked and ran the benchmark themselves.


>The latest quadsort version sorts 8 elements at a time. It's not really a sorting network,

That is precisely the definition of a sorting network. A sorting network is a fixed size sorter - that is all.

>it wouldn't have worked very well prior to branch prediction, or having a multitude of registers.

Both around for over 60 years...... In fact, both were in production hardware before any of these sorts were invented. Do you just make stuff up?

>So basically all the work that had been done for decades has become useless

Your baffling dismissiveness of things apparently you have little or no understanding of makes me highly disbelieving you're doing decent analysis on your sort. This is not the sign of a careful, thoughtful algorithm designer.

> plenty of people have checked and ran the benchmark themselves.

I don't doubt you have benchmarks where it's better - that's trivial to do for any sorting network idea - that is well trodden theory. I posted above how to make data that falls on either side of the argument. Making a sort that performs well over significant variety in real-world data without suddenly hitting bad downsides is the hard part. Making some datasets faster makes others slower when you're fiddling with number of comparisons dynamically.

That you are unaware of the past and dismiss it out of hand is not a good way to ensure you're not fooling yourself.


Here you can see how an older version of quadsort beats some benchmarks that include the number of comparisons: https://www.raygard.net/2022/03/09/Re-engineering-a-qsort-pa...


I agree one can make data for which quadsort has less compares. But one can also make data where it's worse. It's provably worse over each of the 24 orderings of possible data when uniformly weighted. When your benchmarks are weighted to cases where it's better, you'll conclude it's better.

But that doesn't really cover what will happen in the real world. That distribution is what will matter.

As I pointed out, this line you're chasing was done decades ago. I posted one paper along this line. Things it cites and things citing it likely cover the method you're working on. Theres significant literature on your method and lots of generalizations. I'd be surprised if somehow all those researchers missed this.


Thank you for these insights!

To measure the energy efficiency as simply as possible, I wonder if it could be sufficient to run a big-enough benchmark on a laptop and check the battery charge level before and after.


Are there any other examples of such hand crafted algorithms that perform better than theoretically best algorithms?


> WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Why not run the benchmarks in an environment that matters? Latest stable gcc is 12 and I wouldn't think anybody runs production code on WSL.


> I wouldn't think anybody runs production code on WSL.

I wouldn't be so sure - just couple of days ago, on WSL subreddit was a question on setting it up on Windows Server 2019/2022 and that sounds quite worrying for me, that someone actually _DOES_ it.


> someone actually _DOES_ it.

Sure, but my point stands: It still doesn't matter as a production platform :)

    hncat 34568832 | sed "s/runs production code on WSL/cares about WSL as a production platform/g" 
if you will.

edit: https://github.com/plq/hncat :)


https://gist.github.com/twirrim/f25fc35939c521a096d64c20eb34... I gave it a quick short, Intel(R) Core(TM) i7-8665U, Ubuntu 22.04, gcc 11.3.0.


> ... I wouldn't think anybody runs production code on ...

famous last words. There ought to be something like rule 34 for software: "If it can conceivably be used in that way, somebody's doing just that in production, somewhere safety critical."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: