
Wolfsort: An ultra-fast hybrid radix sort algorithm - scandum
https://github.com/scandum/wolfsort
======
HippoBaro
> Historically software was always low on memory, and subsequently sorting
> algorithms were developed that bent over backwards in order to use as little
> additional memory as possible. Nowadays most modern systems have several
> gigabytes of memory that are not used and are just sitting idle.

True, but on these systems, most of the time, users don't care about
performance. And when they do, the dataset is not large enough for this kind
of algorithms to shine anyway.

~~~
nine_k
Also, memory is way slower than the CPU cache(s). So operating within a small
subset of the available memory is still beneficial if you need the top
performance.

~~~
gridlockd
There often is only a narrow range where (data + overhead) is larger than
cache, but (data + 0) is smaller than cache.

Cached reads/writes arent't free either, so if overhead drastically reduces
those, you may come out on top 100% of the time.

------
CyberDildonics
> Wolfsort requires 8 times the array size in swap memory for the O(n)
> partitioning process.

> Wolfsort is primarily a proof of concept, currently it only properly
> supports unsigned 32 bit integers.

A lot of sorts can beat a generic sort on an array of numbers by gathering
statistics, paying better attention to locality, implementation details like
use of SIMD or some combination of those three.

~~~
cheez
Indeed, I wonder if someone can't write an ensemble algo of sorts that gets
chosen based on some AI (not to hand wave)

~~~
pdq
See Hybrid Sorting algorithms. Here's Introsort as an example:

[https://en.wikipedia.org/wiki/Introsort](https://en.wikipedia.org/wiki/Introsort)

~~~
cheez
All of these algos have the characteristic that a human is statically deciding
on the algo/thresholds. I think there is an opportunity for some AI here,
that's all I'm saying.

~~~
zelly
Even bubblesort would return in the time it would take to send the array to be
sorted to the GPU

~~~
rainbow-road
I assume the idea is to do some profiling and throw in some machine learning
to select the perfect mix at compile time.

~~~
xyproto
This is doable, but the risk is that looking at the numbers might take longer
than just sorting them, even with a sub optimal algorithm.

I guess someone will just have to try this.

~~~
cheez
Here is what a human did:

It begins with quicksort, it switches to heapsort when the recursion depth
exceeds a level based on (the logarithm of) the number of elements being
sorted and it switches to insertion sort when the number of elements is below
some threshold.

I think machines can do this too.

------
dthul
A bit OT: I very much appreciate well written C code for its simplicity and
"rawness", but how come that many C codebases seem to place no importance on
code comments?

This repo contains a few thousand lines of code without so much as a single
comment. In my experience that tends to happen much less in other languages.
And it's not like this C code is obvious.

~~~
bane
C has an interesting culture of terse variable and function names, and limited
comments. It comes from an entirely different mindset and time period in
software. It predates most "modern" concepts of software re-usability that
eventually turned into enterprise OOP practices and was crystalized in Java
and C#.

It's not great anymore. The software world has moved on to understanding how
important these things are (and even often takes things too far), but C seems
a bit stuck in its ways.

~~~
vii
The value of comments and references depends on the software context. If
inexperienced people are expected to be editing or reading the code they're
more valuable. However, they are expensive to keep in sync with changes to the
implementation and the value has to balanced against this cost.

Java and C# did not replace C. They're typically used for different tasks.

Objectively speaking, do we have evidence these enterprise OOP practices lead
to better software?

------
shin_lao
It seems to me that it's using a strategy similar to SpreadSort, or am I
wrong?

[https://en.wikipedia.org/wiki/Spreadsort](https://en.wikipedia.org/wiki/Spreadsort)

~~~
nedoka
Similar strategy. Wolfsort is stable however while SpreadSort does not appear
to be so.

I wonder if wolfsort is stable and faster because it utilizes quadsort?

[https://github.com/scandum/quadsort](https://github.com/scandum/quadsort)

~~~
aciswhat
It does, according to the Wolfsort's readme

------
usr1106
> Wolfsort is primarily a proof of concept, currently it only properly
> supports unsigned 32 bit integers.

Which makes it pretty useless outside of theoretical exercises(that is not
saying that theoretical exercises would be useless). What would be a practical
use case to sort integers? Well, the integers can be keys to some kind of
records, but if your records remain unsorted accessing them is very expensive
even if you have the sorted list of keys available.

Edit: Calculating the median is a practical case. Not sure how often that is
needed in performance-critical contexts.

~~~
zelly
All radix sorts only work for integers. This isn't unique to OP's algorithm.

~~~
kd5bjo
This is true in only the strictest sense, as computers only deal with
integers:

Any set of tuples/sequences can be lexicographically radix-sorted if its
components can be; strings are a special case, where the components are the
individual character codes.

Floating point numbers are integers with the high-order bits used for the
exponent. When normalized, they can be treated as integers for sorting.

All radix sort requires is a recursively partitionable domain, and it performs
optimally when the partitioning scheme matches the data distribution.

