
Skylinesort - daniel-cussen
http://www.skylinesort.com
======
daniel-cussen
Skylinesort is a novel sorting algorithm that sorts integer keys. It
outperforms current sorting algorithms when keys are clustered (not uniformly
randomly distributed). I showed it to some profs at Stanford, and might
publish it formally.

It sorts in time O(k _u-u_ log(u)+n), where k is the number of bits in each
key, u is the number of unique elements, and n is the total number of
elements.

~~~
lonelappde
Why make a website and a domain name before getting peer review to check for
mistakes?

~~~
daniel-cussen
I did a fair amount of that, but there was a lack of interest because it's
just a sorting algorithm. Sorting is considered dead as a field, it's
basically solved.

------
wjmao88
So I don't really understand this after reading the explanations.

1\. You claim to have explained the height of the sticks "later". Maybe you
had it explained in one of the paragraphs, but I just don't see it.

2\. In the diagram that is supposed to illustrate the left sweeping part, it
has fancy windows and doors but doesn't have anything to explain why the lines
are drawn that way. The home page animation also didn't give me any insight on
how that part is supposed to work.

~~~
cormacrelf
Stack height just visualises the repeated transition function for "next spot
to jump to". You don't have to compute the stick heights and do searches, just
execute p <\- (p | (p + 1)) & MASK for a "right horizontal line". This
transition is more simply explained as "set the least significant 0 bit". The
important thing is that each successive jump is at least double the size of
the previous one, which intuitively gives you the log2 in the worst case
complexity. There is a paragraph on right skipping, but not left skipping.

For left skipping AKA "gather", what you're trying to do is log-skip across
contiguous zeroes in the auxiliary array. So you skip bigger and bigger
regions until you hit a nonzero spot. I'd have to think a bit more to prove
that you do actually hit the next number, but the visualisation of the
transitions should get you in the ballpark: the tall sticks tend to get
numbers written under them by the lesser sticks to their left, and when you
are skipping from the right, you tend to hit the tall sticks.

The explanation doesn't mention it, but the left-skips are defined as p <\-
(((p+1) & p) - 1) & MASK. This is also simply explained: "clear any contiguous
1s pinned to the least significant end, then subtract 1 to create more 1s"
e.g.:

    
    
        1100 1100 (      =204)
        1100 1011 (-1,   =203)
        1100 0111 (-4,   =199)
        1011 1111 (-8,   =191)
        0111 1111 (-64,  =127)
        1111 1111 (-128, =255, wrapped)
    
    

So the "buildings" get wider as you go, until you hit another nonzero spot,
and you start drawing thin buildings again because you start from the element
to its left, whose binary representation ends with 11*0.

The bitwise ops are the key to understanding why you can draw the sticks at
those heights: for each index i, simply draw to height = 1 + the # of trailing
1s in i. Many trailing ones makes for bigger jumps in either direction. So the
wide buildings are also tall.

~~~
daniel-cussen
Thanks so much for this great, novel explanation.

~~~
cormacrelf
You have my permission to use it if you like!

------
bitexploder
Thanks for sharing this. Often sorting algorithms in practice end up being
glommed into a Swiss Army knife style sort function to avoid pathological
cases. E.g for smaller inputs just use insertion sort, but swap to a different
sort of you know the size, etc. it’s pretty common in standard libraries (I
think). So I am curious: are any pathological cases for skyline sort?

~~~
cormacrelf
Well, it requires a ton of (virtual) memory for [0, 1e9], so much you can't do
it on 32-bit machines. If you can figure out that a particular quicksort/etc
subproblem has a pretty small abs(max-min), then cool, go for it, but try to
reuse the allocations for other subproblems.

With this memory issue, it's not a general purpose sort, nor does it have
subproblems that you can defer to other sorts, so you wouldn't be able to use
it as a default in those "glomsorts" anyway. So plugging pathological gaps is
a bit irrelevant.

~~~
daniel-cussen
It definitely requires a ton of virtual memory, and yes, you can't do 32-bit
word sorting on 32-bit machines. There's actually a use case involving 24-bit
words where the number of occurrences of each word doesn't matter, and where
they necessarily start to cluster after each iteration. So there is a use case
(I might add it's tailor-made for this use-case). But you can still e.g. sort
database keys which might only have a 26-bit maximum, and it'll take advantage
of any clustering that exists in the data, like if in a log you get some
customers appearing more often than others, or more frequently in streaks. It
also works great when your data is in ranges and you don't want to manually
account for this: this will skip that entire empty range in logarithmic time.
(Example, all your keys are between 1000-1100 and 5000-6000. The empty range
1100-4999 will be traversed in like 12 steps.)

------
AlEinstein
Is this cache friendly on a modern cpu? On the face of it, it seems to be
quite cache unfriendly.

~~~
daniel-cussen
I was surprised to find that it is, especially if the data is clustered. What
ends up happening is the elements near the top of the bracket get cached so
they can usually be accessed quickly.

Note that if your data is uniformly randomly distributed the caching will hurt
and it will be slower than quicksort, as this is quicksort's optimal use case.

------
alexandercrohde
How does this differ from a pigeonhole sort?

~~~
daniel-cussen
It is a type of pigeonhole sort. The benefit is you can use a much greater
number of pigeonholes because you don't have to check them all when collecting
your entries.

That's the benefit of skylinesort: you can skip huge ranges of empty
pigeonholes.

