The case for a learned sorting algorithm 72 points by cyb_ 9 months ago | hide | past | favorite | 21 comments

 I remember learning as part of my CS degree that knowing the distribution of your data allows you to sort it in N log(log(N)) instead of N log(N) (which assumes a comparison based sort), although back then it wasn't given such a fashionable name as "learned sort", and determining the distribution wasn't called "training"
 If you know the distribution of your data (in the sense that you are guaranteed that each datapoint is iid with a distribution described by some known PDF f : R -> R), you can sort in O(N) with probability 1.
 That's almost never the case in practice. What distribution the data is drawn from on the other hand can often be known or approximated.
 What do you mean "is drawn from"? Isn't the previous condition a reasonable characterization of "you know what distribution it's drawn from"?
 Let's say you draw 1 million samples from a uniform distribution between 0 and 1. I'm saying that if you know the distribution you can sort in O(N log(log(N))). Are you claiming you can sort them in O(N)?
 Assuming "1 million" is a proxy for taking the limit to infinity, i.e. the usual asymptotic setting, if you are willing to accept that the algorithm will "only" work with probability 1, then yes, I can.Let N be the size of the array. Create N new empty lists, numbered from 0 to N-1. Divide [0, 1) in N equal brackets B0 = [0, 1/n); B1 = [1/n, 2/n]... BN-1 = [(n-1)/n, 1). For each element, determine the bracket it falls in and append it to the corresponding list. Sort each list. Because of the uniformity hypothesis, for n -> infinity, the probability that each bracket contains O(1) elements goes to 1, therefore the running time is O(N) with probability 1.
 I'd really like to read up on this. Do you have a source?
 It's really dead simple, look at my other comment in the same thread. The only caveat is that all those fancy words before are actually necessary minimal hypotheses, not just fancy words.
 >The results show that our approach yields an average 3.38x performance improvement over C++ STL sort, which is an optimized Quick- sort hybrid, 1.49x improvement over sequential Radix Sort, and 5.54x improvement over a C++ implementation of Tim- sort, which is the default sorting function for Java and Python.This is a lot
 Would be curious: this performance result includes training time? I guess I should read the paper...
 The article says it does.
 Can this approach be made more robust so it could be used in critical programs? Is there a way to formally verify performance of an approach like this?
 Is the paper "NN-sort: Neural Network based Data Distribution-aware Sorting" [1] not a year prior to the cited paper? It seems to discuss the same approach.[1] NN-sort: Neural Network based Data Distribution-aware Sorting: https://arxiv.org/pdf/1907.08817.pdf
 The paper in this post doesn't use neural networks at all. It also includes the time to train the model in the sorting time, which the paper you linked doesn't. NN-sort trains on historic data to sort future data, which requires the distribution to be roughly the same, whearas this algorithm learns online.
 This paper does not look like a strong paper, in particular it's very terse when describing the way the neural network is trained, which fails to convince that they did not train the neural network model on the sorted input. They are also not clear on their timings, it's not sure they include the training time in their benchmarks. For these reasons, it does not look very reproducible, and there's no obligation to cite non-reproducible work.
 This one doesn’t use neural nets.
 That paper has not been published. You can't claim a paper is prior, just because a draft was available earlier. Incidently it was send to the same conference as the OP paper. But it seems like it wasn't accepted, while the OP paper was.
 this is not really the norm in computer science. it’s generally expected to cite things on arXiv although it is more of a grey area.
 Well they did right by not citing it, as it was rejected for publication.
 that’s not how things work. lots of well-regarded papers are initially rejected in peer review but are commonly discussed and cited while in the submission process at other venues.
 tl,dr: linear spline fitting to approximate a CDF and then recursive bucket sort similar to radix.

Applications are open for YC Winter 2022

Search: