Wait, so the authors' method is only a tiny fraction faster than the parallel version of the more general Rao-Sandelius method, and uses more random bits? But they claim it's "very fast" and "extremely fast"? Am I missing something?
Yeah, the experimental results are pretty poorly presented, but on the other hand the abstract says: "While this algorithm is of lesser practical interest, we believe it may be of theoretical value."
The number of random bits used by both algorithms is within a few percent of the theoretically optimal value (4.6% more for Rao-Sandelius and 5.5% more for MergeShuffle) so I don't think that's really a significant concern.
Does anyone know of a practical use for a very fast, parallelizable shuffle algorithm that uses few random bits? All the shuffling I've done has used small enough N that Fisher-Yates was just fine.
Technically, the lower bound is log2(n!), which is slightly less than n * log2(n).
According to the table of experimental results, the algorithm described in this paper and two of its competitors can all do better than n * log2(n) bits.
Just in case the authors are reading the comments: the second paragraph of section 1.1 ends mid-sentence, and even worse is missing the closing parenthesis!