
Good Knuth, Bad Knuth - vinutheraj
http://dev.netcetera.org/blog/2007/08/24/good-knuth-bad-knuth/
======
fhars
As far as I can tell, the "correct" shuffle in that post is wrong, int(rand()
* $i) is always smaller than $i, so it will never generate a permutation where
an element stays at the same position.

It is obvious from the numbers, too: the loop goes from $length - 1 to 1, so
it has $length - 1 iterations, in the first iteration there are $length - 1
possible values for $r, in the second $length - 2, and the last iteration will
always swap elements 0 and 1, for a total of (N-1)! different paths through
the program, which are clearly not sufficient to generate N! different
permutations.

[Edit: I had an off-by-one in my calculations, too. I hope it is correct now.]

[Edit: Actually, there still is an error in the first sentence: the algoritm
will never leave the current element in its place, so the only element that is
guaranteed not to stay at the same position is the last one. Run this:

    
    
      for my $i (1..1000000) {
        my @a = (1..10);
        my @b = shuffle(@a);
        if ($b[10] == 10) {
    	die "Criticism is wrong $i";
        }
      }
    ]

~~~
fhars
After a good night's sleep I see that my second edit is in fact wrong and my
intuition in the first sentence was right (and the code in my edit fell victim
to the same off-by-one as the code in the article).

Final proof:

    
    
      for my $i (1..1000000) {
        my @a = (0..10);
        my @b = shuffle(@a);
        for $j (0..10) {
    	if ($b[$j] == $j) {
    	    die "Criticism is wrong $i";
    	}
        }
      }

------
astine
I can honestly say that that isn't the algorithm that would have occured to
me. In CL, this is the what I would have thought to do:

    
    
      (defun butnth (index list)
        (append (subseq list 0 index)
      	    (subseq list (1+ index))))
    
      (defun shuffle (list)
        (unless (null list)
          (let ((index (random (list-length list))))
            (cons (nth index list)
      	      (shuffle (butnth index list))))))
    

Building a second sequence by randomly selecting nodes from the old sequnece
and inserting them into the new sequence seems more intuitive to me. It
certainly works better with lists. If I where working with arays, the OPs
method would be more efficient.

~~~
crux_
Won't that explode somewhat performance-wise once you have more than a handful
of cards? It looks like something along the lines of O(n^2)...

~~~
astine
As it's written I'll blow the call stack before I see a performance decrease.
:)

Actually, the algorithm is O(n), it's essentially a list reversal with a call
to random inserted. In psuedocode:

    
    
      oldlist = [...]
      newlist = []
      
      if oldlist not empty
      do
        x = remove random from oldlist
       insert x into newlist
      repeat
    

Of course, I'm ignoring the complexity of 'butnth' in that assesment, but this
was just a 'first instinct' sort of deal.

~~~
kmcgivney
I didn't like the butnth. I get an allergic reaction to append. Blame Ken
Tilton.

    
    
      (defun shuffle-in-place (l)
        (cond ((null l) nil)
              (t (rotatef (nth 0 l)
                          (nth (random (length l)) l))
                 (shuffle-in-place (rest l))))
        l)
    
      (defun shuffle (l)
        (shuffle-in-place (copy-list l)))

~~~
astine
_Blame Ken Tilton._

I do. Always.

One could use nconc instead of append, but that's not where the processor
cycles are being lost as far as I can tell.

------
rimantas
Some nice graphs: <http://www.codinghorror.com/blog/archives/001015.html>

------
jrockway
The best way to get this right in Perl:

    
    
        use List::Util qw(shuffle);
        shuffle(@list);

~~~
astine
But that misses the point of the exercise. :)

------
songism
This is how python's shuffle works:

    
    
      def shuffle(x):
        for i in reversed(xrange(1, len(x))):
          j = int(random() * (i+1))
          x[i], x[j] = x[j], x[i]
    

In the first loop the last element in the list 'x' has a chance to be
exchanged with itself or any of the other elements. Then the second to last
element has a chance to be exchanged with itself or any other element.

All the way down to the second to last element, which has the chance to be
exchanged with itself or the first element.

This is exactly how the good Knuth shuffle algorithm works, right?

------
allenbrunson
I just happen to have implemented something like this myself recently, which
seems easier to me. here it is: generate a large random number for each card,
then sort the list according to the random numbers. Can anybody here poke
holes in that one?

~~~
fhars
It's O(N * log(N)) instead of O(N).

And of course it has the usual problem of generating permutations using a
pseudo random number generater: Even for moderately sized collections, the
number of permutations is far larger than the number of differnt
initialization states of many common PRNGs, so even if an algorithm looks
correct, it may not be able to generate all permutations. For example, a 63
bit linear congruential shift generator has 2^63 = 9x10^18 possible sequences,
while there are about 52! = 80x10^66 permutations of 52 cards.

~~~
allenbrunson
i'm not really worried about the complexity. it's plenty fast enough for my
needs. but i _am_ worried about getting decent results. i suck at anything
that smells like math, but i think you're saying that the distribution of
random numbers might not be random enough to get decent results, right?

is there some random number generator in a C-like language i should
investigate, as opposed to using srand() and rand()?

~~~
jrp
You probably have something like 2^32 ways to seed your RNG. But there are N!
possible outcomes of the shuffle, so at most 2^32 of those will appear (one
per seed).

2^32 = 4294967296

100! is about 10^158

So for just 100 items, you're already missing most of your outcome space. You
might switch to a generator with a bigger seed. For this example with 100
items, you need at least

log(100 !) / log(2) = 524.764993

bits in your seed. For N items to be sorted, you need about NlogN bits in your
seed. So if you're trying to actually make each outcome possible, then no
matter which clever algorithm you use to go from random numbers to a shuffled
sequence, you've already lost.

~~~
allenbrunson
hrm. never thought of it that way before. but assuming I seed the prng only
once at the beginning of a program run, i could hit every possible shuffle on
subsequent deals, couldn't I?

~~~
fhars
No, you couldn't, that's what this subthread is about. A PRNG with 32 bit
entropy will only ever produce 2^32 different sequences of random numbers and
so only 2^32 different shuffles. If you want more, you must introduce more
entropy, either by using a larger PRNG, or using a hardware RNG. Calling the
same PRNG fro diffferent threads (don't forget your synchronization
primitives!) might also work as a cheap source of entropy but will likely have
either bad statistical properties or horrible performance: for it to make a
real difference, every single shuffle of a deck must be interspersed with many
calls to the PRNG from other threads. If calls from other threads do only
happen between shuffles, all you gain from introducing shared state
concurrency into your program is that the same 2^32 permutations are generated
in a different order.

But, as I remarked earlier, if this is just for a game that is for fun where
opponents can't cheat you or other players out of large amounts of money by
predicting which permutations may or may not occurr, you may get away with
whatever PRNG you have at hand and only swap is out once many players start to
complain that they had this very hand one or two billon games ago.

------
thisrod
I'm surprised that the median job candidate gets that wrong. It isn't an
especially hard question, and I would have assumed professional programers had
a pretty good instinct for combinatorics.

------
rflrob
So presumably if the naive interpretation is wrong, then the bias should be
predictable. Any ideas how to go about determining which permutations are more
likely than they ought to be?

------
Tichy
It annoyed me when I heard that some guys slapped their name on that algorithm
(having "invented" it independently for a game). It seems like another of
those not patent worthy obvious things. Why is it obvious: because you should
realise that there is a random number generator in your programming language.
Therefore shuffling is silly. You want to assign the cards to slots determined
by the random number generator directly. From there on it is pretty
straightforward. </brag>

~~~
scott_s
<http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle>

~~~
Tichy
Yes? That is what I described - I left the swapping part as an exercise to the
reader (since it is already described in the article submission). It seems to
be the only logical way (I have not tried to think of another way, though -
but none has been mentioned yet on HN, either).

~~~
scott_s
They didn't "slap their name on it." Other people refer to it as their
algorithm because they came up with it first. It's obvious to you that someone
would want an algorithm for a random shuffle of a sequence. It wasn't obvious
at the time.

~~~
Tichy
What I was trying to say is that the algorithm is the obvious consequence of
wanting to assign a sequence to random places in an array. I wanted it because
I was programming a game. I think the same must have happened to many people.
Anyway, no biggie - of course the world doesn't end because now the algorithm
has a name.

~~~
scott_s
And what you're missing is that they realized people might want to randomly
shuffle a sequence before there was obvious applications for it.

