
Unbiased Randomization with the Fisher-Yates Shuffle - numo16
http://spin.atomicobject.com/2014/08/11/fisher-yates-shuffle-randomization-algorithm/
======
TheLoneWolfling
Fisher-Yates is only unbiased in the event of a "true" RNG as opposed to a
PRNG with a less than absurd seed length.

In practice, say you're shuffling a 52-card deck - that's 52! orderings. Even
if you have a 128-bit random seed, that's still nowhere near enough. You need
at least a log2(52!) = 226 bit seed to cover all orderings, even assuming
every seed produces a unique sequence of numbers, by the pigeonhole principle.

And it's even worse than that. Say you have a 226 bit seed. Well, you _still_
won't have a uniform distribution. Why? Because 52! does not evenly divide
2^226. Even assuming a "perfect" PRNG (again: where every seed produces a
unique sequence of numbers), you'll end up hitting some deck orderings less
than others. In this case, it's a difference between some deck orderings being
hit once and others being hit twice - which can make a big difference! You can
mitigate this by either discarding and reinitializing your PRNG if the initial
seed is >= 52!, or just choosing a bit length long enough that the effect
becomes minimal.

It's like trying to pick a number between 0 and 45 with a 100 sided die - if
you just take the number rolled mod 45, you'll end up with 0 through 9 picked
more often than they should be. Instead, you have to roll until you get a
number between 0 and 89, inclusive, then take mod 45.

(Oh, and as for the response of "just pick a larger seed", say you have a card
game played with 4 decks (for example Shanghai rum with a bunch of players).
Then you need a seed length of >=1307 bits. I cannot think of any PRNGs with a
seed that large.)

~~~
twisted1827
Python's Mersenne Twister was first seeded with 32 bytes, changed to 2500
bytes here:

[http://hg.python.org/cpython/rev/7b5265752942](http://hg.python.org/cpython/rev/7b5265752942)

~~~
TheLoneWolfling
/dev/urandom on Linux only has an internal state of 1024 bits [1]

As Python's Mersenne Twister is seeded with urandom...

[1] [https://pthree.org/2014/07/21/the-linux-random-number-
genera...](https://pthree.org/2014/07/21/the-linux-random-number-generator/)

------
nobodysfool2
If you are using Python, you should use the standard 'random.shuffle' which
does use the Fisher-Yates algorithm.

~~~
silentbicycle
There's value in knowing how it's implemented.

~~~
taeric
Further, there is value in knowing what makes the incorrect ways incorrect. :)

Also, even in the article it noted to use a library if your language allows.

------
kiyoto
On second read, I realized that OP didn't even prove Fisher-Yates. I blogged
about this several weeks back, and here is an excerpt from it. Some of you
might find it useful.

"This is easy to verify for N = 2: You are just flipping a coin to decide if
you swap 1 with 2. For N > 2, you just need to show that each of 1 through N
has an equal chance of getting the k-th slot for 1 through N. In the first
step, every number has a 1/N probability of getting into the first slot. For
all other slots: the number has (1-1/N) chance of getting/staying there, then
by induction, all slots are equally likely, hence (1-1/N)*1/(N-1) = 1/N. This
completes the proof."

------
prakashk
List::Util [1], part of Perl 5 standard library, offers shuffle function which
implements Fisher-Yates [2].

[1]: [https://metacpan.org/pod/List::Util#values-shuffle-
values](https://metacpan.org/pod/List::Util#values-shuffle-values) [2]:
[http://www.perlmonks.org/?node_id=1869](http://www.perlmonks.org/?node_id=1869)

