

How to shuffle an array correctly - jwilliams
http://adrianquark.blogspot.com/2008/09/how-to-shuffle-array-correctly.html

======
jacobscott
Shuffling a deck of cards is a very simple problem in randomized algorithms
and something that anyone who took a course in the area should know how to do
(maybe not the fastest way, or with a full proof of correctness, but still).

Here's an easy way that mirrors insertion sort:

for i=0 to a.length-1

    
    
       x <- random number between i and a.length-1
    
       swap(a[i], a[x])
    

induction gives a proof of correctness

[note: as sown points out this is known as the Knuth shuffle
<http://en.wikipedia.org/wiki/Knuth_shuffle>]

------
sown
<http://en.wikipedia.org/wiki/Knuth_shuffle>

Neat!

------
ars
Maybe someone knows:

Is there anyway to generate a list of numbers in random order, containing each
number exactly once?

i.e. use a random number generator to generate a list of number from 1 to x,
containing each number between 1 and x exactly once, but in random order?

The initial purpose I wanted it for was a hard disk burn in test - read every
single block on the hard disk in random order, but I need to guarantee I read
each and every block exactly once.

Storing the entire array in memory first would use far too much memory.

(It does not need to be cryptographically secure at all, just random.)

Any ideas?

~~~
lincolnq
I think number theory might help. I don't know much about it, but the way most
number generators work (as I understand it) is that they are a representation
of a big number which is the internal state, and that internal state
completely traverses some large space (2 to some large power) where it hits
everything once, and you are only observing a few bits at a time of the value
of the internal state.

That leads me to think that if you used a small state space (close to size x -
maybe the next prime greater than x) you could walk the whole space and hit
every number once.

Example: take two primes p and q (I think they might only have to be
relatively prime), where q >= x. I'll use x=6, p=19 and q=7.

Observe multiples of p: 0,19,38,57,76,95,114

Mod them by q: 0,5,3,1,6,4,2,0...

Take the first q results, but drop the ones >= x: 0,5,3,1,4,2

If you want a different series, just use a different p. You can skip any
number of initial results if you want to start somewhere later in the state
space.

I guess it's worth noting that selection of p and q makes a difference to how
"random" it looks -- if p and q are only 1 apart, it will just count up from
0, etc. I don't have a good intuition for what values produce "random"-looking
results.

And yeah, this method is definitely not cryptographically secure in any way.

(Any number theorists around to tell me this is a stupid idea?)

Edit: this isn't that great, because the pattern is obvious (19 == -2 mod 7)
so it's equivalent to subtracting 2 each time. Maybe if you pick a q three
times the size of your x (15) and still drop the ones greater than x, you
could get it to look more "random". Hmm.

------
aston
The moral of this story is: Amazon asks candidates some poor questions. Very
difficult sort of a problem to solve on your feet, especially since "random"
and "stochastic" are so related in people's minds.

~~~
Tichy
I had to come up with such an algorithm a while ago and don't remember it
being very difficult. Later I learned "my" algorithm is officially
acknowledged, there was a HN post about it a while ago (some other guys stole
my idea and slapped their name on the algorithm, damn it).

Of course my Java-Dev-friend only answered "just use the shuffle method of the
Array collection".

Besides, I'd rather work for a company that asks too difficult questions than
too easy ones.

~~~
aston
If you get it right the first time, you're probably just lucky. When was the
last time you had to formulate an inductive proof at a career fair? And why
should you be expected to? If you already know about the Knuth shuffle and you
get it right, the question wasn't worth asking in the first place.

~~~
Tichy
No I wasn't lucky. I briefly considered various alternatives and settled for
the correct one. What made me select it was the thought "wait a minute, there
already is a random function in my programming language, why not use it". I
actually find it hard to think of a suitable alternative for that algorithm.

Also, I don't think the question was for an inductive proof.

To me the question seems exactly right, not too easy and not too hard.

~~~
aston
Maybe I'm not being clear. The difference between the right algorithm and the
wrong one here is whether or not items that have already been randomly swapped
can be swapped again. Simply performing n random selections on the list is
wrong, and the easiest way to see why it doesn't work is harder math than you
probably want to ask people for. Meanwhile, proving the other answer correct
does require an inductive argument (pretty similar to the one that a selection
sort works).

If your intuition led you to the right answer, that's cool, but for those who
guess wrong, it sucks for them. The question doesn't do a good job of
distinguishing between people who are smart and people who aren't because both
the right and wrong answers seem pretty right because they're both stochastic.
Unfortunately, only one is truly random.

~~~
Tichy
To be fair, I don't think I went through as elaborate mathematical
considerations as the writer of the article. But even he mentions that he
immediately had a bad feeling about the wrong solution, simply because it is
doing too much work.

The step for me was to move away from the stochastic problem and say "wait a
minute, there is a random generator built into my computer already, how can I
use it directly". Assuming a horde of mathematicians has already showed that
the inbuilt random numbers generator is as good as possible, that way I avoid
all the maths issues. And that is a step that seems reasonable to expect from
a good programmer.

Edit: just read the Knuth Shuffle entry at wikipedia. Happy that I even
reinvented the Knuth optimization myself ;-)

