
Show HN: A visual explanation of Fisher–Yates shuffle - mbostock
http://bost.ocks.org/mike/shuffle/
======
gerggerg
Beautiful. I love the visual demonstrations of the less efficient methods as
well. This is a great learning tool for a super common programming problem.
One general comment/question. Code like this trips me up:

    
    
      t = array[--m];
    

Well I don't know if it trips me up so much as it makes me think too much
about syntax especially when the focus is on the algorithm itself. When
writing educational code I find the following more effective:

    
    
      m -= 1;
      t = array[m];
    

Anyone else have any thoughts?

~~~
Jagat
Most of the mainstream languages support these operators and programmers are
encouraged to use these operators since they make the code much more elegant
and readable. Even an average programmer would be able to understand the
nuances of post/pre increment/decrement operators.

~~~
gerggerg
I definitely understand it. It's not a matter of understanding the syntax.
It's a matter of teaching people most effectively. Here's a comparison of the
spoken explanation of the code:

    
    
      't' equals the value of 'array' at index 'm' minus 1 and 'm' equals itself minus 1.
    

vs

    
    
      'm' equals itself minus 1.
      't' equals the value of 'array' at index 'm'.
    

To me one of those seems more concise for writing production code, and the
other more useful for teaching algorithms.

~~~
Jare
That explanation is not how someone who understands the syntax would read the
statement. In general, when an expression contains one pre- (or post-)
operator, the reader will take that action before (or after) the rest:
"decrement then index" in both cases.

It sounds confusing that the pieces of such a statement are not executed in
the order that they are written, but in reality it's not any more strange than
the fact that "a = b;" reads "evaluate b and store in a".

The real problem happens when there are multiple operators like that in the
same statement, or multiple appearances of a post/pre-inc/dec-remented
variable and you need to parse operator precedence in order to figure out the
order of execution. As usual, abusing the syntax leads to confusing code, but
simple and straightforward cases like "a = b[--i];" are cleaner and easier to
read for anyone competent enough to matter.

~~~
gerggerg
"competent enough to matter" is a bit condescending don't you think?

------
benmathes
Pet Peeve of mine with math: Fisher and Yates have come up with something
interesting here, yes. But like most mathematical theorems named after
someone, attaching their names to it doesn't help understanding of what it
does at all.

I don't name my functions benmathes_foo().

~~~
burgerbrain
To be fair, the wikipedia page for the algorithm _is_ called "Fisher–Yates
_shuffle_ ". This is in my experience a common way to call an algorithm. I
can't think of a better way to do it.

~~~
benmathes
After putting a few seconds of thought into it, "In place shuffle" explains
what it does better than "Fisher-Yates shuffle".

I'm not complaining about Fisher or Yates, who are just following the example
their/our research community uses.

------
josephmoniz
The final method uses a really common programming idiom for optimizing the
removal of elements in a std::vector known as "swap and pop". Swap and pop
works by taking advantage of the facts that the most efficient element you can
remove from an array/vector is the last one by `pop`ing it off and that the
preserved order of the given set isn't important.

    
    
        // swap and pop in C++11
        int offset = 2; // remove the second element from the vector
        int end = vec.size() - 1;
        
        auto tmp = vec[offset];
        vec[offset] = vec[end];
        vec.pop();
        // `tmp` now contains the value removed from the std::vector

~~~
psykotic
Yeah, aside from the swap trick, there's very little to the Fisher-Yates-Knuth
shuffle. It's just following the standard recursive recipe for generating
permutations: to generate all permutations on n distinct elements, try every
element in the first place concatenated with every permutation of the n-1
remaining elements.

You can implement this directly with hash sets in O(n) time, assuming you're
willing to treat hash set insertion and deletion as O(1) operations:

    
    
        # O(n) time
        def shuffle(elems):
            elems = set(elems)    # O(n) time
            xs = []
            while elems:
                x = choose(elems) # O(1) time (expected)
                xs.append(x)
                elems.remove(x)   # O(1) time
            return xs
    

This whole approach is obvious when you stop thinking of the problem as
shuffling an array in place and start thinking of it as randomly generating a
permutation of elements drawn from a set. In the final analysis, the optimized
in-place implementation of the algorithm with the swap trick does provide in-
place shuffling; the way the two halves of the array are used to contain
respectively the partial result and an auxiliary set of remaining elements is
very much like heap sort.

Not coincidentally, if you interpret the algorithm as I wrote it above
monadically, then it can be used to generate either a random permutation or
all permutations, depending on whether choose() is interpreted in the
probability monad or the list (non-determinism) monad. This shows conclusively
that the shuffling algorithm is really just the standard combinatorial
generation algorithm for permutations adapted to random sampling. If so, you'd
also expect the swap trick to be useful for generating all permutations in
place. Well, of course:

    
    
        void perms(int *xs, int n, int i)
        {
            if (i == n) {
                for (int j = 0; j < n; j++)
                    printf("%d ", xs[j]);
                puts("");
                return;
            }
    
            for (int j = i; j < n; j++) {
                swap(xs + i, xs + j);
                perms(xs, n, i + 1);
                swap(xs + i, xs + j);
            }
        }

~~~
longlivedeath
I find it a bit hard to see how to reinterpret your Python code in the list
monad. Can you post a Haskell version of this function?

~~~
psykotic
Hey, sorry for the late reply.

It seems pretty obvious to me, so I'm not sure what is confusing you.

    
    
        class Monad m => ChoiceMonad m where
            choice :: [a] -> m a
    

For the list monad, choice is just the identity function. For the probability
monad, choice picks a random element from the list. (Ideally choice would take
a random access structure like a set as its argument, but I'm doing it the
simplest way here.)

To implement shuffle, you more or less just transliterate the Python code as
you'd expect.

Before you tackle this, make you sure completely understand how the same
situation plays out for the simpler case of subsets:

    
    
        subset        :: ChoiceMonad m => [a] -> m [a]
        subset []     = return []
        subset (x:xs) = do b <- choice [True, False]
                           xs' <- subset xs
                           return (if b then x:xs' else xs')
    

Over the list monad, this generates a list of all subsets. Over the
probability monad, it generates a random subset.

~~~
longlivedeath
Thank you, now I see. Here's my version:

    
    
        shuffle :: (ChoiceMonad m, Eq a) => [a] -> m [a]
        shuffle [] = return []
        shuffle elems = do 
          x <- choice elems
          xs <- shuffle (x `delete` elems)
          return (x:xs)

------
jnbiche
I love the d3.js data visualization framework, which is what Mike Bostock (the
creator of d3, as it happens) used the create this page. Well done, Mike.

Does anyone have suggestions for an algorithm for me to use d3.js to
illustrate visually? This is such an effective learning tool that I would
probably benefit from some review. I was thinking of trying a graph or tree
search algorithm, maybe's Dijkstra's. Any other suggestions?

~~~
petercooper
Clustering can be visualized quite well and there are several families of
algorithms that could be run over similar data, such as k-means clustering or
DBSCAN.

------
lnanek
This is a pretty common way to use arrays, at least in game programming on
earlier versions of Android where you couldn't garbage collect during a frame
due to the delays. You'd often have big pre-allocated arrays with a marker for
how much is currently used. That or object pools when you needed someone more
than arrays.

------
manifold
Great article. This is probably a question for the statisticians: Would
repeatedly sampling from a uniform distribution mean that on average the next
card chosen will come from the middle of the remaining pack (as repeatedly
sampling from a uniform distribution gives a normal distribution), or does the
decreasing sample size somehow cancel that effect out? Then at the same time
the algorithm is moving cards from the end towards the middle. Intuitively it
feels like the shuffle ought to be biased towards selecting the middle-end of
the pack first, but wikipedia indicates it's unbiased. Any help with getting
further intuition?

~~~
klochner
Repeatedly sampling from a uniform distribution gives a uniform distribution -
uniform means the probability is the same for all cards.

~~~
manifold
Yes, but the average value of the sampled values will tend towards a Normal
distribution by the central limit theorem, and I'm probably misapplying it by
wondering whether the next sample tends to be from nearer the middle. I think
skimbrel is right that as the sample being drawn from is decreasing each time
the CLT will be invalidated anyway.

~~~
klochner
Sorry to be blunt, but you're confused.

The _average_ of the sampled values will be normally distributed, but the
values themselves will be uniform.

If you sample every number once from [0,100], the average is 50, but that
doesn't imply that the distribution of the sampled elements is normal - it's
uniform.

His question of whether "on average, does the next card come from the middle"
(which it doesn't) isn't the same as "is the mean of the next sampled card in
the middle" (which it is).

The language was a little misleading - he said "on average" when he intended
"most frequently".

~~~
manifold
Ok thanks for clearing that up. I hadn't appreciated the difference between
the two.

------
mef
Great post, and extra points for visualizations that work on the iPhone.

~~~
justincormack
It uses d3.js, which pretty much requires a modern browser with SVG. I suspect
you will find it doesn't work on IE8.

------
iamwil
I have a question. Why does the shuffled deck have the cards criss-crossed
with its neighbors? I'm not sure what it's suppose to represent.

~~~
iamwil
Nevermind. I just answered my own question. It's a result of the original deck
being fanned visually. So when he reshuffles the deck, the cards are still
orientated in the same direction as they were in the original deck, so you get
the visual side-effect of having cross-crossed cards. So you can tell
approximately which side of the deck the card was from originally. And by
visual estimation, a properly shuffled deck should have a combination of cards
orientated in all different directions.

~~~
CountHackulus
Which if you look at the sorting visualization links at the bottom, is
actually a really effective way of showing when things are very out of order,
or close to in order.

------
substack
Thanks for posting this! I just updating my own silly shuffle module
(<https://github.com/substack/node-deck>) to work using the last in-place
algorithm for uniform shuffling and I saw a huge speedup.

------
joechapman
In your last example you could do; [array[m], array[i]] = [array[i], array[m]]

And forget about t entirely.

