Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A visual explanation of Fisher–Yates shuffle (ocks.org)
176 points by mbostock on Jan 14, 2012 | hide | past | favorite | 37 comments



Beautiful. I love the visual demonstrations of the less efficient methods as well. This is a great learning tool for a super common programming problem. One general comment/question. Code like this trips me up:

  t = array[--m];
Well I don't know if it trips me up so much as it makes me think too much about syntax especially when the focus is on the algorithm itself. When writing educational code I find the following more effective:

  m -= 1;
  t = array[m];
Anyone else have any thoughts?


I agree, if the intent is for pedagogical purposes (as in this case), the use of pre-increment and post-increment operators should probably be avoided. A good rule of thumb is that if you find yourself having to say "and" too many times when describing a single line -- then it's probably doing too much.

In fact, I was actually thinking more of this example:

  i = Math.floor(Math.random() * n--);
when I stumbled across your comment.

Otherwise, that's a minor criticism and it's a good page overall. In fact, it was one of the interview questions I was asked when I interviewed at Microsoft a long time ago. At the time, I didn't realize the algorithm actually had a name.


Point taken; I'll keep this in mind in future examples. I find it extremely tempting to write JavaScript as compact as possible, sometimes at the expense of readability. Writing pedagogical code requires a slightly different mindset.


I generally don't use it too much myself because I agree it tends to add a bit of confusion. A tiny bit but still.

I think that in that particular case the fact the array[m] is used in the next line makes things not as obvious as they could be visually. I feel that the following requires less reading:

  m -=1;
  t = array[m];
  array[m] = array[i];
  array[i] = t;
To me, it's faster to parse.


It doesn't bother me to read other people's code that does this, but I never decrement or increment within an assignment statement. I've just made too many off by one, index out of bounds errors over the years.


Most of the mainstream languages support these operators and programmers are encouraged to use these operators since they make the code much more elegant and readable. Even an average programmer would be able to understand the nuances of post/pre increment/decrement operators.


I definitely understand it. It's not a matter of understanding the syntax. It's a matter of teaching people most effectively. Here's a comparison of the spoken explanation of the code:

  't' equals the value of 'array' at index 'm' minus 1 and 'm' equals itself minus 1.
vs

  'm' equals itself minus 1.
  't' equals the value of 'array' at index 'm'.
To me one of those seems more concise for writing production code, and the other more useful for teaching algorithms.


That explanation is not how someone who understands the syntax would read the statement. In general, when an expression contains one pre- (or post-) operator, the reader will take that action before (or after) the rest: "decrement then index" in both cases.

It sounds confusing that the pieces of such a statement are not executed in the order that they are written, but in reality it's not any more strange than the fact that "a = b;" reads "evaluate b and store in a".

The real problem happens when there are multiple operators like that in the same statement, or multiple appearances of a post/pre-inc/dec-remented variable and you need to parse operator precedence in order to figure out the order of execution. As usual, abusing the syntax leads to confusing code, but simple and straightforward cases like "a = b[--i];" are cleaner and easier to read for anyone competent enough to matter.


"competent enough to matter" is a bit condescending don't you think?


Pet Peeve of mine with math: Fisher and Yates have come up with something interesting here, yes. But like most mathematical theorems named after someone, attaching their names to it doesn't help understanding of what it does at all.

I don't name my functions benmathes_foo().


And quicksort doesn't tell me anything about the algorithm either. It's just a name. Things need to be distinctly identified in a set. That's really the only purpose of a name anyway.


Attaching the name doesn't even tell you who came up with it, in general.

Unfortunately, it can be hard to generate names which are a) descriptive, b) distinct, c) short. But we've done pretty okay with sorting algorithms, and I definitely feel that more is possible.


To be fair, the wikipedia page for the algorithm is called "Fisher–Yates shuffle". This is in my experience a common way to call an algorithm. I can't think of a better way to do it.


After putting a few seconds of thought into it, "In place shuffle" explains what it does better than "Fisher-Yates shuffle".

I'm not complaining about Fisher or Yates, who are just following the example their/our research community uses.


The final method uses a really common programming idiom for optimizing the removal of elements in a std::vector known as "swap and pop". Swap and pop works by taking advantage of the facts that the most efficient element you can remove from an array/vector is the last one by `pop`ing it off and that the preserved order of the given set isn't important.

    // swap and pop in C++11
    int offset = 2; // remove the second element from the vector
    int end = vec.size() - 1;
    
    auto tmp = vec[offset];
    vec[offset] = vec[end];
    vec.pop();
    // `tmp` now contains the value removed from the std::vector


Yeah, aside from the swap trick, there's very little to the Fisher-Yates-Knuth shuffle. It's just following the standard recursive recipe for generating permutations: to generate all permutations on n distinct elements, try every element in the first place concatenated with every permutation of the n-1 remaining elements.

You can implement this directly with hash sets in O(n) time, assuming you're willing to treat hash set insertion and deletion as O(1) operations:

    # O(n) time
    def shuffle(elems):
        elems = set(elems)    # O(n) time
        xs = []
        while elems:
            x = choose(elems) # O(1) time (expected)
            xs.append(x)
            elems.remove(x)   # O(1) time
        return xs
This whole approach is obvious when you stop thinking of the problem as shuffling an array in place and start thinking of it as randomly generating a permutation of elements drawn from a set. In the final analysis, the optimized in-place implementation of the algorithm with the swap trick does provide in-place shuffling; the way the two halves of the array are used to contain respectively the partial result and an auxiliary set of remaining elements is very much like heap sort.

Not coincidentally, if you interpret the algorithm as I wrote it above monadically, then it can be used to generate either a random permutation or all permutations, depending on whether choose() is interpreted in the probability monad or the list (non-determinism) monad. This shows conclusively that the shuffling algorithm is really just the standard combinatorial generation algorithm for permutations adapted to random sampling. If so, you'd also expect the swap trick to be useful for generating all permutations in place. Well, of course:

    void perms(int *xs, int n, int i)
    {
        if (i == n) {
            for (int j = 0; j < n; j++)
                printf("%d ", xs[j]);
            puts("");
            return;
        }

        for (int j = i; j < n; j++) {
            swap(xs + i, xs + j);
            perms(xs, n, i + 1);
            swap(xs + i, xs + j);
        }
    }


I find it a bit hard to see how to reinterpret your Python code in the list monad. Can you post a Haskell version of this function?


Hey, sorry for the late reply.

It seems pretty obvious to me, so I'm not sure what is confusing you.

    class Monad m => ChoiceMonad m where
        choice :: [a] -> m a
For the list monad, choice is just the identity function. For the probability monad, choice picks a random element from the list. (Ideally choice would take a random access structure like a set as its argument, but I'm doing it the simplest way here.)

To implement shuffle, you more or less just transliterate the Python code as you'd expect.

Before you tackle this, make you sure completely understand how the same situation plays out for the simpler case of subsets:

    subset        :: ChoiceMonad m => [a] -> m [a]
    subset []     = return []
    subset (x:xs) = do b <- choice [True, False]
                       xs' <- subset xs
                       return (if b then x:xs' else xs')
Over the list monad, this generates a list of all subsets. Over the probability monad, it generates a random subset.


Thank you, now I see. Here's my version:

    shuffle :: (ChoiceMonad m, Eq a) => [a] -> m [a]
    shuffle [] = return []
    shuffle elems = do 
      x <- choice elems
      xs <- shuffle (x `delete` elems)
      return (x:xs)


I love the d3.js data visualization framework, which is what Mike Bostock (the creator of d3, as it happens) used the create this page. Well done, Mike.

Does anyone have suggestions for an algorithm for me to use d3.js to illustrate visually? This is such an effective learning tool that I would probably benefit from some review. I was thinking of trying a graph or tree search algorithm, maybe's Dijkstra's. Any other suggestions?


Clustering can be visualized quite well and there are several families of algorithms that could be run over similar data, such as k-means clustering or DBSCAN.


This is a pretty common way to use arrays, at least in game programming on earlier versions of Android where you couldn't garbage collect during a frame due to the delays. You'd often have big pre-allocated arrays with a marker for how much is currently used. That or object pools when you needed someone more than arrays.


Great article. This is probably a question for the statisticians: Would repeatedly sampling from a uniform distribution mean that on average the next card chosen will come from the middle of the remaining pack (as repeatedly sampling from a uniform distribution gives a normal distribution), or does the decreasing sample size somehow cancel that effect out? Then at the same time the algorithm is moving cards from the end towards the middle. Intuitively it feels like the shuffle ought to be biased towards selecting the middle-end of the pack first, but wikipedia indicates it's unbiased. Any help with getting further intuition?


The key here is that you aren't sampling the same uniform distribution each time. You shrink the sample space by one each time you remove a card and append it to the end.


Repeatedly sampling from a uniform distribution gives a uniform distribution - uniform means the probability is the same for all cards.


Yes, but the average value of the sampled values will tend towards a Normal distribution by the central limit theorem, and I'm probably misapplying it by wondering whether the next sample tends to be from nearer the middle. I think skimbrel is right that as the sample being drawn from is decreasing each time the CLT will be invalidated anyway.


Sorry to be blunt, but you're confused.

The average of the sampled values will be normally distributed, but the values themselves will be uniform.

If you sample every number once from [0,100], the average is 50, but that doesn't imply that the distribution of the sampled elements is normal - it's uniform.

His question of whether "on average, does the next card come from the middle" (which it doesn't) isn't the same as "is the mean of the next sampled card in the middle" (which it is).

The language was a little misleading - he said "on average" when he intended "most frequently".


Ok thanks for clearing that up. I hadn't appreciated the difference between the two.


That's not how central tendency works. The average position converges to a normal.


Great post, and extra points for visualizations that work on the iPhone.


It uses d3.js, which pretty much requires a modern browser with SVG. I suspect you will find it doesn't work on IE8.


I have a question. Why does the shuffled deck have the cards criss-crossed with its neighbors? I'm not sure what it's suppose to represent.


Nevermind. I just answered my own question. It's a result of the original deck being fanned visually. So when he reshuffles the deck, the cards are still orientated in the same direction as they were in the original deck, so you get the visual side-effect of having cross-crossed cards. So you can tell approximately which side of the deck the card was from originally. And by visual estimation, a properly shuffled deck should have a combination of cards orientated in all different directions.


Which if you look at the sorting visualization links at the bottom, is actually a really effective way of showing when things are very out of order, or close to in order.


The angle of the lines encodes the value of the card. The lowest-value cards lean left, and the highest-value cards lean right. The in-order deck, therefore, neatly fans from left-to-right. In the shuffled deck the values are in random order, so the lines often cross.


Thanks for posting this! I just updating my own silly shuffle module (https://github.com/substack/node-deck) to work using the last in-place algorithm for uniform shuffling and I saw a huge speedup.


In your last example you could do; [array[m], array[i]] = [array[i], array[m]]

And forget about t entirely.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: