
Darts, Dice, and Coins: Sampling from a Discrete Distribution (2011) - elasticdog
http://www.keithschwarz.com/darts-dice-coins/
======
benfrederickson
Alias tables are pretty cool, I wrote an interactive visualization of how they
get built as part of this post a while ago:
[http://engineering.flipboard.com/2017/02/storyclustering](http://engineering.flipboard.com/2017/02/storyclustering)
. We used alias tables with MCMC and the hastings metropolis test to build a
super fast LDA.

Also worth reading up on are sum-heaps. Alias tables are O(1) to sample from
but O(n) to build/modify. Sum-heaps let you modify in O(log(n)) at the cost of
sampling in O(log(n)) as well. A good writeup is here:
[https://timvieira.github.io/blog/post/2016/11/21/heaps-
for-i...](https://timvieira.github.io/blog/post/2016/11/21/heaps-for-
incremental-computation/)

------
jasonl99
There's a really cool way to do this that I learned about with ruby here
[https://gist.github.com/O-I/3e0654509dd8057b539a](https://gist.github.com/O-I/3e0654509dd8057b539a)

Here's a quick demo class that shows the technique. It's amazingly simple.

class Demo EXAMPLE = { "75%" => 0.75, "15%" => 0.15, "9%" => 0.09, "1%" =>
0.01 }

    
    
      def self.sample(choices = EXAMPLE)
        choices.max_by { |_, weight| rand ** (1.0 / weight) }.first
      end
    
      def self.show(samples: 10000)
        items = samples.times.map {sample}
        items.each_with_object(Hash.new(0)) do |item, counts|
          counts[item]+=1
        end.sort_by(&:last).reverse.to_ h
      end
    

end

# Demo.show # => {"75%"=>7480, "15%"=>1525, "9%"=>901, "1%"=>94}

Edit: Sorry for the bad formatting. I added a gist:

[https://gist.github.com/jasonl99/25d3c922d73f10a75fe228c2de3...](https://gist.github.com/jasonl99/25d3c922d73f10a75fe228c2de38d270)

~~~
gjm11
The code's nice and short, but note that if you are choosing from a large
collection it's really inefficient: in order to generate a single random
element it needs to do some computation for every element in the collection.
That is, its runtime is O(n).

(For choosing from a small number of things this code has the advantage that
it's not doing a lot of complicated stuff _in Ruby_ , and in any case the
speed probably doesn't matter too much. In fact, even when choosing from a
large enough number of things to make this much slower than it need be, the
speed may well not matter at all. But it's worth being aware.)

------
nayuki
The random sampling algorithms described on the page are very closely related
to the technique of
[https://en.wikipedia.org/wiki/Arithmetic_coding](https://en.wikipedia.org/wiki/Arithmetic_coding)
in data compression.

------
gmueckl
This page has helped me a lot. I have used that alias table algorithm various
times with very good results.

