
Efficiently Generating a Number in a Range - pettou
http://www.pcg-random.org/posts/bounded-rands.html
======
dagenix
> Let's move from an over-engineered approach to an under-engineered one.

The article says this to deride C++s implementation as being too complicated
because it supports ranges such as [-3,17] and then promptly goes on to
discuss how a modulo based implementation is very biased if the upper end of
the range is above 2^31. It's not really clear why the former use case is
unimportant but the latter isn't.

It just goes to show that one person's niche use case is another person's main
use case. I wish people would just avoid the judgemental term "over
engineered" and instead focus on matching appropriate algorithms to
appropriate use cases.

~~~
vinkelhake
The comment there is not about [-3, 17] being an obscure _output_ range from a
distribution. It is that the distribution must be able to handle a random
generator that outputs numbers in that range.

I think there's a small error there in that the output type of
_UniformRandomBitGenerator_ must be actually be unsigned. The larger point
still stands though. It is possible to write a conforming
_UniformRandomBitGenerator_ that has an output range of [3, 17] and it falls
on the distribution to handle this.

~~~
dagenix
Ah, good call. I did slightly misinterpret what was being said. I think my
overall point still stands, though.

------
rootlocus

        return min + (max - min) / 2
    

Oh, you want a random number?

~~~
emmanuel_1234
Or in O(1):

    
    
      return min
    

Alternatively

    
    
      return max

~~~
dahart
> Or in O(1)

You mean in one instruction? The parent comment is O(1). The methods in the
article are all O(1) too.

------
kazinator
In TXR Lisp, the algorithm I put in place basically finds the tightest power-
of-two bounding box for the modulus, clisp the pseudo-andom number to that
power-of-two range, and then rejects values outside of the modulus.

Example: suppose we wanted values in the range 0 to 11. The tightest power of
two is 16, so we generate 4 bit pseudo-random numbers in the 0 to 15 range. If
we get a value in the 12 to 15 range, we throw it away and choose another one.

The clipping to the power-of-two bounding box ensures that we reject at most
50% of the raw values.

I don't bother optimizing for small cases. That is, under this 4 bit example,
each generated value that is trimmed to 4 bits will be the full output of the
PRNG, a 32 bit value. The approach pays off for bignums; the PRNG is called
enough times to cover the bits, clipped to the power-of-two box, then subject
to the rejection test.

~~~
jgtrosh
Off the top of my head, for a randomly chosen range of size n, you reject a
throw with probability 1/4, right?

~~~
throwaway080383
It's unintuitive, but I believe the probability ends up being 1-ln(2) if you
think of n as being uniformly random.

~~~
jgtrosh
For any power of two m, then for any range size n (with m/2 < n <= m), the
probability of rejection is (m-n)/m. If any n is equally probable, then the
average rejection is equal to the rejection of the average n (= 3*m/4):
(m/4)/m = 1/4\. This is true for any power of two m. I stand my case!

~~~
throwaway080383
Hm, yeah not sure what I was thinking. Maybe expected number of attempts?

------
nerdponx
This is well-timed. I don't know much about different random number generators
but I do know that we recently had an problem where RNG was a serious
performance bottleneck.

~~~
by
I only skimmed the article, so maybe they said this, but for choosing from a
small range, for example 0..51, you can get several of these from a 32 bit
random number with this algorithm

[https://stackoverflow.com/questions/6046918/how-to-
generate-...](https://stackoverflow.com/questions/6046918/how-to-generate-a-
random-integer-in-the-range-0-n-from-a-stream-of-random-
bits/10481147#10481147)

You should be able to run a 64 bit PRNG once and pick at least 8 random cards
from a deck.

~~~
ballenf
The article's conclusion was that the PRNG generation method used is usually
not the bottleneck, but how you take that to get a result is. Don't know if
that applies to the algorithm linked, but the author's point was that
bottlenecks are more likely to arise in the code that surrounds the PRNG
algorithm than in the call to PRNG itself.

------
adrianmonk
I wonder if the "Bitmask with Rejection" method would be more efficient if you
sometimes made the mask one bit larger than strictly necessary.

As it is, if you want a number in the range 0..8, you take 4 bits of
randomness, giving you a number in 0..15. This is great, but 7/16 (43.75%) of
the time you have to try again. This not only means more loop iterations, it
also means you discard 4 bits of randomness, which may have been costly to
generate.

If instead you took 5 bits of randomness, you'd be able to accept anything in
0..26 and would only have to reject 27..31, which means only rejecting 5/32
(15.625%) of the time.

0..8 is a particularly bad case, though. If you need numbers in the range
0..14, then it's not worth trying to use 5 bits.

------
modeless
It seems crazy to me that there's no way to produce unbiased numbers in an
arbitrary range without rejection sampling and a loop. Is there a proof of
this?

~~~
duckerude
I'd expect it's possible by changing the generator at the lowest level, but it
makes sense to me that you need a loop if you don't control the underlying
generator.

Imagine you want to turn a random number in 1..4 into a random number in 1..3.
The original is your only source of randomness, so the rest should be
deterministic. Then each outcome in 1..4 has to map to exactly one number in
1..3, but there's no mapping that accepts all of 1..4 while still giving each
of 1..3 an equal probability.

~~~
modeless
What if we allow the mapping function to be stateful?

~~~
duckerude
I guess you could save up a few bits over repeated calls, but it can't help
you always execute the first call with a single round of generation.

~~~
modeless
Could it cap the maximum number of generator calls though? Rejection sampling
is technically O(infinity) because you could reject an unbounded number of
times. This isn't a problem in practice but it sure is theoretically annoying.
With a cap on the maximum number of calls, it would be O(1).

~~~
duckerude
I don't think so.

If you want a number in 1..3, and the generator provides numbers from 1..4,
and you want a cap n, then you could model it as a generator that provides
numbers in 1..4^n. There's never a way to split that space into three equal
parts.

You always end up with unwanted extra possibilities that you still need to
handle somehow.

------
sdmike1
Personally I'm a fan of the xoshiro[1] generator I have found it to be faster
and give more equiprobable outputs.

[1][http://xoshiro.di.unimi.it](http://xoshiro.di.unimi.it)

~~~
smaddox
xoshiro has flaws: [http://www.pcg-random.org/posts/a-quick-look-at-
xoshiro256.h...](http://www.pcg-random.org/posts/a-quick-look-at-
xoshiro256.html)

~~~
nightcracker
xoshiro's response:
[http://pcg.di.unimi.it/pcg.php](http://pcg.di.unimi.it/pcg.php)

~~~
modeless
Melissa O'Neill's response back: [http://www.pcg-random.org/posts/on-vignas-
pcg-critique.html](http://www.pcg-random.org/posts/on-vignas-pcg-
critique.html)

~~~
smaddox
Thanks. This completely allays my concerns about the PCG's I would actually
use.

Edit: And I like that MCG with 64-bit multiplicand that she shows. I might
switch to that for Monte Carlo applications.

