Nearly Divisionless Random Integer Generation on Various Systems

FabHK · on June 8, 2019

The linked overview on Efficiently Generating a Number in a Range by O'Neill (of PCG fame) is insightful. It gives some more background and compares several approaches (including this, Lemire's).

It seems based on her testing that a very simple bit mask & rejection technique ("Apple's method", as Apple implemented it in their modification of arc4random_uniform) is the fastest among the unbiased techniques, particularly for large N, while for mixed N Lemire's technique seems competitive or better. She suggests a further optimisation, which now clearly "wins" except for large N.

Cool stuff - I find random number generation fascinating.

http://www.pcg-random.org/posts/bounded-rands.html

raymondh · on June 8, 2019

If I'm reading this correctly, it is very close to what Python has been doing for over a decade, so I am not sure that using a "rejection method" counts as "new tech". Here are the relevant snippets.

Code for shuffle(x):

    for i in reversed(range(1, len(x))):
        j = randbelow(i+1)
        x[i], x[j] = x[j], x[i]

Code for randbelow(n):

    k = n.bit_length()
    r = getrandbits(k)
    while r >= n:
        r = getrandbits(k)
    return r

Code for getrandbits(k):

    bytes = ((k - 1) / 32 + 1) * 4;
    for (i=0 ; i<bytes ; i+=4, k-=32) {
        r = genrand_int32(self);
        if (k < 32)
            r >>= (32 - k);
        bytearray[i+0] = (unsigned char)r;
        bytearray[i+1] = (unsigned char)(r >> 8);
        bytearray[i+2] = (unsigned char)(r >> 16);
        bytearray[i+3] = (unsigned char)(r >> 24);
    }

dgacmu · on June 8, 2019

Rejection isn't new - what's new is using a variation of his "use the high order bits of the 64x64 multiply by a range to get a random number efficiently" trick (which he published two or three years ago) to substantially improve the efficiency of a rejection sampling method.

The efficient random number in a range trick is really cool. I've used it in a couple of data structures that we've designed - it's substantially faster than using modulus, or even using pre-computed division by reciprocal multiplication.

nullwasamistake · on June 8, 2019

Lemire has some great stuff. It looks like mostly self dicovery so some of his efforts have historical precendent.

AlexCoventry · on June 8, 2019

It's not presented very clearly in the blog post, but it's a novel use of rejection sampling.

DerekL · on June 7, 2019

Dupe: https://news.ycombinator.com/item?id=20120642

dang · on June 8, 2019

There's no discussion there, so a repost is ok. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.