
A fast alternative to the modulo reduction (2016) - bra-ket
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
======
bra-ket
"An intuitive way to think about "fast range" technique is scaling. Number x
ranges [0,2^32−1], multiplying it by N then divide by 2^32, the range becomes
[0,N−1].

From "Writing a Damn Fast Hash Table With Tiny Memory Footprints":
[http://www.idryman.org/blog/2017/05/03/writing-a-damn-
fast-h...](http://www.idryman.org/blog/2017/05/03/writing-a-damn-fast-hash-
table-with-tiny-memory-footprints/) , previous discussion
[https://news.ycombinator.com/item?id=14290055](https://news.ycombinator.com/item?id=14290055)

~~~
nullc
This approach comes naturally to people working with fixed-point math: Your
random value is a UQ32 fixed-point fractional value, you multiply it by a
UQ32.0 size resulting in a UQ32.32 value that you shift back down to UQ0. [
[https://en.wikipedia.org/wiki/Q_(number_format)](https://en.wikipedia.org/wiki/Q_\(number_format\))
]

I wonder if the introduction of calculators into school curriculum has made
people less prone to carry around rational values as intermediates. Division
is a pain in the butt in mental arithmetic and it remains slow on computers
(except when compilers are able to strength-reduce it away). There are a lot
of nice optimizations in numerical algorithms that come from keeping
denominators seperate (implicitly or explicitly).

~~~
djmips
I was thinking the same thing because I'm older I grew up on 8 bits and this
sort of thing comes more naturally to me. It's a nice article but does it
deserve to be called the 'Lemire technique'. :)

------
jhncls
An additional argument for the multiply-then-shift instead of using modulo, is
that for most random number generators the high bits are more random than in
the low bits.

Another approach, if you really want to use modulo, is to execute the division
as a multiplication by the reciprocal followed by a shift. This makes sense
when the divisor is odd and used multiple times. See e.g. [0] about how
compilers generate code to divide by a constant.

[0]
[https://en.wikipedia.org/wiki/Division_algorithm#Division_by...](https://en.wikipedia.org/wiki/Division_algorithm#Division_by_a_constant)

------
josephg
Be careful using this with hash tables of integers in small hash tables. It
looks like if applied naively it would be quite easy for this method to end up
flattening all small integer keys into bucket 0.

~~~
corysama
I heard a story recently about that being exploited as an attack.

Attackers knew some site used some DB that used a specific hash and items that
hash to the same value end up in a linked list. So, the submitted a ton of
user data that all hashed to the same value then DOS’d with requests that ran
through the whole linked list every time.

~~~
im3w1l
It's called an algorithmic complexity attack, and yeah it's a known issue with
hashmaps. One solution is to us a parametrized hash function with a secret
key. And a second is to use another data structure entirely.

~~~
baby
It’s actually called a hashmap collision attack, fixed by randomizing your
hash function at the program start. For example generating a key at program
start that will be used with siphash.

~~~
rurban
Not fixed by a random seed and not fixed by siphash. Only fixable by not
allowing a linear search in the collisions. (counting or promote to a tree)

The random seed can by extracted, exposed or calculated, and then siphash
doesn't help you at all, it just makes everything 2x slower.

~~~
baby
How do you extract/calculate The random seed?

This is what all (non-vulnerable) programming languages do btw.

PS: saw your other post, interesting take.

[https://news.ycombinator.com/item?id=12401920](https://news.ycombinator.com/item?id=12401920)

I’ll venture a guess: a lot of practical attacks in the lab becomes completely
impractical on a network.

------
ThJ
As far as I can tell, this is basically the 32.32 fixed-point equivalent of
Math.floor(N * Math.random()).

It reminds me of the sort of tricks I used as a teenager back in the 90s to
speed up my MS-DOS graphics code.

Another trick: You can perform rounding by adding fixed-point 1/2 before the
bit shift.

~~~
nullc
> As far as I can tell, this is basically the 32.32 fixed-point

That's exactly how I described this trick in code-comments.

------
kazinator
> _Suppose you want to pick an integer at random in a set of N elements._

Furthermore, suppose the distribution has to be even. Then, you do not want
the modulo N in any shape or form, unless the modulus is a factor of the range
of the random number source.

What you do is find the smallest power of P, such that N < P. Then you reduce
the input numbers using mod P; which is a fast bitwise AND operation against
(P - 1).

Then, you reject values which are >= N; you get another random number R and
try again, until R & (P - 1) lands in the [0, N) range.

> _Suppose you have a hash table with a capacity N. Again, you need to
> transform your hash values (typically 32-bit or 64-bit integers) down to an
> index no larger than N._

I wouldn't have such a thing where N isn't a power of two.

Non-power-of-two moduli in hash tables are only useful in the open addressing
technique, where all entries are stored in the table itself, rather than in
chains emanating from the table. Open addressing resolves collisions by
probing for alternate locations in the hash table. If the modulus is a prime
number under open addressing, then if quadratic probing is used:

[https://en.wikipedia.org/wiki/Quadratic_probing](https://en.wikipedia.org/wiki/Quadratic_probing)

it will have the property of visiting all the table locations. That is to say,
the quadratic probes modulo a prime N generate a sequence of unique table
entries before repeating. This guarantees that a place will be found for any
new key even if the table has just one slot left.

------
cousin_it
Let's say you want a random number from 0 to 5. You can generate a random
integer, take the three low order bits (giving you a random number from 0 to
7), and if it's more than 5, try again. I just checked and it seems to work
well ("bitmask with rejection"): [http://www.pcg-random.org/posts/bounded-
rands.html](http://www.pcg-random.org/posts/bounded-rands.html)

~~~
dbaupp
That results in many rejections when the inclusive limit is a power of two (or
slightly larger). For instance, for [0, 4], one has to reject 38% of the time,
and this gets closer to 50% as the limit increases. This requires doing more
samples from the (P)RNG.

The "Faster Threshold-Based Discarding" section of that article is the
technique that Lemire actually suggests, to avoid the %, and the benchmarks
indicate Lemire's method is equal or faster in all but the "Large Shuffle"
case (and the "Optimizing Modulo" section reduces that difference
significantly).

~~~
tzs
One way to pick an integer in [0, 4] given a source of uniform random integers
in [0, 7] (which I shall call "the d8") is to divide the interval [0, 1] into
5 equal parts, by splitting it at 1/5, 2/5, 3/5, and 4/5\. Number the 5
intervals 0 through 4.

Write those four split positions in base 8:

    
    
      1/5 = 0.1463(1463)
      2/5 = 0.3146(3146)
      3/5 = 0.4631(4631)
      4/5 = 0.6314(6314)
    

The parenthesis mark groups of digits that repeat infinitely.

Use the d8 to generate an octal fraction in [0, 1]. Find which of the 5
intervals it falls in, and output that interval's number.

For example, if the d8 comes up 0, 2, 5, or 7, you would output 0, 1, 3, or 4,
respectively. If you get anything else, you will need at least one more roll.
For example, suppose the first roll was 1. Then you know that the output is
going to be 0 or 1. Roll again. On 0, 1, 2, or 3, output 0. On 5, 6, or 7,
output a 1. On 4, you need to roll a third time. On the third roll, 1-5 => 0,
7 => 1, and 6 means you'll need a 4th roll. The 4th roll goes 0-2 => 0, 4-7 =>
1, and 3 means roll again. The pattern then repeats for potential rolls 5, 6,
7, and 8, and so on.

Another way to work out the same thing, without explicit fractions, is to work
out a state diagram, naming each state with a string that lists each reachable
output, with some outputs repeated so that they appear in the name in
proportion to their probability from that state. That's probably confusion, so
here is an example, where we will use a d2 (i.e., a random bit stream) to
generate integers in [0, 2]. In other words, we are using a d2 to simulate a
d3.

The first state should be able to lead to 0, 1, or 2, an they should be
equally probable, so we name the first state 012. There should be two next
states from that, one for rolling 0 and one for rolling 1.

The way we get the names for those states is by splitting 012 in two. But we
need a name with an even number of characters for that, so lets double all the
characters: 001122. Now we can split it in two to get the two next states: 001
and 122.

Same thing to get the next states for 001 (make it even--000011--and then
split it: 000 and 011) and 122 (=> 112222 => 112 and 222). So far we have this
(draw root down):

    
    
      000 011  112 222
        \  /     \  /
         001     122
            \    /
              012
    

That says if we roll 00 we output 0, and 11 outputs 2, and 01 and 10 need at
least another roll. If you work out the next states from 011 you get 001 and
111. 001 was 011's parent, so we've got a loop there. Same on the other side.
Putting those in gives:

    
    
         A 111 111 B
          \ /   \ /
      000 011  112 222
        \  /     \  /
        A:001    B:122
            \    /
              012
    

To roll a d3 with a d2, start at the root, use the d2 to traverse the tree,
and when you reach a node whose name is 000, 111, or 222, output 0, 1, or 2,
respectively.

This all easily generalizes to using a dN to simulate a dM.

The base-M fraction approach can be done without actually building up the
table of partition points, and the digits of the one partition point you need
for each simulated dM roll can be worked out on demand as you need them, using
just integers.

For example, if you need a/b in base M, set r = a. To get successive digits,
iterate this:

    
    
      r = r * M
      d = floor(r/b)
      r = r % b
    

The sequence of d that generates is the digits of the base M expansion of a/b.

~~~
dbaupp
I'm not entirely sure how that relates to my comment...

However, I think that approach is conceptually very similar to "FP Multiply"
(using floating-point arithmetic) or "Integer Multiplication" (using fixed-
point arithmetic) in the GP's linked blog post. Those approaches package the
whole "generate digits" idea into a single step, by handling the "split
positions" directly as native fractions, rather than having to think of them
as sequences in a particular base. These are likely to be significantly more
efficient too, since they do not have to do (multiple) division/modulos, which
are expensive as the main article points out (let alone computing/storing the
fraction representation).

Your approach has the benefit that it is not biased (if the entropy in `r` is
tracked appropriately, so that it can be resampled), but Lemire's "Debiased
Integer Multiplication" version (as well as the optimised versions) are likely
a faster way to remove the bias.

------
the8472
> leaving the values 0, 1, …, (2^32 – 1) mod N with ceil(2^32/N) occurrences,
> and the remaining values with floor(2^32/N) occurrences

> You can also generate random numbers in a range faster, which matters if you
> have a very fast random number generator.

That seems too biased for many RNG uses.

~~~
anderskaseorg
The author addresses that: “You can correct the bias with a rejection, see my
post on [fast shuffle functions]([https://lemire.me/blog/2016/06/30/fast-
random-shuffling/](https://lemire.me/blog/2016/06/30/fast-random-
shuffling/)).”

~~~
lmb
In some of the code snippets, the following is used:

    
    
      threshold = -bound % bound
    

I don't understand how that isn't always zero. What am I missing?

~~~
dbaupp
In N-bit unsigned arithmetic as we have here, -bound = 2 __N - bound (and
similarly for 32-bit) and the unsigned modulo operator "knows" that. In
essence, that is a tricky way to compute 2 __N % bound, to work-around not
able to represent 2 __N directly in an N-bit integer.

------
hinkley
I’m a bit confused. I thought it was pretty common to a generate a random
number between 0 and N by getting a random float between 0 and 1 and
multiplying it by N.

~~~
pfortuny
No: floats are not uniformly distributed. That way you get (IIRC) “many” more
numbers near 0 than near 1.

~~~
dbaupp
Generating a random float always takes that into account: for instance, the
samples will be less than 0.01 only 1% of the time. A method that just
interpreted a uniformly random bit pattern (that is, integer) as a float would
only be useful in very specialised situations.

~~~
ben509
So my concept for how it works was you're putting the random bits into the
mantissa while holding the exponent constant, then normalizing the result.

Is that accurate? Is there any concern there for never getting the smallest
possible values? Or is that acceptable because you can't generate values very
close to 1?

~~~
dbaupp
That's close to one approach (generate a float in [1, 2) by setting the
mantissa with a constant explonent of 1, and subtract 1), another is to
generate a 32 or 64 bit integer and multiply by 2^-32 or 2^-64. I think both
of these can generate all floats close to 1, it is the floats close to 0 that
they're less good with.

There is some concern with the bias due to missing small values (and missing
low bits, even for moderate values), but this is often ignored, and is usually
good-enough.

------
rurban
Can dang please add a (2016) to the title? This a very popular post, and
rehashed every once in a while.

