Hacker News new | past | comments | ask | show | jobs | submit login
Fixing Random (ericlippert.com)
44 points by azhenley on Feb 9, 2019 | hide | past | favorite | 21 comments




> An alternative, and possibly better solution, would be to build the double from bits directly, rather than doing an expensive division.

Clever trick:

  uint64 x = 0x3FF0'0000'0000'0000 |
      (rand64()&0xF'FFFF'FFFF'FFFF)
  return *(double*)&x - 1.0
This gives a perfect uniform distribution on [0,1), with the caveat that everything is rounded to a multiple of 2^-52 (≈ 2.22e-16).


So, am I correct that this would give you effectively 52 random bits instead of the maximum 53? Is the problem that the 53rd bit would mean there are possibly invalid values you'd have to throw out, or that the distribution wouldn't be uniform?


The problem is that: a, there are over a thousand random bits to fill, and you'll need a significantly more sophisticated mechanism to fill them; b, if you need more than a dozen significant digits, 64-bits floats probably aren't what you want anyway; and c, clever tricks aren't the place for rigorous correctness when "correct to within rounding errors" will do just as well.

If you really want you could tack on:

  ret += (entropy >> 63) * 0x1.0p-53; // 1.110223024625156540e-16;
  OR
  ret += (entropy >> 52) * 0x1.0p-64; // 5.421010862427522170e-20;
right before the return to pack in some extra bits.


This reply made me even more confused. According to my naive understanding of floating point there are 53 bits for the mantessa (sp?) so in my head, you would specify a exponent of 2^(-53) and fill the mantessa with random bits to get a "pretty good" random number. If you wanted to represent every possible value of a double, then you have to vary the exponent and your distribution becomes non-uniform. I assume I'm missing something fairly obvious to others, but your value seems to be exactly one bit short of the "ideal" (if you call it that, but really it's just my imagination of a "perfect" random double). I see your reply mentions you can get back that bit, but I don't understand why it has to be it's own step. What limitation of doubles are you overcoming when you use the range 1-2 and then subtract one instead of just generating in the range 0-1?


For IEEE floats, the range [1, 2) has uniform absolute precision. Not so for [0, 1), which spans many values of the exponent.

There is uniform absolute precision on each interval among [0.5, 1), [0.25 0.5), [0.125, 0.25), &c., but as you get closer and closer to 0, there are still just as many fraction bits available (same relative precision), so these numbers are much more precise in absolute terms (i.e. relative to 1.0).

You can generate a random number in [1, 2) by just filling in the fraction part with random bits, because the exponent is always the same. Then you can subtract 1 to get a number in the range [0, 1). But for a number in e.g. the range [0.125, 0.25), the last 3 bits of the fraction will always be 0, meaning you are skipping 7/8 of the possible floating point numbers in that range.

The smallest possible denormal number among IEEE doubles is 2^(–1074), but the smallest non-zero value you will get with this method is 2^(–52).

If you want to generate any possible floating point number in the range [0,1) with probability proportional to the difference between it and the next representable number, it gets pretty tricky / computationally expensive.


This was pretty much my reasoning as well, but I missed that you can generate a floating-point integer in [0,2^53) and then scale down to [0,1) with a multiply, which gets you one additional significant bit for free or cheaper, and simplifys the code.

Edit: I think for the general case you want something like:

  uint64 x = rand64() & 0x001F'FFFF'FFFF'FFFF
  double xf = .5 * 1p-52 * x
  // return xf; /* 53 bits of entropy */
  if(xf < .5)
    {
    y = rand64() & 0x000F'FFFF'FFFF'FFFF
    double yf = .5 * 1p-52 * 1p-52 * y
    xf += yf
    if(xf < .5 * 1p-52) ... /* loop about 20 times */
    }


Possibly we could generate a random number at the subnormal range (with 0 exponent) then multiply by 0.99999~ divided by the greatest possible 0 exponent number.

  uint64 x = rand64()&0x000F'FFFF'FFFF'FFFF
  return *(double*)&x * 4.49423283715579e+307
I tweaked the multiplier as high as possible without producing 1, but there might be a fraction of a bit bias introduced by the multiplication compared to the method of subtracting 1 from 1-2 range.

edit: That multiplier seems to be essentially 2^1022, in which case there may be no quantization noise possible in multiplying by it. It may simply cause the exponent to be increased and the result normalised.


> What limitation of doubles are you overcoming when you use the range 1-2 and then subtract one instead of just generating in the range 0-1?

The limitation is that floating point numbers have different exponent depending on where in the range [0,1) they are, while every float in [1,2) has a exponent of 3FF. Generating a number in the latter range and subtracting 1.0 lets the floating point hardware do the work of figuring out the exponent.

On futher testing, though, it looks like you can just do ( rand53() * 1p-53 ), which will be hands down better as long as your floating-point hardware has a good int64->double conversion.


See NextDoubleHighRes() here:

https://github.com/colgreen/Redzen/blob/master/Redzen/Random...

There's an explanation in the comments, and a link to the original fuller explanation and source, here:

https://mumble.net/~campbell/tmp/random_real.c

I.e. I'm pretty sure your approach gives a non uniform distribution, but happy to discuss in more detail if you want.


Among the number it emits, the distribution is "uniform", but it leaves out many possible doubles (half between 0.5 and 1, a quarter between 0.25 and 0.5, etc.), as far as I understand.

Great sources!

(EDIT: replace "omit" by "leave out", because it's too close to "emit")


Yeh that sounds right. Had forgotten the details as it's been a while since I went through all the issues and wrote and tested the redzen code.


> your approach gives a non uniform distribution

Unless I made some significant error, it is a uniform distribution over multiples of 2^-52. It's also a discrete approximation of the real uniform distribution in that for any subdivision of [0,1) into N equal (half-open) intervals, the most-populated interval has at most one more inhabitant the least populated one.

It is however strictly inferior to rand53()*2^-53 (NextDoubleInner at your github link) unless your int64->double or float-multiply operations are quite bad.


I love this trick. It generates a float of the form 1.xxxxx..., filling in the x part with random bits, generating a random number between 1 and 2. Then it subtracts 1.


Note though that the least significant bit of your float will always be zero, if I understand correctly (and the subsequent ones will be zero more often than not). PRNG are tricky, and floats are tricky, and the combination is tricky, unsurprisingly.

See this discussion here:

https://github.com/JuliaLang/julia/issues/16344


Aw, man. Some day we'll have this all figured out, but that day won't be soon.


There are a number of issues with System.Random, all of the known issues to date are described in this open github issue:

https://github.com/dotnet/corefx/issues/23298

The redzen library contains RNGs that address all of the known issues, e.g. see:

https://github.com/colgreen/Redzen/blob/master/Redzen/Random...

https://github.com/colgreen/Redzen/blob/master/Redzen/Random...

That Xoshiro256StarStarRandom source has also been merged into Math.Net.


I personally fell victim to this very mistake he mentions when I first started with C#. It really doesn’t follow the principle of least astonishment and it makes me feel a bit better knowing that it’s apparently a pretty common mistake.


I'm not following his reason for writing this. In part 2 he says he doesn't care about the performance hit from a crypto library that implements a system TRNG, sooooo, use that?

https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR...

Or even study the thread on entropy pool paranoia from the Stanford Javascript Crypto Library:

https://github.com/bitwiseshiftleft/sjcl/issues/77

/scratches head at FAIC/


Why are you linking to Javascript docs under an article about why the default C# APIs for random are bad? And even if there's already replacement libraries, a discussion of how they can be better than the default is useful?


The reason for writing this is to gently introduce the general topic of improving how we represent stochastic workflows in mainstream programming languages. Entering that complex topic by exploring just how horrid even the most basic aspects of randomness are represented is an accessible and compelling entry point.

This is part three of a very long series that will explore the connections between LINQ, probability distributions, and the difficulties of sampling from arbitrary distributions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: