

State of the art of random double generation - nkurz
http://www.blahonga.org/2015/04/state-of-art-of-random-double-generation.html

======
SamReidHughes
This 1:2:1 distribution is the right sort of behavior for a double-precision
random number generator (what the standard says notwithstanding). It's what
you get if you throw a uniform random dart at the real line and round it to
the nearest floating point number.

Also, a <= x <= b is really the right way for uniform floating point ranges to
be, with half-probability of the edges getting selected. The API of returning
[0, 1) is pretty horrid, because then when somebody uses that to generate [1,
2), by doing rand01() + 1.0, they actually create the possibility of
generating 2, because (1-2^-53) + 1 rounds up to 2. (That's just like the
rounding that happens when generating to [2^52, 2^52+2].)

If you want a <= x < b, getting that for practical purposes could be
relatively simple: compute rand01() * (b - a) + a, and if the output is >= b,
then return a. That works wonderfully if rand01() behaves like randint(2^53) /
2.0^53. It's a little suboptimal if rand01() is implemented something like
randint(2^64) / 2.0^64, because it creates a weird little density spike at the
bottom for contrived ranges such as [2.0^-30, 1 + 2.0^-30), but if you're
getting that anal you'd also worry about the round-to-even behavior of
computing rand01() + 1.0.

~~~
Someone
_" It's what you get if you throw a uniform random dart at the real line and
round it to the nearest floating point number."_

Only when you throw the dart, repeat until you are in the target interval, and
then look for the nearest representable number. If you throw the dart, look
for the nearest representable number and repeat until that number is in the
interval, you get something different.

Worse, neither approach gives you a uniform distribution in the general case.
The hypothetical uniform dart is as likely to land between 1 and 2 as it is to
land between 2 and 3, but there are more representable numbers in the first
range than in the second.

I also think your _rand01() + 1.0_ example is flawed. It is just because
floating point is hard that this kind of functionality ends up in a standard
library.

~~~
SamReidHughes
> Only when you throw the dart, repeat until you are in the target interval,
> and then look for the nearest representable number.

Yeah, when you throw your dart at the target interval.

> If you throw the dart, look for the nearest representable number and repeat
> until that number is in the interval, you get something different.

I don't know why you'd want to throw a uniform dart at some other, larger
interval, before culling it down. You can't throw a uniform random dart at the
entire real line. I think what you're describing gives undue representation to
the edges -- you wouldn't use those sampling points with those weightings to
integrate a function, but you might use the trapezoid method.

> The hypothetical uniform dart is as likely to land between 1 and 2 as it is
> to land between 2 and 3, but there are more representable numbers in the
> first range than in the second.

That's a fact of life.

------
evanpw
At first, I thought trying to sample "uniformly" from a discrete non-
homogeneous set of real numbers was just mathematically nonsense, but there's
actually a pretty elegant generalization:

Basically, the probability of picking any particular floating-point value x
should be proportional to the distance from x to the next largest
representable number (with zero probability of choosing the largest floating-
point number). This has the property that for any two floating-point numbers a
and b, the probability of sampling a value in the interval [a, b) is
proportional to b - a, which is the same as the defining property of the usual
uniform distribution on real numbers, just with some additional restrictions.

------
twic
It's no better in Java. A fairly literal translation of the code [1], using
the new Random::doubles method [2], gets counts like so:

    
    
      166808
      332743
      500449
    

If anything, this is even weirder than the C++ results. :rage1:

The doubles method makes each value "as if it's the result of calling the
following method with the origin and bound':

    
    
      double nextDouble(double origin, double bound) {
        double r = nextDouble();
        r = r * (bound - origin) + origin;
        if (r >= bound) // correct for rounding
          r = Math.nextDown(bound);
        return r;
      }
    

Which is the naive approach that Artur has been struggling against.

[1]
[https://bitbucket.org/snippets/twic/qqEe](https://bitbucket.org/snippets/twic/qqEe)

[2]
[https://docs.oracle.com/javase/8/docs/api/java/util/Random.h...](https://docs.oracle.com/javase/8/docs/api/java/util/Random.html#doubles-
long-double-double-)

------
cschwan
I guess this could be related to my question I asked some time ago on SO:
[http://stackoverflow.com/questions/25668600/is-1-0-a-valid-o...](http://stackoverflow.com/questions/25668600/is-1-0-a-valid-
output-from-stdgenerate-canonical)

------
TheLoneWolfling
A related question:

Is there any efficient way to generate a random number in a range w.r.t. the
amount of entropy used? The way I know (generate a random number, if it's in
the top incomplete section regenerate until it isn't, modulo the width of the
range, add the starting value - there are some fencepost errors here but you
get the idea) is really inefficient in the case of, for example, you want a
number between zero and half the range + 1 (so, for example, if you have an
8-bit RNG and you want numbers between 0-128, inclusive.)

You end up using, on average, 8*(1/(1-127/256)) ~= 15.9 bits of entropy per
call, whereas you only "need" to use log2(129) ~= 7.0 bits of entropy per
call.

~~~
mark-r
Sometimes you can generate the random numbers in batches to get better
efficiency.

For your example, you can combine 3 random 8-bit values into a 24 bit value.
You still only get 3 outputs, but you can use all the random values up to 7 *
(129^3) - you only need regenerate random numbers 10% of the time, rather than
49%. 60 bits does even better, giving you 8 outputs and requiring regeneration
only 0.2% of the time.

~~~
TheLoneWolfling
Nice.

A pity it requires knowing sizes in advance. I wonder... Could you do
something similar "afterwards"?

~~~
mark-r
Sure you can, I used a short program myself to calculate the best case
outcomes. The question is whether it would be worth the bother. Usually the
range you request is much smaller than the range of your random number
generator, so the number of regenerations required are nowhere near the worst
case - and the cost of generating a random number isn't enough to get worried
about.

------
TwoBit
What good is the standard implementation if the gcc guy responds with WONTFIX?

------
Someone
Root of the problem seems to be that the standard gives pseudo code for
generate_canonical (paragraph 26.5.7.2 in my unofficial version) that
implementations must follow that, when translated straightforwardly into C,
gives results that are inconsistent with the textual description of the
function.

See [http://stackoverflow.com/a/25669510](http://stackoverflow.com/a/25669510)

~~~
acqq
Language-lawyers-no-oxygen-astronauts (1) strike again.

Producing quality random floating point numbers, fast, is a solved problem
(maybe even since the times of John von Neumann) but the astronauts would then
have to come to the Earth and not just invent the wrong code attempting to use
the the latest features of the standard. The direction should be the opposite:
starting from the known and proper solutions, see how they can be (if they
need to be) standardized.

The standard that standardizes "something" while not knowing how the problem
is actually solved in practice is actually dangerous.

1)
[http://www.joelonsoftware.com/articles/fog0000000018.html](http://www.joelonsoftware.com/articles/fog0000000018.html)

~~~
art4711
Author of the blog post following the sudden surge of traffic here.

> Producing quality random floating point numbers, fast, is a solved problem

Is it? Do you have any reference I can read?

Everything I've found is obviously flawed on the level of `(rand() %
1000)/1000.0`. I don't actually care about the C++ facilities that I'm
discussing in that post. I just hoped that they would be a good path to follow
to find one good source that discusses the various trade-offs between mapping
a set of random integers (or a stream of bits, or whatever) to floating
points. I've seen serious peer reviewed papers that just randomize the bits of
the exponent and mantissa and call that random or papers that do what I've
tried to do (which has its flaws) - count the number of representable,
uniformly spaced, floating point numbers between `from` and `to` but then
generate their random numbers with `random() % count`, which is like restoring
the Mona Lisa with crayons.

So I'm completely serious. I'm desperate for a good source, where can I find
it?

~~~
danieldk
I get the impression that it's indeed not a solved problem, and even
implementations that do not have to follow standards like C++ get it wrong. I
recently submitted this to Hackernews, which is a nice treatment of the
problems:

[http://mumble.net/~campbell/2014/04/28/uniform-random-
float](http://mumble.net/~campbell/2014/04/28/uniform-random-float)?

~~~
art4711
I actually got the link to his code yesterday and it's eerily close to what I
wrote a few months ago, but I rejected that approach because I didn't like
that any of the numbers in [0.5,1.0) were so much more likely to be returned
than 0.0. This gives the highest possible entropy and all numbers
representable by floating points in that range can be returned, but the trade-
off is that certain values will be returned vastly much more often than
others. It's this discussion of trade-offs that I'm actually missing.

I don't feel competent enough to write that paper/article/rant about the
trade-offs myself, I barely understand math and randomness, but I'm getting
tired of all the libraries out there that say "uniform[1] random[2] floating
points[3] [1] - not actually uniform [2] - not actually random [3] - terms and
conditions apply, YMMV, not valid where invalid".

~~~
danbruc
What do you mean when you say you don't like that some numbers are returned
more often than others? This seems completely inevitable for a uniform
distribution because floating point numbers don't have a uniform density
across the real line.

Only if you choose a range like [1,2] with a uniform density of floating point
numbers you will see all floating point numbers in the interval - except both
bounds of course - with equal probability.

You would have to not return some floating point numbers ever to see all
returned floating point numbers in the interval with equal probability but
that of course hardly qualifies as a uniform distribution if some perfectly
valid floating point numbers will never occur.

~~~
SamReidHughes
It totally qualifies as a uniform distribution. Is it the case that generating
[1, 2) generates a uniform distribution, but subtracting 1 from that does not
produce a uniform distribution?

