Hacker News new | past | comments | ask | show | jobs | submit login
random(random(random(random()))) (openprocessing.org)
394 points by Alifatisk on May 15, 2023 | hide | past | favorite | 146 comments



Since `random(x)` returns a random number in [0,x), it will always output a number that is less than `x` (the initial value of x is 1 in this case). Therefore, the outcome where repeated applications of `random(...)` to its own result has little shrinking effect is vanishingly small. It's the result of these repeated applications that is being subtracted from the length of the unit vector that is used to position each pixel. So the pixels naturally accumulate around that unit length. No broken crypto here.

It's sort of like asking, "What would happen if I multiply a bunch of numbers together that are guaranteed to be less than 1 (and even much less than 1 much of the time)?"


In more formal terms, since it is a uniform distribution you can write

  random(x) = random()*x
then you can easily see that

  random(random(random())) = random()*random()*random()
The resulting distribution is not equal to random()^3, because e.g. the probability that all 3 random calls give you a number <0.5 is only 0.125.


Oh. Thanks for the explanation, but I thought random was taking a seed, which would have made this much cooler, like this:

https://www.jstatsoft.org/article/download/v007i03/892


It depends on the language, some languages like JS does not take an argument. So, it is language-specific.


Clear and simple explanation


Took me a moment to understand that by random()^3 you meant the output of the random function to the power of 3, not some temporary function which we name "random^3" since that's also how we build it.


Exactly, like Perlin noise, the output is not indicative of the function. Works as intended. The beauty is in the fact that when using pseudorandom number generation and the right output function, you can simulate a lot of things. Star systems, planets, stock market games, aim handicap, and itano circus.


That's how many computer games generate their universes. It's called procedural generation [1].

One of the earliest and most famous examples is Elite [2]:

> The Elite universe contains eight galaxies, each with 256 planets to explore. Due to the limited capabilities of 8-bit computers, these worlds are procedurally generated. A single seed number is run through a fixed algorithm the appropriate number of times and creates a sequence of numbers determining each planet's complete composition (position in the galaxy, prices of commodities, and name and local details; text strings are chosen numerically from a lookup table and assembled to produce unique descriptions, such as a planet with "carnivorous arts graduates").

[1] https://en.wikipedia.org/wiki/Procedural_generation

[2] https://en.wikipedia.org/wiki/Elite_(video_game)


Procedural generation was also used to generate levels in the Pitfall game for Atari 2600. A single level was represented with a byte (!) and bits were controlling the number of rolling logs, holes, etc. There were 255 levels there and LFSR was used to go from one level to another, so the actual level bytes wasn't even stored in the ROM.

For more details and screenshots see this great blog post: https://evoniuk.github.io/posts/pitfall.html


Elite: Dangerous and Elite Dangerous: Odyssey - the latest versions of Elite - still use procedural generation in order to generate each one of the 400 billion star systems in their shared galaxy.

The proc gen is also consistent and persistent, based on some initial seed value decided upon at some point in the game's genesis; A CMDR who jumps their ship to some star system never visited before has that star system generated by their game client according to its algorithm - all the system bodies and their types and classes and whatever's on any planets, if anything.

Another CMDR who subsequently visits it will see the same thing the original CMDR saw.

It's a thing of beauty.


I urge anyone who has played the original Elite but not Elite: Dangerous to please go do that. The game is amazing, beautiful and an absolute treat if that type of gameplay is what you want. It's even better in VR but playing in an environment where you can't see the keyboard makes a HOTAS mandatory instead of recommended.

I am not joking when i say that E:D is in my top 3 games of all time.


I know, I wrote a procedural universe simulation in 2006. It’s like playing god.


> No broken crypto here.

Which implementation of random is processing using? There's a lot of random() implementations that aren't suitable for crypto (like JavaScript and Java).

1. https://docs.oracle.com/javase/8/docs/api/java/security/Secu...

2. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Good point. I should have said, "No evidence of broken crypto here."


FAQ

1. So what’s happening here?

Conceptually, this goes around the circle in steps of 1 degree, say, and for each ray plots a point at a distance from the circumference given by random(), or random(random()), etc.

So, conceptually,

  For phi = 1..360
    r = 1 - random() // etc.
    x = r*cos(phi)
    y = r*sin(phi)
(It might choose phi randomly uniformly instead, doesn’t matter)

2. What’s random(random())?

As explained in detail elsewhere, since the argument specifies the upper limit of the uniform, it equals random()*random() (ie the product of two independent uniform 0-1 random variables).

3. Why is the first image denser in the center, and the second uniform?

Imagine cutting the disk into 100 rings of equal (small) thickness. The area of a ring is approximately thickness times circumference of the ring, in other words, proportional to the distance from the center.

If we put the same number of points into each ring, the rings near the center (with less area) will have more points per area than the rings at the outside, thus appear darker.

If, however, we put more points into the outer rings, and in particular if we make the number of points proportional to the distance from the center, then we will get a more uniform picture.

4. Why 1-random(), rather than random()?

With the latter, we’d just get a series of disks with the bulk of the points concentrated in the centre, increasingly so. Less interesting than what’s here.


As a way of illustrating the distribution of a set of random numbers, it's an absolutely terrible visualization.


Lots of math all over the place computing the expected value :)

So, here's a derivation: if rand(x) returns uniformly a real in [0,x) or equivalently for expected value, in [0,x], then here are the first few iteration expected values from which you can see the general answer. 1 term

    E(r) = (1/1) * Integral_0^1 x_0 dx_0 = (x_0^2/2) from 0 to 1 = 1/2
2 terms

    E(r(r)) = (1/1) * Int_0^1  (1/x_0) Int_0^x_0 x_1 dx_1 dx_0 = 1/4
3 terms:

    E(r(r(r))) = (1/1) * Int_0^1 (1/x_0) Int_0^x_0 (1/x_1) Int_0^x_1 x_2 dx_2 dx_1 dx_0 = 1/8
And so on. The nth term is exactly (1/2)^n


I think I understand, but I also saw a comment above that specifically says (having done 3 iterations)

  The resulting distribution is not equal to random()^3, because e.g. the probability that all 3 random calls give you a number <0.5 is only 0.125.
https://news.ycombinator.com/item?id=35954931

Any thoughts? Am I mixing up two unrelated things?


There's a difference between taking a random() and then cubing it, and taking three different random()s and multiplying them together. The EV of the first is 1/4, whereas the EV of the second is 1/8.

You can prove the EVs for yourself, but an intuitive way to think about why the EV of the first is twice as much as the second scenario is to consider their probability density functions. When you cube a single random variable, the resulting PDF is skewed towards lower values, but not as dramatically as when you multiply three separate random variables together.


> not equal to random()^3

Let's clarify what this means, that's probably the confusion. Here's python code

from random import random

x = random()

z = x*3

That's the random()^3 term. But let's try to differentiate this

a,b,c = random(), random(), random()

z = a * b * c

You may notice that the z's don't match in these cases. The key difference is that in the exponent version we draw a single random variable (remember it is >0 and <1). Then we take this value and multiply it by itself 3 times (cube). This corresponds to rolling a dice once and then multiplying the value by itself 3 times (x*x*x). But in the other case we have 3 independent random variables. This is like rolling the dice 3 times and multiplying the 3 different values. Clearly the possibilities of achieving a specific number are going to be different in these situations. What I suggest doing is plotting a histogram of this data (10k samples with 1k bins will be sufficient). Make 3 plots (preferably using subplot). I went ahead and did this for you[0]!

== Worked out ==

Let's actually simplify this a bit[1] and work it out by hand: we'll look at r^2 vs r*r (where r is an independent random uniform variable). We'll look at an upper bound and the middle.

So in r^2 90% of our r's will be under 0.9, with 9% of values in r^2 being <0.81. Now the middle case, r=0.5. Half our values are above 0.25, and half are <0.25. From this we can see 3 bins: <0.25/0.25 - 0.81/>0.81. 50% of our data is <0.25, 40% is between 0.25 and 0.81, and 10% is >0.81. We can tell from these three bins that our data is pushed towards 0 and we have a long tail.

Now let's look at the r*r case. We'll need to write out more possibilities because we have more cases. 0.9*0.9=0.81, 0.5*0.9=0.9*0.5=0.45, 0.5*0.5=0.25. You see we've added an extra bin because the r's don't have to be the same (lucky for symmetry though). Our bins are <0.25/0.25-0.45/0.45-0.81/>0.81. This now corresponds to 25% of the data, 20% of the data, 36% of the data, and 19%. If we look at these bins, we can see that this is flatter. In the r^2 case we had 50% of our data under 0.25 but in the r*r case we only have 25%! Similarly in the r^2 case we had 10% of our data >0.81 compared to 19%! So we should expect a similar shape but flatter.

I hope this helps.

== Quiz to test your understanding: ==

We're going to make a bet and I'm giving you two options. In the first option I'll allow you to roll a dice, take the value, and cube it. In the second case I'll allow you to roll 3 dice and multiply all values. You win if your result is >63. Which option do you choose? What if you win when the result is <8? (I'm sure someone will post an answer)

[0] https://i.imgur.com/gQ5uqED.png

[1] I started writing out the full case but it got long winded so I deleted. I can redo if needed.


The re


Very well explained, that made it click for me, thanks!


Even simpler, every time you sample random you get the same number of bits of entropy. Sampling once and then cubing spreads the same entropy bits over the interval. Cubing thrice spreads 3x the entropy into the reachable area. Thus you know this guy is doing no net favor to the world.


If you're wondering why the dots approach the circumference and not the center, the length of the vector is `diameter * (1 - random(random(...)))`

It's interesting that the second circle is a pretty uniform distribution. `1 - random(random())` must be approximately `sqrt(random())`, iirc that's how you correct for the error demonstrated in the first circle (https://youtube.com/watch?v=4y_nmpv-9lI).


... That means each point is at a distance of sqrt(f(x)) from the circumference. And nesting random() n times is similar to random()^n, which explains why the first plot has them amassed in the center and the second has a more uniform distribution.

Why are they similar? For a rough idea, random() is a random number between 0 and 1, which on average is 0.5; random(random()) is a random number between 0 and on average 0.5, which on average is 0.25; and so on.


Oh man if my math teachers taught math with application, I would have been so much better off. It's so much easier to understand all of this after writing code for a while.


A better title would be "1 - Random(Random(Random(Random())))", than it's more understandable why the points lean to the circumference.


Yes exactly. I was wondering why the center was becoming more and more empty when the result was moving more to 0.


The cumulative distribution of random(random(...)) repeated k times is 1-(1-x)^k, so 1 - random(random()) should have a cumulative distribution of x^(2).

I think that should even make it exactly uniform, but somehow it doesn't look like it. I may have missed a bit somewhere.


Yeah you're wrong :)

So, random(k) should just be k rand() and so random(random()) should be just random() random().

One really fun way to do this is to rewrite using standard exp/ln rules as

    # three exponential random variables each with mean 1
    k1 = -ln(rand())
    k2 = -ln(rand())
    k3 = -ln(rand())
    # rand() * rand() * rand() written cryptically
    exp(-(k1 + k2 + k3))
That these are exponential random variables comes from the inverse-CDF Sampling Theorem.

The characteristic function E[exp(itX)] is a Fourier transform of the PDF and for a sum of independent variables it becomes a product,

E[exp(it(X + Y))] = E[exp(itX) exp(itY)] = E[exp(itX)] E[exp(itY)]

for an exponential distribution you have

E[exp(itX)] = ∫{0 → ∞} dx exp(-x) exp(itx) = 1/(1 – it).

So the Fourier transform of the PDF of a sum of n geometric random variables is going to be 1/(1 — i t)^n, which is the characteristic function for a gamma-distributed random variable,[1] and so the corresponding PDF is actually

f_n(x) = x^{n-1}/(n-1)! exp(-x)

[this is not too hard to see above when you realize that the exponent can be factored out as a repeated derivative of 1/(1 — i t) and in Fourier space a derivative corresponds to multiplication by the frequency ... derivatives with respect to t become multiplications by x].

So now we know that the distribution of the product U_n = exp(-G_n) where G_n is a Γ(n, 1)-distributed random variable, we work backwards from the definition:

h_n(u) du = the probability that U_n is between u and u + du

in which case G_n is between -ln(u) and -ln(u + du) = -[ ln(u) + ln(1 + du/u) ] ~= -ln(u) - du/u.

so that's f_n(x) dx where x = -ln(u) and dx = du/u, so that's

h_n(u) = (-ln(u))^{n-1}/(n-1)!, for 0 < u < 1.

and if you wanted the CDF it'd be an incomplete gamma function I think. For n=1 you can see that this recapitulates a uniform distribution but even for n=2 you have -ln(u) being the probability density function which I don't think will give you that sqrt() stuff that you need to get exact uniformity.

[1] https://en.wikipedia.org/wiki/Gamma_distribution .


Perhaps it looks nonuniform because we're not using literal points (our dots have nonzero area), so when they get too close they blend together and make the distribution look denser than it is in that area


1-random(random()) aka 1-random()*random(),looks slightly skewed toward circumference, and 1-random()^2 looks slightly skewed toward center.

If you plot their difference, 1-random(random()) aka 1-random()*random() is empirically skewed slightly larger than 1-random()^2, but I don't know how to analyze it exactly analytically.


Could be, but it is also possible I'm missing something somewhere, especially because I can't explain why min(random(), random()) looks different from random(random()) or random()*random().


An attempt to explain the difference without just calculating the things:

random(random()) has an identical distribution to random()*random() (it may even behave identically for a given rng state), although this is different to Math.pow(random(),2) since in that case there's 100% correlation between both parts which makes the expected value product bigger.

random(random()) is also distributed equivalently to

    x=random()
    y=random()
    while(y>=x) y=random()
    return min(x,y)
(the last line could also read 'return y') Comparing that to min(random(),random()), we can see that if the second call to random is smaller than the first, they will return the same result; otherwise, the program equivalent to random(random()) will return a smaller value, therefore the expected value of random(random()) must be lower that that of min(random(),random()).


Nice argument! And to clarify further, the equivalent implementation for `random(random())` could be viewed as a rejection-sampling implementation of random(random()).

Another intuitive rephrasing is that while both min(random(), random()) and random(random()) are guaranteed to be at most your first draw, with random(random()) your second draw is constrained to be within the interval of your first, whereas with min(random(), random()) there's no such constraint on the second draw. Thinking about P(x > y); x,y~U should then provide the illumination.


Ah I think I see it now, yes given X and Y with uniform distribution then the distribution of X is uniform given Y and X < Y, but the distribution of min(X,Y) is not uniform given max(X,Y).


According to the internet what you wrote is the distribution of min(random(), random()), but the distribution of the product is different.

Product: https://math.stackexchange.com/questions/659254/product-dist...

Minimum: https://stats.stackexchange.com/questions/220/how-is-the-min...

I can't explain it either, but I also don't think there is a reason they should look the same.


What language? What are the parameters for the random function here (seed? upper range value in a uniform distribution? stddev for a random distribution?) Why? What is this demonstrating?

I can grok a lot of the dimensional effects of randomness but without these things specified, the picture is rather meaningless to me.


1. The language is Processing

2. The argument is the high end of the range (0 to n [https://processing.org/reference/random_.html])

3. I dunno why, it seems like it's just cool. It seems to demonstrate that random(random(...)) collapses to 0, which is exactly what I expected but it's pretty cool.


Q4: What is it plotting? `random()` returns a float, right? Is it plotting... r*cis(theta), where r is the random invocation and theta varies uniformly across the circle, normalized so that all the circles are the same size (where cis(theta) = (cos(theta), sin(theta))?


Nah, it's plotting polar vectors, and it uniformly picks an angle. The random(...) is just the magnitude of the vector.


FYI, r*cis(theta) _is_ a polar vector. So it's the thing I said, but with the angle chosen randomly.


Gotcha. My bad, thanks for clarifying.


I'm a little confused then what the value of this demonstration is.

The expectation marches by half the distance towards the outer edge each time, so it becomes a soap bubble at a rate of 1/2^n

Which is nice and sorta cool but I'm not quite sure what the "whoa" factor here is. It's like one of those zip cinch tops for those bags. Sorta cool how it works on observation, really cool in detail, but also sorta...meh? At the same time?

Maybe I'm a bit jaded here? Admittedly, a continuous animation of this would be an improvement on the cool factor to me, personally.


Well, all I can tell you is that I'm having a good time discussing the nitty gritty of it in the comments here, and it made the gears of my mind turn in a pleasant way. It didn't make me go, "wow," but it did make me go, "hmm...."


Gotcha, yeah, that makes sense to me now.

Maybe the real article was the HN comments section that we made along the way....


> Maybe the real article was the HN comments section that we made along the way....

Nominating this comment for https://news.ycombinator.com/highlights


"Misunderstanding is the foundation of human discourse."—Jacques Lacan


It’s a kind of stone soup situation. With the OP providing the stone.


Speaking personally, a webpage with nothing beyond a cool or interesting sequence of images is already in the 90th percentile of good HN posts.


I think the sequence tends toward the edge at exp(-n), not 2^-n. The distance from the edge is a product of n I.I.D. variables, so the logarithm is a sum of n I.I.D. variables, and the central limit theorem gives us the result.

You can confirm in a python terminal (or anything else) that the product of n `random()`s decreases more rapidly than 2^-n.

So maybe there's some value in it after all :)


>The distance from the edge is a product of n I.I.D. variables

It is not a product of random variables; it is an iterated random variable. The output of one influences those higher in the chain. Redo your python code with rand(rand(rand...))) not rand() * rand() * rand()...

The expectation of composition of functions is not the composition of expectations, so there is some work to do.

For uniform over [0,1] for the innermost item, it becomes an iterated integral, with value (1/2)^n.


I did redo it, and the distributions are identical. It's just a very heavily skewed distribution, so the expected value is not very intuitive. I still think even E[x] decreases faster than 2^-n though. See my other comments.

Edit - Actually, I think E[x] = 2^-n, but showing this numerically for large n is impractical because of just how skewed the distribution is. It is simultaneously true that E[iter_rand(N)] = 2^-N and E[log(iter_rand(N))] = -N. Fun stuff :)


It's not about intuition. I computed it directly from the integral definition for expected value. I posted it elsewhere in the thread. It proves it for all N.

Also, you keep using the new incorrect claim that r() * r() * ... * r() is the same as r(r(...r())..). They are not.


I agree that E[r() * r() * ...] = 2^-n but that's not the final word. E[log(r() * r() * ...)] = -n and Pr(r() * r() * ... > 2^-n) is very small. That's the intuition part.

The two distributions are identical, I checked numerically and they have the same summary statistics. Furthermore, r() * n is a uniform distribution from 0 to n so by induction we can prove the two are identical.


For anyone who cares and read this far (and if so, I'm honestly surprised) I made a colab that demonstrates all this: https://colab.research.google.com/drive/1xCZ24cLPNOqyJD0kUTO...

To wrap it up:

- X == r() * r() * ... == r(r(...))

- E[X] = 2^-n

- median[X] ~ 1/2 exp(1 - n) ?!

Anyway, I thought it was interesting - especially how the median never quite "catches up" to the value one might naively compute from the CLT. I think there's some subtlety I'm missing there.


random() * 5 is equivalent to random(5) (the former being the classic way to implement the latter). So why wouldn't random() * random() be equivalent to random(random())?


Multiplying by a constant commutes with expectation. Multiplying by a random variable does not. It may in some cases, but intuition is not the way to check it.


The claim here is not about expectations. random(X) is identically distributed to random() * X, even if X itself is a random variable, because the randomness in random() is independent from the randomness in X. Indeed, the way you would implement random(X), if you only had random() and X available, would be to compute random() * X.

So in particular, random(random()) is identically distributed to random() * random(). (To be clear, this is different from random()^2, as pointed out elsewhere.)


As a quick sanity check,

  Mathematica 13.2.1 Kernel for Linux ARM (64-bit)
  Copyright 1988-2023 Wolfram Research, Inc.

  In[1]:= DI = Distributed;

  In[2]:= EV = Expectation;

  In[3]:= SS = Subscript;

  In[4]:= UD = UniformDistribution;

  In[5]:= f[1] :=
    EV[x, DI[x, UD[{0, 1}]]]

  In[6]:= f[n_Integer] :=
    EV[x, DI[x, UD[{0, f[n - 1]}]]]

  In[7]:= Block[{n = 42},
   EV[Product[SS[x, k], {k, 1, n}],
     Map[DI[SS[x, #], UD[{0, 1}]] &,
      Range[1, n]]]
    == Product[EV[x, DI[x, UD[{0, 1}]]], {k, 1, n}]
    == f[n]]

  Out[7]= True
where n = 42 ∈ ℕ is arbitrary.

Alas, Mathematica doesn't appear to support actual nested distributions of the form UD[{1,UD[{1,...}]}], so, e.g.,

  f[2]
  == EV[x_2,
       DI[x_2,
         UD[{0,
           EV[x_1,
             DI[x_1, UD[{0,1}]]]}]]]  
  == EV[x_2,
       DI[x_2,
         UD[{0,1/2}]
  == 1/4
which is trivial.


(As is the expectation of a uniform distribution in the first place. From the PDF for the uniform distribution (1/(b - a) for x ∈ [a,b], zero elsewhere) and the definition of EV as the integral of the product of the identity map and the PDF over the reals,

     EV[x, DI[x, UD[{a,b}]]]
  == Integrate[x * Piecewise[{
         {1/(b - a), a <= x <= b},
         {0, x < a || x > b}}],
       {x, -∞, ∞}]
  == Integrate[x * 1/(b - a), {x, a, b}]
  == ((b^2/2) - (a^2/2)) / (b - a)
  == (a + b)/2
so for a = 0,

  == b/2
no matter how many nested integrals it takes to define the real number b.)


Agreed, in this case they agree, but it's completely a numerical coincidence and not general. So people with a rough feeling that they compute rand * 5 should equate to rand(rand())=rand * rand should be admonished to check it extremely carefully. And the result is not true at the level of code.

In practice it's not true, unless extreme care is taken to ensure perfect uniformity, which is rare due to nonuniformity of IEEE floats. I've not seen a codebase in the wild that makes all the required pieces work. It's doable, I've written articles on it, but it's costly to do.

Another error in practice is that the method of taking a rand * 5 is not uniform due to numerical representations. So these are not identical in practice.

This the best way to analyze, which is trivially analyzable, is the nested form.

It's also why I wrote "may" in the previous comment.


It is true, both in the ideal mathematical sense, and also in the IEEE sense under the totally obvious implementation that most programmers would write without any special care:

    function rand(endpoint = 1) {
        return endpoint * Math.random();
    }
Here rand(rand()) would evaluate to literally the same number as rand() * rand() given the same state of the underlying generator. Someone who follows this chain of reasoning is perfectly correct to consider it intuitively obvious.

And rand() * rand() is a great form for analysis, since it allows us to take advantage of E[XY] = E[X]E[Y] for independent X, Y to compute the mean very quickly.

Clearly you followed a different chain of reasoning, and that’s fine; maybe you have a much more complicated generator in mind that takes more care to ensure ideal rounding at the limits of floating-point precision, and that’s fine too. But let’s not admonish anyone for making correct statements.


>It is true, both in the ideal mathematical sense, and also in the IEEE sense under the totally obvious implementation .....

That implementation is not uniform under IEEE 754. Here [1] is but one of hundreds of papers on the subject. To quote "Drawing a floating-point number uniformly at random from an interval [a, b) is usually performed by a location-scale transformation of some floating-point number drawn uniformly from [0, 1). Due to the weak properties of floating-point arithmetic, such a transformation cannot ensure respect of the bounds, uniformity or spatial equidistributivity.".

This is trivial to understand. Suppose you truly had a IEEE uniform [0,1) in the sense every possible float representation could occur and an interval of [a,b) (for representable a,b) is represented with probability b-a. Then if you multiply, say, by 0.99, then by the pigenhole principle some of the original values would coalesce, and some would not, so now you are no longer uniform. Some values are more likely to occur. This also happens when you multiply by 5, or 7, or any number, except the ranges, as they stretch, due to finite precision, do not map to the correct length ranges. The new range [0,5) has a different set of representable floats, and the precision of them moves in predictable but non-uniform ways, so the result is no longer uniform. So it is not true that rand() * N is uniform for IEEE 754 numbers.

Another simple way to see it: when you multiply by floats > 1, then some of the low bits get lost for some floats, so you cannot begin to hope the results are uniform. You're truncating/rounding, which loses information, so you by necessity only have an approximation.

And note that very few (if any) popular languages even manage to get [0,1) uniform (more details on 15 languages are in [1]). Their conclusion, Table II, to the question of whether the language rng provides uniformity (and spatial equidistributivity, needed to scale as the above), is no for all 15 languages. And this is because most do what you claim works. Even getting the [0,1) uniform as a start is extremely tricky and nuanced. The usual "uniform random int in [0,2^p-1] then divide by 2^p (or 2^p-1)" fails to be uniform most of the time and (as the paper shows) for pretty much every language implementation out there.

This issue is important for places needing super careful accuracy like physics sims, weather sims, many monte carlo algorithms where you don't want bad numerics screwing with your results, and in all of those places, they use custom, properly written methods to create such numbers. The general idea is that for getting a uniform in [a,b) you need to sample enough bits for the range, then carefully bit construct the floating point value - it cannot be transformed (easily) from smaller or larger ranges.

Table V lists various methods to draw floats in [a,b). Your method is listed (as I claimed above) as not being uniform (in fact, it fails all 5 criteria in the table, the only method of the 7 listed to score so poorly).

Here's [2] a paper showing the common method of using a PRNG then dividing by some number to get "uniform" in [0,1) is not uniform, which is well known [2]. So most implementations don't even start with uniform. This is a well known problem in random number generation. And the rand(rand()) == rand() * rand() relies on uniform distributions.

So please stop claiming things until you have checked them carefully. Fuzzy feelings about what you guess is true is no substitute for actually doing the work to check, and doubling down when someone points you on the right path is bizarre to me.

[1] https://hal.science/hal-03282794v4/document

[2] https://hal.science/hal-02427338/document\\*


I’m fully aware of how IEEE 754 works, and that there is—to quote my own comment—“a much more complicated generator that takes more care to ensure ideal rounding at the limits of floating-point precision”. If you’re looking for somebody clueless to argue at, you’ll need to go find someone else.

But these details aren’t relevant to this conversation:

• We’re not simulating the weather, we’re drawing dots in a circle—and anyone who cares about the dots being shifted from true uniformity by tiny fractions of the radius of an electron is going to have objections to any conceivable computerized representation of the numbers involved.

• It’s quite reasonable to implicitly treat this discussion as a question within the domain of true mathematical reals, and quite unreasonable to chastise someone as incorrect and careless for doing so.

• It’s still false that “rand(rand()) == rand() * rand() relies on uniform distributions”. This identity between distributions holds with the naïve multiplicative implementation of rand() and any underlying random generator, even a blatantly skewed one, in either the mathematical or IEEE reals, exactly. (And with a more precise IEEE generator, it will hold in as close of an approximate sense as one can expect any mathematical identity to hold when translated to IEEE, which from my perspective is just fine in this context. My only real interest in even acknowledging IEEE here is to support the point that commenters above were correct to consider the identity intuitively obvious, even when you try to throw in that well-actually.)


> We’re not simulating the weather, we’re drawing dots in a circle

Ah, so we don't care about mathematical rigor, yet we're discussing a proof of the pure mathematical behavior?

> It’s quite reasonable to implicitly treat this discussion as a question within the domain of true mathematical reals,

So we do care about mathematical rigor?

> It’s still false that “rand(rand()) == rand() * rand() relies on uniform distributions”.

Consider a distribution A(r) uniform on [0,r) except it never returns 1/4. This is a perfectly valid distribution, extremely close to being the same (in the mathematical sense) as the uniform distribution itself.

A(A(1)) will never be 1/4. A(1) * A(1) may be 1/4. Thus the claim needing uniform (or a proof for whatever distribution you want) requires careful checking. And if you don't like this distribution, you can fiddle with ever larger sets (in cardinality, then lebesgue measure zero) where you tweak things, and you'll soon discover that without uniform, you cannot make the simplification. And I demonstrated that the actual PRNGs are not uniform, so much care is needed to analyze. (They're not even continuous domain or range)

Yes, if you define the rng circularly, then you can make the assumption, but then you cannot make the claim of 1/2 per term, since they are not uniform. If you define the rng more carefully, then they often don't commute (maybe never?).

For example, defining the rng in another reasonable (yet still naive) fashion to keep more bits:

    float rnd(float max)
        i = int_rand(N) // 0 to N-1, try weird N like N = 12345678
        f = i*max
        return f/N
this does not satisfy rnd(rnd()) == rnd()*rnd() in general, as you can check easily in code.

Thus, to make the claim you can swap them requires for the mathematical part true uniformity (my claim way up above) and for the algorithm sense it requires knowing details about the underlying algorithm. Simply having a rnd(max) blackbox algorithm is not enough.


Of course we care about mathematical rigor.

> Yes, if you define the rng circularly, then you can make the assumption,

We’re agreed then.

> but then you cannot make the claim of 1/2 per term, since they are not uniform.

As mathematicians, we’re allowed to make multi-step arguments. The first step rand(rand()) = rand() * rand() holds for any distribution that scales multiplicatively with the argument, as we’ve agreed. The second step E[rand() * rand()] = E[rand()] * E[rand()] holds for any two independent distributions with finite expectation. The third step E[rand()] = 1/2 holds for this particular distribution. An ideal mathematical rand() satisfies all three of these properties, so this argument that E[rand(rand())] = 1/4 is valid in the mathematical reals. (And it’s approximately valid in the IEEE reals, which is all one typically asks of the IEEE reals.)


How do you mean?

I'm going based on the propagation of expectations through the system.

The expectation of a uniform distribution is half the bound. Unless there's some math shenanigans going on, I believe that the expected expected value of a distribution drawn from distributions should be 1/2 * 1/2 * 1 in this case.

Of course it's not a bound but right now I'm having a hard time determining otherwise.

Is there a mathematical motivation for e^-n here? That seems an odd choice since the uniform distribution is 0-1 bounded generally. However I could understand if it emerges in a few conditions.


e shows up because we're doing an iterated product (of random variables, but still). If you look at the central limit theorem, sum(log(rand())) tends to N * E[log(rand())] for large N. Well, what's E[log(rand())]? -1!

Like I said, it's fairly easy to test this in a python terminal. I encourage anyone who doubts me to try it :)


Alright, I tried it in the terminal and the numbers confirmed my earlier math. .5->.25->.125 mean as N->infinity. I chained the python uniform function like the above code in the originally linked post does and averaged the results.

The snark to me and the other commenters is a bit unnecessary in either case. I'm not really sure where you're getting a natural log or an iterated product of random variables in this instance.

Could you perhaps show where you're transforming uniform(uniform(uniform(uniform(0, 1)))) into the math you're showing above? I'm trying to follow along here but am having difficulty connecting your line of reasoning to the problem at hand.


Not trying to be snarky by any means. I'm sorry if you interpreted it that way.

I don't think the difference will show up for small N. This is an asymptotes thing. Try it for N = 100, that's what I did. For example:

  >>> np.product(np.random.random(100))
  5.469939152265385e-43

  >>> 1 / np.e**100
  3.7200759760208555e-44

  >>> 1 / 2**100
  7.888609052210118e-31

The underlying thing here is that random(random()) in this case is the same as random() * random(). So random(random(random(...))) is the same as random() * random() * random() and then the analysis goes on. And sure, random() * random() has a mean close to 1/4. But the dynamics change as N becomes large.

Edit - and just in case you doubt whether random() * random() * ... is a valid way of doing this, I also just checked the laborious way of doing it:

  >>> def foo(n):
  ...   result = 1.0
  ...   for _ in range(n):
  ...     result = np.random.uniform(0, result)
  ...   return result
  ...
  >>> foo(100)
  1.4267531652344414e-46
  >>> foo(100)
  7.852496730908136e-49
  >>> foo(100)
  1.3216070221780724e-41


> I'm sorry if you interpreted it that way.

is probably a good marker that it is time for me take my bow in this conversation, however, for an alternative approach I recommend SideQuark's comment on iterative random variables vs chains of multiplied random variables, which have different characteristics when defined as a sequence.


I checked his comment, I think he's incorrect on the approaches being different. Using my function `foo` that does the iterative approach, we can compare the distributions and they're fairly identical.

  >>> np.mean([foo(100) for _ in range(100000)])
  3.258425093913613e-33

  >>> np.mean([np.product(np.random.random(100)) for _ in range(100000)])
  8.814732867008917e-33
(There's quite a bit of variance on a log scale, so 3 vs 8 is not a huge difference. I re-ran this and got varying numbers with both approaches. But the iterated code is very slow...)

Note that the mean is actually quite a bit closer to 2^-100 even though the vast majority of numbers from either distribution fall below 2^-100. Even so, the mean for both is approximately a factor of 100 less than 2^-100. Suspicious! Although I think we've both burned enough time on this.


Yeah, the problem with your sampling experiment (as your suspicious observation makes it sound like you’ve started to realize) is that this distribution has a very narrow tail to 1 that you won’t hit by accident but nonetheless contributes very significantly to the computation of the mean.

You correctly computed that the mean of log(foo(n)) is exactly −n, and the median of foo(n) indeed shrinks like e^−n (more specifically like e^(1/3 − n), I think), but the mean of foo(n) is exactly 2^−n.


If this were stackoverflow, I'd mark this as an answer.

Also answers my question up above a bit as well, just from a different approach.

Many thanks for clarifying here and elsewhere! :)

HN is a lovely, curious, and interesting place.


Except that your logarithms can have any consistent base, including 2.

So I don't think using a logarithm to introduce a special dependence on e is warranted in this case.


Well the expected value of E[log2(rand())] is not the same as E[log(rand())] and therein lies the difference. See my sibling comment.

Like I said, you can check this in a python terminal.


So, what PRNG does Random() use? LCG (linear congruential)? LFS (lagged Fibonacci)? MT (Mersenne twister)? In each case it's a linear function of some global variable, not any kind of random number generator. Treating it as if it's a random number generator is silly. It's a pseudo-random number generator, pure algebra.


This is a drawing app using P5.js

Click the "</>" at the top to read the code.

There's a popular "Let's Code" YouTube channel called The Coding Train that uses it. It's quite acccessible.


The documentation suggests random() without arguments is not supported

    random(high)
    random(low, high)
https://processing.org/reference/random_.html


Oooh it’s a high value, I thought it was a seed and that this was demonstrating a flaw in the algorithm.


Same, at first at least.


Tips: The source code can be seen by clicking on '</>'; The random() function takes an upper limit of a desired uniform distribution (from 0 to x), this shows the effect of the distribution of those iterated random functions.

I think this isn't so intuitive because the polar coordinates already imply greater density near center (which could be corrected with a suitable map).


The reason it's counterintuitive is because the randomly chosen distance variable is 1 minus the nested randoms. Each random call becomes the upper bound of future random calls, so the more times you nest it, the closer to zero you're likely to be. Subtracting that from one puts you close to the unit circle.


right, if the points are distributed in a square then the distribution looks more like what one would expect:

(i am to lazy to register to make a fork, here is the code change):

replace:

  let a = random(Math.PI*2);
  let r = d * (1 - random());
  point(cos(a)*r, sin(a)*r);
with:

  let x = random(d);
  let y = d * (1 - random());
  point(x, y);
in every block

you could even simplify it to:

  let x = random(d);
  let y = random(d);
  point(x, y);
and for more fun apply the nested random() call to both axis:

  let x = random(random(d));
  let y = random(random(d));
  point(x, y);


Yup, that's way more intuitive, and removes the artifact of the first circle looking like the points are clustered around the middle:

I forked it: https://openprocessing.org/sketch/1929579


I forked it with similar idea: https://openprocessing.org/sketch/1929687


heh, neat. next up a 7x7 grid with the even distribution at the center :-)



Just whipped this up in Linqpad real quick to see how quickly random(random(...)) tends to converge to 0.

Fun lil script.

https://media.discordapp.net/attachments/519790546601115649/...

EDIT: Realized I cut off the full script. First two lines are:

  var ran = new Random();
  var counts = new Dictionary<int,int>();


In a related vein, there's a Numberphile / 3B1B video about the pitfalls of plotting randomness in a circle:

https://www.youtube.com/watch?v=mZBwsm6B280


It's almost criminal to post things like that without any annotations.

The first picture plots points where radius and angle (in polar coordinates) are uniformly distributed. Of course, this is not uniform in the disc.

I don't have the time right now, but if I do later, I'll type up the math behind what a uniform distribution in the disc looks like, and what's going on in other pictures.


the bigger crime is that this is plotting

    1-Random(Random(Random(Random())))
and not

    Random(Random(Random(Random())))


In Warcraft we call this deathrolling, and it’s a fun way to gamble away all your gold.


can you elaborate on that please? how do you apply this topic to a game?


In World of Warcraft, there are text-based commands that can be executed in the chat interface, similar to command line tools in a computer terminal. The "/played" command, for example, will give you the total time spent playing on a specific character.

One such command is "/roll". When this command is followed by a number (n), it generates a random number between 1 and n. This is akin to a digital dice roll, with the number specifying the number of sides on the dice.

For instance, you might decide to start a game with another player where you both start with a number of 500 and wager 100 gold. You would initiate the game by typing "/roll 500" into the chat, which would then generate a random number between 1 and 500. Let's imagine it gives you 137.

The other player then uses this number as the argument for their roll. So they would type "/roll 137". This process continues with each player using the result of the previous roll as the new argument for the "/roll" command.

The game ends when a player rolls a 1, and that player is deemed the loser.


[flagged]


Can we please ban this kind of comment on HN? Anyone here who is curious could have used chatgpt for themselves. This doesn't really contribute anything. Since the OP can be assumed to not know what he's talking about (since they had to use chatgpt to explain) we can assume that they also could not fact check chatgpt's answer.


I disagree. The "ChatGPT: " signifies the potential for inaccuracy and I would have appreciated someone answering so I don't have to. I wonder how many repeat lookups happen because of a perceived societal issue with providing AI answers that leads to groups shadow banning them. Imagine if I didn't put "ChatGPT :" how many people here would have upvoted me instead of flagging me. Interesting lessons all around.


I will say, I appreciated the preface.


Love that openprocessing is being featured here! (I checked and this is the first big hit on HN). It is a great open source coding site with great coding artists, who share their creations openly. My personal favorite is Okazu, see an example here: https://openprocessing.org/sketch/1653811


Sorry to be blunt, but what is of interest here?

Aesthetics?

P.S. In case anyone is confused, the arg to random is the upper limit (not the seed), and r = diameter * (1 - value)


I find it cool that this website lets you upload code for visualizations, and then lets any user modify that code and re-run it. It's like if Gist let you render images from code, rather than from just SVG.


Yeah, this was a very fun illustrative view of why this is a bad idea.


Sorry, an illustration of why letting you modify and run code is a bad idea? How does this show it's a bad idea?


I think he means why it's a bad idea to chain random() calls.


Yeah the code under the </> tab is very important, on first blush it looks like random isn't a uniform distribution for some reason. It's just because they plotted it with polar coordinates. It would be much clearer what's actually going on here if an XY plot was used.


No, it's because random(random(random())) is iterating the upper limit, not the seed.


This is the answer. I think the author is confused, as most platforms do have the seed as the lone parameter to their Random function.


It's still the case that the circular plot is pretty but obfuscating.


It is uniform I'm pretty sure, that's what plotting a polar coordinates look like when you use a uniform distribution for the length.


Isn't that what I said?


Oh I misunderstood, my bad. Upon rereading I see what you meant.


Most things are interesting.


From what I understand (and creatively assume) the concentration in the middle (on the top left panel) is because of mapping random (uniform distribution) in Cartesian space to polar space.

Not sure I can explain the effect on the other panels since the first iteration is already based on assumptions.


Thanks I was curious about the concentration in the middle!


It's a funny one.

On the first look it looks that random is broken then you realize, no random(n) is a random value between [0;n) so it's behaving exactly right.

It also shows really nice how "naively" choosing random points in a non rectangular shape can easily lead to a non uniform distribution.


> Provides interestingly titled code example "random(random(...."

> Accidentally(?) shows nonuniform-looking distribution of points from uniform random() due to polar coordinate conversions

> Shows effects of nesting random calls in a way reminiscent of security papers showing design flaws, but it's not really showing that

> Refuses to elaborate further

The more I think about this, the better it is. The imperfection is what makes this little bit of code so interesting to us nerds, while even trolling some people, but we have no idea what the intent was. It's great.


It's still a uniform distribution of points, it's being warped by converting to a non uniform space and back to a uniform space.


It's not a uniform distribution of points after the warping. I'm pretty sure every bounded distribution over a compact space is uniform under some warping.


Fork wrapping the random radius in sqrt() so the first disk is uniformly distributed

https://openprocessing.org/sketch/1929418


This would've been a lot less confusing if they'd used

let radius = d * random(random(random()))

instead of

let radius = d * (1 - random(random(random())))

But, if so, probably so much less confusing that folks wouldn't be talking about it here, to be honest.


Kinda cool. A little feedback:

- random(x) made me think x was the seed. Maybe call it random(0, x) to make it clear to most that you mean min 0 max x.

- The random() probably looks nonuniform to most people. Looking at the code, it's a random angle and length in polar coordinates, with the angle not getting any additional random() wrapping in the other examples. XY coordinates would be clearer.


> `random(x)` made me think x was the seed.

Fair, but that would be bad design in my opinion. In general you want the random() function to be usable multiple times to get the same sequence after setting the seed once, and not have to specify the seed every time.


Yeah, neither assumption makes much sense. Seeding should be a separate func, and range random should take two args. Just for some reason seed is the first thing that comes to mind.

But now I see this page has editable code and whatever language it's in does have random(x) taking one arg. So fair enough.


It’s in Processing. Its random function can take one arg (the max) or two args (the range). So what you suggested would be possible.


Yep, just about what one would expect from randomly cutting the top of your uniform distribution off multiple times!

What's interesting is the transforms used to sample in some new space from e.g. uniform random inputs. For instance, disks [1] or hemispheres [2] or an arbitrary density function [3]. Common stuff in light transport simulation, and easy to get wrong.

[1]: https://stats.stackexchange.com/questions/481543/generating-...

[2]: https://alexanderameye.github.io/notes/sampling-the-hemisphe...

[3]: https://en.wikipedia.org/wiki/Inverse_transform_sampling


This would be way clearer as a Cartesian plot with the output of random as X, and sample number as Y.


I think there is a link with curse of dimensionality here.

As you add more randoms, most of the points lie close to the surface.

This is also the case in high dimensional spheres: most of volume lie near the surface of the hyper sphere.


This was my immediate thought as well! Can someone more mathematical say if this is an unreasonable thought?


As one increases more dimensions, it becomes increasingly likely that at least one of them will be an "extreme" value (i.e. point is at the edge of the hypercube ).

In the random() * random() * random() ... , once you draw at least one number close to 0 (chances will increase as you add more randoms), then your output collapses to a very small number. And in this visualization, 0 is shown as a point at the edge of the circle.


It depends upon what metric you use to measure it.

Of course with Euclidean it will as that seems to match our expectations for low dimensional space.

I'm not a mathematician so there is not much more that I could say on the matter at this juncture.


The volume of an n-ball has been calculated for other non-Euclidean distances.

For p goes to infinity, the unit "ball" becomes a "cube" of two units wide per side.

https://en.wikipedia.org/wiki/Volume_of_an_n-ball#Balls_in_L...

https://www.johndcook.com/blog/2010/07/02/volumes-of-general...


Many thanks, I've been wondering about this one for quite a while. <3 :))))


I like using Math.min(nextInt(), nextInt()) for non-uniformly distributed random data.


Wouldn't a square be less confusing than a circle for a uniform distribution?


Where "random" is defined as:

    let a = random(Math.PI*2);
    let r = d * (1 - random());
    point(cos(a)*r, sin(a)*r);
(click "</>" in the submission to view source).


Random is always interesting.

Here is an example I made to use logistic map for music note generation:

https://glicol.org/demo#chaos


I see nothing but a light blue screen in Android Firefox.

Can someone describe what this is supposed to do?


I was hoping it was the seed! That it is the max is less interesting


I'm intrigued by it


random(random()) seems to have the most even distribution. Idk why?


It’s because it doesn’t really make sense to plot the results of random() in a circle like this. There’s the same number of points with radius between 0 and 1/2 as between 1/2 and 1, but they have to fit in a space with 1/3 the area as the outer band. So it’s naturally more dense.

The fact that random(random()) /looks/ more even in this visualization shows that the actual distribution is not.


it's an illusion created by the polar plot. the most even distribution is in the first image. in a polar plot, two points that are eg 5degrees apart appear closer together in the center than they appear at the edge of the circle, so the center looks denser even though the angular distance of the points is the same


which I could fork on this site without creating an account


just in terms of pure randomness (which is not what this is testing), random(random()) doesn't make more sense than random(). If your random number generator is good, one is enough. If it's not good, multiple times is not going to help.

I'm putting this in the past tense, it randumb enough times for me.


The argument to random() in this case is the upper bound, so `random()` and `random(random())` are different.

See: https://processing.org/reference/random_.html




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: