
Random with care - dhotson
https://eev.ee/blog/2018/01/02/random-with-care/
======
saosebastiao
Great article, lots of things I've ranted about in the past, and lots of
things I've never considered.

An apocryphal story from a former finance professor who taught me about MCMC:
One of his former colleagues was working for a hedge fund in commodity futures
markets, and he had developed a monte carlo model for trading in a very
specific market that was working exceptionally well. Then one day it stopped
working so well...returns cut by 1/2 of their original level. It wasn't a
gradual decline, but rather one day it was one level and then overnight that
changed. He eventually got poached by a competing hedge fund working in the
same exact markets, and he found out that the reason his returns declined was
because one of the mathematicians at his new company had almost perfectly
reverse engineered his model by guessing his RNG seed, conveniently a number
in the name of his former hedge fund.

~~~
gfody
I once swapped out a counter-based distributed routing system where the
counters were not synchronizing quickly enough between nodes, with an RNG-
based one. It worked well for months until one fateful weekend when the client
called in a panic, the distribution was going 100% one way causing a terrible
catastrophe for them. It turned out the system was working properly but the
RNG had a very long, extremely unlikely yet technically possible run of low
numbers. To try and recover they had changed the weights from 20% to 80% and
finally 99% to no avail, the RNG was dead set on fucking shit up that day and
the day ended in disaster.

moral of the story: behind every RNG Cthulhu waits, sleeping.

~~~
munificent
[http://www.catb.org/jargon/html/R/Random-Number-
God.html](http://www.catb.org/jargon/html/R/Random-Number-God.html)

~~~
AceJohnny2
More recently, in the Hearthstone online card game community: "RNJesus"

~~~
slazaro
More generally, this term is used in basically every game that has a big RNG
component (lots of rogue-likes, for instance).

------
x1798DE
In the example about picking a random 3-d vector, it seems like the "draw
components from a Gaussian distribution" method is the most common, but I
don't really understand why you can't just pick two angles ([0, pi) and [0,
2pi) respectively) from a uniform distribution and interpret those as
spherical coordinates on a unit sphere.

Given that the "draw from a Gaussian and normalize" seems like the hard way to
do it and also is the only one anyone is suggesting, I assume I'm missing
something. Anyone know what the problem is?

~~~
adament
That is actually a really nice question, essentially it boils down to the same
trouble as he described with the taking each coordinate between -1 and 1 and
normalizing, in that you get certain preferred directions and not the uniform
distribution on the sphere. In your case you get that each of the circles
described by your first angle are equally probable. But when you look at a
sphere and trace them out, they have different lengths, hence they should have
different probabilities. The most extreme case is that the pole (a single
direction) has just as high a probability as the entire equator combined.

Just for prudence sake; one way of seeing why the normalization of n
independent gaussian draws works. Is that the multivariate normal distribution
in n-dimensions with independent coordinates is rotationally symmetric. One
can visualize this in 2 dimensions by making a graph of the probability
density and seeing it is invariant under rotations of the plane. This
generalizes to higher dimensions. This is exactly the property that any
direction is equally likely.

~~~
tzs
Suppose you are in a spaceship, and you want to pick a random direction to
travel. You want to do so with at most one pitch change, one roll change, and
one yaw change.

One could of course do this by picking the direction first, and then
calculating the necessary pitch/roll/yaw changes to point that way, but I
wonder if there is a good way to it without selecting the direction first? In
other words, can you do it just by using your random numbers to pick pitch,
roll, and yaw changes, assuming you only have the common random number
generator distributions available.

First thing that comes to mind would be to do a random pitch change in [-pi,
pi) then a random yaw change in [-pi, pi), but I think that is still going to
be non-uniform. It doesn't have the same problem as the lat/long approach
(because both pitch and yaw move on great circles, so you don't have anything
like the variable length latitude line problem), but it still favors some
points.

Including a random roll step in there changes it from two favored points to a
favored great circle (I think...visualizing this is hurting my brain), but I
suspect that no finite sequence of pitch, roll, yaw random steps determined by
independent random numbers with distributions that are not changing based on
prior selections can erase biases that stem from the fact that this approach
is starting with the ship pointing at a particular point and having a
particular initial orientation.

~~~
rzzzt
"Hot spots" of directions appear because the more your spacecraft pitches "up"
or "down", the less contribution yaw changes make on the final result. At the
extremes, the entire range of yaw values is effectively wasted on rolling:
[https://en.wikipedia.org/wiki/Gimbal_lock#In_three_dimension...](https://en.wikipedia.org/wiki/Gimbal_lock#In_three_dimensions)

gattr suggests in another comment to look for clues around global illumination
methods on how to solve this problem; his reference has the equations for
picking polar coordinates from two random numbers generated from a [0..1]
uniform distribution in section IV. B.

------
JadeNB
As with cryptography, it's probably dangerous to "roll your own random" _if_
the randomness matters. The author already mentions both the point about
cryptography, and the dangers of plausibility arguments about randomness: an
intuitively plausible way of picking a random point from a sphere doesn't give
a uniform distribution.

Playing around with distributions as the author does is surely fine if you
just want to get something that "feels right", but if the applications depend
on precise randomness properties, then, for example, "let's just multiply
these two PDFs" is dangerous (not least because the result is almost
guaranteed not to be a PDF, and may not even be normaliseable to one).

Although it surely won't be applied for random f (no pun intended), the
transformation from P(f(u) ≤ x) to P(u ≤ f^(-1)(x)) relies very much on f
being (strictly, in order to have an inverse) increasing—so it's even
dangerous to use f(x) = x^2 if we don't know that u is non-negative-valued.

~~~
mjevans
There's a difference between using a library (hopefully) maintained by those
more expert in a field and in using the output it provides.

Using the output of RNG still requires some skill, but that's about designing
a "fair" set of rules. Not a better way of obtaining truly random, random
enough, or plausibly pseudo-random numbers from a given system.

As the article discussed, there are times when a designer might want to add in
bias to make the response of the system fit within desired constraints.

One such example might be a load balancing daemon removing targets from a
distribution entirely if they are either unresponsive or under sufficiently
greater load (relative to other targets).

------
jhallenworld
It reminds me of this nice link on how to generate Gaussian from uniform,
using Box-Muller transformation:

[http://www.design.caltech.edu/erik/Misc/Gaussian.html](http://www.design.caltech.edu/erik/Misc/Gaussian.html)

------
dredmorbius
I discovered through experimentation recently that GNU awk (gawk) takes only
signed 32 bit values.

A loop of 10 million iterations of straight rand() output produces unique
values only about 2% of the time -- the other 98% of values are repeated
throughout the sequence. (This may be due to time-of-day as seed.)

    
    
        gawk 'BEGIN {srand(); for (i=0;i<10000000;i++) printf("%s\n", rand())}' | sort | uniq -c | wc -l
    

The srand() feature appears to take in signed 32-bit values only -- that is,
-2147483647 to 2147483648. If you require more than 4.2 billion distinct
sequences, this might be something to keep in mind.

This information may be well documented, though I find it in neither the gawk
manpage (yes, I'm aware FSF deprecates manpages, an idiotic move), nor the
online gawk manual, linked below.

Again -- if you're just playing around, this may not hurt you, but if you're
fond of gawk and think you can develop high-strength crypto or security code
using it, you're going to need to go beyond the built-ins at the very least.

Earlier:
[https://plus.google.com/104092656004159577193/posts/exhAxhd4...](https://plus.google.com/104092656004159577193/posts/exhAxhd4v2n)

------
adwhit
> [Tetris] simply shuffles a list of all 7 pieces, gives those to you in
> shuffled order, then shuffles them again to make a new list once it’s
> exhausted.

Interesting tidbit! So all the times I've furiously cursed at my gameboy
because I swear the 'I' is by far the rarest tetrimino and it hasn't given me
one for at least 20 turns... was just classic cognitive bias.

~~~
joshuamorton
This is, I believe, implementation specific. Modern, competitive Tetris games
do this, but not all versions do, and I'm not sure it was true
historically/originally.

------
0xdeadbeefbabe
> If your random number generator has fewer than 226 bits of state, it can’t
> even generate every possible shuffling of a deck of cards!

Anyone know why?

~~~
roberto
2^225 < 52!

~~~
reificator
Sounds fairly intuitive but just to be sure I ran the math (by which I mean I
typed them both into Google)

2^255 = 5.7896045e+76

52! = 8.0658175e+67

EDIT: I misread 2^225 as 2^255, thanks for the heads up from rzzzt.

2^225 = 5.3919893e+67, which is indeed smaller.

~~~
rzzzt
You entered that middle digit slightly differently:

2^225 = 5.3919893e+67

2^226 = 1.0783979e+68

~~~
reificator
Whoops, thanks for catching that. Powers of two catch my eye.

------
gattr
As for choosing random 3D directions with various distributions (and more),
the Global Illumination Compendium [1] has a lot of useful formulas.

[1]
[https://people.cs.kuleuven.be/~philip.dutre/GI/](https://people.cs.kuleuven.be/~philip.dutre/GI/)

------
gmiller123456
"Random numbers should not be generated with a method chosen at random" \--
Donald Knuth

~~~
nimish
"Random Number Generation Is Too Important to Be Left to Chance" \-- Robert
Coveyou

~~~
FabHK

       int getRandomNumber()
       {
          return 4; // chosen by fair dice roll.
                    // guaranteed to be random.
       }
    

\-- xkcd ( [https://www.xkcd.com/221/](https://www.xkcd.com/221/) )

------
venuur
The Just Simulate It principle is super practical. I once sat through a talk
by a principal engineer who’s talk was based on that premise.

