
Why is quality of pseudorandom number generators important? - DrinkWater
http://superuser.com/questions/712551/why-are-people-so-bothered-about-truly-random-numbers-instead-of-ones-generated
======
chriswarbo
When I see the word "random" I parse it as "unpredictable", since this is
usually what people mean (even if they don't realise it) and can clear up many
arguments. The concept of 'random' is interesting in some cases, but the vast
majority of the time it's _far_ stronger than what is required.

In this case, the question is:

> If a number generator is uniformly distributed, why might that not be
> 'random enough'?

If we rephrase this as:

> If a number generator is uniformly distributed, why might that be 'too
> predictable'?

This makes the flaw obvious: there are many ways to choose numbers uniformly
which are completely predictable. For example, choose the smallest value
first, then choose a value which is furthest from any previously-chosen value
(in, say, lexicographic gray-code order).

The point here is that 'uniformity' is not the property we care about; it is a
_consequence_ of the property we care about (unpredictability). If a
distribution were non-uniform, we would be better able to predict it (by
biasing out predictions to match the distribution), hence the least-
predictable distribution is necessarily a uniform one.

Other times this comes up:

> Are quantum measurements 'truly random'?

That's untestable, but we _can_ test whether they're predictable or not.

> This encryption algorithm requires a source of random bits.

It would work just as well with a source of bits that are merely
unpredictable.

> This strategy can only be defeated by a random opponent.

It can also be defeated by an opponent which you can't predict.

And so on.

~~~
sdevlin
That is a good way to think of it. Indeed, that is how cryptographic
randomness is typically defined.

Simply put: given the first k bits of a random stream, can you predict the
k+1th bit (more than 50% of the time)?

A generator that passes this test will pass any statistical randomness test,
but the converse is not necessarily true. For example Mersenne Twister is a
good generator from a statistical perspective, but it's actually quite easy to
recover its internal state by observing a small amount of output. (Around 20k
bits, if I recall correctly.)

~~~
dalke
> Simply put: given the first k bits of a random stream, can you predict the
> k+1th bit (more than 50% of the time)?

This definition isn't sufficient. Suppose you have a random stream, and one
predictor which asserts the next bit is "1" and another predictor which says
that next bit is "0".

As k increases, there's a nearly 100% chance that one of the two predictors
will be correct more than 50% of the time.

Even if you pick a single predictor, say, that all-1s predictor, there's an
almost 50% chance that for a given random stream and k that it will have
better than 50% predictive ability.

Just because Guildenstern's coin is heads 92 times in a row doesn't mean that
it's not random. Only that it's very unlikely to be random.

~~~
sesqu
> very unlikely to be random.

Speaking of word replacements, "random" does not mean "uniformly distributed".
An unfair coin toss is still random.

------
pygy_

        put on hold as primarily opinion-based by Xavierjazz,
        Olli, Excellll, HackToHell, Tog 3 hours ago
    
        Many good questions generate some degree of opinion
        based on expert experience, but answers to this
        question will tend to be almost entirely based on
        opinions, rather than facts, references, or specific
        expertise.
    
        If this question can be reworded to fit the rules in
        the help center, please edit the question.
    

Stack exchange is broken.

The question mixes up uniform distribution and randomness, but the answers are
factual. There's little to no opinion involved here.

~~~
df07
Give the system time to work. It's already been taken off hold.

~~~
cruise02
No! A question got closed! The whole thing must be broken!

~~~
pygy_
Hyperbole sometimes helps to get the point across...

The problem with Stack exchange is that it conflates heavy site usage with
domain experience and maturity.

Avidity for imaginary internet points and maturity are at best orthogonal.

~~~
bronsoja
As a general rule across the web, I'd agree with you. Although, I feel like on
StackExcahge (or at least SO) those with the highest scores typically gave
good answers in their domain as judged by their community and would not
commonly give any answers outside their domain that could be voted up simply
based on user rep alone.

I don't really have any hard data on this, it's opinion based on seeing
_extremely_ good answers from high rep people regularly, and not being able to
recall seeing poor answers from high rep users voted highly when they aren't
actually useful.

------
Udo
I've been running a virtual dice roller site for pen & paper roleplayers for
about ten years. I get these kinds of emails _all the time_. "Your RNG is
faulty, I just rolled three 20s in a row!" and stuff like that.

It's probably not fixable psychologically, we're just seeing patterns
everywhere. This also happens with physical dice by the way. If a player
rolled two or more abysmal results in a row it's common to say "hey, you
should really swap out these dice".

~~~
polymatter
I've toyed with the idea of not using a fair RNG for games for this reason.
People expect die rolls to converge to the average distribution much much
faster than they actually do. So perhaps you can make a RNG program that does
exactly that and does what the user expects, even if mathematically
ridiculous.

For example, if 20 is rolled then the probability of 20 being rolled again is
halved (so it has a 1/40 chance) while the probability of its inverse is
doubled (so probability of 1 is 1/10). This dramatically reduces the chance of
the three 20s in a row scenario and would very quickly regress to the mean,
just like people expect. Added flair for affecting the probabilties of all
possible values subtly (so that after rolling a 20, a 2 becomes more likely
and a 19 less likely).

I'm sure I can't be the only person to think of this.

~~~
tfgg
I've thought about the same for Settlers of Catan, which I think is too random
if you use dice, disrupting people's ability to do more than basic strategy.

I know there exist "dice" packs of cards, which contain the correct
distribution of rolls. If you shuffle a few of those together, you'll probably
get a nicer game.

~~~
ygra
That sounds like a nice idea, actually. I hate those rounds where 11 or 2
occur more often than 6 or 8. I know it _can_ happen, but it can make the game
rather annoying at times.

~~~
taeric
I actually think that makes the games more fun. Especially so when you have a
larger game so that there are people on those tiles.

For me, it makes the game feel less "solvable" by basic analysis. Sure,
statistically certain observations will hold. For the game you are in, though,
you have to be ready to adjust your strategy based on what has happened.
(Clearly not in anticipation of future rolls, but more based on the resources
you have managed to get, not the ones you wanted.)

~~~
ygra
Admittedly, we already tend to take the number planning a bit out of the game
by initially turning them on the other side and you can only reveal the
numbers to hexes you have built at (for the inital round of placing
settlements/streets they are revealed after everyone has placed both). Makes
for a nice variant where exploration holds surprises, or you can expand to
hexes that are already known.

~~~
taeric
I've played once with this variation. I liked it, not sure why we didn't let
it stick.

Of course, I haven't played at all in a long while.

------
jbert
PRNGs (pseudo-random-number-generators) take in a small amount of randonmness
(a seed) and produce a long stream of numbers from that.

Anyone using the same PRNG can look at the output of yours and try to put
their PRNG in the same state. If they succeed, the output of theirs will match
yours - now and in the future - unless you re-seed.

Two problems can occur here:

1) you seed with something they can predict. e.g. seconds since 1970 (or
microseconds since 1970). If they have a reasonable sample of numbers from
your system, they can try _lots_ of different seeds and see if they can find
the one which gives the same output as you.

2) PRNGs have "internal state", which is a bunch of numbers they mix together.
Some PRNGs have the property that if you can you can observe enough numbers in
a row from the PRNG, you can turn them back into the internal state locally
and then you can do the same thing as if you knew the seed (predict future
numbers).

~~~
chilldream
> you seed with something they can predict

One of my favorite examples of this ever: Once in Las Vegas a keno machine
mistakenly used a _fixed_ seed. Meaning once someone figured this out, he
could show up when the game started for the day and predict what the machine
would do with 100% accuracy.

[http://www.americancasinoguide.com/slot-machines/the-
worlds-...](http://www.americancasinoguide.com/slot-machines/the-worlds-
greatest-slot-cheat.html)

------
aestra
Take a look here:
[http://www.cigital.com/papers/download/developer_gambling.ph...](http://www.cigital.com/papers/download/developer_gambling.php)

They found a flaw in a REAL poker site because they were using a pseudo random
number generator (and a stupid algorithm) and were able to know the order of
the cards being used in the game!

>The RST exploit itself requires five cards from the deck to be known. Based
on the five known cards, our program searches through the few hundred thousand
possible shuffles and deduces which one is a perfect match. In the case of
Texas Hold'em poker, this means our program takes as input the two cards that
the cheating player is dealt, plus the first three community cards that are
dealt face up (the flop). These five cards are known after the first of four
rounds of betting and are enough for us to determine (in real time, during
play) the exact shuffle. Figure 5 shows the GUI we slapped on our exploit. The
"Site Parameters" box in the upper left is used to synchronize the clocks. The
"Game Parameters" box in the upper right is used to enter the five cards and
initiate the search. Figure 5 is a screen shot taken after all cards have been
determined by our program. We know who holds what cards, what the rest of the
flop looks, and who is going to win in advance.

------
lutusp
> Just say you code in any language at all to roll some dice (just using dice
> as an example), after 600,000 rolls, I bet each number would have been
> rolled around 100,000 times, which to me, seems exactly what you expect.

Bad example. Because the author specified "some dice", we can assume more than
one die, in which case some numbers have a greater chance to appear than
others, in a series of fair "random" throws.

It's a bad sign that the author of a piece about randomness isn't aware of the
systematic behavior of his chosen example, a behavior biased in favor of
certain outcomes.

[http://www.algebra.com/algebra/homework/NumericFractions/Num...](http://www.algebra.com/algebra/homework/NumericFractions/Numeric_Fractions.faq.question.200314.html)

------
StavrosK
The linked poker vulnerability article is really worth a read.

------
snarfy
The idea of something being truly random bothers me because it goes against
cause and effect. An effect happened without a cause. It violates laws of
conservation, entropy, determinism, and a lot of other things.

~~~
lutusp
> The idea of something being truly random bothers me because it goes against
> cause and effect.

Not at all. The fact that one cannot predict the next number in a random
sequence is irrelevant to the fact that, in the long term, that number has a
predictable relationship with other numbers in the pool of possibilities.

For a random generator of the digits 0-9, can I predict the next number in the
sequence? No. Can I say what the proportion of, say, 7s will be, within a
large list of outcomes? Yes, and the larger the list, the more reliable the
prediction.

If your position had merit, quantum theory, probabilistic on a small scale,
would be seen as violating cause/effect relationships, a rather important part
of physical theory. But, because of the mathematics of quantum probability,
individually unpredictable atoms become very predictable macroscopic objects.

------
mathattack
Eric Lippert's answer is great. It isn't the distribution of answers, it's the
predictability. In most cases it's ok if the distribution looks random, but in
some it turns out not to be ok.

------
Aardwolf
Ivy Bridge and newer CPU's of Intel have this new "random" instruction (I
still have a Sandy Bridge CPU tho). Is that one truly random enough for crypto
purposes? Thanks!

~~~
fhars
It depends on whom you trust: [http://arstechnica.com/security/2013/12/we-
cannot-trust-inte...](http://arstechnica.com/security/2013/12/we-cannot-trust-
intel-and-vias-chip-based-crypto-freebsd-developers-say/)

------
sturmeh
The issue is that if you run the same program twice without selecting a
suitably random seed you will make the same 60 million rolls.

If you can somehow control, predict or influence the seed you can predict or
influence rolls.

For example if the game uses JUST the system time as a seed, I can shift the
system time back to a specific position and run the application again to get
the same random number generation.

If you know the algorithm, for example you're playing an open source card
game, and you can determine the time (offset) on the server or the other
players computer you could cheat by calculating their random variables.

Now it really isn't an issue in games, but it's incredibly important in
security. Randomness makes up a huge component of encryption.

That's why you have applications that require you to move the mouse a bit,
hopefully randomly and they take in a whole bunch of other entropy fed in by
the system.

In Unix-like OS's you have /dev/random and /dev/urandom. /dev/random requires
a certain amount of entropy and environmental noise, and it blocks on reads
until it's satisfied with the output.

/dev/urandom does not block, but it gives pseudo random output. For the
purposes of security, /dev/urandom should NEVER be used. However neither are
truly 'random'.

~~~
sharpneli
But for any other purpose than generating private keys one should NEVER use
/dev/random. If you're not sure use /dev/urandom.

I've had to painstakingly explain to certain people why, as an example,
erasing a HDD from /dev/urandom is allright. And why their program that
simulates some random input should use /dev/urandom.

But no, they babble about true randomness and then complain why they get like
paltry few hundred bytes per second.

Even if you have a server running casino games just use /dev/urandom/. It
requires a total compromise of the server to get the internal state out of
that, and in that case it's easier just to change urandom into /dev/zero.

~~~
blueskin_
If you need a lot of entropy, it's good to use a hardware RNG to keep the load
down (and it improves quality).

~~~
sharpneli
You should be careful when using word like quality in this context. Some might
think that by quality you mean any kind of statistical property. E.g dice
rolls generated by either of them are somehow statistically different, which
they are not.

What does this mean in practice: If I give you 10 10MB files. 5 of them
created with urandom and 5 of them created using hardware RNG there is
practically no chance that you could differentiate which came from which,
barring knowledge of the urandom entropy pool.

But it is true that HW RNG could be useful just to keep CPU load down. By now
we know that those are backdoored by NSA, so you should still use /dev/random
as a source for private keys. So HW RNG is useful just to keep the load down.
Not to avoid any attacks.

This is so important that I'll have to repeat it again: The only difference
between urandom and random is that urandom is theoretically suspectible to an
attack which could allow prediction of the output values if the attacked knows
the internal entropy pool state. The statistical properties of both are the
same.

~~~
blueskin_
Agreed that some (e.g. RDRAND) are potentially untrustworthy, but others
aren't - for example, linux has daemons available that can source entropy from
audio or video noise.

