
Understanding "randomness" - giu
http://stackoverflow.com/questions/3956478
======
drblast
The top answer is an excellent demonstration of the central limit theorem and
why the question's assumption that rand() and rand() * rand() are "equally
random" is wrong.

The central limit theorem is one of the most initially surprising and
fascinating things in all of math. Please support the central limit theorem by
learning about it on Wikipedia, or wherever fine theorems are discussed.

Edit to add: If you haven't learned about this enough to know that
implementing systems that depend on randomness, like cryptographic systems, is
really, really difficult and error-prone, please just use a standard library
instead.

It's the most counter-intuitive thing in the world that rand() * rand() is
less secure than rand(); shouldn't it be twice as unpredictable? But it's not.
Cryptography and related fields are rife with counter-intuitive gotchas.

~~~
hvs
My favorite quote from that page:

 _Don't take Statistics and Intuition to the same party ...._

~~~
Eliezer
It's my favorite quote too, but intuitions, unlike probability theory, can be
changed.

------
samd
Anyone interested in understanding randomness should endeavor to understand
what probability actually is. As a Bayesian I think that probability is not a
property of any object but rather a subjective measure of your own confidence,
your own knowledge. To say something is random is just to say that you lack
enough information to predict any one outcome over another. But everything
could be predicted perfectly (barring quantum events) with enough knowledge.
If I knew that a coin was weighted to favor heads my degree of belief in the
outcome of the toss would become stronger. The probability for me would
become, say, 75% heads, while for the observer without such knowledge it would
be 50%.

Here's a good article on probability theory for those interested:
<http://plato.stanford.edu/entries/probability-interpret/>

------
iskander
>@Thilo No, it doesn't... ? If a random variable is uniformly distributed in
the range (0,1), and you sample the variable n times, and take the sum, it
will just be uniformly distributed in the range (0,n). – user359996 12 hours
ago

It's so easy to sound certain and be wildly wrong.

------
danger
Why has nobody in this thread or the stack overflow thread linked to the
Wikipedia page on entropy?

<http://en.wikipedia.org/wiki/Entropy_(information_theory)>

~~~
cdavidcash
This is because Shannon entropy is basically useless when it comes to proving
anything about randomized algorithms or cryptography.

~~~
danger
How is that related to the SO question? The post is asking about how to tell
if one distribution is "more random" than another, which is what entropy is
all about.

~~~
pjscott
Consider a random number generator that produces integers between 0 and 15.
Here's a really crappy algorithm:

1\. Start with a truly random seed between 0 and 15.

2\. Increment it each time to generate a random number, modulo 16.

Suppose you start at 11. Your sequence of random numbers will be 11, 12, 13,
14, 15, 1, 2, 3 ....

This is obviously not very random. But look at the entropy of it. All values
between 0 and 15 are equally likely, so this will have the maximum possible
entropy: 4 bits.

The problem here is that, for entropy to be an accurate measurement of
information content, you have to assume that you're measuring independent
identically distributed random variables. The outputs from a pseudo-random
number generator are not independent.

~~~
danger
If we are interested in a _sequence_ of pseudorandom numbers, then we should
be talking about the _joint_ entropy of the sequence. But information theory
is plenty capable of describing "how random" a sequence is.

The generator you are talking about is far from producing a uniform
distribution (which would have maximum entropy) jointly over the n-dimensional
hypercube that would represent a sequence of n draws from your generator, so
it has less entropy and is thus "less random" than the distribution you'd get
from n independent draws from a perfect rand().

~~~
pjscott
You're right (have an upvote); I was warning more against rash misuse of the
concept of entropy, which I see quite a bit.

------
GavinB
More interesting to grasp is that even rand() * rand() / rand() is less evenly
distributed than rand().

It does seem like it should be possible to remap the numbers onto an even
distribution if you knew how uneven to expect it to be. That would probably
lead to some loss of fidelity at the limits of accuracy of the variable,
however.

------
chrisaycock
I like the comments about dice. With one die, an outcome of 4 or 2 is equally
likely. With two dice, an outcome of 4 is three times more likely than 2.

So while one and two dice games are both random (as in unpredictable), only
the one-die game has uniformly distributed outcomes.

