2. The dice roll example is not uniform distribution, I think this is a common pitfall when generating random integers of a range. `randomNumber % 6` results in a slight bias towards 0 and 1, since 2^31 % 6 == 2, there are more numbers in the range [0, 2^31-1] that map to 0 and 1 than those that map to 2...5. To make it uniform, for example, you should always discard if `randomNumber < 2` and regenerate another number for use.
On first pass, Benford’s Law looks a lot like Zipf’s Law.
What differentiates Benford’s Law from Zipf’s Law?
> It has been argued that Benford's law is a special bounded case of Zipf's law, with the connection between these two laws being explained by their both originating from scale invariant functional relations from statistical physics and critical phenomena. The ratios of probabilities in Benford's law are not constant. The leading digits of data satisfying Zipf's law with s = 1 satisfy Benford's law.
I’ll take some time to try and better understand your post.
(Since we can reasonably assume that randomNumber is a binary number, and thus would be balanced over 8 values instead of 6.)
I know that serious radio telescopes cost way more than any random person could afford to pay but I've certainly seen plans for smaller DIY homemade models.
The idea is to count the number of events (beta particles here) per time interval. Do this twice. If count A > count B, output a 1. If count A < count B, output a 0. If count A = count B, skip that result. Von Neumann came up with that trick.
Don't use the low-order bit of the count. That has a bias.
> Von Neumann’s originally proposes the following technique for getting an unbiased result from a biased coin :
> > If independence of successive tosses is assumed, we can reconstruct a 50-50 chance out of even a badly biased coin by tossing twice. If we get heads-heads or tails-tails, we reject the tosses and try again. If we get heads-tails (or tails-heads), we accept the result as heads (or tails).
Only practical use for me would be to confound markers trying to reproduce my RNG's in statistics assignments.
Your PRNG that reads from your telescope would need to compensate for this.
As far as using radiation to generate random numbers, check out https://www.fourmilab.ch/hotbits/
It reminds me of a story from Cryptonomicon in which a character mentions a secretary grabbing randomly spun number balls from a tumbling device while blindfolded (if I remember the objects right) and not liking the results when she had to write them down because they didn't look random enough to her, so she starts peeking and slightly "correcting", and thus ruins a number of one-time pads
The point is that you are going to do some kind of whitening anyway and you then have essentially three choices:
1) design something which requires only von Neuman-style whitening where there still are arbitrary parameter choices hidden in the hardware
2) Design some non-trivial, but still simple entropy-extraction/whitening algorithm (ie. take 16b sample and discard top 10 and bottom one bit).
3) just take the measurement results and pass it through some kind of CSPRNG or sponge function.
Third variant is what makes most sense for most applications because mostly you either don't care about the randomness that much or you want to use it for cryptographic purposes. And if you want to do cryptography then philosophical arguments about the cryptography-based whitening not being "truly random" do not make sense, because your application itself is based on belief that the crypto primitives used are "random enough".
* Bias in a bit-stream means any deviation from IID Bernoulli trials with p=0.5
* Von Neumann whitening addresses the IID Bernoulli case for p!=0.5 by looking at bit pairs. It takes the first bit when they differ, and no bits when they match. This works because (1,0) and (0,1) both occur with equal probability p(1-p).
* NIST wrote a remarkably accessible document 
From the brochure:
“ Photons - light particles - are sent one by one onto a semi-transparent mirror and detected. The exclusive events (reflection - transmission) are associated to « 0 » - « 1 » bit values.”
Here's a document from 1997 that looks at some hardware RNGs and how they fail: http://www.robertnz.net/true_rng.html
Modern PRNG's can be tested with Dieharder, TestU01 or STS and benchmarked. This article only talks about primitive old LCG's (not any good one) or MT.
Even a truncated 128-bit LCG has far better properties.
The homepage might come across as a a little overzealous (for example ChaCha quality listed as good rather than excellent), but generally has good points.
They recommend to use their xoshiro PRNG.
And the PCG author's response:
For example, for one of his arguments, he specifically chose a generator called pcg32_once_insecure, which the PCG author does not recommend due to its invertible output function!
Personally, I have read both arguments in detail and I would always use PCG or even a truncated LCG over xoshiro, which has a large size in comparison, potentially worse statistical properties, and no gain- faster in some benchmarks and slower in others.
But I am using xoshiro in my projects, because I thought xor was simpler than multiplication.
But there exist some new and very fast PRNG's which are fast and extremely good. Almost TRNG, wyrand eg.
* Creates a sequence of unique integers
* Uses prime numbers that are congruent to P = 3 (mod 4).
* A single iteration has noticable patterns but applying it twice already results in randomness that is sufficiently good for many use cases.
* It is "embarrassingly parallel", a simple mapping of randomValue = randomize(i). You can calculate unique and deterministic random numbers from input i in parallel threads with no sync between threads.
* Since it is a unique mapping of i to r, you can use it to shuffle data sets virtually instantly. Take the index of a value in the original array, and use it to compute the target index in the shuffled array.
* I've used it to shuffle up to 800 million items per second on a GPU, including the time it took to transfer the data from RAM to GPU. So without the IO, you could probably shuffle billions of values per second, probably mostly bound by GPU bandwidth. E.g., 700GB/s and each item is 70 bytes -> could perhabs shuffle 10 billion items per second.
The more I read it, the more confused the article appears to be (e.g. mersenne twister is NOT a good example of a modern or high quality PRNG). For more about secure random numbers in Linux, I'd suggest reading .
Some cryptosystems really do need uniform randomness (ECDSA) rather than just negligible probability of choosing values. Other cryptosystems depend on not reusing values, though the values could be predictable. Sometimes there are subtle shifts in these needs based on modes (AES/CBC vs AES/GCM is a good example).
What I was trying to say is that the kernel CSPRNG (exact mechanism depends on version) mixes together a bunch of things that aren't truly random from an information-theoretical perspective in order to produce uniform random output from the CSPRNG function - and that it doesn't actually matter that those sources aren't information-theoretically random. That'll teach me to comment in haste!
Modern chips ranging from the one in the Raspberry Pi to Intel CPUs have them too.
Example from matlab: https://www.mathworks.com/help/stats/generate-random-numbers...