Hacker News new | past | comments | ask | show | jobs | submit login
How Do Computers Generate Random Numbers? (digitalbunker.dev)
65 points by aryamansharda 14 days ago | hide | past | favorite | 50 comments

1. You don't turn PRNG into "true" RNGs simply by picking seeds from environmental randomness. The seed is just the initial state, as long as the output is generated by a deterministic algorithm, by definition it's a PRNG. At the very best you can make a CSPRNG, but not a "true" RNG.

2. The dice roll example is not uniform distribution, I think this is a common pitfall when generating random integers of a range. `randomNumber % 6` results in a slight bias towards 0 and 1, since 2^31 % 6 == 2, there are more numbers in the range [0, 2^31-1] that map to 0 and 1 than those that map to 2...5. To make it uniform, for example, you should always discard if `randomNumber < 2` and regenerate another number for use.

#2 reminds me of Benford's Law, which I recently learned about and find truly fascinating.



On first pass, Benford’s Law looks a lot like Zipf’s Law.

What differentiates Benford’s Law from Zipf’s Law?

From https://en.wikipedia.org/wiki/Zipf%27s_law :

> It has been argued that Benford's law is a special bounded case of Zipf's law,[22] with the connection between these two laws being explained by their both originating from scale invariant functional relations from statistical physics and critical phenomena.[24] The ratios of probabilities in Benford's law are not constant. The leading digits of data satisfying Zipf's law with s = 1 satisfy Benford's law.

Thank you!

I’ll take some time to try and better understand your post.

Re 2.: OTOH, the bias is extremely tiny, there's only 2 out of 2^31 cases which are problematic, or one in a billion. A generic RNG obviously shouldn't do it this way, but for most applications it would be good-enough really.

The article claims that "it is indeed a uniform distribution" so I wanted to point it out.

To make a dice roll uniform, wouldn’t it be easier to do randomNumber % 8 and only use it if it is from 0 to 5, and discard and re-roll otherwise?

(Since we can reasonably assume that randomNumber is a binary number, and thus would be balanced over 8 values instead of 6.)

That depends on the modulus being divisible by 8, which is not always the case. A common example is modulus = 2^31 - 1 [0] which is prime.

[0] https://en.wikipedia.org/wiki/Linear_congruential_generator#...

Ah very true, an often ignored thing, thank you for sharing!

Just out of layman's curiosity, what would be the problem or difficulty of somehow connecting a more compact type of radio telescope that detects some level of cosmic background radiation and hooking that up to a computer. Would this not be a guaranteed way of generating truly random numbers for any need flawlessly?

I know that serious radio telescopes cost way more than any random person could afford to pay but I've certainly seen plans for smaller DIY homemade models.

A small radiation source works well, although the data rate is low.[1]

The idea is to count the number of events (beta particles here) per time interval. Do this twice. If count A > count B, output a 1. If count A < count B, output a 0. If count A = count B, skip that result. Von Neumann came up with that trick.

Don't use the low-order bit of the count. That has a bias.

[1] https://www.fourmilab.ch/hotbits/

The Von Neumann extractor is interesting.

> Von Neumann’s originally proposes the following technique for getting an unbiased result from a biased coin :

> > If independence of successive tosses is assumed, we can reconstruct a 50-50 chance out of even a badly biased coin by tossing twice. If we get heads-heads or tails-tails, we reject the tosses and try again. If we get heads-tails (or tails-heads), we accept the result as heads (or tails).

That is a fantastic resource. It never occured to me that such a thing would exist, but now that I know it does, I find it hard to think of how it wouldn't - since so much of the internet would rely on these kind of tricks to keep working.

Only practical use for me would be to confound markers trying to reproduce my RNG's in statistics assignments.

Any stochastic data source may not (and almost certainly isn't) evenly distributed--it'll probably follow some normal distribution.

Your PRNG that reads from your telescope would need to compensate for this.

As far as using radiation to generate random numbers, check out https://www.fourmilab.ch/hotbits/

Your claim makes me a bit skeptical unless im misunderstanding something here... I'd assume that a data source of pure natural radiation would be genuinely random even if its distribution isn't even, and that using anything in your computer to compensate for it would actually do the opposite: reduce randomness with damaging bias.

It reminds me of a story from Cryptonomicon in which a character mentions a secretary grabbing randomly spun number balls from a tumbling device while blindfolded (if I remember the objects right) and not liking the results when she had to write them down because they didn't look random enough to her, so she starts peeking and slightly "correcting", and thus ruins a number of one-time pads

The signal coming from typical radiation detector is analog pulse train which you have to somehow convert into useful binary data. Simply sampling this analog value at some regular interval will create samples that are hugely biased to small number of values (and distribution between these values has more to do with measurement uncertainty than with the supposedly random phenomena you are trying to measure) so you need some kind of processing step to get useful random numbers. Typical radiation based TRNG works by comparing the analog value from the sensor against some kind of threshold (usually in analog hardware) and producing stream of digital samples from that which is then either directly passed through von Neumann whitening algorithm or converted into stream of pulse times from which only few low-order bits are taken and whitened (in fact, the end result is mostly same as whitening the bit stream directly, but timing the pulse lengths is slightly more efficient). One can argue that "quantum TRNG" based on semi-transparent mirror produces unbiased output, but that is not true in practice because of manufacturing (the mirror will not be perfectly semi-transaparent) and implementation (producing and measuring single photons is hard) constraints.

The point is that you are going to do some kind of whitening anyway and you then have essentially three choices:

1) design something which requires only von Neuman-style whitening where there still are arbitrary parameter choices hidden in the hardware

2) Design some non-trivial, but still simple entropy-extraction/whitening algorithm (ie. take 16b sample and discard top 10 and bottom one bit).

3) just take the measurement results and pass it through some kind of CSPRNG or sponge function.

Third variant is what makes most sense for most applications because mostly you either don't care about the randomness that much or you want to use it for cryptographic purposes. And if you want to do cryptography then philosophical arguments about the cryptography-based whitening not being "truly random" do not make sense, because your application itself is based on belief that the crypto primitives used are "random enough".

Facts I looked up to understand this comment:

* Bias in a bit-stream means any deviation from IID Bernoulli trials with p=0.5

* Von Neumann whitening addresses the IID Bernoulli case for p!=0.5 by looking at bit pairs. It takes the first bit when they differ, and no bits when they match. This works because (1,0) and (0,1) both occur with equal probability p(1-p).

* NIST wrote a remarkably accessible document [1]

[1] https://csrc.nist.gov/csrc/media/publications/sp/800-90b/dra...

If the distribution non-evenness result is, say, 1000 bits of entropy in every 1024 bits of RNG output, but we save circuitry and have simple auditable RNG then it's worth it. I'm not going to lose sleep over my RSA key having effectively 4000 bits instead of 4096 when it was generated by maximally simple and transparent process. As compared to complicated crypto whitening that everyone thinks "must have".

The bias of unwhitened output of almost any TRNG is hugely biased. For almost anything based on detecting some kind of radiation (which at the same time are exactly the kinds of TRNGs that are truly random according to current understanding of physics, not only practically impossible to predict) you are on the order of one bit of entropy per 1024bits of output, not 1000.

That is an overkill. Measuring time between geiger detector events with 555-like counter is okay with me, it's still simple enough to be fully auditable and the bias is not so tragic.

You’re not too far off. There’s for example the Quantis random number generator, which uses photons:


From the brochure:

“ Photons - light particles - are sent one by one onto a semi-transparent mirror and detected. The exclusive events (reflection - transmission) are associated to « 0 » - « 1 » bit values.”

The problems with that approach come from trying to de-skew the data, and then from sampling the data into whatever port you're using.

Here's a document from 1997 that looks at some hardware RNGs and how they fail: http://www.robertnz.net/true_rng.html

For one thing it would not work if the attacker has physical access to your machine.

The chart shows the distribution of "10,000 dice rolls," yet each potential value, 1-6, has between 6000 and 7000 hits, indicating the true number of rolls to be between 36,000 and 42,000.

You're absolutely right. The attached code was for 40K and I just mislabeled the graph. I've updated it - thanks for catching that!

The Mersenne-Twister approach should certainly not be studied anymore, even if some popular old libraries still use it. It fell long out of favor, is too slow, and not good enough.

Modern PRNG's can be tested with Dieharder, TestU01 or STS and benchmarked. This article only talks about primitive old LCG's (not any good one) or MT.

When you say Mersenne-Twister isn't good enough, what are the other shortcomings apart from speed? It seems that even modern versions of Python are continuing to use it...

Its slow, large, and statistically worse than modern PRNGs- and jumping ahead takes longer and a more complicated algorithm.

Even a truncated 128-bit LCG has far better properties.

See https://www.pcg-random.org/index.html

The homepage might come across as a a little overzealous (for example ChaCha quality listed as good rather than excellent), but generally has good points.

However, this page claims PCG is rather bad: http://pcg.di.unimi.it/pcg.php

They recommend to use their xoshiro PRNG.

That author has a history of extreme bias and almost-vindictive personal attacks on the author of PCG. See the reddit comments:


And the PCG author's response: https://www.pcg-random.org/posts/on-vignas-pcg-critique.html

For example, for one of his arguments, he specifically chose a generator called pcg32_once_insecure, which the PCG author does not recommend due to its invertible output function!

Personally, I have read both arguments in detail and I would always use PCG or even a truncated LCG over xoshiro, which has a large size in comparison, potentially worse statistical properties, and no gain- faster in some benchmarks and slower in others.

Well, I had only skimmed the page

But I am using xoshiro in my projects, because I thought xor was simpler than multiplication.

Yeah, xor is simpler than multiplication in terms of hardware complexity- luckily, we have the multiplication circuits built in, so may as well take advantage of them.

Xoshiro ist only fast, nothing else. PCG has much better properties. Fast is not everything.

But there exist some new and very fast PRNG's which are fast and extremely good. Almost TRNG, wyrand eg.

Note, however, that it has an extremely large period which is a need for some specific applications (but I do agree that, for most use cases, it should be avoided).

My favourite is this one: https://preshing.com/20121224/how-to-generate-a-sequence-of-...

* Creates a sequence of unique integers

* Uses prime numbers that are congruent to P = 3 (mod 4).

* A single iteration has noticable patterns but applying it twice already results in randomness that is sufficiently good for many use cases.

* It is "embarrassingly parallel", a simple mapping of randomValue = randomize(i). You can calculate unique and deterministic random numbers from input i in parallel threads with no sync between threads.

* Since it is a unique mapping of i to r, you can use it to shuffle data sets virtually instantly. Take the index of a value in the original array, and use it to compute the target index in the shuffled array.

* I've used it to shuffle up to 800 million items per second on a GPU, including the time it took to transfer the data from RAM to GPU. So without the IO, you could probably shuffle billions of values per second, probably mostly bound by GPU bandwidth. E.g., 700GB/s and each item is 70 bytes -> could perhabs shuffle 10 billion items per second.

Is a hardware random number generator that would use environment or electrical sensors to generate noise expensive or hard to manufacture? I would assume this should be part of any standard motherboard given the importance of cryptography. Or does it create an attack vector?

Why, of course, it's been there for almost 8 years already in Intel processors: https://en.wikipedia.org/wiki/RDRAND

Cryptography doesn't depend on random numbers for its random seed. It depends on unpredictable numbers. These are not the same thing. As long as no-one can predict what the next number will be it doesn't matter how "random" they are. TFA is sufficiently vague it's unclear if the author understands this.

The more I read it, the more confused the article appears to be (e.g. mersenne twister is NOT a good example of a modern or high quality PRNG). For more about secure random numbers in Linux, I'd suggest reading [0].

0: https://buttondown.email/cryptography-dispatches/archive/cry...

I wouldn't be so universal with this statement.

Some cryptosystems really do need uniform randomness (ECDSA) rather than just negligible probability of choosing values. Other cryptosystems depend on not reusing values, though the values could be predictable. Sometimes there are subtle shifts in these needs based on modes (AES/CBC vs AES/GCM is a good example).

Thanks for the comment, yes I should have phrased that very differently.

What I was trying to say is that the kernel CSPRNG (exact mechanism depends on version) mixes together a bunch of things that aren't truly random from an information-theoretical perspective in order to produce uniform random output from the CSPRNG function - and that it doesn't actually matter that those sources aren't information-theoretically random. That'll teach me to comment in haste!

There was an RNG on the i810 chipset ~20 years or so ago and several VIA chips had onboard RNGs as well.

Modern chips ranging from the one in the Raspberry Pi to Intel CPUs have them too.

For those interested, one can get from a uniform distribution to pretty much any defined statistical distribution using just the uniform random number generator and the inverse cumulative distribution function of the desired random distribution. A useful trick for non-standard distributions with no function available in your prefered language.

Example from matlab: https://www.mathworks.com/help/stats/generate-random-numbers...

Roll20 has an interesting way of generating random numbers as well - https://wiki.roll20.net/QuantumRoll

My personal favorite RNG is the logistical map. x_{n+1} = 4x_n(1-x_n). There is no hidden seed beyond the current output. Thus if you have a scientific calculator that lets you refer to the value on screen then you can rig it into an RNG. Seeing a simple, non-programmable machine "misbehave" and act random melts my mind a little.

Beware that while it looks random, this is actually not a great rng. For example, it does not satisfy the central limit theorem.

That's still just a linear congruential PRNG with state sizeof(unsigned int). LCG is simple and has well-known flaws.

No it is not an LCG. When you distribute the terms there is a -4x_n^2. Therefore it is non-linear.

You might find this ultra-simple RNG interesting https://en.wikipedia.org/wiki/Rule_30

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact