Response by the author:
Personally, I use a SIMD Xoshiro256+ when I want a solid and fast RNG (although I mostly use Julia's default -- a MersenneTwister).
My use cases are Monte Carlo, where "passes statistical tests" is probably all that's needed for good results.
Maybe I should just try an mcg.
PCG is also not that good. There do exist proper and fast PRNG's though. DSFMT has the same problems as MT.
I'm just adding some of them to gsl and dieharder btw.
LCGs have an awkward habit of bad projections in 2D.
What's the best argument against PCG?
Preliminary results at https://github.com/rurban/dieharder/blob/pcg/QUALITY.md
Re PCG: statistical weaknesses compared to the good xoshiro's or wyrand, and much slower.
I'm still missing SFMT (2x faster, due to simd), and the 2 newer PCG's with a good and a fast mixer.
But given the current results, there are a few new PRNG's which are so much better and faster than everything else: wyrand, xoshiro128+, xoroshiro64, xoshito128++, xoroshiro64
One property I anecdotally observed about it is that it is a little sensitive to the values you choose for its seed. Too many zero bits or making two generators with nearly-identical seeds would create visible correlation in the outputs. It worked much better if all values were hashed before using them for seeding.
Hence any cryptographic use of PCG is definitely prohibited
-  https://github.com/mratsim/weave/issues/147
-  https://github.com/mratsim/trace-of-radiance/blob/99f7d85/tr...
- Usage for parallel raytracing: https://github.com/mratsim/trace-of-radiance/blob/99f7d85/tr...
Beyond the JAX paper, the Pedigree-based RNG from Leiserson was interesting as well:
And Haskell's splittable RNG:
PCG Family Excellent
> Fewer rounds result in poor statistical performance; ChaCha2 fails statistical tests badly, and ChaCha4 passes TestU01 but sophisticated mathematical analysis has shown it to exhibit some bias. ChaCha8 (and higher) are believed to be good. Nevertheless, ChaCha needs to go to more work to achieve satisfactory statistical quality than many other generators.
* too slow (for a scant few applications)
* too much state & code (for a scant few applications)
* can’t find a nice implementation (there are some, though)
Everything else written there is FUD. And this whole "statistical tests" thing is a red herring anyway. There’s a field that concerns itself with "statistical quality"—it’s called cryptography. They have plenty sophisticated mathematical analysis.
Cryptographic RNGs achieve their strength through large round-counts. ChaCha runs 20x quarter rounds of its algorithm, which means you're basically doing:
randomNumber = hash(hash(hash(hash...hash(seed)))))))))))
After 80-iterations of your hash function, you'd HOPE it was random looking!! In effect, cryptographic RNGs are built for safety, and have huge safety margins.
The "natural" approach to making a non-secure (but faster) RNG is to reduce the hash rounds to a single cycle. IE: instead of running 80-quarter rounds of ChaCha, why not just run 1x quarter-round of ChaCha?
Well, it turns out that 1-round of ChaCha20 is utter crap. Completely crap, worse than a LCGRNG.
AES is similarly designed for 8+ rounds. In my experiments, it took 4 or 5 rounds before AES would achieve a decent distribution of bits!!
It took a fair bit of effort for me to turn AES into a random number generator. I analyzed AES's bit-flow, and designed a "counter function" that worked well with AES's single-round weakness. Even then, I never accomplished a 1-round of AES RNG, I had to use 2-rounds of AES to make a decent bit distribution.
In particular: 1-round of AES is only designed to shuffle bits across a 32-bit DWORD. You need the 2nd round (or more) to shuffle the bits to the whole 128-bit group. (That is: the "MixColumns" and "ShiftRows" step of AES, which only really take effect on the 2nd round or later).
If anyone could figure out a special "seed" function that can work with AES's single-round weakness, they probably can make a much faster RNG than what I accomplished. Or maybe, people can just be "happy" with 32-bits of well-shuffled bits? A 32-bit RNG (with 128-bit seed) is probably still useful to somebody.
Quoting JP Aumasson’s Too Much Crypto :
The best result on ChaCha is a key recovery attack on its 7-round version, with 2^237.7 time complexity (the exact unit is unclear) using output data from 2^96 instances of ChaCha, that is, 2^105 bytes of data. On 6 rounds, a similar attack can run in time & data 2^116 & 2^116, or 2^127.5 & 2^37.5. On 5 rounds, the attack becomes practical due to the lower diffusion, and runs in 2^16 time and data.
You see what’s happening there? 5 rounds is easily broken, 6 rounds is borderline secure, 7 rounds is secure. Same thing with AES; these algorithms were never meant to be used the way you’re using them.
I agree! PRNGs are surprisingly difficult! The main difference I argue, is that the statistical-tests applied to crypto-algorithms are "harder" than the statistical-tests applied to PRNGs.
In a standard PRNG, you're happy if you can say pass the Chi^2 test. But with a Crypto-PRNG, you're aiming to beat differential cryptoanalysis.
What is differential cryptoanalysis? Well, its just a pair-wise statistical test across your bits!! Its a far more strict test than just Chi^2, but the overall concept is the same: Anything that differs from a uniform random distribution fails.
> You see what’s happening there? 5 rounds is easily broken, 6 rounds is borderline secure, 7 rounds is secure. Same thing with AES; these algorithms were never meant to be used the way you’re using them.
Indeed. But for those who need "simple" PRNGs for simulation purposes.
Since PRNGs are designed for speed, their statistical tests are far easier than differential cryptoanalysis. You're "only" aiming for Chi^2, or other tests (See PractRand or BigCrush).
ChaCha over 5 or 6 rounds is going to be inevitably slower than a single-round of xorshift128!!
True, ChaCha "scales" to higher rounds (xorshift128 will probably fail differential cryptoanalysis even with 20+ rounds applied... while ChaCha continues to get "more secure" the more rounds you apply all the way up to 80+ rounds).
But that's a weakness PRNG users are willing to accept. As long as the bits are random enough for a monte-carlo simulation (for raytracing, or weather simulation, or... AIs), you're happy.
Because PRNGs tests are "easier" than crypto-tests, you can often make due with a single round alone. (See LCGRNG: a singular round of (seed = seed*K1 + K2) is sufficient to pass Chi^2 tests).
I've never designed a crypto-algorithm before. But based on my understanding of Crypto... the creation of PRNGs is a surprisingly similar field. GF(2), Bijections, mappings, bit-tests, statistical tests... they have applications to both crypto and PRNGs.
Secondly, ChaCha1–7 aren’t a thing, outside of a cryptanalysis paper anyway. Primitives like ChaCha need a certain number of rounds to do what they do.
Finally, ChaCha8 doesn’t just “pass statistical tests,” it’s cryptographically secure.
The reason nobody uses it like that is that it is slower than most other PRNGs
- Deep Learning,
- Monte-Carlo simulation (finance, reinforcement learning, game AI, raytracing).
- Load balancing
- Peer selection (if non adversarial, otherwise use a CSPRNG)
Also non-determinism of CSPRNG (and floating points) would a huge issue for debugging machine learning models:
Chacha8 has a throughput of 2GB/s and xoshiro256++ has a throughput of 8GB/s
For Monte-Carlo, the RNG is definitely the bottleneck.
For load balancing of short tasks the RNG is the bottleneck.
You can eek out another 10% or so if you dial it back to the recommendations of the "too much crypto" paper:
9x AES rounds (versus 10).
What is questionable is the "somewhat secure" argument. Either you don't want adversaries to predict your numbers and you should use a good CSPRNG, or you don't care and predictability is not a property that matters.
As for reproducibility, all PRNGs give a reproducible sequence if you know the internal state, including the secure ones. You have to mix in a source of entropy to make them non-deterministic. The predictability we are considering here is when the attacker doesn't have access to the internal state.
What makes PCG attractive for microcontrollers is really that it's of known good quality, its implementation is very small and very simple, and it ends up generating efficient code for 32-bit processors (i.e., ARM Cortex-Ms). That is not, and can not ever be, true for something like a 128-bit shift register. PCG is great for just tossing in when I'm working on a platform that has no built-in library rand() function, or where it's busted, or whether I don't want to bother figure out if/how badly it's busted. (With how easy PCG is to use, that last one covers every embedded platform ever....)
PCG is not the best RNG out there: it's not the highest quality, it's not the fastest, it's not the least predictable, it's not the strongest theoretically. (It isn't the smallest, either, but it's quite small and the smaller RNGs I know of are either code-golfed, which doesn't count, or truly garbage.) But it is a nice equilibrium between all of those things. And did I mention it's simple? Simple is really, really nice to have :)
I'd suspect the majority of random numbers generated in the world don't need crypto, they need speed. The world's supercomputers are not doing crpyto, they're doing simulations. Most people are not doing crypto most of the time - and when they're gaming in any manner, no need to pay csprng tax.
The only place one should use a csprng is for adversarial situations.
Use the right tool for the job, correct?
There is a time and place for non-CS rngs, and it is in randomized methods whos correctness/results does not rely on the RNG.
 "Pattern" section here: http://builder.openhmd.net/blender-hmd-viewport-temp/render/...
http://extremelearning.com.au/unreasonable-effectiveness-of-... (yes, no HTTPS, but incredibly illustrative)
Discussed in 2018: https://news.ycombinator.com/item?id=17873284
A good non-cryptographic random number generator like xoshiro256 can generate at 4 times the speed of a chacha20 random number generator. Depending on how many models you need to compute and the desired precision of those models and how many random samples you need per model, this can easily be the difference between hours and days, or days and weeks.
Whenever a post on PRNGs comes up, I'm reminded of this post from a while back about Chrome:
Near the end is a Monte Carlo estimate of Pi using 10^10 iterations:
Chrome 3.1418697000 0.0002770464 1301.903s
Firefox 3.1416998084 0.0001071548 249.595s
Safari 3.1416692728 0.0000766192 7172.207s
That's around 30x slower than a decent version of PCG, which also has a lot of variants to cover a lot of use cases.
For MC path tracing (graphics) this is at least an order of magnitude too slow.
Even fairly trivial scenes can require over 20 random floats per sample and more complex scenes over 100. And you want to actually use those random numbers for calculating reflections, intersections etc. In the renderer I worked on the PRNG typically took at most 10% of total CPU time.
So assuming the above those 4GB/s would translate into less than 5k samples/sec, which would be considered quite slow.
Just to refresh my memory I just ran a quick test with a fairly simple scene. There's 6 samples for the camera, at least 3 per path vertex up (depth 32) and another 2 per light (8 in this scene). There might be some more I'm forgetting but lets use that a lower bound. For this scene then that means 6 + 3 * 32 + 2 * 8 = 118 random numbers per sample. On a single core I got 14.04kS/s, so that works out to at least 1.6M random numbers per second.
And this renderer was focusing more on physics and fidelity rather than speed. For animations and similar you'd need a renderer that's at least an order of magnitude faster than the one I worked on.
For reading further you could try PBRT. It's a great book but not free. Luckily enough though the free chapter is about sampling and reconstruction so you can get a feel from that what is needed.
I also recall that we tried MT but found a quite noticeable impact on speed compared to what we had, and on the PGC performance page MT is listed in the tens of GB/s. Perhaps our implementation was considerably worse though, I can't recall us testing it alone like that.
As I mentioned though, that renderer was more about physics and fidelity. Typical render times for even a preview would be a minute, for final single-frame stills tens of hours to several days on a single computer was common.
For animations you don't have that kind of time, so you need a orders of magnitude more samples/sec.
Sibling comments are completely correct. Graphics simulations often require quasirandom sequences; the particular sequence is not important, but any correlations in the sequence will be visible in the output as artifacts, so we want a decorrelated sequence.
If this is not enough of a real-world example for you, then Monte Carlo methods also show up in weather modeling, financial planning, and election forecasting. In those predictive systems, there is extreme uncertainty and dependence on initial conditions, which we model with quasirandom sequences and seed numbers respectively. By running many simulations with different random sequences and then examining all of the results statistically, we can get good estimates of future behavior.
Edit: Oh, you're not in good faith. Okay, cool story. Best of luck with whatever imaginary idea of "unpredictability" you're trying to define, but you might want to return to actual maths at some point.
Every other thing in the PCG website and in this thread attempts to be similar to random (with varying degrees of success). No one here is trying to intentionally be different from random.
> Sibling comments are completely correct.
No sibling comment says anything about quasirandom. The sibiling comment mentioning graphics (by Karliss) actually sort of disagrees with you, and says that we want stuff to be as close to random as possible for graphics so that players don't see patterns. Of course there can be multiple types of graphics situations, some where quasirandom would be useful (the blog you linked), some where a real random approximation would be useful (Karliss's situation).
> Wouldn't the second category be more less CSPRNG?
The website says PCG is "Challenging" to predict and ChaCha20 is "Secure". So I think this means that PCG isn't a CSPRNG, but still is harder to predict than Mersenne Twister.
But is this useful at all? I don't know. This thread started by andreareina asking that exact question: "What applications require unpredictability where you wouldn't go full csprng?" People immediately then misunderstood the question because they misunderstood how the word unpredictability is being used.
A player not being able to mentally predict enemies and not seeing visible patterns sounds like it might fall more under the website's definition of "Statistical Quality".
For example you can increase the quality of a RNG sequence r(n) by adding a multiplicative hash, r'(n) = A r(n) for some good constant A. A consumer treating r' as a blackbox expects B r'(n) = AB r(n) to be a good random sequence, but if B is close to the inverse of A, the constant AB may no longer be "good".
There is no need to cryptographic strength, which is costly, and pseudo-randomness is enough and much cheaper.
People was later able to use the leaked, alleged rc4 algorithm to decrypt messages encrypted by Notes. That basically confirmed that the algorithms are the same.
I use it to deal tiles in my version of online Azul: https://tiles.kevincox.ca
I also used it to simulate the Red7 game for strategy testing: https://gitlab.com/kevincox/red7-sim
I am not fluent in C or Rust and time constrained, otherwise I would have taken a shot at it
Or better, with an entropy source like rdrand or rdseed?
The main use for rdrand is to mix it with other entropy sources to produce a seed for another rng.
They are a neat theoretical construction however I believe they have some serious flaws when attempting to split and construct multiple streams.
Honestly though RDRAND is cheap enough that I'd take true randomness over determinism.
but I still need to add more tests to get a better quality comparison against the xoshiro's
Can an algorithm really produce a non-reproducible result?
> The BSD implementations (which are the only widely used ones) have several additional issues
> • No facility for a user-provided seed, preventing programs from getting reproducible results
> • Periodically “stirs” the generator using kernel-provided entropy; this code must be removed if reproducible results desired (and is removed from the C++ adaptation I created for testing purposes)
In a more direct sense, yes. If you're allowed to keep secrets (like internal state), it's possible to produce random numbers algorithmically for certain (weak) definitions of random, even with a deterministic machine. A simple example of this would be sequential bits of a normal number. As you strengthen your definition of randomness you pretty quickly get into non-computable territory though.
#1: Have a "counter" of some kind. The simple counter trivially creates a maximum length cycle. "seed++" is one such counter.
#2: Have a "hash" that converts the counter into a random number, using a 1-to-1 bijection. There are a great many 1-to-1 bijections.
So your RNG is basically:
oldSeed = seed;
seed = counterFunction(seed);
I used this methodology to make a pretty fast generator myself. The coding was done in a weekend, but it took some weeks to test: https://github.com/dragontamer/AESRand
"counterFunction" is a SIMD 2x64-bit Add over the XMM register. I chose an odd number arbitrarily (which are just the first 16 prime numbers: 0x01030507...), which will trivially cycle after 2^64 numbers.
"hash" is aesenc(aesenc(seed, constant), constant), where constant is an arbitrary (chosen to be the same odd number that was used in the "counterFunction" step). The 2nd parameter to aesend is just an XOR.
I also run a 2nd parallel aesdec(aesenc(seed, constant), constant) to generate a 2nd set of 128-bit random numbers, for 256-bits to be made across the hash. Yes, a 128-bit seed "hashes" into a 256-bit random number. Since this passes statistical tests, its probably fine, and this is a good source of instruction-level-parallelism.
All in all, I achieved 30GB/sec random numbers. Or 256-bits of random numbers every 3.7 cycles that passes BigCrush, TestU01, and PractRand.
AES is a 1-to-1 bijection: that's a necessary condition for AES to be decoded. Though I use AES, this function is NOT cryptographically sound. I'm just using AES because its the fastest high-quality bijection in the entire modern CPU instruction set. x86, ARM, and POWER9 all implement single-cycle AES instructions.
It is possible to write a portable AESRand across the different CPUs (ARM, x86, and POWER9). I got far enough to prove that its possible, but then my real work started to ramp up and I had to drop this hobby project. Its very tricky to do so: ARM, x86, and POWER9 implement AESENC slightly differently: shuffling the fundamental steps in different ways.
Or in the case of POWER9, it executes in big-endian mode instead of little-endian (like ARM / x86). Redesigning the algorithm to be portable across the shuffled "SubBytes" and "MixColumns" steps between the CPUs is tricky but possible.
Lets take PCG's author's comments one by one:
Re: Predictability. I don't care. Truely. If unpredicatablity is important, the application must use a cryptographic random number generator. Those are the only known family that are hard-to-predict. Valuing un-predictabiltiy from a non-cryptographic PRNG has always been one of PCG's demerits in the xoshiro-versus-pcg debacle.
Re: Seed_seq's problems. The biases she's making out to be catastrophic are really quite small. In actual scientific monte carlo simulations, they don't affect things at all. I want to know the difference between 10^6 floats generated from the gaussian distribution, one with seed-seq's optimally seeded and one with it sub-optimally seeded. Because the "motivating scenario" isn't. You shouldn't ever be using a PRNG if all you need is one random number. You just grab a seed from a strongly-random generator like /dev/random.
The argument about seed_seq failing to be a bijection is completely irrelevant. You just need it to uniquely select one internal state of MT from the one initial value. So long each sequence of the first 600-someodd values drawn from it are unique, you've done that.
The demonstrated bias in MT's initial conditions doesn't matter one whit.
Concepts and modules (supposedly) fix these issues, and I'm not going to use ranges again until those two are very mature features of C++. Even if you ignore this stuff, it's incredibly "C++ified" in just the worst way, with incredibly obtuse templates and type concepts all over the place.