All credits should go to George Marsaglia (who died in 2011).
He invented the XorShift method (2003), the LCG with good hyperplane behavior (1968), and the modern method of combining multiple poorly behaving RNGs (via function combination and output mapping) to create modern RNGs, not to mention the "diehard" tests he developed which are used to evaluate the performance of RNGs (1995).
This "PCG" algorithm, which is just xorshift(LCG), is just a typical implementation of his methods, which he even explicitly states in his papers.
Sadly Marsaglia home page with all his papers is no longer online, but they can be found scattered around the net.
Speaker mentions[0] that it is possible to get a different statistically random value every time you run the program, without any input to the program, apparently by using some tricks, but she refuses to elaborate for the camera. Can anyone explain how that works?
Has this something to do with address space randomization?
> (This last one relies on the operating system placing myRNG at a different address every time the program is run. It's not as strong as the other techniques.)
Linux gives each process a 32 bit random seed, which is probably more useful (as is making a syscall). On an embedded system the address seed may well be useless anyway.
This is fine for quickly generating random numbers for simulations, just don't use it for cryptographic applications. If you want random numbers for crypto use a CSPRNG.
I'm not really qualified to speak as to its statistical properties, but I've used this generator a couple times.
The biggest downside I ran into is that it requires 64 bit numbers even for the 32 bit RNG, which means its very simple code becomes more complex on a platform with only 32 bit numbers, or in a scripting language. Also, it's not a CSPRNG, if this matters to you.
This is minor though, and I would use it again. It's flexibility for seeding and easy serialization outweigh this IMO (and it's still simple compared to e.g. mtrand)
I must have done something wrong, but I tried the basic C implementation (https://github.com/imneme/pcg-c-basic) on a quick Diehard-like test and it failed. I'm still trying to understand why. Probably not its fault, just that the code is not as forgiving and ready-to-consume as I had expected.
It passed for me. Where did you get the diehard test? If you compiled it directly from http://www.stat.fsu.edu/pub/diehard/ then it is because that source is broken.
I actually tried a modified OCaml implementation of a very simple test. The issue is probably due to bad seeding/initialization: if I use the given static seed, it passes, but if I try to provide any other seed, it fails on at least one test.
I'll try to find a more complete/better tested port, or try to do it myself.
I'm impressed by how fast this algorithm is. I wrote a toy implementation in Go, and it was only ~50ns/op to generate random numbers. And it's very even in distribution, but I'd have to rerun my tests to find the percentage again.
Wow talk about a coincidence - I just implemented this algorithm today as the RNG for some bootstrapped statistic code. One thing I can say is it very fast, but the code for seeding is not great on any platform without /dev/random.
I implemented the mersenne twister to generate 2d planets in a galaxy of a game I am working on a while ago.. I didn't know about this pcg and it's an interesting read!
I had been limited in selection on the random number generator reproducibility (so that each planet is only stored as the seed for the generator)
I'll have a try later on, it'd be interesting how a sampling of planets turn out from both rngs, if they are visually different - the rng is used both for features and for the perlin noise that generates cloud&land textures, so any artifact should come up easy in the textures
Interesting. I don't think people realize just how slow rand() can be if it is called frequently in your c/c++ program. Marsaglia's xorshf is the fastest algorithm that I know of that also give a ok statistical quality.
Looks like a very good and fast PRNG. How could it possibly achieve higher statistical quality than a secure PRNG? I guess that depends on how you define quality.
Unless it's secure, I see no motivation to use it.
In their table, they acknowledge that Arc4Random and ChaCha20 as secure. The only negatives against ChaCha20 is that it's not 'fast enough' and k-Dimensional Equidistribution. In my experience it's never showed up in my profiling, so I feel it's fast enough; and I'm not sure why I want k-Dimensional Equidistribution.
So... I'll keeping using ChaCha20 when I need to (it's part of nacl/libsodium if you don't know where to get it from).
What's your application? When I wrote a ray tracer that did photon mapping, I needed about 300 000 000 random numbers per light to get half-decent results. Admittedly, I could have done more to optimize my use of those numbers, but the point remains that you need huge numbers of random samples.
I used mt19937 for this and it worked ok. I suspect that using pcg might have shaved a few percentage points off the total run time of the program (maybe I should do a benchmark sometime). If so, not a huge win, but noticeable. Especially if you're doing renders on a larger scale than I was. A couple percentage points could trim a few computers from a render farm.
Just as you wouldn't want to use a general purpose PRNG for security applications so too you wouldn't want to use a secure PRNG for general purposes, particularly demanding applications such as Monte Carlo simulations.
Your dismissal strikes me as both ignorant and rude.
What are the random number data rates for those applications?
One of the good features of ChaCha is that it is fast. A modern CPU can churn out Gbps of data. And in HW you can do it at low clock frequencies. This is the reason we use ChaCha as the CSPRNG in the Cryptech HSM.
> Just as you wouldn't want to use a general purpose PRNG for security applications so too you wouldn't want to use a secure PRNG for general purposes, particularly demanding applications such as Monte Carlo simulations.
Why not?
It's perfectly valid to use a secure RNG for non-secure purposes.
As I mentioned, I've used ChaCha20 for those sort of purposes too. And the random number generation barely shows up as a fraction of the runtime.
i.e. improving 1% of the runtime by 99% isn't worth the overhead of having more than one random number generator.
The reason why is crypto RNG are much slower than something less secure. It really depends on the application, but in the code I wrote today a significant fraction of my application’s time was spent in the RNG function. By replacing the RNG I managed to speed up my code by a bit over 2 fold. I am not doing crypto work so this is great.
One of the benefits of a PRNG is reproducibility. If you are doing a monte carlo simulation, then you can get good results while still starting runs from predictable seeds. (e.g. [0,1,2,3,4,...])
This gives you both good quality random numbers during the simulation run yet lets others reproduce your results and expand with more results if required.
I wrote a standalone C++11 (compatible with the standard RNG library) implementation of ChaCha (with the number of rounds as a parameter), available here: https://gist.github.com/orlp/32f5d1b631ab092608b1
This is also the implementation used for the benchmark at the pcg-random.org homepage.
It's essentially the same, except for the API. However, you linked to the reference implementation. My implementation also contains a SIMD implementation, if your CPU supports it. I assume that libsodium has a similar implementation in its codebase somewhere too.
My SIMD implementation can be made to run a bit faster, but this would mean computing more blocks in parallel, and thus increase the state size of the RNG even more. I chose to only compute one block at a time. An optimized stream cipher implementation will have all the implementations with different parellel block sizes embedded, to achieve optimal speed for long messages.
---
ChaCha20 is a stronger more modern variation on Salsa20. But the differences are minor.
But, to be precise, libsodium prefers XSalsa20 to ChaCha20. The core feature about XSalsa20 is that it supports a 192-bit nonce instead of the 64-bit nonce of ChaCha20. This means that it is safe to use a randomly generated nonce, while randomly generating a 64-bit nonce would result in collisions eventually. I believe this is the reason they chose XSalsa20.
XSalsa20 is a simple variation on Salsa20 that does a bit more initialization to turn the normal 64-bit nonce into 192-bit. I assume the same construction can be applied to ChaCha20, but since djb (the author of both Salsa, XSalsa and ChaCha) hasn't done it, I think that libsodium went with the safe approach and just used XSalsa20.
'A bit faster' is an understatement. On the benchmarking machine used by the PCG author, you would get a ~2.5x speedup over that implementation, which would put Chacha20 roughly at (amortized) 5 cycles per 32-bit word. If you go Chacha8, which is still cryptographically secure, that becomes 2-3 amortized cycles per word. The flagship PCG generator from the paper, 'PCG XSL RR RR 128', runs at 1.81 nanoseconds per word on the same machine. This converts to ~6 cycles per word. So I am not inclined to believe the claim, right on the frontpage, that Chacha20 is 'fairly slow' .
Unless you're hurting for space---and in a Haswell machine you probably aren't---the case for using a cryptographic generator everywhere is strong. If you're hurting for space, or are on a less desktop-oriented architecture, you probably are not going to like 64-bit integer multiplications and variable rotation counts either.
By the way, you're using the aligned _mm_load_si128 and _mm_store_si128 intrinsics to load and store the block: https://gist.github.com/orlp/32f5d1b631ab092608b1#file-chach.... But there's no guarantee that block is aligned to 16 bytes, so that code may crash in some compiler/platform combination.
The fastest ChaCha20 implementation available (from http://bench.cr.yp.to/results-stream.html) on a modern Intel processor is 1.2 cycles / byte (cpb) for ChaCha20 and 0.6 cpb for ChaCha8. ChaCha is very fast.
The drawback however, is that these implementation only get that fast because ChaCha is embarrassingly parallel. The fastest implementations can be computing 16-24 blocks in parallel efficiently with AVX2. That's 1.5kB of random data generated at once. And it's only this fast in a closed loop, where all code and memory is hot.
For the above reasons and simplicity I've chosen to not do a parallel SIMD implementation in my gist.
But I agree that 'fairly slow' is not really fair to say, at least not in combination with 'Good' statistical quality (rather than Excellent, because the author found faults in ChaCha2). ChaCha8 is blazing fast, and so far no cryptographer has been able to do a successful attack against it. However, before, ChaCha was mentioned as being 'slow', and I attempted to fix that by emailing the author with my implementation.
-
Interestingly, I was (sadly I never got to finish it before the http://competitions.cr.yp.to/caesar.html deadline) working on an AEAD design similar to ChaCha, that was doing 1.5 cpb authenticated encryption on AVX2: http://www.liacs.nl/~opeters/orlein.pdf. SIMD plus embarrassingly parallel ciphers is a strong combination.
-
Thanks for the warning about the alignment. Because I tacked on the SIMD implementation later, I forgot to add the alignment requirement, I did that now.
> The drawback however, is that these implementation only get that fast because ChaCha is embarrassingly parallel.
That is my point exactly. Why would you design, in this day and age, a generator that doesn't vectorize well? It will not take full advantage of the CPU. Even with parallel streams, large integer multiplication and variable-length rotation are SIMD-killers. Regarding the 1.5KB of data, I suspect you can get away with less than that if you specialize to this application, but note that this is still around half the state size of mt19937.
https://www.youtube.com/watch?v=45Oet5qjlms
Its a bit long but I highly recommend it. I learned a lot about RNGs from it.