PCG, A Family of Better Random Number Generators

ufo · on July 15, 2015

There is a seminar about PCG from its author available on youtube.

https://www.youtube.com/watch?v=45Oet5qjlms

Its a bit long but I highly recommend it. I learned a lot about RNGs from it.

jacobolus · on July 15, 2015

Great talk! The video is also linked from the site, under 'Video' in the banner: http://www.pcg-random.org/posts/stanford-colloquium-talk.htm...

For those who didn’t bother poking around and are interested in random number generation and C++, the blog also great: http://www.pcg-random.org/blog/

pcrng · on July 15, 2015

All credits should go to George Marsaglia (who died in 2011).

He invented the XorShift method (2003), the LCG with good hyperplane behavior (1968), and the modern method of combining multiple poorly behaving RNGs (via function combination and output mapping) to create modern RNGs, not to mention the "diehard" tests he developed which are used to evaluate the performance of RNGs (1995).

This "PCG" algorithm, which is just xorshift(LCG), is just a typical implementation of his methods, which he even explicitly states in his papers.

Sadly Marsaglia home page with all his papers is no longer online, but they can be found scattered around the net.

https://en.wikipedia.org/wiki/George_Marsaglia

dminor · on July 15, 2015

PCG is not simply Marsaglia's xorshift applied to an LCG. I'd recommend reading the paper on the site - it's quite good.

Amiga64 · on July 15, 2015

https://en.wikipedia.org/wiki/Xorshift

ufo · on July 15, 2015

Its "she", btw.

vtsrh · on July 15, 2015

Speaker mentions[0] that it is possible to get a different statistically random value every time you run the program, without any input to the program, apparently by using some tricks, but she refuses to elaborate for the camera. Can anyone explain how that works?

Has this something to do with address space randomization?

[0]:https://youtu.be/45Oet5qjlms?t=1h3m3s

echaozh · on July 15, 2015

> (This last one relies on the operating system placing myRNG at a different address every time the program is run. It's not as strong as the other techniques.)

from: http://www.pcg-random.org/useful-features.html#id2

justincormack · on July 15, 2015

Linux gives each process a 32 bit random seed, which is probably more useful (as is making a syscall). On an embedded system the address seed may well be useless anyway.

pdw · on July 15, 2015

Probably not what she meant, but Linux has a RNG that extracts randomness from CPU execution time jitter: https://lwn.net/Articles/642166/

chinpokomon · on July 15, 2015

So good. I'm only about 20 minutes into it, but it addresses so many of the questions and ideas I've had about RNG.

dchest · on July 15, 2015

Experimental Go implementation by Rob Pike, as a replacement for math/rand: https://go-review.googlesource.com/#/c/10161/

panax · on July 15, 2015

This is fine for quickly generating random numbers for simulations, just don't use it for cryptographic applications. If you want random numbers for crypto use a CSPRNG.

danieltillett · on July 15, 2015

For non-crypto applications it is really good. Very fast and simple.

jey · on July 15, 2015

Sounds awesome, if it lives up to its claims. Have the domain experts weighed in on this yet?

carterehsmith · on July 15, 2015

The latest word is,

"This paper is currently submitted to ACM Transactions on Mathematical Software, where it is currently under review. "

vtsrh · on July 15, 2015

This is great, but everything so far is overly positive. Can we get some negative criticism as well. Anyone qualified?

yoklov · on July 15, 2015

I'm not really qualified to speak as to its statistical properties, but I've used this generator a couple times.

The biggest downside I ran into is that it requires 64 bit numbers even for the 32 bit RNG, which means its very simple code becomes more complex on a platform with only 32 bit numbers, or in a scripting language. Also, it's not a CSPRNG, if this matters to you.

This is minor though, and I would use it again. It's flexibility for seeding and easy serialization outweigh this IMO (and it's still simple compared to e.g. mtrand)

edwcross · on July 15, 2015

I must have done something wrong, but I tried the basic C implementation (https://github.com/imneme/pcg-c-basic) on a quick Diehard-like test and it failed. I'm still trying to understand why. Probably not its fault, just that the code is not as forgiving and ready-to-consume as I had expected.

vtsrh · on July 15, 2015

It passed for me. Where did you get the diehard test? If you compiled it directly from http://www.stat.fsu.edu/pub/diehard/ then it is because that source is broken.

panax · on July 16, 2015

Try using Dieharder. It incorporates most of the Diehard tests, as well as other tests and is much more rigorous.

sudo apt-get install dieharder

http://www.phy.duke.edu/~rgb/General/dieharder.php

If you make a GSL interface for the RNG it will make using dieharder much easier

vtsrh · on July 16, 2015

How do you build dieharder on windows? I guess you just don't...

edwcross · on July 16, 2015

I actually tried a modified OCaml implementation of a very simple test. The issue is probably due to bad seeding/initialization: if I use the given static seed, it passes, but if I try to provide any other seed, it fails on at least one test.

I'll try to find a more complete/better tested port, or try to do it myself.

trebor · on July 15, 2015

I'm impressed by how fast this algorithm is. I wrote a toy implementation in Go, and it was only ~50ns/op to generate random numbers. And it's very even in distribution, but I'd have to rerun my tests to find the percentage again.

dminor · on July 15, 2015

Same here - first thing I wrote in go, just to play with the language: https://github.com/davidminor/gorand

danieltillett · on July 15, 2015

Wow talk about a coincidence - I just implemented this algorithm today as the RNG for some bootstrapped statistic code. One thing I can say is it very fast, but the code for seeding is not great on any platform without /dev/random.

avereveard · on July 15, 2015

I implemented the mersenne twister to generate 2d planets in a galaxy of a game I am working on a while ago.. I didn't know about this pcg and it's an interesting read!

I had been limited in selection on the random number generator reproducibility (so that each planet is only stored as the seed for the generator)

I'll have a try later on, it'd be interesting how a sampling of planets turn out from both rngs, if they are visually different - the rng is used both for features and for the perlin noise that generates cloud&land textures, so any artifact should come up easy in the textures

danieltillett · on July 15, 2015

Give it a try - I just used the minimum C code function and implemented my own seeding function.

Amiga64 · on July 15, 2015

Interesting. I don't think people realize just how slow rand() can be if it is called frequently in your c/c++ program. Marsaglia's xorshf is the fastest algorithm that I know of that also give a ok statistical quality.

joseraul · on July 15, 2015

Looks like a very good and fast PRNG. How could it possibly achieve higher statistical quality than a secure PRNG? I guess that depends on how you define quality.

explorigin · on July 15, 2015

It does not claim to be security quality.

mturmon · on July 15, 2015

Seems perfectly reasonable and worth knowing about. Thanks for submitting.

kabouseng · on July 15, 2015

Shouldn't the title read: ....better psuedo-random ...

daurnimator · on July 15, 2015

Unless it's secure, I see no motivation to use it.

In their table, they acknowledge that Arc4Random and ChaCha20 as secure. The only negatives against ChaCha20 is that it's not 'fast enough' and k-Dimensional Equidistribution. In my experience it's never showed up in my profiling, so I feel it's fast enough; and I'm not sure why I want k-Dimensional Equidistribution.

So... I'll keeping using ChaCha20 when I need to (it's part of nacl/libsodium if you don't know where to get it from).

slavik81 · on July 15, 2015

What's your application? When I wrote a ray tracer that did photon mapping, I needed about 300 000 000 random numbers per light to get half-decent results. Admittedly, I could have done more to optimize my use of those numbers, but the point remains that you need huge numbers of random samples.

I used mt19937 for this and it worked ok. I suspect that using pcg might have shaved a few percentage points off the total run time of the program (maybe I should do a benchmark sometime). If so, not a huge win, but noticeable. Especially if you're doing renders on a larger scale than I was. A couple percentage points could trim a few computers from a render farm.

bradleyjg · on July 15, 2015

Just as you wouldn't want to use a general purpose PRNG for security applications so too you wouldn't want to use a secure PRNG for general purposes, particularly demanding applications such as Monte Carlo simulations.

Your dismissal strikes me as both ignorant and rude.

JoachimS · on July 15, 2015

What are the random number data rates for those applications?

One of the good features of ChaCha is that it is fast. A modern CPU can churn out Gbps of data. And in HW you can do it at low clock frequencies. This is the reason we use ChaCha as the CSPRNG in the Cryptech HSM.

daurnimator · on July 15, 2015

> Just as you wouldn't want to use a general purpose PRNG for security applications so too you wouldn't want to use a secure PRNG for general purposes, particularly demanding applications such as Monte Carlo simulations.

Why not? It's perfectly valid to use a secure RNG for non-secure purposes.

As I mentioned, I've used ChaCha20 for those sort of purposes too. And the random number generation barely shows up as a fraction of the runtime.

i.e. improving 1% of the runtime by 99% isn't worth the overhead of having more than one random number generator.

danieltillett · on July 15, 2015

The reason why is crypto RNG are much slower than something less secure. It really depends on the application, but in the code I wrote today a significant fraction of my application’s time was spent in the RNG function. By replacing the RNG I managed to speed up my code by a bit over 2 fold. I am not doing crypto work so this is great.

daurnimator · on July 16, 2015

ChaCha is fast. very fast. see just below: https://news.ycombinator.com/item?id=9890599 I'm not sure why they marked it as "fairly slow" in the table in the OP.

danieltillett · on July 16, 2015

Fast and slow are all relative - ChaCha is relatively fast but PGC is faster.

ars · on July 15, 2015

> Why not? It's perfectly valid to use a secure RNG for non-secure purposes.

It's much slower. Yes, in your application it was a fraction, but they are still much slower than non-secure RNG's.

eterm · on July 15, 2015

One of the benefits of a PRNG is reproducibility. If you are doing a monte carlo simulation, then you can get good results while still starting runs from predictable seeds. (e.g. [0,1,2,3,4,...])

This gives you both good quality random numbers during the simulation run yet lets others reproduce your results and expand with more results if required.

dchest · on July 15, 2015

ChaCha20 is a cipher, of course, it creates reproducible sequences.

orlp · on July 15, 2015

I wrote a standalone C++11 (compatible with the standard RNG library) implementation of ChaCha (with the number of rounds as a parameter), available here: https://gist.github.com/orlp/32f5d1b631ab092608b1

This is also the implementation used for the benchmark at the pcg-random.org homepage.

daurnimator · on July 15, 2015

How does it compare to the libsodium implementation: https://github.com/jedisct1/libsodium/blob/master/src/libsod...

Btw, libsodium prefers salsa20 instead of chacha20, I don't know why; but I trust they made a good decision.

orlp · on July 15, 2015

It's essentially the same, except for the API. However, you linked to the reference implementation. My implementation also contains a SIMD implementation, if your CPU supports it. I assume that libsodium has a similar implementation in its codebase somewhere too.

My SIMD implementation can be made to run a bit faster, but this would mean computing more blocks in parallel, and thus increase the state size of the RNG even more. I chose to only compute one block at a time. An optimized stream cipher implementation will have all the implementations with different parellel block sizes embedded, to achieve optimal speed for long messages.

---

ChaCha20 is a stronger more modern variation on Salsa20. But the differences are minor.

But, to be precise, libsodium prefers XSalsa20 to ChaCha20. The core feature about XSalsa20 is that it supports a 192-bit nonce instead of the 64-bit nonce of ChaCha20. This means that it is safe to use a randomly generated nonce, while randomly generating a 64-bit nonce would result in collisions eventually. I believe this is the reason they chose XSalsa20.

XSalsa20 is a simple variation on Salsa20 that does a bit more initialization to turn the normal 64-bit nonce into 192-bit. I assume the same construction can be applied to ChaCha20, but since djb (the author of both Salsa, XSalsa and ChaCha) hasn't done it, I think that libsodium went with the safe approach and just used XSalsa20.

pbsd · on July 15, 2015

'A bit faster' is an understatement. On the benchmarking machine used by the PCG author, you would get a ~2.5x speedup over that implementation, which would put Chacha20 roughly at (amortized) 5 cycles per 32-bit word. If you go Chacha8, which is still cryptographically secure, that becomes 2-3 amortized cycles per word. The flagship PCG generator from the paper, 'PCG XSL RR RR 128', runs at 1.81 nanoseconds per word on the same machine. This converts to ~6 cycles per word. So I am not inclined to believe the claim, right on the frontpage, that Chacha20 is 'fairly slow' .

Unless you're hurting for space---and in a Haswell machine you probably aren't---the case for using a cryptographic generator everywhere is strong. If you're hurting for space, or are on a less desktop-oriented architecture, you probably are not going to like 64-bit integer multiplications and variable rotation counts either.

By the way, you're using the aligned _mm_load_si128 and _mm_store_si128 intrinsics to load and store the block: https://gist.github.com/orlp/32f5d1b631ab092608b1#file-chach.... But there's no guarantee that block is aligned to 16 bytes, so that code may crash in some compiler/platform combination.

orlp · on July 15, 2015

The fastest ChaCha20 implementation available (from http://bench.cr.yp.to/results-stream.html) on a modern Intel processor is 1.2 cycles / byte (cpb) for ChaCha20 and 0.6 cpb for ChaCha8. ChaCha is very fast.

The drawback however, is that these implementation only get that fast because ChaCha is embarrassingly parallel. The fastest implementations can be computing 16-24 blocks in parallel efficiently with AVX2. That's 1.5kB of random data generated at once. And it's only this fast in a closed loop, where all code and memory is hot.

For the above reasons and simplicity I've chosen to not do a parallel SIMD implementation in my gist.

But I agree that 'fairly slow' is not really fair to say, at least not in combination with 'Good' statistical quality (rather than Excellent, because the author found faults in ChaCha2). ChaCha8 is blazing fast, and so far no cryptographer has been able to do a successful attack against it. However, before, ChaCha was mentioned as being 'slow', and I attempted to fix that by emailing the author with my implementation.

-

Interestingly, I was (sadly I never got to finish it before the http://competitions.cr.yp.to/caesar.html deadline) working on an AEAD design similar to ChaCha, that was doing 1.5 cpb authenticated encryption on AVX2: http://www.liacs.nl/~opeters/orlein.pdf. SIMD plus embarrassingly parallel ciphers is a strong combination.

-

Thanks for the warning about the alignment. Because I tacked on the SIMD implementation later, I forgot to add the alignment requirement, I did that now.

pbsd · on July 15, 2015

> The drawback however, is that these implementation only get that fast because ChaCha is embarrassingly parallel.

That is my point exactly. Why would you design, in this day and age, a generator that doesn't vectorize well? It will not take full advantage of the CPU. Even with parallel streams, large integer multiplication and variable-length rotation are SIMD-killers. Regarding the 1.5KB of data, I suspect you can get away with less than that if you specialize to this application, but note that this is still around half the state size of mt19937.

netheril96 · on July 15, 2015

What is the stream parameter in the constructor `explicit ChaCha(uint64_t seedval, uint64_t stream = 0);`?

orlp · on July 15, 2015

It allows you to make multiple parallel streams of random data with the same seed.

If you prefer, you can see the seed + stream as a combined 128 bit seed.

zaroth · on July 15, 2015

> I'm not sure why I want k-Dimensional Equidistribution

For the party tricks, of course! http://www.pcg-random.org/party-tricks.html