
Specific Problems with Other RNGs - nkurz
http://www.pcg-random.org/other-rngs.html
======
tptacek
I went back to the root page of this site to see what PCG is, and I don't
understand it. What's the point of an RNG that is "more secure" than an
insecure RNG, but less secure than a real CSPRNG? What does "Predictability:
Challenging" actually mean?

~~~
pbsd
It means exactly what it sounds like: the author tried to attack the
generator, failed, and thus concluded that it must be a Very Hard Problem.

Note that the field of pseudorandom number generation for non-cryptographic
purposes is much less rigorous than cryptography. Typically, for a new
generator to be accepted as "OK" all it needs to do is pass a number of fixed
statistical tests, usually one of the TestU01 batteries [1]. This is usually
the only falsifiable claim you get, and if you're familiar with diffusion and
how to achieve it this is easy to work out. Other falsifiable claims include
equidistribution, but that is not a very useful guarantee---simply
incrementing by 1 will achieve that. The notion of indistinguishability
against computationally-bounded adversaries does not exist. All of this
contributes to this being a field where crackpottery abounds, and is hard to
distinguish good from bad.

For example, here's a generator that uses 1/4 of an AES round per output word,
and that passes the Crush battery of statistical tests (which uses ~2^35
samples). Is it a good generator? I would not bet on it.

    
    
      #include <stdint.h>
      #include <immintrin.h>
    
      struct S {
        static const unsigned kAesRounds = 1;
        union {
          uint32_t state_[4 * 4];
          __m128i  words_[4];
        } u_;
        unsigned counter_;
        const __m128i key_;
    
        S(uint64_t seed)
        : u_{{0}}, key_{_mm_set_epi64x(seed, -seed)}, counter_{0} {
          for(unsigned i = 0; i < 16; ++i) (void)next();
        }
    
        uint32_t next() {
          const uint32_t output = u_.state_[counter_];
          __m128i t = u_.words_[counter_ & 3];
          counter_ = (counter_ + 1) & 15;
          for(unsigned i = 0; i < kAesRounds; ++i)
            t = _mm_aesenc_si128(t, key_);
          u_.words_[counter_ & 3] = t;
          return output;
        }
      };
    

[1]
[http://simul.iro.umontreal.ca/testu01/tu01.html](http://simul.iro.umontreal.ca/testu01/tu01.html)

~~~
nkurz
Is there a particular reason you would bet against it, or do you just saying
that you wouldn't would bet for any implementation without proven theoretical
properties?

(As an aside, it's probably just my lack of familiarity, but considering how
simple the algorithm is I find the syntax surprisingly hard to follow. Is this
modern C++?)

~~~
pbsd
Although this thing is able to fool the 200 or so tests from TestU01, it would
take a short time for an expert to devise a test that would distinguish it
from random, and even recover the key (read: seed). On the other hand, AES-CTR
(conjecturally) fools _all_ statistical tests that take less than 2^128
effort, and on top of that it is still very very fast (0.63 cycles per byte on
Haswell). So what really was gained? Cryptographic primitives have gotten so
fast that the additional speed from these ad hoc generators seems like an
unworthy risk. I concede that it can be very fun to try to make them, though.

Additionally, a generator that passes those 200 tests is not guaranteed to be
high-quality for a particular usage, which might as well be considered nothing
but another statistical test. There is the famous case of the r250 generator
---an additive Fibonacci generator that passed all statistical tests of the
time---which turned out to have characteristics that rendered costly real-
world simulations wrong [1].

That is C++, yes, but I don't think there is much of anything modern about it.
Trying to save on space probably contributed to its unreadability a bit.

[1]
[http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.69....](http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.69.3382)

~~~
nkurz
Here's a paper that includes an analysis of a similar approach that uses 5
rounds of AES under the name ARS-5:
[http://www.thesalmons.org/john/random123/papers/random123sc1...](http://www.thesalmons.org/john/random123/papers/random123sc11.pdf)

~~~
pbsd
I know this paper, and like this approach. Reduced-round primitives let you
take advantage of existent analysis, and keep _some_ guarantees while
improving speed. In the case of AES, we know that the probability of any
differential for 4 rounds is at most 2^-113; similar for linear probabilities
[1]. I'm curious why ARS-4 fails Crush; I wonder if the custom key schedule is
the culprit.

[1] [https://eprint.iacr.org/2005/321](https://eprint.iacr.org/2005/321)

------
pettou
What does the "zero will be produced once less often than every other output"
for negative qualities of XorShift and XorShift* means? That they are not able
to generate "0"?

Also, does anyone know if PCG is in use somewhere today?

~~~
dminor
I believe go has switched to it or its considering switching. Someone linked
to a code review awhile back.

------
thesz
ChaCha20 is (relatively) slow, but! ChaCha8 does not have known attacks and
2.5 times faster than ChaCha20.

Not mentioning speed variability of ChaCha family is a flaw in analysis.

------
carterschonwald
I mentored a gsoc student this summer who worked on rngs, and the only two
algorithms that passed the big crush statistical suite were pcg random and
split mix.

------
cwmma
the knocks against the openbsd arc4random, namely

> No facility for a user-provided seed, preventing programs from getting
> reproducible results

> Periodically “stirs” the generator using kernel-provided entropy; this code
> must be removed if reproducible results desired (in testing its speed, I
> deleted this code)

seem like exactly the kinds of foot guns you really want removed from an RNG
you're using for real live code.

~~~
thrownaway2424
What about unit tests?

~~~
dspillett
Produce a set of random numbers once and store them as part of the test set.
That way your test is independent of the black box which is your RNG. Or make
your own simple RNG - in these cases you are looking for "arbitrary,
definitely not just sequential" rather than cryptographically random (or,
indeed, truly random at all).

If you rely on the order of the output, for reliably consistent test cases in
this instance, avoid using black-box RNGs which could change implementation
under your nose without much or any warning.

~~~
Veedrac
Cryptographically random isn't the aim here - pseudorandom is.

A seedable black-box RNG that guarantees what seed corresponds to what stream
is way simpler to use than a list of random numbers, IMHO. It's the difference
between copying one number and copying a million.

~~~
dspillett
> Cryptographically random isn't the aim here

We are agreeing here.

> A seedable black-box RNG that guarantees what seed corresponds

If you can control when the blackbox gets updated, or the blackbox carries a
guarantee of stable output for any given seed over the while time of its
existence, yes. But if you are using OS provided RNGs, as a for instance,
their behaviour is not defined (well they are, but those definitions are not
set in stone) and may change as kernel updates happen.

That is why I suggest "making your own simple PRNG". This could be as simple
as picking a known documented algorithm as implemented by a particular library
and using that in your test cases - it doesn't need to be as much as writing
your own function even.

~~~
Veedrac
Right... but nobody's suggesting replacing the OS's RNG.

------
justcommenting
Having encountered a number of VPS setups relying on haveged or rng-
tools/virtio-rng, has anyone observed "specific problems" with the
misconfiguration/misuse of haveged on VPS?

------
cpeterso
Has there been any research into generating a PRNG using a genetic algorithm
whose fitness function is the Crush and Diehard test results?

~~~
rurban
Zilong Tan's fast-hash contains a generator which genetically optimizes over a
combination of the cheap ops mul, xorshift left, xorshift right and rotate
right, and also does genetic mutations over the initial magic states.

[https://code.google.com/p/fast-
hash/source/browse/trunk/hash...](https://code.google.com/p/fast-
hash/source/browse/trunk/hashgen/hashgen.cpp)

The fit function is only the avalanche test, but that's easily exchangable.

~~~
cpeterso
Thanks!

