
On Melissa O’Neill’s PCG random number generator - ozanonay
http://lemire.me/blog/2017/08/15/on-melissa-oneills-pcg-random-number-generator/
======
FabHK
A few notes:

The author writes "Meanwhile, at least one influential researcher (whose work
I respect) had harsh words publicly for her result", and then quotes some of
these words:

    
    
       Note that (smartly enough) the PCG author avoids
       carefully to compare with xorshift128+ or xorshift1024*.
     

However, the author fails to note that said "influential researcher",
Sebastiano Vigna, is the author of xorshift128+ and related PRNG.

In the linked test [2] by John D. Cook (who uses PactRand, a test similar to
the (obsolete) DIEHARD), xorshift128+ and xoroshir0128+ fail within 3 seconds,
while PCG ran 16 hours producing 2 TB of pseudo-random numbers without any
suspicious p-value detected.

On the other hand, Vigna claims that the xoroshiro family does "pass"
PactRand.

I've submitted an answer to StackOverflow a while ago [1], recommending
xoroshiro and PCG, thus I'd be concerned if PCG turns out to be flawed. It's
actually quite hard to get academics in the field to give an authoritative
recommendation (I've tried) - their response is typically along the line "It's
complicated"...

[1] [https://stackoverflow.com/questions/4720822/best-pseudo-
rand...](https://stackoverflow.com/questions/4720822/best-pseudo-random-
number-generator/38202922#38202922)

[2] [https://www.johndcook.com/blog/2017/08/14/testing-rngs-
with-...](https://www.johndcook.com/blog/2017/08/14/testing-rngs-with-
practrand/)

Edit: remove italics due to asterisk in PRNG name, & add link to John. D
Cook's test.

~~~
MiceWithYaffle
I think Vigna's claim is that if you ignore the PractRand tests that fail, it
passes. (Really!)

O'Neill has instructions on how to test with PractRand and with TestU01 on her
blog ([http://www.pcg-random.org/blog/](http://www.pcg-random.org/blog/)). I
had a go with TestU01 on Vigna's generators, and when you test the low 32 bits
reversed (for 64-bit PRNGs, you have to test the high 32, the low 32, both
forwards and reversed), I found that all Vigna's generators fail.

Given the PractRand results it makes sense, I guess, but I had read that
Vigna's generators were supposed to pass TestU01.

Does anyone else wants to have a go at testing so I can know if I screwed up
somehow?

~~~
rockdoe
_I think Vigna 's claim is that if you ignore the PractRand tests that fail,
it passes. (Really!)_

The code does explain exactly what the issue is, i.e. that the last bit isn't
random:

    
    
       This generator passes the PractRand test suite
       up to (and included) 16TB, with the exception of binary rank tests,
       which fail due to the lowest bit being an LFSR; all other bits pass all
       tests. We suggest to use a sign test to extract a random Boolean value.
    

But I'm tempted to agree this isn't a desirable property for a generic RNG.

How many users of JavaScript know about this property? (it's the default RNG
for most browser engines) Or does it not matter because they return 53-bit
floats?

~~~
FabHK
In the comment section to the V8 JavaScript blog post [1], Vigna writes:

    
    
      - Technically, it would be better if you used the upper
      52 bits, rather than the lower 52 bits, to generate a
      double. The lowest bit of a xorshift128+ generator is an
      LSFR, and while people has been happy using LSFR for
      decades, it is slightly inferior in quality to all other
      bits. This is really OCD, as computational errors makes
      the lowest bit almost irrelevant, but now you know.
    

I don't know whether that's been implemented, but the maintainer replied:

    
    
      Thanks for the suggestions! I will definitely revisit the 
      current implementation with your tips in mind.
    
    

[1] [https://v8project.blogspot.nl/2015/12/theres-mathrandom-
and-...](https://v8project.blogspot.nl/2015/12/theres-mathrandom-and-then-
theres.html?showComment=1452592903162#c1549004517443909784)

------
Animats
Here's the site for the random number generator.[1] It's basically a simple
linear congruential random number generator (well known, but not very good)
fed into a mixer. The mixer is new.

Most of the analysis is about the LCG or the final output. The suggested mixer
is just

    
    
        output = rotate64(uint64_t(state ^ (state >> 64)), state >> 122);
    

That's simple, and the insight in this paper is that something that simple
helps a lot. I would have thought that you'd want a mixer where changing one
bit of the input changes, on average, half the bits of the output. The mixer
above won't do that. DES as a mixer would probably be better, but it's slower.
The new result here is that something this simple passes many statistical
tests.

This isn't crypto-grade; both that mixer and a LCG generator are reversible
with enough work.

[1] [http://www.pcg-random.org/](http://www.pcg-random.org/)

~~~
marze
>linear congruential random number generator (well known, but not very good)

Relevant quotes from the paper:

"But if you began reading the section with the belief that “linear
congruential generators are bad” (a fairly widely-held belief amongst people
who know a little about random number generation), you may have been surprised
by how well they performed. We’ve seen that they are fast, fairly space
efficient, and at larger sizes even make it through statistical tests that
take down other purportedly better generators. And that’s without an improving
step."

and

"Despite their flaws, LCGs have endured as one of the most widely used random-
number generation schemes, with good reason. They are fast, easy to implement,
and fairly space efficient. As we saw in Section 3.3, despite poor performance
at small bit sizes, they continue to improve as we add bits to their state,
and at larger bit sizes, they pass stringent statistical tests (provided that
we discard the low-order bits), actually outperforming many more-complex
generators. And in a surprise upset, they can even rival the Mersenne Twister
at its principle claims to fame, long period and equidistribution."

"Nevertheless, there is much room for improvement. From the empirical evidence
we saw in Section 3.3 (and the much more thorough treatment of L’Ecuyer &
Simard [28], who observe that LCGs are only free of birthday-test issues if n
< 16p1/3, where n is the number of numbers used and p is the period), we can
surmise that we may observe statistical flaws in a 128-bit LCG after reading
fewer than 247 numbers (which is more than BigCrush consumes but nevertheless
isn’t that many—an algorithm could plausibly use one number per nanosecond and
247 nanoseconds is less than two days)."

~~~
lfowles
> 247 nanoseconds is less than two days

Yes, yes it is.

For those just as confused as I was, replace all instances of 247 with 2 __47

~~~
kurthr
Hmmm... is HN removing ^carets?

Replace 247 with '2^47', Which is ~'1.28x10^14' or ~36 hours in ns.

~~~
lfowles
Hah. I used double asterisks, should have seen that coming in hindsight.
Muphry's Law.

~~~
PhasmaFelis
I've often been frustrated trying to put more than one asterisk (not italic
tag) in an HN comment.

Serious question, what happened to WYSIWYG? The Web has supported it for like
20 years, why have we standardized on clunky in-line markup for rich text
entry, pretty much everywhere?

~~~
a_t48
WYSIWYG is kinda crap on mobile. A checkbox saying "Don't format" would be
nice, though.

~~~
imron
It'd be nice if the formatter would realise that two asterisks in a row aren't
emphasising any text and therefore just leave them as is.

------
sulizilxia
I love O'Neill's work on PCG, and loved the talks by her I watched online.

As a tenured professor I want to say two things about this piece:

1\. I think academic publishing will be forced to change. I'm not sure what
it's going to look like in the end, but traditional journals are starting to
seem really quaint and outdated now.

2\. As far as I can tell from what she's written on the PCG page, the
submission to TOMS is a poor example, because no one I know expects to be done
with one submission. That is, no one I know submits a paper to one journal,
even one reputable journal, and is done. They submit and it gets rejected and
revise it and resubmit it, maybe three or even four times. After the fourth or
fifth time, you might give up, but not necessarily even then.

I have mixed feelings about the PCG paper as an example, because in some ways
it's great: an example of how something very influential has superceded
traditional academic publishing. In other ways, though, it's horrible, because
it's misleading about the typical academic publishing experience. Yes,
academic publishing is full of random nonsense, and corruption, but yes, you
can also get past it (usually) with just a little persistence. In still other
ways, it's a good example of what we might see increasingly, which is a
researcher having a lower threshold for the typical bullshit out there.

------
mjb
> I wonder whether the academic publications are growing ever less relevant to
> practice.

I think there are two topics here. One is whether academic research and work
is becoming less relevant to practice. The other is whether the formalism of
academic-style publishing are becoming less relevant to the modern world where
more and more venues for publishing, rating, and discovering work.

On the former, I believe that academic work is as relevant as ever. There are
some areas (like systems) where I'm doubtful about relevance from the point of
view of a practitioner, but other areas (like hardware and ML where work
remains extremely relevant). I haven't noticed a trend there over the last
decade, except in some areas of systems where the industrial practice tends to
happen on cluster sizes that are often not approachable for academia.

On the latter, academic publication does indeed seem to be getting less
relevant. There are other (often better) ways to discover work. There are
other ways to tell whether a piece of work is relevant, or credible. There are
other, definitely better, ways to publish and distribute work. In some sense I
think this is a pity: as an academic-turned-practitioner I like academic-style
publications. Still, I think they are going to either change substantially or
die.

This article raises another very good point: sometimes the formalism of
academic publication makes the work harder to understand, less approachable,
or less valuable. That's clear harm, and it seems like this professor was
right to avoid that.

------
starmole
As an engineer who switched to PCG:

\- PCG is not crypto, everybody should understand that. It's for simulation
and rendering.

\- PCG mainly replaces Mersenne Twister which is in c++11. The Twister has a
LOT more state and is a LOT slower for less randomness.

\- In rendering and simulation speed really matters, and PCG excels there.

\- Xorshift is another algorithm in the same class. I would really like to see
an objective comparison. In my cursory engineering look PCG seemed better.

\- Fast PRNG is almost a new field again: It's not crypto, but immensely
useful. How did the Twister get into C++11 while it is so much worse than PCG
or Xorshift? Nobody cared!

\- Maybe PCG should have been a paper at SigGraph.

\- For the style of the paper, I think one contribution is rethinking PRNG
outside crypto. That deserves and requires a lot of exposition.

~~~
MiceWithYaffle
Yeah, it's not for crypto. But I think it's for more than simulation and
rendering. It's meant as a _general purpose_ PRNG. It's just as good for
randomized algorithms like picking the pivot in quicksort, or playing games,
procedural content generation (PCG!), or whatever you want to use it for.

I think that the whole point of the prediction difficulty stuff is that a
library (e.g., C++11's) with general purpose PRNGs can't know how they'll be
used. Maybe some idiot write code for a gambling machine in C++ and use
whatever PRNG is to hand. There was a story in the news the other week about
people going around casinos predicting slot machines, so maybe this has
already happened! PCG is trying to make your simulation and rendering code
fast while trying to offer at least some defense against egregious misuse.

Basically PCG is trying to be a good all rounder. As you say, it's meant as a
replacement for the Mersenne Twister.

~~~
starmole
I found the story: [https://www.wired.com/2017/02/russians-engineer-brilliant-
sl...](https://www.wired.com/2017/02/russians-engineer-brilliant-slot-machine-
cheat-casinos-no-fix/)

And yes, PCG is harder to exploit in this way than the Twister, but you still
really should not bet money on it!

------
lowmagnet
I like this because she is a professor at Harvey Mudd. They took steps to make
CS more inclusive, with great results. I appreciate her attitude on
accessibility, which is in keeping with that institution's philosophy.

That she ran into a paper wall doesn't bother her because she's openly
publishing is even better.

~~~
thanatropism
There's room for accessible and for abstruse literature. Usually what happens
with novel work is that initial publications are abstruse but met with
excitement by the scholarly community and gradually more accessible works (as
the number of collaborators/coauthors grows too) are published.

That said, even if multi-culti math means that top-line researchers are going
to be spending time with song-and-dance introductions, she should still have
put a grad student onto the task of making the short paper that experts will
actually read.

If the whole thing is a matter of style and not of obfuscation, this would
have given a grad student an easy, cool first publication.

~~~
mathperson
What is multi-culti?

~~~
thanatropism
It's a common sarcastic term hurled at things like "inclusive CS".

For all I know this iteration of "inclusive" could be the Right Stuff --
proper intellectual discipline even as it advances secondary goals of
"inclusiveness". But stiiill... she should have by now had the common courtesy
of producing a short document aimed at experts, possibly prepared by her
underlings. This would also have helped the career of the underlings.

------
bmm6o
It sounds like an interesting result, I look forward to reading the paper more
carefully. That said, it's clearly not written for an academic journal.
Section 2.4.3 is entitled "The Importance of Code Size", and explains why
shorter code is better. I think you can argue that some academic papers are
excessively concise, but this is a 58-page paper about an RNG. It is clearly
not a journal paper and has a ton of extraneous content. I have to sympathize
with the commenter that the author has made a trade-off and written a paper
that's less rigorous than it should be (for peer review). I wonder why she
didn't write 2 versions.

~~~
wadkar
> I wonder why she didn't write [two] versions.

Because the reviewers took over 10 months to respond with a rejection mainly
citing the length of the paper. And more importantly, "By that point, everyone
who might have wanted to read it had almost certainly found it here and done
so, so I saw little merit in drastically shortening the paper."[1]

She has updated the blog post which discusses all the nuanced details of the
whole affair last month (2017-07-25)[2].

[1] [http://www.pcg-random.org/paper.html](http://www.pcg-
random.org/paper.html) [2] [http://www.pcg-random.org/posts/history-of-the-
pcg-paper.htm...](http://www.pcg-random.org/posts/history-of-the-pcg-
paper.html)

~~~
bmm6o
There is no excuse for the journal to take so long to provide a response. At
the same time, it seems to me that their response was entirely predictable. Or
does the journal usually post such long articles directed at a general
audience?

------
lisper
This is an interesting article, but it's more about the changing landscape of
academic publishing than it is about random number generators.

[EDIT] The actual paper is here: [http://www.pcg-random.org/pdf/hmc-
cs-2014-0905.pdf](http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf)

~~~
pedrocr
Only if it turns out the work is actually good. If not this is exactly what
should happen in peer review.

~~~
lisper
You don't think the peer review process ever produces false negatives?

~~~
dsacco
The peer review process absolutely produces false negatives, but that doesn't
really change the fact that this paper doesn't need to be nearly 60 pages,
doesn't sufficiently analyze one of its central premises (6.2.2 Security
Considerations) and in general focuses on treatise and levity rather than
rigor.

~~~
User23
Peer review is basically just proof-reading, and taking 60 pages to present 10
pages of findings is exactly the kind of thing a proof-reader should catch.

------
the_stc
As a general comment, I dislike deliberately obtuse writing in papers. In my
current work, I came across a very in-depth survey of our industry (sex work).
Excellent study, very helpful. But some of the sentences seemed to over-
complicate the math. Example: "Consider the set P {p1, p2, ... pN}
representing providers and the set C {c1, c2, ... cN} representing customers".
I am pretty sure this kind of stuff is filler or pretends to make things look
more rigorous than they are.

On the other hand, maybe spending more than a line explaining what the
birthday paradox is should be cut out and put in a backgrounder paper or
appendix so that the paper can focus on the actual novel ideas.

~~~
mathperson
are you joking? That is your example of abstruse mathematical notation? Some
variables with names?!

~~~
the_stc
No. I am saying that giving the definition of a set each time is just extra
verbosity. The whole paper had that extra verbiage, everywhere. Kind of why-
use-one-word-when-ten-will-do feeling.

~~~
mathperson
Could you share the paper? I mean if they were making a mathematical
argument...it is kind of hard to do that without defining things..

~~~
greggyb
"Consider the set P {p1, p2, ... pN} representing providers and the set c {c1,
c2, ... cN} representing customers"

versus

"Consider the set P of providers and the set C of customers"

The former is about 2x the length of the latter. This is the verbosity I read
into the original comment.

~~~
thaumasiotes
The first is required when you want to later refer to an individual element
from the set.

"Consider the set P of providers" means when you eventually refer to p_2,
you'll have to note that you mean an element (the second, in some sense) of P.
That moved the verbosity around rather than eliminating it.

To me it seems like a much bigger error that "Consider the set P {p_1, p_2,
... p_N} representing providers and the set c {c_1, c_2, ... c_N} representing
customers" states outright that the sets are equal in size. I would expect C
to be much larger than P.

~~~
greggyb
If this is a commonly used notation in the paper, then it would make sense to
state once up front "here's how we refer to sets and their members". DRY and
all that.

------
drallison
Melissa O'Neill gave a talk describing the PCG random number generator at
Stanford in EE380.
[https://www.youtube.com/watch?v=45Oet5qjlms](https://www.youtube.com/watch?v=45Oet5qjlms)

------
Houshalter
I once tried to develop my own fast random number generate using nothing but
bitwise operations. On the theory they were the fastest/simplest. I had a
program generate thousands of random combinations of bitwise functions. And
then used statistical tests to see which ones produced the most "random"
seeming behavior.

It worked as far as I can tell. But I don't trust the statistical tests. Who
is to say there isn't a very obvious pattern in the numbers that I didn't test
for or notice? How do you prove a random number generator is good?

~~~
thaumasiotes
> How do you prove a random number generator is good?

You can't; that's the nature of randomness. You can prove they're bad, though.

------
dsacco
I have a few comments:

1\. The paper itself[1] is _extremely_ readable by the standards of most
cryptography research. On one hand, this is great because I was able to follow
the whole thing in essentially one pass. On the other hand, the paper is
_very_ long for its result (58 pages!), and it could easily do without
passages like this one:

 _Yet because the algorithms that we are concerned with are deterministic,
their behavior is governed by their inputs, thus they will produce the same
stream of “random” numbers from the same initial conditions—we might therefore
say that they are only random to an observer unaware of those initial
conditions or unaware of how the algorithm has iterated its state since that
point. This deterministic behavior is valuable in a number of fields, as it
makes experiments reproducible. As a result, the parameters that set the
initial state of the generator are usually known as the seed. If we want
reproducible results we should pick an arbitrary seed and remember it to
reproduce the same random sequence later, whereas if we want results that
cannot be easily reproduced, we should select the seed in some inscrutable
(and, ideally, nondeterministic) way, and keep it secret. Knowing the seed, we
can predict the output, but for many generators even without the seed it is
possible to infer the current state of the generator from its output. This
property is trivially true for any generator where its output is its entire
internal state—a strategy used by a number of simple random number generators.
For some other generators, such as the Mersenne Twister [35], we have to go to
a little more trouble and invert its tempering function (which is a bijection;
see Section 5), but nevertheless after only 624 outputs, we will have captured
its entire internal state._

That's a lot of setup for what is frankly a very basic idea. A cryptographer
being verbose in their writing might briefly remind the reader of these
properties with the first sentence, but they'd still likely do that with much
more brevity than this. I understand wanting to make your research accessible,
but for people who understand the field this detracts from getting to the
"meat." It might make it harder to get through, but a 10-30 page result is
preferable to a nearly 60-page one that assumes I know nearly nothing about
the field. If I don't know these details very well, how can I properly assess
the author's results?

2\. The author's _tone_ in her writing is something I take issue with. For
example, passages like this one...

 _Suppose that, excited by the idea of permutation functions, you decide to
always improve the random number generators you use with a multiplicative
step. You turn to L’Ecuyer’s excellent paper [25], and without reading it
closely (who has time to read papers these days!), you grab the last 32-bit
constant he lists, 204209821. You are then surprised to discover that your
“improvement” makes things worse! The problem is that you were using XorShift_
32/32, a generator that already includes multiplication by 747796405 as an
improving step. Unfortunately, 204209821 is the multiplicative inverse of
747796405 (mod 2 32), so you have just turned it back into the far-
worse–performing XorShift generator! Oops.*

...go a bit beyond levity. If you're trying to establish rigorous definitions
and use cases to distinguish between generators, functions and permutations,
this isn't the way to do it. This isn't appropriate because it doesn't go far
enough to _formalize_ the point. It makes it intuitive, sure, and that's a
great educational tool! But it's a poor scenario to use as the basis for a
problem statement - research is not motivated by the failure of an engineer to
properly read and understand existing primitives, it's motivated by novel
results that exhibit superior qualities over existing primitives.

3\. The biggest grievance I have with this paper is the way in which it
analyzes its primitives for cryptographic security. For example, this passage
under 6.2.2 Security Considerations:

 _In addition, most of the PCG variations presented in the next section have
an output function that returns only half as many bits as there are in the
generator state. But the mere use of a 2 b /2-to -1 function does not
guarantee that an adversary cannot reconstruct generator state from the
output. For example, Frieze et al. [12] showed that if we simply drop the low-
order bits, it is possible for an adversary to discover what they are. Our
output functions are much more complex than mere bit dropping, however, with
each adding at least some element of additional challenge. In addition, one of
the generators, PCG-XSL-RR (described in Section 6.3.3), is explicitly
designed to make any attempt at state reconstruction especially difficult,
using xor folding to minimize the amount of information about internal state
that leaks out.17 It should be used when a fast general-purpose generator is
needed but enhanced security would also be desirable. It is also the default
generator for 64-bit output._

That's not a rigorous analysis of a primitive's security. It _is_ an informal
explanation of why the primitive _may_ be secure, but it so high level that
there is no proof based on a significant hardness assumption. Compare this
with Dan Boneh's recent paper, "Constrained Keys for Invertible Pseudorandom
Functions"[2]. Appendices A and B after the list of references occupy nearly
20 pages of theorems used to analyze and prove the security of primitives
explored in the paper under various assumptions.

Novel research exploring functions with (pseudo)random properties is
inherently mathematical; it's absolutely insufficient to use a bunch of
statistical tests, then informally assess the security of a primitive based on
the abbreviated references to one or two papers.

_________

1\. [http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf](http://www.pcg-
random.org/pdf/hmc-cs-2014-0905.pdf)

2\.
[https://eprint.iacr.org/2017/477.pdf](https://eprint.iacr.org/2017/477.pdf)

~~~
tptacek
Just to be clear: it's not a cryptography paper, is it? Did you figure out
what journal it was submitted to?

~~~
dsacco
She submitted it to _ACM Transactions on Mathematical Software._ I would
personally consider it a cryptography paper, for three reasons:

1\. She purports to introduce a novel result that bridges "medium-grade"
performance characteristics and security characteristics in one primitive. In
fact, if you look at the PCG Random website (pcg-random.org), she very clearly
compares and emphasizes both performance _and security_ characteristics with
functions like xorshift and ChaCha.

2\. We see cryptography papers submitted to all manner of theoretical CS
conferences and journals, for example _Symposium on the Theory of Computing_ ,
which are not uniformly crypto-focused.

3\. She acknowledges herself that she found it hard to categorize her paper
(it could be relevant for simulstion, it could be relevant for stream ciphers,
etc) in a blog post about how she chose the venue: [http://www.pcg-
random.org/posts/history-of-the-pcg-paper.htm...](http://www.pcg-
random.org/posts/history-of-the-pcg-paper.html)

As a meta point I read the whole thing, and I actually think it would be a
nice publishable result if it were, say 10 - 20 pages. But 60 is _wild_! It
took me longer to get through this "accessible" paper than it did for me to
get through any of Boneh's papers on constrained and puncturable pseudorandom
functions!

It's definitely _interesting_ , and sure, why not explore "medium-grade
security" that makes explicit tradeoffs with performance and security. But the
presentation seems like it was written by someone writing for a non-academic
audience, and the content of 6.2.2 "Security Considerations" is really light
on provable security.

~~~
tptacek
If you stripped out all the (weird, random) cryptographic stuff from this
paper, it would read pretty much the same and make pretty much the same
points, which tells me: it's not a cryptographic paper.

~~~
dsacco
That's a fair point; it was mostly the presence of the random crypto that
annoyed me :)

~~~
MiceWithYaffle
O'Neill recently mentioned the crypto aspects of PCG in an comment on another
post by John D. Cook. I'll just quote it below. But it looks to me like she
thought that any analysis she did on the prediction difficulty of PCG wouldn't
be well regarded.

Also, I notice that lots of people seem to think her paper was too long, but
they also claim that it doesn't say enough about their favorite topic. That
seems to be happening with you.

Direct quote from O'Neill's comment here
[https://www.johndcook.com/blog/2017/07/07/testing-the-pcg-
ra...](https://www.johndcook.com/blog/2017/07/07/testing-the-pcg-random-
number-generator/)

John, in your post, you said that PCG has “excellent statistical and
cryptographic properties”. I’ve learn it best never to say “cryptographic
security” or “cryptographic properties” when trying to place something on a
spectrum of prediction difficulty. Too many misunderstandings. Saying
“prediction difficulty” still causes some crossed wires, but it’s about as
good as we can go.

Dan & Dmitriy, just to be 100% clear, I HAVE NEVER RECOMMENDED PCG FOR
CRYPTOGRAPHY. I do however, care about prediction difficulty, and I don’t like
trivially predictable generators. Because my viewpoint is often misunderstood,
I have some more blog posts in the pipeline about these issues, but the key
thing is that general-purpose PRNGs get used for almost anything, from using
the low-order bit to toss a coin, to supporting randomized algorithms. If
someone can predict your generator, they can mount an algorithmic complexity
attack on your randomized algorithm, tanking its performance. If predicting
the generator is more costly than the algorithmic complexity attack itself,
people will try their attacks on easier targets. But if the generator spends
to much time being trying hard to be unpredictable, we also tank our
performance, thus we have to try to strike a balance in a different place than
we do for traditional cryptographic applications. We already live in a world
where hash table implementations in scripting languages need to be hardened
because of algorithmic complexity attacks; this is the next logical step.

All that said, there are members of the pcg family (not pcg32 and especially
not pcg32_fast!) that I personally think would be really challenging to
predict. I also find it frustrating that people don’t compare like with like.
If you want to compare the prediction difficulty of PCG against a
cryptographically secure PRNG, you need to compare a PCG variant at least
broadly similar in size. Say, for example, we wanted to contrast PCG against
the ChaCha PRNG from Orson Peters (perhaps with four rounds rather than the
full 20), that PRNG is 104 bytes in size, so it’s fairest to compare it to
pcg64_c8, which is 80 bytes, or pcg64_c16, which is 144 bytes.

Regarding prediction difficulty, I’m very well aware of Bruce Schneier’s law,
“Anyone, from the most clueless amateur to the best cryptographer, can create
an algorithm that he himself can’t break. It’s not even hard. What is hard is
creating an algorithm that no one else can break, even after years of
analysis. And the only way to prove that is to subject the algorithm to years
of analysis by the best cryptographers around.”, and his subsequent
elaboration “Anyone can invent a security system that he himself cannot break.
I’ve said this so often that Cory Doctorow has named it “Schneier’s Law”: When
someone hands you a security system and says, “I believe this is secure,” the
first thing you have to ask is, “Who the hell are you?” Show me what you’ve
broken to demonstrate that your assertion of the system’s security means
something.” Thus my personal thoughts on how difficult it is mean _NOTHING_ in
a cryptography context. I also can’t expect cryptographers to spend their time
on my education, but if someone out does know a simple and efficient algorithm
that can reliably break pcg64_c8, I really would love to see how. (Also, hey
Bruce, gender-neutral language, it’s a thing.)

I can do at least one thing that adds a tiny tiny tiny bit of credibility in
the eyes of folks like Bruce Schneier , I can show other PRNGs I have broken
that might have seemed hard to predict to a casual observer. Mostly that won’t
actually help though, because a cryptographer would say “Ha! That’s toddler
level stuff!” and a mathematician might say “I can’t understand why you care
about prediction at all”. Sometimes I feel there should be more people trying
to occupy the middle ground. Meh.

But, let’s be clear, DO NOT USE PCG FOR CRYPTOGRAPHY. DO NOT USE PCG FOR
CRYPTOGRAPHY. DO NOT USE PCG FOR CRYPTOGRAPHY. Clear?

~~~
tptacek
I think we've surrendered some of the "clarity" argument with the sentences in
that paper describing which exact instantiation of the non-cryptographic RNG
to select for "sensitive" applications.

Even here, in this comment, it's really hard to follow what you're saying. For
instance, you've taken the time to compare the "prediction difficulty" of the
PCG PRNG to that of a ChaCha20-based DRBG. But ChaCha20 is a stream cipher, a
cryptographic primitive. To be competitive with it at producing uncorrelated
bits is to be yourself a cryptographic primitive; that is, to argue that an
LCG and a trivial mixer is all we ever needed to encrypt data. That would
be... newsworthy?

Also: if you're making an appeal to the cryptographic literature, there are
better people to cite than Bruce Schneier.

------
fwdpropaganda
Physicist here.

Off-topic

> And it is not even entirely clear what “really random” would mean. It is not
> clear that we live in a randomized universe…

At the quantum level it really is clear that we live in a really random
universe. What's the meaning of really random? The outcome of a quantum
process.

On-topic. Yeah, you have to know your audience. As OP mentions, just because
the paper wasn't published doesn't prevent anyone from thinking about it and
even building on it. On the other hand these scientific publications have
styles and target audiences, and maybe she got rejected not due to lack of
relevance or rigor, but because the paper didn't match the publication's non-
scientific criteria for publication.

~~~
eridius
A quantum process is a random process, but isn't it still an open
philosophical question as to whether the "random process" we observe is truly
random, or is instead governed by deterministic hidden state?

~~~
fwdpropaganda
No, all theories of (local) hidden variables have been experimentally ruled
out.

[https://en.wikipedia.org/wiki/Bell%27s_theorem](https://en.wikipedia.org/wiki/Bell%27s_theorem)

> Bell's theorem states that any physical theory that incorporates local
> realism cannot reproduce all the predictions of quantum mechanical theory.
> Because numerous experiments agree with the predictions of quantum
> mechanical theory, and show differences between correlations that could not
> be explained by local hidden variables, the experimental results have been
> taken by many as refuting the concept of local realism as an explanation of
> the physical phenomena under test. For a hidden variable theory, if Bell's
> conditions are correct, the results that agree with quantum mechanical
> theory appear to indicate superluminal effects, in contradiction to the
> principle of locality.

~~~
eridius
Interesting. But it sounds like non-local hidden variables are still possible.
My assumption is that non-local hidden variables is non-falsifiable, which is
why I said this was a philosophical question.

