
Romu – Fine random number generators - nkurz
http://www.romu-random.org
======
ChrisLomont
I'd stay away from this work. There are ample math and reasoning errors in the
paper, and the claims are dubious.

One prime example: the algo for ROMU is of the form new_state64 =
ROTATE_LEFT(old_state64,32) * constant, which decomposes the space
{1,2,..,2^64-1} into a permutation. From a given start seed, your cycle may be
short, and you won't know this until it bites you.

The author incorrectly addresses this using the theory of _random_
permutations, but his is most definitely NOT a random permutation chosen
uniformly from S_n. His analysis most does not follow. The proofs he cites are
not applicable to his since he has a fixed permutation.

Among all cycles, among _all_ permutations, the results hold. They do not hold
for a fixed permutation. For example his results fail for the n-cycle and they
fail for the identity permutation. They similarly fail for many, many other
permutations, and he has no proof that his is not a badly chosen one. Based on
the structure of it (rotate and mult), I suspect from past experience that his
permutation has lots of structure that leads to all sorts of pathologies for
this use.This is a basic math error, and the paper is full of such basic
errors.

For why to listen (somewhat) to me, check out my name - I've done a bit of
writing (and a lot of teaching) on PRNGs for decades. I've followed the field
for quite some time, and am well aware of much of the research in it.

There is no need with all the good choices out there to choose a PRNG that
cannot provide a solid proof of cycle length (or at least correct bounds) for
any seed value. This one fails this basic test.

If I get time (and remember) later, I've written some SAT solver code to
explicitly find holes in PRNGs, and I'll try to dig it out and analyze the
cycles in this fixed permutation. If anyone else wants to try it the idea is
you can assign variables to bit positions in the state, represent the
permutation as boolean expressions in these bits, and find cycles using SAT
solvers without having to run through steps incrementally. I suspect in this
case it will break this PRNG pretty quickly.

~~~
MarkOverton
Suppose you generated two permutations, one using true random numbers and the
other using pseudo-random numbers. Now you have two _fixed_ permutations. The
cycles in both will obey the probabilities about such cycles, _unless_ the
pseudo-random generator was poor. But Romu generators pass PractRand, so their
high quality numbers and cycles will obey those probabilities. But you brought
up a good point: Each kind of Romu generator relies on _one_ permutation and
thus one set of cycles. The paper needs to discuss that fact. Thanks for
pointing that out.

Important request: Please post every specific error in reasoning or math you
saw in the paper. You said there are several. I expect that you are _not_ like
many internet posters who reply with vagueness or discouragement when pressed
for specifics. I need the specific errors so I can fix all of them before
submitting the paper to a peer-reviewed journal.

Also, a SAT solver that finds cycles in these generators would be very
helpful!

~~~
ChrisLomont
You cannot fix this error. Your proofs do not apply to such limited set of
random permutations. You have to prove the cycle structure for the specific
ones you chose, no matter how you choose them, otherwise you are playing with
a time bomb. As such there’d be no reason to choose your method over many
other PRNGs that do provide good cycle length guarantees and equidistribution
metrics at the same CPU speeds.

In fact, I think your method is a weaker subset of PCG generators, which are
as fast and have all the good theoretical guarantees.

What do you mean by “true random numbers”?

There are 64! permutations on 64 bits. All Roma permutations are a vanishingly
small subset of these with very specific structure. How can you prove that
structure doesn’t introduce all sorts of bad cases?

Don’t worry about all the errors. I doubt this one is fixable.

~~~
MarkOverton
> You cannot fix this error. Your proofs do not apply to such limited set of
> random permutations.

Incorrect. You are misunderstanding what a probability says. Suppose a ball
has a 1/3 chance of landing in bin 1 and a 2/3 chance of bin 2. You throw one
ball. But you did not see which bin it landed in. What do you know? You know
it has a 1/3 and 2/3 chance of being in bin 1 or 2. Those probabilities apply
to just _one_ throw. A Romu generator uses one permutation. But you don't know
the cycle-lengths. What do you know? You know the probabilities of various
lengths. Those probabilities apply to just _one_ permutation, just as they do
for one ball-toss.

Bob Jenkins created the JSF generator
([http://burtleburtle.net/bob/rand/smallprng.html](http://burtleburtle.net/bob/rand/smallprng.html)),
which is similar to Romu except it uses additions instead of a multiplication.
He wrote an article on designing PRNGs
([http://burtleburtle.net/bob/rand/talksmall.html#reversible](http://burtleburtle.net/bob/rand/talksmall.html#reversible))
in which he explains theory briefly and clearly, so this article might help
you.

> [64! possible permutations] All Roma permutations are a vanishingly small
> subset of these

Yes. In fact, only one permutation is used. But that's irrelevant because the
cycles are what matter. And I showed above that the cycles obey the equations
for their probable lengths.

> with very specific structure.

Incorrect. Page 7 of the paper contains graphs of _all_ the cycles in several
generators with 32 bits of state, with measurements of their randomness. The
cycles performed nearly ideally -- the 45-degree line in the graph. So their
structure is not "very specific" (nonrandom); rather, they pass tough tests of
randomness.

> Don’t worry about all the errors. I doubt this one is fixable.

There is no error. Again, read Bob Jenkins' article for a clear discussion of
cycle lengths.

Let's not discuss the topic of permutations and cycle lengths any further; it
won't be productive.

Instead: Please post your list of perceived errors in the paper, and let's
talk about those. It'll only take a few minutes to write, and will benefit
many people because I will clarify unclear parts of the paper.

~~~
ChrisLomont
>You are misunderstanding what a probability says

I have a PhD in math. I've worked on this stuff for decades. Among other work,
I wrote a decently cited article on the Hidden Subgroup Problem for quantum
computation on arxiv, and this problem requires great understanding of the
group structure of the permutation groups as well as a solid grounding in
probabilty. It's not likely I am the one who misunderstands probability (or,
as I will show you, permutations).

>A Romu generator uses one permutation. But you don't know the cycle-lengths.
What do you know? You know the probabilities of various lengths.

Take your page 2 generator, switch to a 32 bit state so you can empirically
check everything with code. The largest cycle length is 1499760802, or 34% of
the state space. Check each cycle. The results contradict your "proof" result
(2) that the probability P(|cycle with x|<= 2^k) = 2^k/(2^s-1). You simply
don't get the bounds you claim.

The next two largest cycles are 731971696(17%), 1137668726(26%) to aid you in
your checking. They fall off pretty quickly after that.

Maybe we were just unlucky, and the original constant caused the failure. Pick
another multiplicand. Whoops. Same result. Try again. Same result. In fact,
you will fail to meet the bounds of your "proof" for any multiplicand value.

So, go ahead and tell me how to use a 32 bit state version of the page 2
generator for which we can compute exact cycle lengths that also meets your
bounds. I'll wait.....

Thus your "proof" fails to give correct bounds, for precisely the reasons I
stated earlier. And thus your table 2 is incorrect - you don't understand the
structure of your PRNG, or that it does not satisfy the theorems you are
applying to analyze it. Otherwise the "proof" would be a proof. Math doesn't
lie.

And before you next make the error of claiming the quality increases with bit
size, that is usually false. So many people have fallen into this trap in both
PRNG creation and crypto. When each choice in your "proofs" is smaller than
you claim, often these accumulate faster than the space increases, and you can
have catastrophic failure in behavior.

Another way to see it: using your page 2 generator again as an example (the
others have the same flaws): take the state, mult by your odd, then the last
bit is still the same. Rotate by half the bits. Now a middle bit of the next
state is _always_ the same as the last bit of the previous state. This is not
random, and your proof requires that each choice in it is allowed to choose
from the entire remaining possible state values. But we just cut that in half.
This problem happens at every bit - there are too many relations between them
due to the simplicity of your method to use that proof.

Another way to see it: take 64 bit numbers and consider cycles from some
permutation. There are (2^64)! such permutations, a number which has ~10^20
digits. It's an astounding size. The proofs apply when you choose your
permutation _unifrmly_ from this space. The word "uniformly" has precise
mathematical meaning, and you no where near satisfy it. If you don't have a
clear understand of what the word means and why you don't meet this
requirement then you need to learn what it means. Your simple ROMA has a 64
bit constant (2^64 choices) and a rotation (a generous 64 choices). The number
of such choices has about 21 digits, off by around 20 orders of magnitude. You
are choosing from an infinitesimally tiny, well-strucutred subset of all the
permutations. The theorems do not apply. The "well-structured" part will kill
you in practice, just like here.

Another way to see it: your "proof" doesn't mention your choices, thus your
"proof "should hold for any constant. But it doesn't, which is easy to check.

Hopefully that's enough different directions on why your cannot conclude what
you did that you see one. Of course the code example shows it without any
question.

Finally, your entire method is a weaker version of PCG. Your first step is a
MCG, which the PCG paper uses in the same manner, then you do a fixed
rotation, wheras they use a varying rotation. You cannot prove anything, like
cycle lengths or any equidistribution, but they, using the superior structure
of PCG, can prove both.

You claim yours is faster, but you don't compare against the best PCG methods
- choice or accident? Without timing, PCG is likely as fast as yours, with
demonstrable quality RNGs.

And here's the kicker - the main reason people want PRNGs this fast are things
like Monte Carlo, where large cycles and k-equidistribution is of serious
concern, and you cannot provide one and make demonstrably incorrect claims on
the other.

>Bob Jenkins ... which is similar to Romu except it uses additions instead of
a multiplication

Bob does not make the errors you make; he carefully avoids them. His uses an
XOR in the _middle_ of some add, subtract, and rotates. This is fundamentally
different. He even states this in the description you recommended I read: "For
example, + is linear mod 2^32 and XOR is linear in GF(2^32), but the
combination of + and XOR isn't linear. You combine + and XOR, and linearity
goes away."

You should read it and understand it too. That you see a plus in his and claim
it as a defense of yours without understanding what he did and why doesn't
create much faith in your analysis.

Your sequence overlap "proofs" suffer from the same errors - and you can check
them with the 32-bit versions of the code where you can empirically compute
all cycles and check odds. Part of this follows from (2) being wrong, as
demonstrated with code empirically, and being used as a basis in section 3.2.

Without proper proofs I'd recommend no one use this for serious work. There is
no real benefit to using it and so many wrong things with it as demonstrated.

>Let's not discuss the topic of permutations and cycle lengths any further; it
won't be productive.

Agreed. Maybe you can either provide a 32 bit ROMU that meets your claimed
bounds in (2) or replace your "proofs" with correct ones that agree with
exhaustive experiment, then we have progress.

~~~
MarkOverton
I have given this question (and my paper) to a PhD who works heavily with
probability and has a track-record of accomplishments. Also, I have another
idea on empirical verification. So hang on...

------
tgsovlerkhgsel
You can easily get over 1 GB/s of cryptographic randomness with modern
CSPRNGs, for example an AES-NI accelerated CTR_DRBG (basically encrypting
zeros with AES-128 in CTR mode).

I don't think there is any reason to use non-cryptographic PRNGs. (CTR_DRBG is
deterministic if seeded with a fixed seed, so that's not an excuse either).

~~~
pps43
In many Monte-Carlo algorithms random number generation is responsible for a
significant chunk of total runtime. Modern PRNGs like Xoshiro256+ implemented
with AVX are much faster than any CSPRNG.

~~~
seanhunter
Yes indeed. In fact people often use things like Sobol Sequences

[https://en.wikipedia.org/wiki/Sobol_sequence](https://en.wikipedia.org/wiki/Sobol_sequence)

~~~
pps43
Not so much because they are fast, though, but because they produce more
evenly distributed (less clumpy) numbers.

~~~
seanhunter
Yes but this (sometimes/often) means you actually need less paths in your MC
simulation and therefore makes it faster overall. Obviously depending on what
you're trying to simulate and what the underlying stochastic processes are
etc.

I guess my point was there are often legitimate uses for things that are
random-like but far from crypto random.

------
dchest

      // Romu generators, by Mark Overton, 2020-2-7.
      //
      // This code is not copyrighted and comes with no warranty of any kind, so it is   as-is.
      // You are free to modify and/or distribute it as you wish. You are only required   to give
      // credit where credit is due by:
      // (1) not renaming a generator having an unmodified algorithm and constants;
      // (2) prefixing the name of a generator having a modified algorithm or constants with "Romu";
      // (3) attributing the original invention to Mark Overton.
    

So, is it copyrighted or not? If not, then you can't require credit as
described. Also, it's the first time I see a license that dictates how I
should name my variables.

~~~
scrollaway
Schroedinger's license. It's both copyrighted and not, and you won't know
until a court decides.

The more likely outcome though is that it's copyrighted, because things just
_are_ copyrighted unless very clearly stating otherwise. CC0 needs three pages
to explain just how "not copyrighted" the associated work is.

~~~
seemslegit
"This code is not copyrighted" sure _sounds_ like it is clearly stating
otherwise

~~~
moreira
But then the author makes demands, which they could only make if it -was-
copyrighted. So it does seem like it’d be up to a court to decide.

------
clarry
Interesting, but I can't think of any situations where I'd take chances with
probabilistic periods in exchange for a slight speed boost over something with
a known period and good output (e.g. a large enough mcg/lcg where you drop the
low bits on output, or pcg -- these are all plenty fast and can pass the same
tests).

~~~
FabHK
For Monte Carlo eg in finance, you’d definitely take the speed improvement.
This thing passes Big Crush and PractRand - it’s going to compute the
approximate value of your three factor Power Reverse Dual note just fine.

~~~
clarry
> This thing passes Big Crush and PractRand

... for the seeds that were tested.

In theory, you can get unlucky and end up in a short cycle, and it will not
pass these tests.

That's why I'm not comfortable with it, when there are generators with a
known, guaranteed period. They simply do not have this failure mode.

~~~
MarkOverton
That's why I pointed out that the probability of a too-short cycle or
sequence-overlap is no higher than that of randomly selecting one snowflake
out of all snowflakes in Earth's history. Also, the paper has a graph and
accompanying discussion of what happens with the shorter cycles.

------
beefhash
This looks pretty interesting and may well be an improvement over PCG[1].

For non-simulation purposes, however, using a fast-key-erasure[2] ChaCha20 or
AES-256-CTR RNG has served me well enough, and doesn't leave me worrying if
maybe someone could abuse the RNG for whatever purposes (e.g. cheating in a
game by using RNG manipulation).

[1] [https://www.pcg-random.org/](https://www.pcg-random.org/)

[2]
[https://blog.cr.yp.to/20170723-random.html](https://blog.cr.yp.to/20170723-random.html)

~~~
SeanLuke
> This looks pretty interesting and may well be an improvement over PCG[1].

Like this website, the PCG website also makes rather bold claims and has no
peer review to back it up. What's the deal with this?

~~~
FabHK
Good question.

I wish, as in the romantic ideal of science, a consensus would emerge of what
“the best” PRNG are, and how to choose among them. Instead, you seem to have
many academics promoting their own ones (Vigna with the xoroshiro family,
O’Neill with PCG, now this guy Overton with Romu).

And in effect every programming language has to go and make sensible default
choices, while the experts in PRNG refuse to make unambiguous recommendations.

(FWIW, I’d advocate to have, by default, a cryptographically secure PRNG, and
offer a choice of faster ones (like the three mentioned above) that have to be
chosen explicitly, maybe naming them InsecureXYZ, for applications where speed
is paramount.)

~~~
clarry
I doubt there's ever going to be consensus. That's because requirements are
generally ambiguous, and you make tradeoffs based on them.

~~~
FabHK
Agreed, but even a simple flowchart/decision diagram is not given (if you
require this, choose that). At its most basic, we have cryptographically
secure & unpredictable, vs. fast: those two should be offered in the standard
library of any language, IMHO - not the Meraner Twister.

------
SeanLuke
No peer review.

There are many bold claims on this website. But you don't screw around with
RNGs: they need to be right. Until this author's claims have been verified, I
would not use this algorithm.

~~~
cipher_314159
Right?

> In effect, Romu generators are infinitely fast when inlined.

What the hell does "infinitely fast" even MEAN in this context? They can be
parallelized easily? The computational cost amortizes well? That it literally
requires zero operations to generate output?

This may be something interesting if it packs a lot of the features claimed.
But there are a LOT of ill-defined claims on the website, and a fair bit of
time spent pointing to the state of current research in OTHER
algorithms/proposals as if open questions are an automatic disqualification
("[H]ow can you know whether such a generator has enough capacity for your
large job? You don’t know.").

I don't think the guy is a crank, per se-- just excited. But this is the sort
of stuff that makes me think that he hasn't done the requisite level of
research required before claiming a breakthrough.

~~~
clarry
> What the hell does "infinitely fast" even MEAN in this context?

It's nonsense, but their argument seems to be that instruction level
parallelism allows the RNG's user to keep executing simultaneously while the
RNG pre-computes its next output. So the application doesn't have to delay
execution to wait for the RNG to spit something out.

~~~
blattimwind
Well that clearly assumes that the user code is not utilizing most of the
issue bandwidth or rather that the unutilized issue bandwidth is sufficient to
run the RNG _and_ that there are enough EUs left over to actually run it. In
that case you do indeed get operations "for free" (in terms of wall-clock
time), because your extra operations are exactly contained within the
increased IPC.

It doesn't mean that it is infinitely fast. It does however mean that in this
particular case it adds no latency for adding random number generation; an
infinitely fast algorithm would also add no latency for random number
generation. Doesn't mean one implies the other.

~~~
MarkOverton
Yes, the output latency is zero clock cycles. The paper contains ILP tables
that detail what happens in each clock cycle. The generator computes its next
output while the application is running. But you are correct in that the
statement (and ILP table) assumes that the application is not using all
available issue-slots.

------
pps43
What's the advantage compared to this:

    
    
        static uint128_t state = 1;   // can be seeded to any odd number
    
        static inline uint64_t next(){
            return (state *= 0xda942042e4dd58b5ULL) >> 64;
        }
    

from [http://www.pcg-random.org/posts/on-vignas-pcg-
critique.html](http://www.pcg-random.org/posts/on-vignas-pcg-critique.html)?

~~~
dependenttypes
I am not aware of any architecture which allows for 128-bit multiplication.

~~~
shakna
SSE2 introduced native 128-bit types (__m128i), and GCC and Clang both have
their own implementations that may be emulated in software or SSE
instructions, depending on support.

~~~
wahern
Are you sure? AFAIU, __m128i isn't for 128-bit arithmetic, it's simply a
convenience type for packing 8-bit, 32-bit, or 64-bit values for SIMD
operations.

GCC and clang do have a 128-bit integral type, __int128, but arithmetic
operations are synthesized inline, and judging by the generated assembly don't
even make use of SIMD at all.

Note that an unintended consequence of C99's uintmax_t made the introduction
of a standard 128-bit integral type impossible on platforms that cared about
ABI compatibility, an effect somewhat contradictory to the original intention
of improving portability and code integration. A future C standard will loosen
some language and introduce some new facilities to help promote the
introduction of larger integral types. At that point Unix platforms may begin
typedef'ing __int128 to int128_t.

------
bhickey
There aren't many compelling reasons to use insecure RNGs. In very few
applications is the RNG your bottleneck. In many applications there are
negative consequences to your RNG being compromised.

~~~
SeanLuke
> In many applications there are negative consequences to your RNG being
> compromised.

Witness the security researcher's mindset. RNGs aren't just used in security
applications.

A large chunk of RNG needs are in simulation. Simulation has no need for
security guarantees. It _does_ need to be as fast as possible. A great many
secure generators are pokey.

~~~
RL_Quine
There’s always the danger people misuse rng output unfortunately, see
math.random() being used for cryptography almost incessantly.

~~~
GTP
Yes, but I don't see this as a good reason to drop all fast but not
cryptogaphycally secure prng from which simulations can gain an advantage.
Every exising tool can be misused, not providing the tool at all is an extreme
measure that in this specific case I think wouldn't even solve the problem: if
a programmer does the mistake to use a non crypto secure prng for cryptography
is very likely that he will also do other mistakes like reusing nonces or
leaking memory or other less trivial errors. What I think is important is
whenever somebody proposes a new prng to clearly state if it's
cryptograpycally secure or not.

------
edflsafoiewq
re the ILP argument ("app continues running with no delay because it already
has the random number it needs"): can't any RNG take advantage of that by
caching the next value to return? ie. turning this

    
    
        state_t s;
        uint rng_next() {
            X;
            return Y;
        }
    

into

    
    
        state_t s;
        uint next;
        uint rng_next() {
            uint n = next;
            X;
            next = Y;
            return n;
        }

~~~
MarkOverton
I mention this idea in the paper (which is linked at the top of the website).
The problem is that this technique increases register pressure when the
generator is inlined, which in turn increases spills, which reduces
performance.

------
nn3
Nice. RNGs are getting better and better.

Only minor nit: would be nice if they had ready to compile libraries. The ROTL
macro is missing, and useful wrappers are missing too (like correct limiting
to a smaller range) Of course it's trivial to implement, and I like the
simplicity of the algorithm to copy it into programs.

~~~
pps43
If you need the speed of this PRNG, you probably don't want the overhead of
calling a library function. Instead you would implement it in place with
AVX-512 or equivalent for the platform you're running on.

~~~
nn3
Function calls are very cheap on modern CPUs. The same ILP argument the author
makes applies in most cases.

And both vectorization and inlining works fine with a function with modern
tool chains that do LTO.

------
ohazi
For a second I thought this was a "random number generator in your USB port!"
ala TOMU/FOMU...

------
rurban
the wyhash rnd uses the similar trick. LCG are only for old 32 compact, on
64bit larger state and hash-like mixers are needed/much better. Just look at
the graphics.

------
Jahak
Interesting

