Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Shishua – Fast pseudo-random generator (espadrine.github.io)
93 points by espadrine 5 months ago | hide | past | favorite | 35 comments



The design approach of just iterating until it passes a suite of statistical tests is ... not the best.

The issue is that the common suites of statistical test don't include every credible statistical test at their level of complexity.

Essentially as you retry against the tests you expend their utility, producing a result that is guaranteed to pass them as a byproduct of your process. ... but may in fact be extraordinarily weak to yet another reasonable statistical test (and/or broken in some practical application) that no one has yet thought to include in the test suite.


Very true. Sadly, this is the main approach the non-cryptographic PRNG community has at the moment.

On the plus side, those tests are fairly extensive and brutal nowadays. They detect anomalies in cryptographic algorithms that were still used a decade ago, such as arc4random().

But one thing I hint at at the end is that I hope, and anticipate, to see more accurate quality measurement tools in the future.


A partial workaround is to reserve one or two of the tests until your PRNG passes the others.

If it passes the remainder on the first try, you might be on to something.

If it fails, you've got a problem: tweaking until it passes isn't going to help you.


One alternative would be to attempt to select an analyzable structure which you could show through analysis should have good randomness properties. Potentially add a limited number of tweaks or parameters with the aid of just one or two general statistical tests (and perhaps a collection of pants-on-head stupid tests just to avoid wasting time on stuff that is utterly broken).

After that, you try with the complete suite and unless the result reaches a predetermined passing criteria you declare the design as flawed and go back to step 1 rather than continuing to tweak (as the result of continued tweaking will inevitably be a pass even if the design is poor).

Unfortunately, high performance and even 'good random-like behavior' may be at odds with having a construct which is transparent enough to be able to justify its performance from analysis rather than running it. :(


I like your design, simple and elegant.

One thing I did when designing random functions is run them against DieHarder and Practrand, but then also against SMHasher, you just have to use some construction to fold the key into the state. I think that's a useful somewhat orthogonal way to test. have you considered this?


On the other hand, some tests could be aligned enough with the needs of a specific application and prove that a PRNG is good, not only probably good.

For example, if a game uses a PRNG to "stretch" a random seed for procedural generation purposes conditional probabilities are important (to avoid spurious correlations between features) but period is completely irrelevant (even a relatively short period would be many orders of magnitude larger than output size).


Yes, this seems like a relatively fundamental flaw. Even if the confidence of these statistical tests is something like p=0.000005, throwing enough experiments onto them almost ensures you get a false positive.


Why can't we check for overfitting using the usual ML techniques?


We know the tests are overfitting. That's not really a problem we can worry about too much; the tests can't be perfectly comprehensive and also finite. The problem is that we're using to use the model to tell us something about adversarial input.


I've always wondered, why not just run 10 different random number generators at once and ping pong between them or blend them or something?


That gets you the weaknesses of all of them modulo your selection criterion ... some of the time.


Depends on which method is utilized. “Ping ponging” would have issues as you’d run into all the problems of all of them.

Depending on how the “blending” is done and whether the algorithms are truly independent of each other, combining them is definitively more secure than any of them individually. The Linux kernel let’s you do this through writing to /dev/random. The written bytes get added (in part) to the entropy pool.


"Blending" multiple of them per iteration slows you down!


You can do that, but you will lose the "speed" part, obviously.


> Indeed, AVX2 has a bit of a quirk where regular registers (%rax and the like) cannot directly be transfered to the SIMD ones with a MOV; it must go through RAM (typically the stack), which costs both latency and two CPU instructions (MOV to the stack, VMOV from the stack).

I don't think this is quite correct. You can move efficiently from r64 to XMM/YMM without going through memory with MOVQ/_mm_cvtsi64_si128. I'm not able to look into it more closely right now, but these links should give insight:

https://www.felixcloutier.com/x86/movd:movq

https://software.intel.com/sites/landingpage/IntrinsicsGuide...

My vague recollection is that you might be right that this is clumsy with AVX2. Maybe it's a case where you have to take advantage of the fact that XMM1 and YMM1 are same register, and cast the YMM to XMM? Or just drop to inline assembly.

But thanks for the writeup, and I hope to be able to look at Shishua more closely at some point in the future.


Yes, most instructions that modify only the bottom element(s) didn't get a ymm (256-bit) version since it would serve no purpose as it would produce the same result as the xmm one, and the corresponding intrinsics mostly follow the same pattern. So there is no int64 -> ymm intrinsic.

An intrinsic cast works fine though:

https://godbolt.org/z/M9XWCb

Intel even says about the cast:

> Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

That seems a bit beyond their mandate since what the compilers generate is mostly up to them, and in fact it doesn't seem true: at -O0, both gcc and clang generate a few extra instructions for the cast. With optimization on, it's all good though.


If you need something fast and with stronger security guarantees, Google Randen remains a solid choice. https://github.com/google/randen


The article mentions Randen is slower than ChaCha8.


Where? CTRL-F Randen doesn't show anything, and the Randen repo claims it's faster than ChaCha8.


It is not directly in the article, but in a link to a tweet by djb, the creator of ChaCha8. He believes that the cpb listed in the Randen comparison is off:

https://twitter.com/hashbreaker/status/1023965175219728386

He mentions that perhaps the implementation of ChaCha8 for the benchmark is done by hand and unoptimized. And it is true from what I saw that a lot of benchmarks with ChaCha8 are implemented with none of the tweaks that make it fast.

In this instance, it looks like the Randen author didn’t reimplement it from scratch, but they used an SSE implementation, not an AVX2 one, which would have been faster: https://github.com/google/randen/blob/1365a91bafc04ba491ce79...


This looked ugly and so much out of place...

> Or, as Pearson puts it:

> From this it will be more than ever evident how little chance had to do with the results of the Monte Carlo roulette in July 1892.

> (Not sure why his academic paper suddenly becomes so specific; maybe he had a gambling problem on top of being a well-known racist.)


i found it quite funny. And I don't really find calling a well know eugenics advocate racist to be even remotely provoking. Maybe out of style compared to the rest of the text, but apart from that I didn't find it conspicuous in the least.


> One of the not-ideal aspects of the design is that SHISHUA is not reversible.

Why is irreversibility a bad thing? Is it just the loss of useful state which reduces the internal entropy?

Is irreversibility ever useful? Wouldn't some amount of discarded bits make it harder for an adversary to infer the state?


Intuitively, yes.

If something is irreversible then it's not 1 to 1 and you don't use the whole state space, so you'll likely have short cycles.

Second part, again yes. But you can add that useful irreversible transform in the finalization of the output, and not in the state transform itself.

I'm self taught so my understanding and terminology may not mesh with the people who have been taught this.


CSRNGs are mentioned a few times in TFA, but cryptographic security doesn't appear to be among the design criteria for SHISHUA. There's no mention of linear or differential crpytanalisis, which is table stakes for a CSPRNG.


I added a warning in the GitHub’s Readme. SHISHUA is not to be used for cryptographic purposes.

You mentioned the lack of cryptanalysis; there is also the lack of rounds (which prevents researchers from breaking partial versions to ease their study, and allows setting a security margin).


What is a real world use case where CSPRNGs are too slow on modern hardware?


Shuffling 100 million points per second (probably much more if it wasn't both, disc IO and CPU->GPU transfer limited), and shuffling in general if you want to maintain high throughput. e.g.: https://bit.ly/2zcUzJq


Monte Carlo simulations can have a significant speed dependency on the PRNG used.

That said, I'm not up to date on the speed of CSPRNGs so maybe they're already there.


This is not a bad thing, and maybe I'm alone in this, but I spent my first minute thinking over whether "Shishua" was some kind of romanized mandarin or mandarin-esque word!

(I speak cantonese, though, so maybe mandarin speakers wouldn't fall in this trap)


It's a "stone brush" 石刷 shíshuā. (Alternatively "It's a number!" 是数啊 shì shù a, but then pinyin orthography requires writing it as "Shishu'a" with an apostrophe in front of the third syllable.)


Will everyone soon be using Shishua then? (I don't know much about PRNGs.)


Good question.

SHISHUA is a strong design, suitable for many use-cases. But the decision of which PRNG is right for you depends on platform, context, comfort, and ease of access.

If you are making a simple video game, it might be fine to just use your standard library; or to write one of the simple ones that you memorized and that don’t have too terrible an output.

If your video game has more severe stakes, if it is the backbone of a virtual economy in a massively multiplayer game where servers feed on a large amount of randomness that needs high quality to ensure players don’t find flaws and abuse the system for their own gain, maybe SHISHUA is right for you.

If you are targeting a platform that does not support 64-bit computation, SHISHUA can’t do it well, and other PRNGs such as sfc32 can be good there.

If you are working on machine learning that feeds on heaps of randomness over the course of months of computation, and that need quality to ensure an unbiased, even learning, SHISHUA can be right for you.

There was a neat article recently that contrasted a large number of PRNGs for use on video games, in particular on consoles: https://rhet.dev/wheel/rng-battle-royale-47-prngs-9-consoles...

It is before the publication of SHISHUA, but the insights there are interesting.


I couldnt find the license? Is this MIT


CC0 it is! I added the legal information in the GitHub project.




Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: