Hacker News new | past | comments | ask | show | jobs | submit login

PRNGs are one of those things that need to have global state if you want to get decent statistics out of them. You cannot have sets of threads fetching the same value from the PRNG - it will totally destroy how random the PRNG is. Thread-local PRNGs with separate seeds is the way to go if you need parallelism, and it was the author's solution, but I can easily see not knowing that and running into this.



You can't seed each thread with the same value -- that's all you're saying, right?


Result distribution (like averaging to zero) only happens at scale. Even if you use a different seed every time you might not see the proper distribution when looking across different generator instances.


If you're running into contention with multiple threads on a shared PRNG, you're probably at sufficient scale.


Sure. Just saying that, for example, if you spawn threads and use the thread ID as the seed but pull a single value per thread, that's not sufficient.


Correct, you must use different, uncorrelated seeds for each thread if you want this to work. You can seed them all from one CSPRNG, for example, or from the Linux entropy service.


The generator state should either be part of the API or be thread local, but there are probably complex implications regarding the long legacy of libc.


> You cannot have sets of threads fetching the same value from the PRNG - it will totally destroy how random the PRNG is.

This is the rule you need to follow, but I don't understand what you're saying about global state.

The broken version has global state.

Non-broken versions can have global state or not have global state, depending on implementation choice. And that choice is separate from whether they have good performance.


How do you guarantee that you won't get the same value from a PRNG without global state? A PRNG is a state-updating machine that spits out a value based on that state.


Thread local state

You don't need it global

Seed it once per thread from /dev/random and you are done


As I suggested in the first comment here, yes. I believe the GP believed that you could have one PRNG that did not have global state for its consumers.


No, I'm not saying that.

I'm saying that if you have one PRNG, then you have global state no matter how it's designed. This is true whether you write it so that you get decent statistics, or you write it so you get tons of duplicate values.

And many of the fixes remove global state. Per-thread PRNGs are one option, but so are PRNGs that are used by specific batches of objects.

So, the straightforward broken option has global state, and the non-broken options might or might not have global state.

Which means I have no idea what you're talking about when you use the phrase "need to have global state". What is the PRNG API where that need exists, and what does the version without global state look like?


Every PRNG algorithm has state that is shared among all of its consumers. That is basically true by definition. Put another way, a PRNG is an iterative algorithm that compresses a very long random-looking (the mathematical term is "equidistributed") sequence. That iterative algorithm needs to maintain its position in the sequence, and there's no alternative. There is no algorithm for a PRNG that does not have state that is shared among all of its consumers.

The only way to make a PRNG that does not share state between its consumers is to shard per-consumer. The PRNG in libc doesn't do that, it just uses a lock internal to the PRNG to keep things safe.

You could attempt to make a concurrent PRNG that has atomic-sized state and use CAS loops when generating numbers to keep the consumers in sync, but that's probably worse than using a lock.


> Every PRNG algorithm has state that is shared among all of its consumers. That is basically true by definition.

Right, but again why did you say it needs to have global state "if you want to get decent statistics"?

What is the alternative to having global state that doesn't have decent statistics?

That "if" is the part I'm confused about.

The part of your comment directly after that sounds like you're describing the situation where clients are data-racing the PRNG and generating repeat numbers and other problems. But that's a problem specifically because of global state being used incorrectly, not something that happens as an alternative to global state.


The alternative is not being random, and "decent statistics" means "being pseudorandom" - the statistics are inseparable from the randomness. You can make a PRNG that has data races, but from a randomness perspective, that is equivalent to just using /dev/null as your random number generator.


Okay, I think I understand. You're being extremely harsh in how you consider brokenness, so if it's broken it might as well not be generating anything at all, no state needed.

I was considering "fairly broken PRNG" as its own significant group, and those need state too if you treat them as separate from "this literally gives the same number every time". But if those all go in the same bucket, then the comparison you made makes sense.


Algorithms for producing the same stream of random numbers in serial or parallel have been well known for many years. Perhaps the easiest in concept is "encrypt the sequence 1, 2, 3, 4, 5, ...".


The author of the blog wanted each thread to have different random numbers, not the same stream of numbers.


Is there a reason why the RNG function would not be able to simply take the random seed as one of its arguments?


Thanks, you answered the question I came here to ask, i.e. why did it need a lock for a random number, what are the tradeoffs of using the non locking version. Makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: