Hacker News new | past | comments | ask | show | jobs | submit login
Why is quality of pseudorandom number generators important? (superuser.com)
87 points by DrinkWater on Feb 6, 2014 | hide | past | favorite | 76 comments

When I see the word "random" I parse it as "unpredictable", since this is usually what people mean (even if they don't realise it) and can clear up many arguments. The concept of 'random' is interesting in some cases, but the vast majority of the time it's far stronger than what is required.

In this case, the question is:

> If a number generator is uniformly distributed, why might that not be 'random enough'?

If we rephrase this as:

> If a number generator is uniformly distributed, why might that be 'too predictable'?

This makes the flaw obvious: there are many ways to choose numbers uniformly which are completely predictable. For example, choose the smallest value first, then choose a value which is furthest from any previously-chosen value (in, say, lexicographic gray-code order).

The point here is that 'uniformity' is not the property we care about; it is a consequence of the property we care about (unpredictability). If a distribution were non-uniform, we would be better able to predict it (by biasing out predictions to match the distribution), hence the least-predictable distribution is necessarily a uniform one.

Other times this comes up:

> Are quantum measurements 'truly random'?

That's untestable, but we can test whether they're predictable or not.

> This encryption algorithm requires a source of random bits.

It would work just as well with a source of bits that are merely unpredictable.

> This strategy can only be defeated by a random opponent.

It can also be defeated by an opponent which you can't predict.

And so on.

That is a good way to think of it. Indeed, that is how cryptographic randomness is typically defined.

Simply put: given the first k bits of a random stream, can you predict the k+1th bit (more than 50% of the time)?

A generator that passes this test will pass any statistical randomness test, but the converse is not necessarily true. For example Mersenne Twister is a good generator from a statistical perspective, but it's actually quite easy to recover its internal state by observing a small amount of output. (Around 20k bits, if I recall correctly.)

> Simply put: given the first k bits of a random stream, can you predict the k+1th bit (more than 50% of the time)?

This definition isn't sufficient. Suppose you have a random stream, and one predictor which asserts the next bit is "1" and another predictor which says that next bit is "0".

As k increases, there's a nearly 100% chance that one of the two predictors will be correct more than 50% of the time.

Even if you pick a single predictor, say, that all-1s predictor, there's an almost 50% chance that for a given random stream and k that it will have better than 50% predictive ability.

Just because Guildenstern's coin is heads 92 times in a row doesn't mean that it's not random. Only that it's very unlikely to be random.

> very unlikely to be random.

Speaking of word replacements, "random" does not mean "uniformly distributed". An unfair coin toss is still random.

"[T]he vast majority of the time it's far stronger than what is required."

Oftentimes it's also just not what is actually required. Often you want uniform spacing with some jitter, not points drawn from a uniform distribution.

    put on hold as primarily opinion-based by Xavierjazz,
    Olli, Excellll, HackToHell, Tog 3 hours ago

    Many good questions generate some degree of opinion
    based on expert experience, but answers to this
    question will tend to be almost entirely based on
    opinions, rather than facts, references, or specific

    If this question can be reworded to fit the rules in
    the help center, please edit the question.
Stack exchange is broken.

The question mixes up uniform distribution and randomness, but the answers are factual. There's little to no opinion involved here.

Give the system time to work. It's already been taken off hold.

No! A question got closed! The whole thing must be broken!

Hyperbole sometimes helps to get the point across...

The problem with Stack exchange is that it conflates heavy site usage with domain experience and maturity.

Avidity for imaginary internet points and maturity are at best orthogonal.

As a general rule across the web, I'd agree with you. Although, I feel like on StackExcahge (or at least SO) those with the highest scores typically gave good answers in their domain as judged by their community and would not commonly give any answers outside their domain that could be voted up simply based on user rep alone.

I don't really have any hard data on this, it's opinion based on seeing extremely good answers from high rep people regularly, and not being able to recall seeing poor answers from high rep users voted highly when they aren't actually useful.

Excellent point, and well phrased.

Still, I find the whole stack exchange system to be incredibly useful, including the original post.

(Edit: And it looks like you do, too, since you point out the information is factual.)

I've been running a virtual dice roller site for pen & paper roleplayers for about ten years. I get these kinds of emails all the time. "Your RNG is faulty, I just rolled three 20s in a row!" and stuff like that.

It's probably not fixable psychologically, we're just seeing patterns everywhere. This also happens with physical dice by the way. If a player rolled two or more abysmal results in a row it's common to say "hey, you should really swap out these dice".

I've toyed with the idea of not using a fair RNG for games for this reason. People expect die rolls to converge to the average distribution much much faster than they actually do. So perhaps you can make a RNG program that does exactly that and does what the user expects, even if mathematically ridiculous.

For example, if 20 is rolled then the probability of 20 being rolled again is halved (so it has a 1/40 chance) while the probability of its inverse is doubled (so probability of 1 is 1/10). This dramatically reduces the chance of the three 20s in a row scenario and would very quickly regress to the mean, just like people expect. Added flair for affecting the probabilties of all possible values subtly (so that after rolling a 20, a 2 becomes more likely and a 19 less likely).

I'm sure I can't be the only person to think of this.

"Civilization Revolution" does this.

Say the player loses a 75% roll, then attempts another identical 75% roll. The game will actually give you better odds for the second roll. I think it's silly -- the game basically lies to you about your odds on the second attempt.

Sid Meier talked about it at length in this video: http://www.youtube.com/watch?v=bY7aRJE-oOY. It starts around 18:25.

It may be deceitful, but it makes for a better game. Seeing your five hit attack miss five times at 5% miss chance per attack will induce true frustration. Battle for Wesnoth doesn't care. It will be honest, and you will scream in anger.

  > Seeing your five hit attack miss five times at 5% miss
  > chance per attack will induce true frustration.
I was curious, so I worked out the chances of this happening. For a given five hit attack the chance is 0.00003125%, or 1 in 3.2 million. So yes, I imagine it is very frustrating, but it happens incredibly rarely.

you can link to specific times on youtube http://www.youtube.com/watch?v=bY7aRJE-oOY&t=18m25s

iTunes did that at one time because people complained about the shuffle algorithm for similar reasons.

Shuffling is different from selecting at random. I had an MP3 player that would use random selection instead of shuffling all tracks and it would annoy the hell out of me to hear the same songs twice in a row. I don't know if this is what iTunes used to do but if so then those people were right to complain.

> I don't know if this is what iTunes used to do

It is. iTunes would play a song, and randomly select the next one (it would do random play, rather than shuffle).

Nowadays it generates a shuffled playlist and goes along, and people whine because it doesn't reshuffle automatically (you have to disable and enable shuffle to reshuffle).

An other option is to use random sampling, and to re-sample after you're done with the current set (possibly excluding any song of the previous sample)

iTunes used a shuffle that always resulted in the same order as long as there was the same number of songs on the device. (Or perhaps just shuffling with a tiny PRNG that used the number of songs as seed.)

You're right that would probably lead to better user satisfaction, but I'm not going to implement something like this. I refer you to the answer I've given herokusaki below. Basically, the site is not about straight-up die rolling. It supports a myriad of different rule systems. On top of that, it doesn't keep track of users, so it's not possible to tinker with results based on user state. I have no real intention of changing that either ;)

I've thought about the same for Settlers of Catan, which I think is too random if you use dice, disrupting people's ability to do more than basic strategy.

I know there exist "dice" packs of cards, which contain the correct distribution of rolls. If you shuffle a few of those together, you'll probably get a nicer game.

That sounds like a nice idea, actually. I hate those rounds where 11 or 2 occur more often than 6 or 8. I know it can happen, but it can make the game rather annoying at times.

I actually think that makes the games more fun. Especially so when you have a larger game so that there are people on those tiles.

For me, it makes the game feel less "solvable" by basic analysis. Sure, statistically certain observations will hold. For the game you are in, though, you have to be ready to adjust your strategy based on what has happened. (Clearly not in anticipation of future rolls, but more based on the resources you have managed to get, not the ones you wanted.)

Admittedly, we already tend to take the number planning a bit out of the game by initially turning them on the other side and you can only reveal the numbers to hexes you have built at (for the inital round of placing settlements/streets they are revealed after everyone has placed both). Makes for a nice variant where exploration holds surprises, or you can expand to hexes that are already known.

I've played once with this variation. I liked it, not sure why we didn't let it stick.

Of course, I haven't played at all in a long while.

One of the nice features of the Settler of Catan assistant for the iPhone is that it will roll the dice for you, and it keeps a chart of the results. Thus one has data to back up the complaint that "Your 3 is out producing my 6!"

FWIW Catan gets even worse if you play with 1d12. It was worth a shot - there's got to be _some_ easy way to make that game better.

I've had to do this when training research monkeys in a former life, particularly when there is only one or two parameters being varied. They would learn the 'patterns' far too quickly. After seven heads in a row, they'd stop paying attention and always choose heads for some time afterwards.

The solution we settled on was block randomization. But that's probably not what you'd want for a human game as they could potentially figure out the end of each block (and what value remains). But it worked wonderfully for us.

I don't like the notion of reinforcing bad intuitions.

People are born with bad intuitions about how randomness works -- it's not learned. For whatever reason, our brains are wired to make uniformity feel random and randomness feel ordered.

It's why we see stars in the sky, which are more or less distributed randomly in the star field, yet they don't appear random: they appear to form constellations.

It's basically impossible to correct this bad intuition. People educated in the statistics and mathematics of random numbers have to consciously learn what real randomness looks like and override their intuition.

But try as you might, the night sky will still form constellations. And dice rolls will still, frustratingly and confusingly, fall into streaks more often than you'd expect.

There are a lot of systems in the brain. It's theoretically possible that one of them will be untrainably spitting out the same flawed predictions, but we can certainly learn how to respond appropriately to those sensations.

Can't this be partially solved by just throwing multiple dices? E.g two 6-sided dices would then have a higher chance of 7 than 12.

IMO the problem can be best explained with a classroom experiment, almost like a magic trick. It is a built in human intuition which is at odds with the reality of randomness.

Split your class into 2. Give half A the players dice and a piece of paper. Give other half B just a piece of paper. Half B plays with imaginary dice, Half A with real dice. Everyone records 20 games of a piece of paper. You hand in the papers and the demonstrator (or maybe a new one that hasn't been in the class, for extra theatrics) looks at the papers. He will accurately place them into separate piles.

The trick is that fake randomness and real randomness are different enough that they are distinguishable with a high level of accuracy. Real dice will roll the same number in a row far more often than fake dice. The more games you play, the more accurately they can be distinguished.

You can look at the formulas to figure out the odds of rolling the same number 4 times in 100 rolls, but it doesn't drive home the message that your intuition is broken until you see intuition being beaten consistently and reliably by a better tool. It's fun too.

Last year I wrote a short Python script to easily visualize that kind of effect: https://github.com/ncanceill/alea_jacta_lib

You are referring to the common mistake of confusing randomness and uniformity.

Your players are thinking: "Dice rolls are equiprobable, so I should roll evenly-distributed results." That only starts being true after a great number of rolls.

Random distributions will be uneven and present clumps, strands and voids. For the same reason, during WWII, British intelligence thought that the Germans were bombing specific targets in London, while they were just dropping bombs at random.

Source recently posted on HN: http://www.wired.com/wiredscience/2012/12/what-does-randomne...

If you knew how many times I launched into that randomness vs uniformity lecture myself ;)

My point was that it's not possible to convince certain people with logic. They see random numbers and something happens to their brains, maybe it's religion. They'll always see oracles or at the very least shady manipulators behind random data.

Re: Physical dice. I've gamed a lot over the years, and I don't believe I played with anyone anyone for whom swapping out dice had anything to do with actually effecting randomness/probability, but rather was just a fun way to wish for better luck. And, even then, that move "toward" luck or "away" from bad luck was always tongue-in-cheek.

Yet...we still swapped the dice. :)

Some people solve this by throwing hardware (and dice!) at it: http://gamesbyemail.com/News/DiceOMatic :-)


I wonder if visualizing the dice roll could help. If you don't have ethical objections to it you could also A/B test biased dice (say, biased away from 1 and 20) for complaints (give a different contact email address to players who get biased dice) or instead show the players the distribution of the last 100 rolls.

That's not really possible for a lot of reasons.

First, the roller is absolutely anonymous. There are basically three major functions the site performs: just rolling some dice according to user input, an API taking HTTP requests answering in JSON, and a chat room feature that has a dice roller bot. It's not possible to identify individual players except maybe in the chat rooms (though they're not really authenticated there either).

I do sometimes get requests to look into chat logs of a particular room over allegations that one player somehow tampered with the results. This happens about once a month. Invariably, the accused user isn't even that lucky, they almost always have bog standard average results. It's just that the other players think this person is unreasonably lucky and they keep this belief up even after I tell them about the data (which they could have calculated themselves by looking at their own chat logs).

Second, the core value proposition of the site is its ability to render roleplaying dice codes directly. That means you don't select your dice from a drop down - you enter the code. A simple example would be "3d6+5" for damage or something, but there are lots of different codes for different rules - some of them quite convoluted. It's not feasible to tinker with them in any meaningful way because players use the site in so many different ways.

Clearly, there is something supernatural afoot: http://www.penny-arcade.com/comic/2014/02/05

There's more rigging involved with people who can't throw dice properly than there is with the seed of a RNG on a server miles away with you. :P

Throwing dices in real life is even less random than a "truly" random function.

It should be fun to calculate the chance a RNG is biased based on a die rolling 20 3 times in a roll

Similar to that question about biased coins and throwing 10 heads in sequence it was discussed some time ago here.

It happens at our recent dnd game our mage roiled all 1's on a magic missile 6d4 all 1's

PRNGs (pseudo-random-number-generators) take in a small amount of randonmness (a seed) and produce a long stream of numbers from that.

Anyone using the same PRNG can look at the output of yours and try to put their PRNG in the same state. If they succeed, the output of theirs will match yours - now and in the future - unless you re-seed.

Two problems can occur here:

1) you seed with something they can predict. e.g. seconds since 1970 (or microseconds since 1970). If they have a reasonable sample of numbers from your system, they can try lots of different seeds and see if they can find the one which gives the same output as you.

2) PRNGs have "internal state", which is a bunch of numbers they mix together. Some PRNGs have the property that if you can you can observe enough numbers in a row from the PRNG, you can turn them back into the internal state locally and then you can do the same thing as if you knew the seed (predict future numbers).

> you seed with something they can predict

One of my favorite examples of this ever: Once in Las Vegas a keno machine mistakenly used a fixed seed. Meaning once someone figured this out, he could show up when the game started for the day and predict what the machine would do with 100% accuracy.


Take a look here: http://www.cigital.com/papers/download/developer_gambling.ph...

They found a flaw in a REAL poker site because they were using a pseudo random number generator (and a stupid algorithm) and were able to know the order of the cards being used in the game!

>The RST exploit itself requires five cards from the deck to be known. Based on the five known cards, our program searches through the few hundred thousand possible shuffles and deduces which one is a perfect match. In the case of Texas Hold'em poker, this means our program takes as input the two cards that the cheating player is dealt, plus the first three community cards that are dealt face up (the flop). These five cards are known after the first of four rounds of betting and are enough for us to determine (in real time, during play) the exact shuffle. Figure 5 shows the GUI we slapped on our exploit. The "Site Parameters" box in the upper left is used to synchronize the clocks. The "Game Parameters" box in the upper right is used to enter the five cards and initiate the search. Figure 5 is a screen shot taken after all cards have been determined by our program. We know who holds what cards, what the rest of the flop looks, and who is going to win in advance.

> Just say you code in any language at all to roll some dice (just using dice as an example), after 600,000 rolls, I bet each number would have been rolled around 100,000 times, which to me, seems exactly what you expect.

Bad example. Because the author specified "some dice", we can assume more than one die, in which case some numbers have a greater chance to appear than others, in a series of fair "random" throws.

It's a bad sign that the author of a piece about randomness isn't aware of the systematic behavior of his chosen example, a behavior biased in favor of certain outcomes.


The linked poker vulnerability article is really worth a read.

The idea of something being truly random bothers me because it goes against cause and effect. An effect happened without a cause. It violates laws of conservation, entropy, determinism, and a lot of other things.

> The idea of something being truly random bothers me because it goes against cause and effect.

Not at all. The fact that one cannot predict the next number in a random sequence is irrelevant to the fact that, in the long term, that number has a predictable relationship with other numbers in the pool of possibilities.

For a random generator of the digits 0-9, can I predict the next number in the sequence? No. Can I say what the proportion of, say, 7s will be, within a large list of outcomes? Yes, and the larger the list, the more reliable the prediction.

If your position had merit, quantum theory, probabilistic on a small scale, would be seen as violating cause/effect relationships, a rather important part of physical theory. But, because of the mathematics of quantum probability, individually unpredictable atoms become very predictable macroscopic objects.

If I toss a coin into the air and then it lands on tails, there is a visible cause and effect. The velocity of my arm, the position of the coin in my hand, the moment of release, the air currents in the room all affect the outcome. If I could measure these to a high degree of accuracy and plug the measurements into a detailed enough simulation, I could predict the outcome. However, I can't. So I am unable to predict the outcome (hence the outcome is random).

Eric Lippert's answer is great. It isn't the distribution of answers, it's the predictability. In most cases it's ok if the distribution looks random, but in some it turns out not to be ok.

Ivy Bridge and newer CPU's of Intel have this new "random" instruction (I still have a Sandy Bridge CPU tho). Is that one truly random enough for crypto purposes? Thanks!

No, and it does not matter who you trust. Analog circuitry simply has too much inbound and outbound information leakage to be trustworthy. You must round it out with a mixing algorithm.

The issue is that if you run the same program twice without selecting a suitably random seed you will make the same 60 million rolls.

If you can somehow control, predict or influence the seed you can predict or influence rolls.

For example if the game uses JUST the system time as a seed, I can shift the system time back to a specific position and run the application again to get the same random number generation.

If you know the algorithm, for example you're playing an open source card game, and you can determine the time (offset) on the server or the other players computer you could cheat by calculating their random variables.

Now it really isn't an issue in games, but it's incredibly important in security. Randomness makes up a huge component of encryption.

That's why you have applications that require you to move the mouse a bit, hopefully randomly and they take in a whole bunch of other entropy fed in by the system.

In Unix-like OS's you have /dev/random and /dev/urandom. /dev/random requires a certain amount of entropy and environmental noise, and it blocks on reads until it's satisfied with the output.

/dev/urandom does not block, but it gives pseudo random output. For the purposes of security, /dev/urandom should NEVER be used. However neither are truly 'random'.

No. Cryptographic applications on Linux should use urandom, not random. Your belief that urandom should "NEVER" be used is due to an urban legend perpetuated by a badly written Linux man page.

Before you listen to anyone "explaining" to you that /dev/random is good for cryptographic secrets and /dev/urandom is good for everything else, consider that Adam Langley owns Golang's crypto code and Golang's crypto/random interface just pulls from /dev/urandom; Daniel Bernstein, Tanja Lange, and Peter Schwabe --- all cryptographers, with a pretty serious Unix pedigree --- wrote Nacl, an extraordinarily well-regarded crypto library, and Nacl's CSPRNG... wait for it... just pulls from /dev/urandom.

But for any other purpose than generating private keys one should NEVER use /dev/random. If you're not sure use /dev/urandom.

I've had to painstakingly explain to certain people why, as an example, erasing a HDD from /dev/urandom is allright. And why their program that simulates some random input should use /dev/urandom.

But no, they babble about true randomness and then complain why they get like paltry few hundred bytes per second.

Even if you have a server running casino games just use /dev/urandom/. It requires a total compromise of the server to get the internal state out of that, and in that case it's easier just to change urandom into /dev/zero.

If you need a lot of entropy, it's good to use a hardware RNG to keep the load down (and it improves quality).

You should be careful when using word like quality in this context. Some might think that by quality you mean any kind of statistical property. E.g dice rolls generated by either of them are somehow statistically different, which they are not.

What does this mean in practice: If I give you 10 10MB files. 5 of them created with urandom and 5 of them created using hardware RNG there is practically no chance that you could differentiate which came from which, barring knowledge of the urandom entropy pool.

But it is true that HW RNG could be useful just to keep CPU load down. By now we know that those are backdoored by NSA, so you should still use /dev/random as a source for private keys. So HW RNG is useful just to keep the load down. Not to avoid any attacks.

This is so important that I'll have to repeat it again: The only difference between urandom and random is that urandom is theoretically suspectible to an attack which could allow prediction of the output values if the attacked knows the internal entropy pool state. The statistical properties of both are the same.

Agreed that some (e.g. RDRAND) are potentially untrustworthy, but others aren't - for example, linux has daemons available that can source entropy from audio or video noise.

A growing number of chipsets have them built in and accessible.

Many new Intel chips do (though off the top of my head I can't tell you how easy to access it is).

The SoC used in rPis units has one that can be read from at over half an Mbit per second via /dev/hwrng (once the relevant module is loaded), so you can either use it directly or (better for portability) keep reading from /dev/urandom and use rngtools to feed the entropy into the kernel's pool as needed.

In both the above cases there is a trust issue as exactly how the RNGs work is not publically documented, but unless you are extremely paranoid (by neccesity or "issues"!) I woudl consider them decent sources of entropy.

That's exactly the wrong advice.

You should always use /dev/urandom on Linux for cryptographic use, unless you know exactly why you need /dev/random. Hint: you don't.

(On FreeBSD it doesn't matter because it's all the same)

I've been meaning to write up a coherent summary of that issue with all kinds of sources forever now, and I should really get around to just doing it, because this comes up every few weeks on HN.

I have no idea why you would use /dev/urandom over /dev/random, or vice versa. If you can write something, I would be very grateful.

The short version is as follows: /dev/random is basically the entropy pool. There isn't much data in there and whatever lands in it must be carefully chosen to prevent entropy going down (Debian accidentally broke that part a few years ago). Since there isn't much data you probably shouldn't take too much of it when you can help it because if it runs out, it blocks until there is enough entropy back in the pool.

/dev/urandom on the other hand is a CSPRNG that is regularly re-seeded from /dev/random. The purpose is to stretch the avilable entropy to more (pseudo-)random numbers. Since it's a CSPRNG the sequence is deterministic but impractical to predict, re-seeding helps in preventing others from observing too much output to reverse-engineer the seed.

So advice is generally to only use /dev/random when you're implementing something like /sev/urandom and for SSL keys and the like you just use /dev/urandom.

No, no, no, no.

/dev/random and /dev/urandom are two interfaces to essentially the same CSPRNG.

On Linux, unlike FreeBSD (where there is no difference between the two), there are two differences between random and urandom:

(1) random will block when a kernel entropy estimator indicates that too many bytes have been drawn from the CSPRNG and not enough entropy bytes have been added to it

(2) urandom and random pull from separate output pools (both of which receive the same entropy input, and both of which are managed the same way).

It is not true that /dev/urandom is a PRNG and /dev/random is not; it is not true that /dev/random is a CSPRNG and /dev/urandom is just a PRNG. They are the same thing, with a extremely silly policy difference separating them.

Thanks to both of you for responding. I will need to read more to understand in detail, but you have at least given me some pointers.

I stand corrected, then.

The Debian accident was related to the openssl PRNG, not the kernel's.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact