Hacker News new | past | comments | ask | show | jobs | submit login
What does randomness look like? (empiricalzeal.com)
474 points by cyrusradfar on Dec 21, 2012 | hide | past | favorite | 117 comments

Back in the day I was in a raiding guild in World of Warcraft. And back then, each boss monster in a raid dungeon had a loot table, and when you killed it, your raid got a few pieces of random loot.

And I don't know how many times I, and the other guy in the guild who understood statistics, had to explain to the others that random doesn't mean uniform. If a certain piece of armour has a 1/X chance to drop from a boss, what people think should happen is that if they kill that boss X times, they should see it drop once.

But the reality was of course that loot was very non-uniform. Some pieces we saw lots of times, and other pieces very rarely, despite them having the same drop chance. And the players who wanted those pieces that happened to be rare for our guild, got very, very angry.

We saw the same things on the official message boards, players were furious after having spent a year killing the same raid boss once a week, and never seeing a certain piece drop for them. But simple math shows that with million of players, tens of thousands of raiding guilds, some of those will see very streaky results.

These days in World of Warcraft, boss monsters drop tokens instead, and when you have X tokens, you can exchange that for a piece of armour, or a weapon, guaranteed. And noone complains about the random loot anymore.

This is actually an important lesson for game designers. Real randomness is very frustrating for players! Games should be designed to not be random. For example, there's an expansion to The Settlers of Catan that lets you use cards instead of dice to ensure a nice smooth distribution of resources... I think it's a lot more fun!

Oh yes. Our intuition is that random is fair and uniform, but it's absolutely not.

I was bit by another random quirk in World of Warcraft. There was a long-running achievement called "What a Strange Long Journey it has Been", which took at least a year in real-time to complete, and you had to actively play during each of the ten or so in-game festivals and holidays, and do some quests and tasks during each. If you missed a festival, you had to wait another year to do it again, so if you were aiming for the achievement, you really wanted to do it in one go.

During their version of Valentine's, each player got a bag of heart-shaped candy, and the task you had to complete was to pull out at least one each of the eight different heart candies. But you could only pull a piece of candy once every hour or so, each time you pulled a piece you had a 1/8 chance to get a certain piece, but the holiday was time-limited, so you had about two weeks to complete it.

Sounds easy and fair, right? 1/8 chance, get all eight pieces, two whole weeks, easy! Except that there was one piece that I just never got. The piece that said "I LOVE YOU!". And as the time went by, I got more and more frantic, logged in more often so as not to miss any opportunity to pull a piece of candy, but no luck.

So, I did a quick bit of math. You can pull one piece every hour for two weeks, but sleeping, not playing, missing days etc, meant that I effectively pulled ~100 pieces. The chance of missing a certain piece is 7/8, so missing a certain piece 100 times in a row is (7/8)^100, which is roughly a little bit more than one in a million.

With ten million players all doing the same thing, there's going to be a number of them that will hit that "one in a million" chance, which means that whoever designed that part of the achievement didn't do their homework, didn't do the math. Because intuition tells you that 1/8 chance is plenty! Hundreds of tries, of course everyone will get all pieces! Except proper math tells you otherwise, and I was one of the "lucky" outliers.

(Later, they apologized and retroactively removed that part of the achievement, so I got my purple nether-drake mount without having to wait a year extra.)

So, I did a quick bit of math. You can pull one piece every hour for two weeks, but sleeping, not playing, missing days etc, meant that I effectively pulled ~100 pieces. The chance of missing a certain piece is 7/8, so missing a certain piece 100 times in a row is (7/8)^100, which is roughly a little bit more than one in a million.

The chance of failure would be 8 times that much because there are 8 pieces you could miss, which comes to about 12.7 per million.

You couldn't possibly miss all 8 pieces. As soon as you make one attempt you'll get one piece.

Henrik's math calculated the chance of missing the "I love you" piece each time. The chance of missing any one other piece each time is the same.

Yes, but they're not independent. Given that you missed piece X, the probability that any one piece it's piece Y it's 1/7, not 1/8, so the chance of missing both X and Y it's lower than the product of their individual probabilities.

'missing each of 8 pieces individually' is not 'missing a certain piece'.

This is an example of a problem called the Coupon Collector problem (https://en.wikipedia.org/wiki/Coupon_collector%27s_problem). Short version is that the expected value of number of pulls you'll need to collect all of the coupons (candy in this case) is nln(n), or about 16 pulls in this case. Of course some people are just going to be unlucky (as you demonstrated).

> Oh yes. Our intuition is that random is fair and uniform, but it's absolutely not.

But a uniform distribution is random, isn't it? This sort of phrasing is rampant in this thread, and I'm confused by it. How can a uniform distribution be considered non-random, or "less" random than another distribution?

Randomness is about having a uniform probability of results, but that does not translate into a uniform distribution of results, nor can it, because of the de-correlation between random results. Specifically, the chance of getting the same result multiple times is non-zero, and actually can be fairly high with a lot of samples, whereas the chance of duplicate results in a uniform distribution is zero.

Read the article for more details and examples.

Randomness is about having a uniform probability of results

Well, you're still assuming the colloquial definition of "random," which implies "uniformly random." Of course, we can have a random process that is not uniformly random.

I can't tell if you're trolling or not. In case you aren't, a uniform distribution isn't "uniform" in the sense of the above quotation. You're right that it's not less random, but someone looking at draws from a uniform distribution would normally describe it as "clumpy"; hence the article.

A distribution is the probability of various values before the random selection takes place. Uniformity is what we do or don't see afterward.

The results of a uniform distribution, applied randomly a large number of times are less uniform than intuition predicts.

If you had bothered to read the article, your questions would have been answered.

As a completer of the "what a long strange trip its been" ( http://www.wowhead.com/achievement=2144/what-a-long-strange-... ), fuck Halloween.

A little randomness is good, it keeps the experience from being monotonous, but there is a balance between uncontrollable and unpredictable annoyance and quirky interesting outlier behavior. A lot of why WoW did so well was because it often walked the fine line, and it did it everywhere. In that game crafting had percent chances of giving skillups, you had to level weapon skills with a percent chance per hit of getting a point, you had critical % chance, hit % chance, dodge %, parry %, block %, you had talents that gave you % procs for special effects. Tremendous amounts of RNG that most people never even noticed but kept them playing for so long in many respects.

I can think of two specific examples where breaking the rules of probability resulted in better player experience: In tf2, you randomly get weapon drops, up to a limit per week. The initial obvious solution was to randomly roll if they get a new weapon every X minutes of gameplay. This lead to players getting long unlucky streaks where they got little to no gameplay. They eventually changed it to when you get a weapon drop, it rolls to see how long you have to play until you get a new weapon. Players can be unlucky still, but it's drastically better.

The other example was in civilization they showed percentage chances of winning a battle. The problem is that when a player would have three battles with say a 50% chance of winning and lost them all, they would get very frustrated. So they changed it to be much more like the average person perceives statistics to be. This means actually weighting the percentages differently from what the player is told so that if a player has a third of a chance to win each of three battles, it's actually very likely they will win one of them.

The other example was in civilization they showed percentage chances of winning a battle. ... This means actually weighting the percentages differently from what the player is told

This is a (persistent) myth. None of the Civilization games actually do this. Civilization 4 is the only game in the series that displays combat prediction as a percentage chance, and the source code for Civ 4's combat engine is publicly released and well understood. (I am a developer on a Civ 4 modding project and familiar with the code.) The earlier Civ games before 4 don't display any kind of combat odds, and Civ 5 has a totally different system that isn't based on winning percentage. So, unless you're talking about some other game like Galactic Civilizations or a mod for Civilization 4, this isn't true.

Your underlying point is certainly true, though: gameplay often is perceived as a better experience with "randomness" smoothed out to produce more uniformly distributed results.

Ah, this is what made Risk so frustrating at times. Many a good friend was lost to that game.

I was under the impression that a uniform distribution is still considered random, but that there are simply many other random distributions. If you have a flat 1/10 chance of receiving a certain item, then sure, you might do the task 30 times without getting it. But if it were programmed such that you will always receive the item at some point within the first 10 completions of the task (i.e. 10% of players receive it after their first completion, 10% after their second completion, etc.) I would still consider that random, and not really more or less random than the 1/10 scenario. I'm not sure what you mean by "real randomness."

I can give you one simple argument. Ten items shuffled have 10! entropy, while ten independent rolls have 10^10 entropy. How is that not far less random?

Because amount of entropy has nothing to do with "amount of randomness."

I'm pretty sure that entropy is pretty much the only sane definition you could give of "amount of randomness".

> entropy is a measure of the uncertainty in a random variable

The distinction is between independent identically distributed random variables, where it's possible to not get the item after 1/p tries, and what you described, which the events are not identically distributed. In the case you describe, the probability you'll get an item when you have a second completion is 1/9, not 1/10.

This is the distinction that the original article highlighted and gives rise to the Poisson distribution, which can have unintuitive results because it is not a uniform distribution. You are describing a uniform distribution. (A Poisson distribution characterizes the number of hits you would expect after running some number of iid random events.)

This is a gross oversimplification. Randomness in games is like a knife. It can be used to make nice things, or to hurt.

Wesnoth is a good example of a game which is frustrating for many people. I think the reason is that it has random outcomes of decisions. Suppose you order a unit to attack another one. There's a base 0.6 chance to hit. The unit has 3 attacks each dealing 12 damage. What is the expected damage output ?

  - 0 damage: ~ 6%
  - 12 damage: ~ 28%
  - 24 damage: ~ 43%
  - 36 damage: ~ 21%
...so, if you're planning for the most common event - 24 damage - you're a fool, because it occurs only 43% of the time, less than half ! A bit like democracy, which is sometimes called tyranny of the minority. The example I used is actually simplified. In Wesnoth, each strike of an attack is followed by a counter-strike from the attacked unit. Not only the attacker's damage varies wildly, you can easily lose the unit without having it deal any damage. And such a random result when you make a good decision.

Counterexamples: Neuroshima Hex (board game), Seasons (board game), Mission in Space: Lost Colony (flash game). In all of these games randomness is used to improve variety. In MIS there's actually little randomness and it feels more like a puzzle. The most visible effect of randomness in the game is randomly chosen alien spawned. All 3 aliens have the same HP and movement speed, so it's not a big deal. Seasons is a card and dice game, but an unusual one. Each side of dice gives some options to choose and there are no good or bad sides, there only good or bad sides at this very moment. The cards you start with are not really random either, because the game uses draft mechanic: you take one card from your hand and pass your hand to your neighbor on the left. Neuroshima Hex is essentially a game with hexagonal cards. Randomness only determines what cards you draw, and once you do you place the hex-cards on the board. Once the battle resolves, it's deterministic.

The bottom line: I noticed that I much more enjoy games where randomness is used not to determine success/failure, but instead options available. That is, as long as the options are balanced relative to each other. Heroes of Might and Magic 3 has randomly selected spells and skills at levelup, but they range from awesome to pathetic. The result is frustration.

All of the games I mentioned (Except Heroes 3) are free. You can actually play both board games online, too.

24 damage happens 64% of the time, not 43%. Given that you can't use your spare attack elsewhere, the 3x damage even includes the pathway of the 2x damage event.

Another classic example of this randomness-pain is earlier versions of the Civ series, where you could lose tanks against pikemen or other ancient unit types. Not likely, but it hurt when it happened.

Blizzard actually did this in Warcraft 3, the chance of getting a Critical Strike or related effect went up every time you failed, and down every time you succeeded.

That's interesting but then we're talking about another game. If there's a pack of cards indicating dice results you can have the equivalent of a 'warm deck' and a 'cold deck'.

Players can then start counting. E.g.: I know '8' came out 7 times and '6' only came out once, then it changes my strategy because there's a benefit to expand with a new settlement (or make a "city") on an '6' instead of an '8'.

I'm not against it: it's interesting. But it's not really the Settlers of Catan anymore: it's a game which happens to share a lot of rules with the Settlers of Catan but which is definitely different.

They mitigate that by discarding without looking several of the cards, so it's not a completely even distribution of outcomes. E.g., it's possible that 4 of the '6's won't come out at all that time through the deck.

Real randomness is very frustrating for players! Games should be designed to not be random.

This reminds me of Magic: the Gathering and the misery of mana screw.

I disagree with the uniform "should be", because there are so many different design objectives. Slot machines aren't "good games" by Euro standards, but people spend a lot of money on them. (One can argue about whether that's enjoyment or addiction. I'll skip that for now.) Random payoffs can foster enjoyment, as seen in Skinner Boxes and on slot machines. Random denial makes people unhappy. What you want in most games is some degree of random windfall but no one getting "killed by the dice".

Have you played Ambition, by chance? It's a trick-taking card game designed to remove card luck.

> This reminds me of Magic: the Gathering and the misery of mana screw.

Incidentally, apparently people complain about the online versions of the game providing a land distribution that's "too even".

Also, while mana screw sucks, between the fact that you can decide whether or not to begin with a certain opening hand and the fact that you can design your deck around such circumstances (mana fixing, proper distribution of card costs), it's far more fun than the alternative.

I think you're committing a Texas Sharpshooter Fallacy. Yes, people are designing their decks around mana screw. In other words - they discard all those decks ideas that suffer more from randomness. Also remember mana screw works both ways. Later in the game getting a streak of lands when you no longer need them can be disastrous. Your opponent can draw juicy cards instead.

Have you actually tried the alternatives ? There's a trivial variant of M:tG where you can play any card face down to act as a land.

You have a point, though - constructing a deck in M:tG is far more fun than playing one.

> Have you actually tried the alternatives ? There's a trivial variant of M:tG where you can play any card face down to act as a land.

That's not really eliminating randomness, since you're just evaluating (at runtime) whether or not a given card is more valuable as a mana source or at face value. There's still some randomness in determining how high the opportunity cost of playing a land is.

A better example of nonrandomness - which I have considered playing with my friends - is enforcing an even mana draw by increasing access to mana each turn in a manner proportional to the number of lands in the deck. That's deterministic.

> You have a point, though - constructing a deck in M:tG is far more fun than playing one.

I've been playing Magic: The Gathering for far longer than I should admit, and I'd disagree.

Incidentally, apparently people complain about the online versions of the game providing a land distribution that's "too even".

Online bridge has a similar problem. Four riffle shuffles (which does not fully randomize hands) is typical in bridge games, and this means that long suits (and, thus, better hands) are more common. Early online bridge games randomized hands fully and therefore delivered crappier (flatter) hands than people were typically used to.

between the fact that you can decide whether or not to begin with a certain opening hand and the fact that you can design your deck around such circumstances (mana fixing, proper distribution of card costs), it's far more fun than the alternative.

When I played (mid-90s) a lot of those options didn't exist. There weren't mulligans unless you had no land. With one, you had to play it. Also, a lot of the newer mana sources didn't exist. If you drew only 2 lands in your first 10 cards, you were screwed, but you couldn't develop an interesting deck with more than 22 land.

Eh, you'd design your deck around low-mana-cost cards, figuring that most of it had to work with 2-3 lands and the higher-casting-cost cards had to be used sparingly.

One of my favorite decks was an all-common blue/red control deck with 28 land, made so that I could bring it into school and not worry about it getting stolen. It was maddening to play against - I'd be like "Okay, I'll flood 3 of your creatures, block your knight with my clay statue, and counterspell your disenchant." Typically games would run 40 turns with nobody doing any damage, and then all of a sudden I'd be like "...and I'll Lava Burst you for 20. Game over." Or the Storm Shaman would come out around turn 10, by that time it's 5/4 and I've got enough mana to counterspell any attempts to remove it, and the game is over in 4 turns. Land can be devastatingly effective with cards made to take advantage of it.

> Land can be devastatingly effective with cards made to take advantage of it.

One of my favorite decks is loosely based on a 5-color preconstructed deck from the Apocalypse era - it's basically almost all mana fixing, with a few cards that take advantage of multiple colors, and then four "Life/Death"s[1]

All that time you spent doing nothing but playing land-generating spells suddenly pays off when you can attack with twenty 1/1 creatures in a single turn, then declare them unblockable[2] and rinse and repeat the next turn[3].

It's even better because land-destruction spells are much rarer (and more costly), so you're impervious to Wrath-of-God[4]-eque spells.



[3] If, by some miracle, they had enough life to survive, that is.

[4] http://gatherer.wizards.com/Pages/Card/Details.aspx?multiver...

> Early online bridge games randomized hands fully and therefore delivered crappier (flatter) hands than people were typically used to.

Are you saying that all online bridge games today do not fully randomize the shuffle, but simulate riffle shuffles instead?

I don't know exactly how they model shuffling, but I know that some players dislike truly random hands, which is why it's considered improper and impolite to do more than 4 shuffles.

This may actually be somewhat of a myth. I haven't verified it myself. I do know that Bridge protocol is 4 riffle shuffles, and typical riffle shuffles don't have enough entropy for 4 of them to randomize the deck (52! ~ 2^225.6, so you'd need 56.4 bits for each, and riffle shuffles have about 30 bits) but I find it hard to envision why this (possibly slight) lack of randomness would manifest itself in higher frequencies of good hands.

The thing about slot machines though, aside from the fact that the pay outs are not uniformly random, is that they're designed to have tons of almost-wins, i.e. "bar bar cherry". You think you're "so close" to having won that you think if you just try a couple times more, you'll really get it. This goes so far as to program the motors for the wheels to have a specific wobble, so they flip over from a winning combo to a losing one at the last second.

On the other hand, sometimes it turns out the players are right when they suspect something is biased against them.

In Everquest when you cast a spell, there was a chance the spell would fizzle.

Occasionally, someone would get a streak of fizzles. Many argued this was just the normal streakiness you expect when you are do repeated independent trials. Some insisted that fizzles streaks were more common than you'd expect based on that model.

Finally, a statistician who also played EQ spend some time gathering data, and determined that fizzling was not independent. Your fizzle chance was an increasing function of the length of your current fizzle streak, up to a cap.

Things like that give a game character.

I have another anecdote from WoW regarding randomness.

The raid boss Onyxia (a great big dragon in a cave, which you could only kill once a week as i recall) had a phase where she would take off and fly above the raid. Once in a while she would then do a deep breath, a fire attack which had a high chance at killing you if it hit you.

Now, all sorts of tactics emerged on how to handle the fire attacks. One that stuck for a very long time was that the number of DOTs (Damage Over Time) would lower the number of deep breaths you got.

I'm sure there were other theories as well, but this one stuck for years with raid leaders yelling at warlocks, the class with most dots, to add more dots and even stacking the raid with warlocks to get as many dots as possible (we need at least 4 warlocks to take her).

In the end there was a developer (or just Blizzard employee) who admitted that it was in fact random. This story pops back in my head once in a while because I think we do this a lot on larger scales too without realizing it: See some phenomenon caused by randomness, come up with a theory, and because it is not easily refuted (after all it works most of the time right?) it becomes common knowledge that everyone follows.

These days in World of Warcraft, boss monsters drop tokens instead, and when you have X tokens, you can exchange that for a piece of armour, or a weapon, guaranteed. And noone complains about the random loot anymore.

This bit is not correct. All of the most desirable items (except darkmoon trinkets and tier sets) drop directly from raid bosses, and people still complain about the outcomes of random loot.

It's also worth noting that Blizzard recognized this and "fixed" it in their questing system by progressively improving the drop rates of quest items for players, which provides a natural upper limit on "bad streaks" for quest drops, as a means of reducing player frustration without removing all randomness from the system.


Too bad the WoW random number generator was flawed, somewhere in there it had the clause

   if (item == tier2.druid.pants)
       return (item = plate-wearer-loot());
Of this I am certain.

May be the case that it was coded with one version of a programming language and upgraded later and the newer version changed the handerling of rounding. Believe python from 2.0>3.0 had some difference along those lines and whilst on the surface may not mean much they can add bias into what you get dropped as rounding and somebody adding .5 will never get the low value with some rounding as it will always round 0.5 or > to 1 as apposed to some which demand at least be > than 0.5 . Yes there are standards, but they advance and evolve over time and we all know intergers are completely different from floats.

Last thought, if there was a truely random number generator then why do fruitmachines have control systems to monitor the distribution and with that what is paid out to maintain that x% payout to comply with laws that say x% of money taken has to be payed out. Mostly 70-80%, but does vary and whilst the legal min may be say 70%, some casino's and the like will generaly have a higher % payout in a promenant place, just so people think it's even higher.

> Last thought, if there was a truely random number generator then why do fruitmachines have control systems to monitor the distribution and with that what is paid out to maintain that x% payout to comply with laws that say x% of money taken has to be payed out.

That's easy to answer -- to comply with rules that require a certain outcome some x percentage of the time, one need only take a truly random generator's output and filter it by x:

outcome = (rn * 100 <= x);

If rn is a float, lies between 0 and 1 and is truly random, then the above trivial filter will produce the required distribution in the long term.

Nah, the game just used up all its Druid loot luck in Molten Core.

Those frickin' salad shoulders.

That's interesting, because in WarCraft 3, by Blizzard, randomness is designed to be "fair". They use a special word for it but i can't remember it at the moment.

For example, there are certain items that can give you a chance to blow a "critical hit", a critical hit multiplies your damage with X.

Say an item gives you 30% chance to do 2X damage. However, the randomness has memory and is designed to distribute the critical hits evenly by gradually increasing the probability for every non critical hit and resetting it for every critical hit. So the first time you hit the chance isn't actually 30% but more like 10%. If you miss 3 times the chance of the fourth hit being a critical is more like 40-50%. The hit after that will be back on 10% probability. (Just picking numbers out of the blue to illustrate the concept, I'm sure there's some more thought through math behind it)

As the nr of hits goes towards infinity, 30% of them will still be critical but the chance of getting streaks of non-critical or streaks of critical hits is very low.

In practice it works very well from my experience. It is however not completely exploit-proof, you could for example go and increase your probability of initially blowing a critical hit by first making a few hits on NPCs and then go to battle against a harder enemy. But that's more of a hypothetical exploit as the time and risk of doing this simply isn't worth those few extra percent.

I guess applying this concept on item drops available to every player on a whole MMO-server isn't as simple/fair as remembering the outcome of hits for a single player.

I see most people refering to that as a Pseudorandom Distribution, although this is Dota terminology so I'm not 100% sure its the same word.

During the first long raid (Molten Core) the loot table had a weird deviance of being very similar depending on the player that created the raid, Blizzard said that was impossible but in reality guilds would just rotate the raid leader until they hit someone that "would", for example, drop the legendary bindings and stick with that RL for a while -- it worked for many people and 4 or 5 years later one dev finally answered the big question: "Is loot defined when you kill the mob" or "is loot defined when the raid is instanced" which is the second option, the sum of these plus some weird bug that they might have fixed silently at some point might have caused the RL rotation hoax.

A market would also fix this, in effect averaging over the entire population. Given players' anger at pieces not dropping as often as they "should", the market would converge approximate the true ratio between drop frequencies of different pieces.

Final Fantasy XIV did this very well in my opinion. They designed for the economy right from the start, and everything revolves around that (and the amazing graphics, music, and exploration), rather than spells, grinding, and raids.

Except that the most desirable loot was invariably soulbound: http://www.wowwiki.com/Soulbound


And you know, it is always possible in those situations that there is a broken pseudo-random number generator involved. But of course, that's really hard to tell too because real randomness is so unrandom seeming.

Another Blizzard game, Diablo 2, actually had a random number generator bug that went unnoticed for years.

Essentially, any random chance in the game of the form 1 to n could only ever be 1 to n-1, I think with n-1 being twice as likely.

This went unnoticed since most random rolls were over fairly large ranges and it didn't seem to hurt much, however it did explain why one particular subtype of loot literally never dropped.

Regarding the bomb plot at the very bottom. Examining the number of bombs dropped in each square and comparing it to the Poisson distribution, it appears that the distribution of the bombs is random.

But looking at the plot on the map, it appears that the higher incidence of bombs is focused on a specific region. The Poisson distribution doesn't account for the fact that a lot of the squares with a high incidence of bombs are adjacent to each other. From my layman's understanding, it appears that the bombs were in fact targeted on a specific area, but that there was a random offset from this area regarding where the bombs actually landed. Because of this, you'd see randomness in the distribution. But the distribution of bombs wasn't really perfectly random.

Is the author deliberately avoiding this point, or is there something I've misunderstood?

If you read the linked post its clearer whats going on: http://madvis.blogspot.ie/2010/09/flying-bombs-on-london-sum...

The poisson analysis is only done on a subset of the area thats in the 3d plots: "Clarke's analysis was focused on the central area of higher density here and with finer geographic coordinates. Within that area his analysis found no evidence of clustering that cannot be accounted for by a Poisson process."

Yes, it could have been a bit more clear in how they "weren't being aimed". Obviously the Germans did not just point them straight up and light them off, hoping for the best. What this really demonstrated was something more like where the noise floor for the bomb distribution was. They could hit "London" but they couldn't hit any particular place in it. Very important for the military to know that information.

The responses below are correct - the analysis in the article is about a smaller region of London, that essentially falls inside the large peak in the plot. I wasn't deliberately avoiding this point, it's just that the article was getting long and I wasn't sure how much detail to keep in. I've made a correction to the post to alleviate this confusion. And thanks for pointing this out.

I believe the table at the end deals with a subsection of the 3d graph, but I could be wrong.

Here's a weird thing about randomness that has always bothered me:

In quantum mechanics, if you measure two incompatible observables (like position and momentum) of a system, and then repeat that experiment many times, you will get two lists of real numbers. QM says you can predict the distribution of these numbers, but you cannot predict the individual numbers themselves. The popular way of thinking nowadays is that "the universe is just inherently random".

So I posed the question on the Physics Stack Exchange: how do we know these numbers are truly random, and not the result of some as-yet-undiscovered pseudorandom number generator that is nonetheless deterministic? Luboš Motl (Czech string theorist) replied (a bit abrasively I might add) that yes, the numbers are truly random and plenty of experiments have ruled out the loopholes. Now, there's no way to determine if a set of numbers are truly random, so how he made this bold matter-of-fact statement is beyond me.

Einstein initially believed in "hidden variable" theories, undiscovered properties of quantum systems. Most of these have been ruled out by experiment (this is what Lubos mentioned), but really, this doesn't apply at all to my question of whether those numbers are random or not. Superdeterminism seems to still allow non-randomness, but for some reason, most physicists (notably excepting Gerard t'Hooft) have discounted superdeterminism as nonsense.

Motl, abrasive? Surprised?

Maybe the issue is the Einstein-Podolsky-Rosen problem: if the numbers are being generated deterministically, they're somehow being communicated superluminally between entangled particles, which implies that in some relativistic frames of reference they're being communicated into the past? I guess I should learn enough about QM to really understand this stuff instead of guessing.

Yes, that is the thing: Bell's inequality has been verified experimentally and that leads to a non-hidden variables reality (for the usual meaning of 'reality'):


how do you expect to implement the "as-yet-undiscovered pseudorandom number generator" without any state ("hidden variables")?

I'm just an engineer, so I'm not actually trying to challenge accepted thinking.

However, it is well known that any QM system can be simulated using a classical computer, with the penalty of exponential slowdown. Let's say that I have a hypothetical, ultra-powerful classical computer and I want to simulate a gigantic system of particles including aggregates of particles (e.g. people) performing measurements of other particles. When it comes time to determine the particular values for these measurements, I must generate a random number from a Gaussian distribution. So I use something like the Mersenne Twister. From the perspective of the simulated people, their observations would entirely match our own observations in studying a quantum system.

tl,dr: state isn't necessarily a one particle concept or a local concept. Individual particles have their own properties (spin, charge, etc.) and then maybe a collection of a million particles also has unique properties.

But my proposal is basically superdeterminism, which -- while being a loophole that has yet to be ruled out -- is unpopular. Since I'm not sure why, I guess I would need to get a degree in theoretical physics to find out.

You can try it out yourself in your browser. This scriptlet generates random and non-random dot distributions side by side (refreshing regenerates, at least in Firefox 17):

javascript:"<html><body><canvas id=\"tutorial\" width=\"200\" height=\"200\">foo</canvas><-><canvas id=\"tutorial2\" width=\"200\" height=\"200\">foo</canvas><script>var canvas = document.getElementById('tutorial');var ctx = canvas.getContext('2d');ctx.fillStyle = \"rgb(000,0,0)\";for (var i=0;i<400;i++) {ctx.fillRect (Math.random() * 200,Math.random() * 200, 2, 2); };</script><script>var canvas = document.getElementById('tutorial2');var ctx = canvas.getContext('2d');ctx.fillStyle = \"rgb(000,0,0)\";for (var i=0;i<20;i++) for(var j=0;j<20;j++) for(k=0;k<1;k++) {ctx.fillRect (i * 10 + Math.random() * 10, j * 10 + Math.random() * 10, 2, 2); };</script></body></html>"

(just paste the above into the address bar)

Here’s a web page displaying those dot distributions plus a more readable version of that code: http://bl.ocks.org/4358325.

Thanks, this looks much better. My code was like this because I used a pasted canvas tutorial example + my browser's URL bar as the IDE :)

"Gravity's Rainbow", by Thomas Pynchon, has an extended section about the Poisson distribution as applied to falling bombs, maternity wards, etc.:


(possibly NSFW)

That book is exactly what I thought of when reading the article. Everyday I learn something new that seems to unlock a section of that novel that was previously opaque.

"The one on the left, with the clumps, strands, voids, and filaments is the array that was plotted at random, like stars."

Are stars really plotted at random?

That's a good question. There's obviously more stars in the directions of the galactic plane (what we observe as the Milky Way), but the individual stars that we see in the night sky tend to only be the ones that are really close to us (relatively speaking). So it's a question of which effect dominates the other. I think it'd be interesting to see how well the distribution of stars fits a Poisson distribution.

Imagine you were one of the dots on the plane in the dot illustrations. Would the "constellation" of all the other dots from your position be Poisson distributed?


Up to the largest scales the universe is randomly clumped/stringed/voided.

Well no, if you look at the whole sky then obviously you'd see more stars in the direction of the galactic centre (the milky way) than in the opposite direction, but if you consider only a small sliver of sky the results would be pretty close to being truly random.

I would love to see more like this on HN.

The author is @aatishb (https://twitter.com/aatishb) on Twitter. I've followed him for some time and he's a enjoyable science/math blogger if you're interested in that.

Thank you! I've added him to my reading list.

This is exactly how they determine species distributions in ecology as well - http://en.wikipedia.org/wiki/Species_distribution

Uniform dispersion would suggest some territoriality aspect of the species, and clumped dispersion would suggest a heterogeneity of resources (or any other hypothesis that could then be tested).

Nice! That's a really interesting and relevant example.

Python script I wrote a while back that outputs random images. PIL must be installed to work. Change |file_name| accordingly. JPEG images can be up to 65535 x 65535 in size, which are the max values for |width| and |height|. It's not optimized, so keep the resolution small unless you want to wait a while.

It'll output images that look like this: http://dave-gallagher.net/pics/666x666.png

    from PIL import Image, ImageDraw
    from random import randint
    def random_image():
        width       = 666
        height      = 666
        file_name   = '/Users/Dave/%dx%d' % (width, height)
        path_png    = file_name + '.png'
        path_jpg    = file_name + '.jpg'
        path_bmp    = file_name + '.bmp'
        path_tif    = file_name + '.tif'
        img  = Image.new("RGB", (width, height), "#FFFFFF")
        draw = ImageDraw.Draw(img)
        for height_pixel in range(height):
            if height_pixel % 100 is 0:
                print height_pixel
            for width_pixel in range(width):
                r  = randint(0, 255)
                g  = randint(0, 255)
                b  = randint(0, 255)
                dr = (randint(0, 255) - r) / 300.0
                dg = (randint(0, 255) - g) / 300.0
                db = (randint(0, 255) - b) / 300.0
                r  = r + dr
                g  = g + dg
                b  = b + db
                draw.line((width_pixel, height_pixel, width_pixel, height_pixel), fill=(int(r), int(g), int(b)))
        img.save(fp=path_png, format="PNG")
        img.save(fp=path_jpg, format="JPEG", quality=95, subsampling=0)     # 100 quality is 2x to 3x file size, but you won't see a difference visually.
        img.save(fp=path_bmp, format="BMP")
        img.save(fp=path_tif, format="TIFF")
    if __name__ == "__main__":

Such a great article! - something nice at the HN top after a long time.

Were the bombs actually mostly random?

No, they were aimed but it was a matter of trial and error. To get the data about where they'd landed, some had radio transmitters. They sent a message back to armourers in France (I assume some were deliberately non-explosive).

Meanwhile German agents in England were also observing the success or failure of the V2s, and in particular where they'd landed.

However, in an effort to deceive the Germans, the British started reporting the correct time of successful attacks, while mentioning an incorrect location.

Moreover, a double agent called Eddie Chapman also fed false information back to the Germans.

As a result, the aimers never really got a grip on ranging accurately. The bombs started landing to the south east of London.

There's considerably more detail about this in Most Secret War by R V Jones, who was involved in all sorts of ruses to confuse the enemy. Well worth a read.

Eddie Chapman (Agent ZigZag) was played by Christopher Plummer in the film Triple Cross. It seems to be on YouTube.

does that book discuss whether / how the misinformation was used to select the south east? i suspect that was a poorer area of london and i vaguely remember some kind of scandal about the relative suffering of various parts of london and class, etc. so i wonder to what extent the south east was chosen (by the british) as a target?

Without asking the folks aiming them we can never be sure, but the results of the survey of South London detonations mentioned in the article was shockingly close to the Poisson distribution. So it seems likely.

They were aimed, but the precision was so low that the distribution ended up being random.

True Randomness = If you roll a six sided die 60 times, each side will NOT get chosen 10 times. 1 side(s) will get chosen more than others, another side(s) might not get chosen at all.

So for all you entrepreneurs, if you fail, don't fall into a depression. Sure you worked just as hard as everyone else, maybe even harder. And yeah it's annoying to see others surpass you even though you've got everything they do. But that's life, you just got a bad batch of rolls.

Here is an article on how to assess whether your data sample is "random enough": http://goo.gl/opl52

Very interesting! So, the basic point being made is that if you know that a set of events are random and independent and you know their mean value, then you can predict their spread? (or aggregation)

edit: Hmm...another question that comes to mind: is the converse true? If the spread of values of these events do not match the poisson distribution, then can we presume them to be nonrandom? Or nonindependent? Or both?

The Poisson distribution is just one random number distribution; there are several others for situations where the events are correlated, or have other properties. Half the fun of probability is figuring out which distribution is the right one to apply to the question at hand. So if your measurements don't match up to Poisson, it doesn't mean they're not random - they could just be interdependent.

So yes, for a Poisson process, the spread (standard deviation) is equal to the square root of the mean; as the number of events gets large, the Poisson distribution approaches the normal distribution, but the relationship between the standard deviation and the mean continues to hold.

If you have enough data and to determine the distribution and it's characteristics (eg mean and s.d. for normal etc) then you can use it to predict spread.

As a side note this is one of those things that several tetris-clones deal have to deal with, long runs are probable but can frustrate a player so a lot of the times you want to to extra logic to keep it from being actually random.

A uniform "deck" a couple times larger than the number of pieces is the usual suggestion prevents large runs and makes sure that you see every piece more regularly.

Marginally related --

on the other hand, here's "Evil Tetris": http://qntm.org/hatetris

I was happy that the first example with the dots looked like the second one was natural. My eyes felt better. With the student example my eyes struggled so I "reasoned" that the first one had more clumps and was wrong since the first dot example had more clumps and was wrong.

I guess I'm not a good randomness detective.

This article is really interesting. It says that randomness doesn't mean uniformity but true randomness can have clusters. So what exactly is randomness? How can we define randomness? Is Math.random() is really random? Which is the best random function and how can we find if it is purely random?

True randomness is a system that say generates a 10 digit number and 11111111 has the same odd's as appearing as the other permutations. But then thats not strictly true as we all know any system that monitors the outcome (a good example will be fruit machines) to make sure there is a even distribution of all permutation will in essence remove that level of definition we like to think as truely random.

With that it gets hard to truely say what is random or what is a as yet unknown pattern. This is why many have taken the approach of not having a single source of random numbers but use many and average out from there. There again is that random as the chances with such an approach of getting a high or low value would be biased out.

So with that I postulate one mans random string is another mans non-random string. So with that I define randomness as a yet undertermind sequence or a data. So the included Dilbert post is with that extreemly clever and totaly true.

"hard to truely say what is random or what is a as yet unknown pattern."

No, it's just saying that it's much easier to get pseudorandomness out of a computer than true randomness.

"This is why many have taken the approach of not having a single source of random numbers but use many and average out from there. There again is that random as the chances with such an approach of getting a high or low value would be biased out."

Technically speaking you wouldn't average you'd add them together and take the decimal portion (modulo 1). That can negate bias as long as one of the sources is good even if you don't know which one.

Of course you can remove bias from a single source by Von Neumann's method although this might be computationally harder than the above:


There is actually no such thing as a random or not random number. There are only random and not random generators. Whether a fruit machine is random depends on how is defines/ensures evenness.

I'm a bit confused. What is this notion of "pure randomness" that this article and many comments seem to be eluding to? Perhaps I'm just not using the same definitions of terms, but I thought a uniform distribution is still considered random.

> Perhaps I'm just not using the same definitions of terms, but I thought a uniform distribution is still considered random.

No, a uniform distribution is not evidence of randomness. Consider the digits 0 - 9 repeated endlessly:

0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 ...

Uniformly distributed? Yes. Random? No.

> What is this notion of "pure randomness" that this article and many comments seem to be eluding [sic] to?

First, s/eluding/alluding/

Second, although the topic is complex, one test of randomness is that an ideal compression method, one able to find and exploit any repetitive pattern, cannot compress a random sequence.

Third, the term "entropy" as used in information theory is tied to randomness, as explained here:


A quote: "The entropy rate for a [fair coin] toss is one bit per toss. However, if the coin is not fair, then the uncertainty, and hence the entropy rate, is lower."

Based on that, high entropy -> high randomness.

Not to oversimplify a complex topic.

>Uniformly distributed? Yes. Random? No.

I think you're fudging what is supposed to be uniform here. In your example, the unigrams (i.e. single digits) may be uniform, but the n-grams for n > 2 are not.

Actually, the n-grams are periodic and repetitive also. They certainly aren't random.

They may be periodic and repetitive, but they aren't uniform. For example, there are lots of "01"s and no "02"s.

Could you edit that long line somehow? It makes reading harder in my phone.

The concept you're looking for is "independence"


The glow worms are still distributed randomly in the mathematical sense, but knowing the position of one tells you something about the likely positions of others, so they are not independent.

Common parlance doesn't do a great job at talking about the features of random distributions, but when people say "purely random," they often seem to mean "uniformly distributed and independent." Both pictures have uniformly distributed points, but the glow worms are not independent.

The distributions are describing different aspects. Let's take the WoW loot example. A monster could drop 50 different objects with equal probability. Kill it a few million times you'll get a uniform distribution of kills against item type.

Now if we take that same kill data and instead ask how many times do I get item type 2 in 100 kills you'll see a Poisson distribution. If you didn't see a Poisson distribution when asking that question then the events were probably not independent of each other, and hence weren't actually random (e.g. The monster could just drop the 50 items types in order).

Strange terminology. In mathematical parlance the arrangement of the glowworms is still described as "random", but the positions of the worms simply are not independent of one another.

Can you expand on this? I'm curious. It seems to me that with the glowworms, knowing the position of one worm tells you something about the position of every other worm in the sample. If it was truly random, wouldn't it tell you nothing?

No, in mathematics the eye colour and hair colour of a randomly selected human being are considered random, even though knowing one tells you something about what the other is likely to be. Mathematically, "random" means distributed according to some definite (though perhaps unknown) distribution whereas in common parlance it means something like "not distributed according to anything definite at all" (though it's not clear that this way of defining things is even meaningful).

Common parlance "random" seems to mean something more like "uniformly distributed and independent."

One under-stated but awesome thing about Poisson distributions: fishing has a Poisson distribution. (Poisson is French for "fish".)

I think that would only be true in an infinite lake.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact