> Because the monster battles are statistically independent, we can make a rough guess that about 1 in 9 players still won’t have a special item after 20 battles.
From a game design perspective, for some situations this can be a real problem that leads to players getting discouraged and quitting, which leads to loss of revenue.
In the games I've made, I've often used a special trick for overcoming this problem. Rather than using plain random numbers, I conceptually use a shuffled deck of cards, where each player receives a different random seed for the shuffle and I maintain state on how many "cards" they've drawn so far. I put one "win" in the deck and some number of non-wins. This way it's still random, but if people keep at it they'll eventually get the thing that they want. With plain raw probabilities, you'll always have a certain number of customers that keep paying to try and keep failing. The random shuffle is a way better way to approach this from a customer service perspective.
This makes it easier to reason about from a game design and customer service perspective. I can say that no player will ever have to go through more than X attempts. I say it's a customer service issue because in many of my games the players feel like there's real money on the line, and so it can literally lead to customer service tickets, wailing, and gnashing of teeth.
One of my favourite techniques not listed there is weighting the probabilities by distance, so individual outcomes get more likely the longer you've not seen them (making the gamblers' fallacy real, I guess).
which is another way of wording "reliably generates and serves improbable events"
The game had these "Legendary" items of incredible power you could attain that were insanely rare, but had a sort of "pity timer" where you'd get a higher and higher chance of obtaining one based on how long it'd been since your last item.
The problem is that the legendary items were almost inconceivably rare drops through pure random luck, but the escalating chance was really predictable. You were all but guaranteed to never get a legendary item through pure chance alone, but you were basically guaranteed to get one every two weeks if you did all the time-limited content available. (Dungeon raids you can do once per day/once per week, daily chore missions, etc).
For the first few months of the system's availability, after 4 legendary items you'd no longer get the "pity timer" bonus to your loot rolls on legendary items, so your character was effectively capped at 4 legendary items.
Some of the available items could increase your character's combat effectiveness by a relatively huge margin compared to others, and it only took about a week to power-level a new character and get it one or two legendary items, so for a while people were abandoning their max level, legendary-geared characters with 100+ hours invested into them to try again.
Each player got a bag that you once an hour could take a random piece of heart candy from. There were a fixed number of heart candies, the bag worked for a week, and you needed one of each to get the achievement.
One random draw once an hour for at max a week is a limited number of attempts, what are the odds of never getting a certain heart piece? About one in a million or so. How many players did WoW have at that time? Over ten million. So you're pretty much guaranteed that even if you never missed an opportunity to draw a piece of candy, some players would never get one of each, and never get the achievement... Statistics is a bitch.
(They implemented pity timers for every similar achievement afterwards, as far as I know.)
This extends the time to get the achievement from one year to "for 0.0001% of players, two years".
That will annoy those, um, ten people, but "getting the achievement next year" is pretty different from "never getting the achievement".
Anyway, they fixed it retroactively for the first year somehow, so that was nice.
This severely weakens the case that it was a problem that needed to be fixed; if you're getting the Valentine's achievement because you want What A Long, Strange Trip It's Been, the extra waiting time you suffer from not getting Valentine's the first time around is lowered. (Because you would have had to spend a lot of that time waiting anyway, for the other calendar-based achievements that you didn't already have.)
Seems like this is a feature, not a bug. You want to reward users who interact a lot with your game consistently over time.
But this feature meant that through random chance, your alt could become much more powerful than your main. This sucks, because you're now incentivized to spend your time in the game playing a character you don't enjoy as much as your main, and the time you spent on your main making it powerful is simply eradicated.
And that increases the chance of you quitting the game.
Hearthstone used to do this for buying card packs, and d3 used to do it for legendary drops.
Dota2 uses a strategy it calls pseudorandom, where a 25% chance means a much lower chance at first that escalates rapidly to maintain an "average" (whatever that means to them) of 25%. This shrinks the range of possible outcomes while still allowing for chance
I think phantom assassin still can do it with her ult (attack, see if the swing back is the crit animation, if not cancel it), but that may have changed (I haven't played in a few years)
This is preferable to bara getting 5 bashes in a row and killing you due to real rng. Or playing bara and getting what feels like 0 bashes, again due to real rng.
I think such a thing would be hard to find in testing and a shuffled deck or 'pity timer' weighted-RNG approach would ameliorate it but might hide the root issue.
While I'm on the topic, POE tends to use pure RNG in all cases except evasion. Evasion is a pretty useless defensive layer if you get hit 3 times in a row by random chance, ending your run, which gives a terrible game experience. So they use weighted entropy system to dial the chances up or down depending on how long since you've been hit (or since you've hit the enemy).
I love the amount of randomness that exists in the game some of the items that can drop are completely build defining (legion jewels and Watcher's Eyes for example) but also hugely random if you manage to get your hands on one with interesting rolls it feels like possibilities are endless.
At the same time I think they have realized players like some amount of agency as well a lot of the newer content they've introduced has semi-deterministic outcomes where player's action can influence the rewards stuff like Syndicate hideouts, incursion temple etc. Gives you some control over what items you will get at the end of the content.
Syndicate is definitely the sweet spot but I'm glad that the simple encounters like Bestiary exist too. We need both types.
It appears to use the gambler's fallacy as a heuristic for how dice _should_ work.
Far as I know that's not the actual case, but if I were to make one of these I'd definitely have both the gambler's fallacy and ML-based greed in there. They would both activate only for about a third of the sessions, so there would be plausible deniability.
Thinking about this more carefully, though: I'm not sure if actually lowering the reward for greed would be a good idea. Seems to me that you would want to reward greed so that it leads to even more greed.
That sounds more like from a game business perspective. From a game design perspective, I would say it leads to the user not having fun. "Fun" being the goal of the design of the game. Loss of revenue affects a business around the game more than the game itself.
Otherwise I agree with what you're saying.
Yes, certainly. I've made some games from a pure game design perspective without regard for business model, and it didn't work out well for me. That's because I had this weakness where I needed to eat.
When I made my first "real" game, I only thought about the perspective of player engagement and fun. I had a game that was engaging and fun at launch. Then I realized my real challenge was community and discoverability. I needed to have people ready to play it at launch.
When I made my next game, I made sure that community engagement and discoverability was well-covered. So I had something that was fun at launch, and people were ready to play it and pay for it. But I soon realized that my real problem was having a sustainable business model that could keep the team and the game going for a multi-year lifecycle of the game's lifetime.
When making games that let you quit your day job, it's best to think of the business model as one of the key game design components.
Yeah, I guess that's where our perspectives differ, and that's ok! Thank you for sharing your experience with trying to make games for a living.
is bad for both player and business
For example if you're uploading stuff to a bucket, you can compute its hash first to figure out if a duplicate already exists and if so, skip the upload.
Why can you do this? What if it was just a hash collision? Shouldn't you still compare the contents to really make sure they are the same?
Turns out if your hash function is N bits you will need to have 2^(N/2) items before you see two hashed to the same thing by chance. If you choose a 256 bit cryptographic hash function like SHA256, that's 2^128. This probability is so low you have a higher chance of encountering a cosmic ray bit flip!
The hash collision problem is a birthday paradox, I suppose, but the probabilities are so low that that part isn't stressed in your post.
I feel like a better example of the Birthday Paradox would be something like:
I automatically give all my users a 5-character random ID. I figure there are nearly 12 million (26^5) combinations, so I should easily be able to support a hundred thousand users, since (naively) I calculate that the probability of the 1001st user matching any existing user is less than 1%. 
But if you calculate it correctly has the birthday paradox, you'll see that the probability of having a collision given 100,000 users is ~100%. 
 1 - (1-(1/11881376))^100000 < 0.01
 1 - (((11881376-1)/11881376)^((100000 * (100000-1))/2)) = 1
No no no. That's not how probability works. You will see that many on average, but you don't "need to" anything before you can see a collision. You could get collisions on the next 100 files. It's just unlikely. The bitspace bounds the denominator of match probability for each file independently, it does not count files. Unlikely events happen all the time somewhere.
Pretty sure GP understands how probability works, maybe take a more charitable interpretation of their comment?
There are many circumstances where I fully agree with charitable interpretation, but this is not one of them. In this case it's tantamount to saying that it's ok to mislead people because they should just know better. (especially in light of their reply)
If you are correct about what they understand, then I'd rather they were more charitable to the other people who don't already understand it by not writing things that are wrong and potentially dangerous. Someone out there is going to learn about this by googling, see 19 out of 20 people get it wrong, implement a system that relies on the majority false information, and then destroy the lives of a bunch of innocent people. It may not be someone you know, and it may not impact your neighbors, but it happens _all_ the time. I care about the harm of spreading misinformation, and I think you should as well.
Your statement was an incredibly common and often repeated misperception of probability and circumstance. Someone looking it up for themselves, as you suggest, is more likely than not to encounter dozens of wrong statements on the subject before they ever find a right one. I think it behooves us to not add more misinformation to the pile.
We commonly conceive of our computers as 100% accurate, but as you observe here, for this and other reasons they aren't. The distribution of errors is exceedingly pathological; the vast majority of errors you will encounter are not independent but are highly correlated, because some particular bit of hardware is flawed in some manner.
Ignore those for a moment, and consider just the purely-random failures, like cosmic or thermal bit flips. The rate on these is still non-zero. Very small, but high enough that we've pretty much all encountered them, usually without realizing it. (While it is true that a single bit flip can bring a program down if it is the correct bit, the average bit flip will have no visible manifestation of any kind.)
This creates a "noise floor" for our computations. Any event which has a probability lower than this "noise floor" for happening can be treated as the same zero probability you treat the possibility of totally random hardware failure.
The probably of two random pieces of content having the same SHA256 hash by random chance is in practice zero, and you may write your code that way. The potential problem that one may wish to defend against is the possibility that two pieces of content have the same SHA256 hash for non-random reasons, which is to say, the possibility that it will be broken. But the defense against that is rather different. There's a lot of nasty possibilities that lie above this noise floor that still need to be dealt with.
I have seen this concept kinda bothers people sometimes. But you are justified in just ignoring anything below this noise floor. There's basically never a reason to worry about this noise floor, because once you understand what it really means, you will see you lack to the tools to deal with it. Given an error, the probability that the error is the result of some non-random systemic issue is much higher than the probability that it was a truly random error. Even if you care about reliability a lot, like in space or medicine, the problem you need to deal with is failing hardware. If you try to write code to deal with the noise floor, it is much more likely to be getting invoked due to systemic, non-random issues... after all, a "low probability" failure on a network link that corrupts one packet a day is still multiple orders of magnitude above this noise floor.
(Of course, you'd need enough logging and debugging to be able to reproduce that.)
But in many programming applications (e.g. hash maps) you wouldn't use a cryptographic hash because it's too slow. Then it becomes important to design a hash function that is "good enough" and, unfortunately, it's quite easy to implement "hashCode" (or whatever your programming language calls it) in a broken way for your types and suddenly you have collisions. I've seen exactly this problem in my own code.
In short, if you're tossing N balls randomly into N bins, you should expect the most heavily loaded bin to get log N of the balls.
If instead you "toss" by choosing two random bins and then putting the ball in the one with fewer balls, you should instead expect the most loaded bin to get only log(log N) balls.
In practice, it's often not so hard to modify the first strategy into the second to get dramatically more uniform load distribution among bins.
(You might ask, if two choices are good, are three better? Yes, but only slightly. You get an exponential improvement going from 1 to 2; after that, testing more bins improves by constant factors.)
The intuitive explanation for this is that the largest benefit comes from avoiding the most full bin. Two choices is always sufficient to do that.
For example, setting up the server "mic2osoft.com" (a one-bit-flip error from microsoft.com) yielded this request:
msgr.dlservice.mic2osoft.com 213.178.224.xxx "GET /download/A/6/1/A616CCD4-B0CA-4A3D-B975-3EDB38081B38/ar/wlsetup-cvr.exe HTTP/1.1" 404 268 "Microsoft BITS/6.6"
"When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes." And worse "when the average span between arrivals is N minutes, the average span experienced by riders is 2N minutes."
These realizations about Poisson distributions have a lot of real-world implications such as packet traffic, call center calls, etc.
> Even without a statistical test, it's clear by eye that the actual arrival intervals are definitely not exponentially distributed, which is the basic assumption on which the waiting time paradox rests.
> The average waiting times are are perhaps a minute or two longer than half the scheduled interval, but not equal to the scheduled interval as the waiting time paradox implied. In other words, the inspection paradox is confirmed, but the waiting time paradox does not appear to match reality.
> the above analysis shows pretty definitively that the core assumption behind the waiting time paradox — that the arrival of buses follows the statistics of a Poisson process — is not well-founded.
Erhan Çinlar, 'Introduction to Stochastic Processes', ISBN 0-13-498089-1.
an arrival process with stationary (distribution not changing over time) independent (of all past history of the process) increments (arrivals) is necessarily a Poisson process. So there is an arrival rate, and times between arrivals are independent, identically distributed exponential random variables.
Often in practice can check these assumptions well enough just intuitively.
Then as in Çinlar can quickly derive lots of nice, useful results.
(2)The Renewal Theorem.
'An Introduction to Probability Theory and
roughly, with meager assumptions and approximately, if the arrivals are from many different independent sources, not necessarily Poisson, then the resulting process, that is, the sum from the many processes, is Poisson.
E.g., using (2), between 1 and 2 PM, the arrivals at a busy Web site, coming from lots of independent Web users, will look Poisson and from (1) can say a lot for sizing the server farm, looking for DDOS attacks, security, performance, network, and system management anomalies, etc., e.g., do statistical hypotheses tests.
Similarly for packets on a busy communications network, server failures in a server farm, etc.
For example, attributing even some success to randomness (a reasonable assumption, I would think) at least N fraction of successful people you see in life are just lucky! If 1000 people flip a fair coin 10 times, there's a really good chance (60%+) someone gets 10 heads in a row!
Optimize your existence as you see fit with that information! Generate more N, spend more time with your family, try to move the needle on the amount of randomness that contributes to your personal definition of success, etc.
You can make this 100% if it is a tourney style flip off. This seems obvious but more subtle forms of it exist in the world where randomness is involved but some results are determined; i.e. the difference between someone will see 10 heads in a row vs. this particular person will.
That sounds very intriguing, yet I don't really have an aha!-moment. Care to give an example or two from your personal experience?
I quit a job that put me in the top 5% of salaries to do something I loved that gave me more time outside work, because I realized the woman I share my life with was my "ten heads in a row" and not the job that depressed me.
Another example is that rather than sinking my time into one project, one hobby, one organization, I tend to jump around a lot. That isn't to say I am constantly in that state, I've just learned that eventually the coin will flip heads and I'll be working with great people on something really interesting that is worth my time to go deep. "Worth" here, not necessarily being financial. It might be educational, or a cause I'm passionate about. Or fun. This optimizes for N.
In law, I could kick ass, and still lose a case, because of many things that are likely to be depressing to list in public.
In programming, if I kick ass, it deterministicaly leads to something that works! That is good. Sure there is politics and popularity and all those normal human problems too, but those same problems existed in law, so I'm not losing anything there.
See: let's find what is Q such that Prob(no successes in M trials)~=Q. This is:
(1-(1/N)^M ~= Q
M log(1-1/N) ~= loq Q
Using the approximation,
-M/N ~= log Q
exp(-M/N) ~= Q
M=N yields the result.
Now, if we slightly change the problem so Q is a probability threshold such that Prob(no successes in M trials)>Q, we get an exact statement: since x>log(1+x) exactly, -M/N > log(1-1/N) > log Q.
Also it was a great example as to why you should never use “random” as your load balancing algorithm unless you plan to always have 1/3 extra capacity.
Or conversely why you should always have 1/3 extra capacity if you must use random.
Reframing it in terms of capacity cleared that up. If the rate of incoming requests is higher than the total rate your backends can process, your queues will grow infinitely!
So something like an average of 12 incoming requests per second with each backend capable of processing 1 request per second is actually fairly realistic. And I think the math still works out the same there.
where I try to go from the very basic to some useful applications (like Bayes theorem, A/B testing).
You can also subscribe to my newsletter: https://data4sci.com/newsletter where I also announce future webinars, live tutorials and trainings, etc.