Some Useful Probability Facts for Systems Programming 473 points by yarapavan on Jan 30, 2020 | hide | past | favorite | 76 comments

 I want to share a trick with you all.> Because the monster battles are statistically independent, we can make a rough guess that about 1 in 9 players still won’t have a special item after 20 battles.From a game design perspective, for some situations this can be a real problem that leads to players getting discouraged and quitting, which leads to loss of revenue. In the games I've made, I've often used a special trick for overcoming this problem. Rather than using plain random numbers, I conceptually use a shuffled deck of cards, where each player receives a different random seed for the shuffle and I maintain state on how many "cards" they've drawn so far. I put one "win" in the deck and some number of non-wins. This way it's still random, but if people keep at it they'll eventually get the thing that they want. With plain raw probabilities, you'll always have a certain number of customers that keep paying to try and keep failing. The random shuffle is a way better way to approach this from a customer service perspective.This makes it easier to reason about from a game design and customer service perspective. I can say that no player will ever have to go through more than X attempts. I say it's a customer service issue because in many of my games the players feel like there's real money on the line, and so it can literally lead to customer service tickets, wailing, and gnashing of teeth.
 There are lots of fun ways to smooth out short-term variation in game randomness. This article shows a few used by various Tetris games, with one of them being your deck-of-cards approach: https://simon.lc/the-history-of-tetris-randomizersOne of my favourite techniques not listed there is weighting the probabilities by distance, so individual outcomes get more likely the longer you've not seen them (making the gamblers' fallacy real, I guess).
 In general, it's often a good idea to munge pure probabilities to improve user experience. A very well known example is shuffle play. If you use pure probabilities you'll reliably play tracks back to back occasionally, frustrating users. A paradoxical quirk of true randomness is that it reliably throws out improbable events.
 > reliably throws out improbable events.which is another way of wording "reliably generates and serves improbable events"
 Don't quite understand the point of your comment, maybe I am missing some context, but yes. Are you objecting to my "throws out"? I suppose it is suboptimal because it could be misinterpreted as meaning "eliminates" rather than the intended "generates".
 An additional consideration is that predictability can lead to unintentional gaming of mechanics. For example, there are WoW addons that keep track of your bonus damage RNG trinkets to tell you when your RNG is hot and when its cold so you can optimize when you use some big finisher.
 This got really bad in World of Warcraft: Legion, when people were deleting and re-creating their characters to try and get better loot.The game had these "Legendary" items of incredible power you could attain that were insanely rare, but had a sort of "pity timer" where you'd get a higher and higher chance of obtaining one based on how long it'd been since your last item.The problem is that the legendary items were almost inconceivably rare drops through pure random luck, but the escalating chance was really predictable. You were all but guaranteed to never get a legendary item through pure chance alone, but you were basically guaranteed to get one every two weeks if you did all the time-limited content available. (Dungeon raids you can do once per day/once per week, daily chore missions, etc).For the first few months of the system's availability, after 4 legendary items you'd no longer get the "pity timer" bonus to your loot rolls on legendary items, so your character was effectively capped at 4 legendary items.Some of the available items could increase your character's combat effectiveness by a relatively huge margin compared to others, and it only took about a week to power-level a new character and get it one or two legendary items, so for a while people were abandoning their max level, legendary-geared characters with 100+ hours invested into them to try again.
 Speaking of WoW, I got hit by the statistics when they introduced an achievement for the in-game valentine's holiday.Each player got a bag that you once an hour could take a random piece of heart candy from. There were a fixed number of heart candies, the bag worked for a week, and you needed one of each to get the achievement.One random draw once an hour for at max a week is a limited number of attempts, what are the odds of never getting a certain heart piece? About one in a million or so. How many players did WoW have at that time? Over ten million. So you're pretty much guaranteed that even if you never missed an opportunity to draw a piece of candy, some players would never get one of each, and never get the achievement... Statistics is a bitch.(They implemented pity timers for every similar achievement afterwards, as far as I know.)
 > what are the odds of never getting a certain heart piece? About one in a million or so. How many players did WoW have at that time? Over ten million. So you're pretty much guaranteed that even if you never missed an opportunity to draw a piece of candy, some players would never get one of each, and never get the achievement...This extends the time to get the achievement from one year to "for 0.0001% of players, two years".That will annoy those, um, ten people, but "getting the achievement next year" is pretty different from "never getting the achievement".
 Yeah, because the achievement was part of the meta-achievement What A Long Strange Trip It's Been, and if you were one of those ten people, like me, it was pretty damn annoying. (There were actually quite a lot more of us, since very few people drew the maximum amount of candies from the bag, so the realistic odds of never getting it despite your best efforts was maybe one in 100k or something?Anyway, they fixed it retroactively for the first year somehow, so that was nice.
 > Yeah, because the achievement was part of the meta-achievement What A Long Strange Trip It's BeenThis severely weakens the case that it was a problem that needed to be fixed; if you're getting the Valentine's achievement because you want What A Long, Strange Trip It's Been, the extra waiting time you suffer from not getting Valentine's the first time around is lowered. (Because you would have had to spend a lot of that time waiting anyway, for the other calendar-based achievements that you didn't already have.)
 > You were all but guaranteed to never get a legendary item through pure chance alone, but you were basically guaranteed to get one every two weeks if you did all the time-limited content available.Seems like this is a feature, not a bug. You want to reward users who interact a lot with your game consistently over time.
 Well, in the case of WoW, most players are attached to their characters because they've invested a lot of time in them. Most people have a "main" character that is the most powerful, the one they most enjoy playing, and they may have one or more "alts" that they typically spend less time with, and those characters are thus less powerful.But this feature meant that through random chance, your alt could become much more powerful than your main. This sucks, because you're now incentivized to spend your time in the game playing a character you don't enjoy as much as your main, and the time you spent on your main making it powerful is simply eradicated.And that increases the chance of you quitting the game.
 Huh, it's just like card counting in blackjack!
 AKA pity timers!Hearthstone used to do this for buying card packs, and d3 used to do it for legendary drops.Dota2 uses a strategy it calls pseudorandom, where a 25% chance means a much lower chance at first that escalates rapidly to maintain an "average" (whatever that means to them) of 25%. This shrinks the range of possible outcomes while still allowing for chance
 Wait a minute? So "bash priming" or more generally "proc priming" is actually a real thing?
 If you're talking about spirit breaker, a while ago it was changed so the swing back doesn't count, it's only when you actually attack. So now, all you can do is "hold" a streak of not bashing.I think phantom assassin still can do it with her ult (attack, see if the swing back is the crit animation, if not cancel it), but that may have changed (I haven't played in a few years)
 I wasn't talking about waiting to see the proc first, but if you have a passive bash and "prime" it by hitting a creep and it doesn't proc, then the subsequent attack would apparently have a higher chance of hitting due to this mechanic, and not the actual % given. You could prime a crit or a bash before fighting a hero, first hit bash can make a big difference. But as stated by someone else this in practice is not easy to incorporate into a game and you are probably just as often wasting time.
 Yes, technically, but this is very hard to pull off in a real game.This is preferable to bara getting 5 bashes in a row and killing you due to real rng. Or playing bara and getting what feels like 0 bashes, again due to real rng.
 Interestingly, Path of Exile had a bug back in the Delve (infinite dungeon) league where some characters had a RNG seed that meant they would never see certain valuable items.I think such a thing would be hard to find in testing and a shuffled deck or 'pity timer' weighted-RNG approach would ameliorate it but might hide the root issue.While I'm on the topic, POE tends to use pure RNG in all cases except evasion. Evasion is a pretty useless defensive layer if you get hit 3 times in a row by random chance, ending your run, which gives a terrible game experience. So they use weighted entropy system to dial the chances up or down depending on how long since you've been hit (or since you've hit the enemy).
 Chris Wilson talks about how PoE implements RNG (and procedural generation) little bit in his GDC talk which is on youtube, he mentions one of the design pillars of the game is "multiple overlapping axis of randomness" the whole video is really good.I love the amount of randomness that exists in the game some of the items that can drop are completely build defining (legion jewels and Watcher's Eyes for example) but also hugely random if you manage to get your hands on one with interesting rolls it feels like possibilities are endless.At the same time I think they have realized players like some amount of agency as well a lot of the newer content they've introduced has semi-deterministic outcomes where player's action can influence the rewards stuff like Syndicate hideouts, incursion temple etc. Gives you some control over what items you will get at the end of the content.
 Yeah agreed but I think they went too far with Synthesis. The managing and placement of decaying tiles (with a limited bag space) was far too fiddly and way too much micromanagement for me.Syndicate is definitely the sweet spot but I'm glad that the simple encounters like Bestiary exist too. We need both types.
 This sounds sorta like this "Gambler's dice" project to me: https://github.com/xori/gamblers-diceIt appears to use the gambler's fallacy as a heuristic for how dice _should_ work.Nice idea!
 In fate/go there's this meme that the devs are tracking 'greed', i.e. how much people want some rare reward, the idea being your chance of getting something you want from the lootbox system is lower if they think you can tolerate greater spends, because you want it more.Far as I know that's not the actual case, but if I were to make one of these I'd definitely have both the gambler's fallacy and ML-based greed in there. They would both activate only for about a third of the sessions, so there would be plausible deniability.Thinking about this more carefully, though: I'm not sure if actually lowering the reward for greed would be a good idea. Seems to me that you would want to reward greed so that it leads to even more greed.
 > From a game design perspective, for some situations this can be a real problem that leads to players getting discouraged and quitting, which leads to loss of revenue.That sounds more like from a game business perspective. From a game design perspective, I would say it leads to the user not having fun. "Fun" being the goal of the design of the game. Loss of revenue affects a business around the game more than the game itself.Otherwise I agree with what you're saying.
 > That sounds more like from a game business perspective. From a game design perspective...Yes, certainly. I've made some games from a pure game design perspective without regard for business model, and it didn't work out well for me. That's because I had this weakness where I needed to eat.When I made my first "real" game, I only thought about the perspective of player engagement and fun. I had a game that was engaging and fun at launch. Then I realized my real challenge was community and discoverability. I needed to have people ready to play it at launch.When I made my next game, I made sure that community engagement and discoverability was well-covered. So I had something that was fun at launch, and people were ready to play it and pay for it. But I soon realized that my real problem was having a sustainable business model that could keep the team and the game going for a multi-year lifecycle of the game's lifetime.When making games that let you quit your day job, it's best to think of the business model as one of the key game design components.
 > When making games that let you quit your day job, it's best to think of the business model as one of the key game design components.Yeah, I guess that's where our perspectives differ, and that's ok! Thank you for sharing your experience with trying to make games for a living.
 Birthday paradox is way too important to omit from this list!For example if you're uploading stuff to a bucket, you can compute its hash first to figure out if a duplicate already exists and if so, skip the upload.Why can you do this? What if it was just a hash collision? Shouldn't you still compare the contents to really make sure they are the same?Turns out if your hash function is N bits you will need to have 2^(N/2) items before you see two hashed to the same thing by chance. If you choose a 256 bit cryptographic hash function like SHA256, that's 2^128. This probability is so low you have a higher chance of encountering a cosmic ray bit flip!https://en.wikipedia.org/wiki/Content-addressable_storage
 I don't quite understand how the birthday paradox relates to your example. Your example is how a hash collision is a very low probability event. But the birthday paradox is typically stated as odds being much higher than you expect.The hash collision problem is a birthday paradox, I suppose, but the probabilities are so low that that part isn't stressed in your post.I feel like a better example of the Birthday Paradox would be something like:I automatically give all my users a 5-character random ID. I figure there are nearly 12 million (26^5) combinations, so I should easily be able to support a hundred thousand users, since (naively) I calculate that the probability of the 1001st user matching any existing user is less than 1%. [1]But if you calculate it correctly has the birthday paradox, you'll see that the probability of having a collision given 100,000 users is ~100%. [2][1] 1 - (1-(1/11881376))^100000 < 0.01[2] 1 - (((11881376-1)/11881376)^((100000 * (100000-1))/2)) = 1
 > Turns out if your hash function is N bits you will need to have 2^(N/2) items before see two hashed to the same thing by chance.No no no. That's not how probability works. You will see that many on average, but you don't "need to" anything before you can see a collision. You could get collisions on the next 100 files. It's just unlikely. The bitspace bounds the denominator of match probability for each file independently, it does not count files. Unlikely events happen all the time somewhere.
 > No no no. That's not how probability works.Pretty sure GP understands how probability works, maybe take a more charitable interpretation of their comment?
 > Pretty sure GP understands how probability works, maybe take a more charitable interpretation of their comment?There are many circumstances where I fully agree with charitable interpretation, but this is not one of them. In this case it's tantamount to saying that it's ok to mislead people because they should just know better. (especially in light of their reply)If you are correct about what they understand, then I'd rather they were more charitable to the other people who don't already understand it by not writing things that are wrong and potentially dangerous. Someone out there is going to learn about this by googling, see 19 out of 20 people get it wrong, implement a system that relies on the majority false information, and then destroy the lives of a bunch of innocent people. It may not be someone you know, and it may not impact your neighbors, but it happens _all_ the time. I care about the harm of spreading misinformation, and I think you should as well.
 The precise statement is that you need to have O(2^(N/2)) items before the probability of finding a hash collision is greater than 50% (or whatever nontrivial percentage). English is hard and I think whoever cares about the details can look it up themselves.
 > I think whoever cares about the details can look it up themselvesYour statement was an incredibly common and often repeated misperception of probability and circumstance. Someone looking it up for themselves, as you suggest, is more likely than not to encounter dozens of wrong statements on the subject before they ever find a right one. I think it behooves us to not add more misinformation to the pile.
 The precise statement should be Omega, not O, since you're doing a lower-bound.
 "This probability is so low you have a higher chance of encountering a cosmic ray bit flip!"We commonly conceive of our computers as 100% accurate, but as you observe here, for this and other reasons they aren't. The distribution of errors is exceedingly pathological; the vast majority of errors you will encounter are not independent but are highly correlated, because some particular bit of hardware is flawed in some manner.Ignore those for a moment, and consider just the purely-random failures, like cosmic or thermal bit flips. The rate on these is still non-zero. Very small, but high enough that we've pretty much all encountered them, usually without realizing it. (While it is true that a single bit flip can bring a program down if it is the correct bit, the average bit flip will have no visible manifestation of any kind.)This creates a "noise floor" for our computations. Any event which has a probability lower than this "noise floor" for happening can be treated as the same zero probability you treat the possibility of totally random hardware failure.The probably of two random pieces of content having the same SHA256 hash by random chance is in practice zero, and you may write your code that way. The potential problem that one may wish to defend against is the possibility that two pieces of content have the same SHA256 hash for non-random reasons, which is to say, the possibility that it will be broken. But the defense against that is rather different. There's a lot of nasty possibilities that lie above this noise floor that still need to be dealt with.I have seen this concept kinda bothers people sometimes. But you are justified in just ignoring anything below this noise floor. There's basically never a reason to worry about this noise floor, because once you understand what it really means, you will see you lack to the tools to deal with it. Given an error, the probability that the error is the result of some non-random systemic issue is much higher than the probability that it was a truly random error. Even if you care about reliability a lot, like in space or medicine, the problem you need to deal with is failing hardware. If you try to write code to deal with the noise floor, it is much more likely to be getting invoked due to systemic, non-random issues... after all, a "low probability" failure on a network link that corrupts one packet a day is still multiple orders of magnitude above this noise floor.
 Also, if you choose an unbroken cryptographic hash like SHA256, and you stumble on a collision, you can later publish it and become famous.(Of course, you'd need enough logging and debugging to be able to reproduce that.)
 Flipped bits are not rare at all. Flipped bits are very common and your computer (or datacenter) has so many transistors that I assure you, bits are flipping all over the place.
 In fact, this is exactly why things like ECC memory exist -- https://en.wikipedia.org/wiki/ECC_memory
 If you are hashing k unique elements, N=2lg(k) is a good rule of thumb for sizing your hash functions. A Birthday Attack [1] analysis provides the exact collision probabilities for arbitrary (k, N).
 The most useful formula I’ve seen related to the birthday problem is the approximation p = (n^2)/(2m), where n is the number of random choices made, m is the number of choices available, and p is the probability of at least one collision. You can derive the usual formula for the number of hash bits necessary to avoid a collision with probability 1/2 (given a set of cardinality n) by just substituting 1/2 for p and solving for m. But this formula is much more useful than that special case since it lets you set the collision probability to be arbitrarily small.
 Sure, the probability of a collision is exceedingly low if you use SHA256. And if you take care of hashing _the whole content_.But in many programming applications (e.g. hash maps) you wouldn't use a cryptographic hash because it's too slow. Then it becomes important to design a hash function that is "good enough" and, unfortunately, it's quite easy to implement "hashCode" (or whatever your programming language calls it) in a broken way for your types and suddenly you have collisions. I've seen exactly this problem in my own code.
 A missed opportunity: the balls-and-bins discussion didn't go on to discuss the power of two choices.In short, if you're tossing N balls randomly into N bins, you should expect the most heavily loaded bin to get log N of the balls.If instead you "toss" by choosing two random bins and then putting the ball in the one with fewer balls, you should instead expect the most loaded bin to get only log(log N) balls.In practice, it's often not so hard to modify the first strategy into the second to get dramatically more uniform load distribution among bins.http://www.eecs.harvard.edu/%7Emichaelm/postscripts/handbook...(You might ask, if two choices are good, are three better? Yes, but only slightly. You get an exponential improvement going from 1 to 2; after that, testing more bins improves by constant factors.)
 > You might ask, if two choices are good, are three better? Yes, but only slightly. You get an exponential improvement going from 1 to 2; after that, testing more bins improves by constant factors.The intuitive explanation for this is that the largest benefit comes from avoiding the most full bin. Two choices is always sufficient to do that.
 ... and following this thread a little will bump into importance sampling, an interesting topic.
 Another one to add: if N is big enough, even statistically improbable events like single bit flips in memory become likely or even nigh-certain. For example, if you set up a bunch of fake domains that differ from real domains in a single bit, you might get upwards of a dozen hits per day from clients clearly trying to reach the real service - this is called Bitsquatting: http://dinaburg.org/bitsquatting.htmlFor example, setting up the server "mic2osoft.com" (a one-bit-flip error from microsoft.com) yielded this request:`````` msgr.dlservice.mic2osoft.com 213.178.224.xxx "GET /download/A/6/1/A616CCD4-B0CA-4A3D-B975-3EDB38081B38/ar/wlsetup-cvr.exe HTTP/1.1" 404 268 "Microsoft BITS/6.6" `````` This is a machine trying to download some kind of update package from the wrong server because somewhere in its memory the "r" from "microsoft.com" got flipped to a 2 (0x72 -> 0x32).
 Is it possible someone typed that in manually?
 Worth mentioning is the The Waiting Time Paradox:"When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes." And worse "when the average span between arrivals is N minutes, the average span experienced by riders is 2N minutes."These realizations about Poisson distributions have a lot of real-world implications such as packet traffic, call center calls, etc.
 For some reason this is the post that took me back to a EE numeric analysis and stats practical exercise nearly 40 years ago. A friend encounters a miserable looking me standing on the street with a clipboard and a stopwatch. Friend: "What the hell are you doing?". Me: "Testing the hypothesis that passing traffic is Poisson distributed".
 I'll quote from the conclusion of that article, because the frequent assumptions that bus arrivals are a poisson process peeve me and this article did a good job of actually testing that assumption:> Even without a statistical test, it's clear by eye that the actual arrival intervals are definitely not exponentially distributed, which is the basic assumption on which the waiting time paradox rests.> The average waiting times are are perhaps a minute or two longer than half the scheduled interval, but not equal to the scheduled interval as the waiting time paradox implied. In other words, the inspection paradox is confirmed, but the waiting time paradox does not appear to match reality.> the above analysis shows pretty definitively that the core assumption behind the waiting time paradox — that the arrival of buses follows the statistics of a Poisson process — is not well-founded.
 This also applies to bitcoin blocks
 My favorite of such applications of probability is the Poisson stochastic arrival (get arrivals at discrete points in time) process. There are two amazing, powerful, useful, non-obvious biggie points:(1)Axiomatic DerivationAs inErhan Çinlar, 'Introduction to Stochastic Processes', ISBN 0-13-498089-1.an arrival process with stationary (distribution not changing over time) independent (of all past history of the process) increments (arrivals) is necessarily a Poisson process. So there is an arrival rate, and times between arrivals are independent, identically distributed exponential random variables.Often in practice can check these assumptions well enough just intuitively.Then as in Çinlar can quickly derive lots of nice, useful results.(2)The Renewal Theorem.As inWilliam Feller, 'An Introduction to Probability Theory and Its Applications, Second Edition, Volume II', ISBN 0-471-25709-5,roughly, with meager assumptions and approximately, if the arrivals are from many different independent sources, not necessarily Poisson, then the resulting process, that is, the sum from the many processes, is Poisson.E.g., using (2), between 1 and 2 PM, the arrivals at a busy Web site, coming from lots of independent Web users, will look Poisson and from (1) can say a lot for sizing the server farm, looking for DDOS attacks, security, performance, network, and system management anomalies, etc., e.g., do statistical hypotheses tests.Similarly for packets on a busy communications network, server failures in a server farm, etc.
 Indeed, poisson processes are surprisingly common in networking.
 We can all use excellent reminders like this about how random outcomes can be distributed really unevenly.For example, attributing even some success to randomness (a reasonable assumption, I would think) at least N fraction of successful people you see in life are just lucky! If 1000 people flip a fair coin 10 times, there's a really good chance (60%+) someone gets 10 heads in a row!Optimize your existence as you see fit with that information! Generate more N, spend more time with your family, try to move the needle on the amount of randomness that contributes to your personal definition of success, etc.
 there's a really good chance (60%+) someone gets 10 heads in a row!You can make this 100% if it is a tourney style flip off. This seems obvious but more subtle forms of it exist in the world where randomness is involved but some results are determined; i.e. the difference between someone will see 10 heads in a row vs. this particular person will.
 > Optimize your existence as you see fit with that information!That sounds very intriguing, yet I don't really have an aha!-moment. Care to give an example or two from your personal experience?
 Sure.I quit a job that put me in the top 5% of salaries to do something I loved that gave me more time outside work, because I realized the woman I share my life with was my "ten heads in a row" and not the job that depressed me.Another example is that rather than sinking my time into one project, one hobby, one organization, I tend to jump around a lot. That isn't to say I am constantly in that state, I've just learned that eventually the coin will flip heads and I'll be working with great people on something really interesting that is worth my time to go deep. "Worth" here, not necessarily being financial. It might be educational, or a cause I'm passionate about. Or fun. This optimizes for N.
 Also moving from law to programming, I changed the effect of randomness on my success.In law, I could kick ass, and still lose a case, because of many things that are likely to be depressing to list in public.In programming, if I kick ass, it deterministicaly leads to something that works! That is good. Sure there is politics and popularity and all those normal human problems too, but those same problems existed in law, so I'm not losing anything there.
 However million-to-one chances crop up nine times out of ten. https://wiki.lspace.org/mediawiki/Million-to-one_chance
 My favourite quote from the series. He was such a great author.
 Thank you for posting this, the explanations are intuitive and the real-world examples really help one think about the implications of these probability facts.
 An easier almost-proof of the N trials with 1/N chance bit uses the approximation log(1+x)~=x for small x.See: let's find what is Q such that Prob(no successes in M trials)~=Q. This is:(1-(1/N)^M ~= QthereforeM log(1-1/N) ~= loq QUsing the approximation,-M/N ~= log Qthat isexp(-M/N) ~= QM=N yields the result.Now, if we slightly change the problem so Q is a probability threshold such that Prob(no successes in M trials)>Q, we get an exact statement: since x>log(1+x) exactly, -M/N > log(1-1/N) > log Q.
 Not exactly related with the content of the article, but I was also reminded of the German tank problem: https://en.wikipedia.org/wiki/German_tank_problem
 This was a great and readable explanation!Also it was a great example as to why you should never use “random” as your load balancing algorithm unless you plan to always have 1/3 extra capacity.Or conversely why you should always have 1/3 extra capacity if you must use random.
 I thought the example in the article was a little artificial at first. Like why would you only have 12 requests if you have 12 backends?Reframing it in terms of capacity cleared that up. If the rate of incoming requests is higher than the total rate your backends can process, your queues will grow infinitely!So something like an average of 12 incoming requests per second with each backend capable of processing 1 request per second is actually fairly realistic. And I think the math still works out the same there.
 That condition is only met if you send N requests to N routers. If you send 1,000,000*N requests to N routers, they will almost always be evenly distributed.
 But then you’re under capacity. The assumption is that it takes N servers to service N requests simultaneously.
 In the same line, some of you might be interested on the slides for my Probability webinar:https://drive.google.com/file/d/1qz4wAmwiKadshhrStxcz-8atb0S...where I try to go from the very basic to some useful applications (like Bayes theorem, A/B testing).You can also subscribe to my newsletter: https://data4sci.com/newsletter where I also announce future webinars, live tutorials and trainings, etc.
 All these examples (in the article and in the comments here) shatter my mind. I don't find it intuitive and even after reading the explanations I feel like there's something I'm not getting. I'd attempt a marathon to be conversational in probability theory.
 Probability is incredibly non-intuitive! If you're willing to run a marathon to become more conversant in it, spend the ~4-5 hours to run marathon on this course: https://projects.iq.harvard.edu/stat110/home