 Visualizing Bayes Theorem (2009) 148 points by Tomte 24 days ago | hide | past | favorite | 21 comments I recently built an interactive visualisation  to explore dependent probabilities in terms of area and side length of rectangles. I like that. I started playing by twiddling P(A). Since probabilities add to one, there are only three degrees of freedom. I got that the other two degrees, P(B|A) and P(B| not A) didn't move, why would they? By contrast P(A|B) and P(A| not B) in the other column did move - so far so good.But P(B) just sat there. Puzzling. Then I realized that it starts at P(B|A) = P(B| not A) which is a special case.An excellent philosophical toy :-) I understood OP's visualisation immediately, and I still don't understand yours. Not sure if the fault is in me or it... I agree that it could use a little written explanation:One big square represents the whole result space having a probability of 1, ie an area of 1 (side length 1x1). The four colors partiotion the whole space in all 4 combinations: 1. A happens and B happens, 2. A happens but not B, 3. A does not happen but B, 4. neither A nor B happen.Each of the four colors has an area representing the corresponding probability. Naturally all 4 areas sum to 1.The left and right big squres are simply the same areas just arranged differently to create a straight vertical or horizontal split.This leads to the side lenghts of the areas corresponding to the probabilities of A happening, B happensing and the dependent probsbilities (eg A happening given that B happens, or B happening given that A happens, etc).Given that two areas of the same color are the same size you can derive bayes theorem by converting the side length of one rectangle to that of the other same colored reactangle.For example: The pink rectangle has the area equal to the probability Pr(B∩A). the left pink rectangle is factorized into the side lengths of width=Pr(A) and height=Pr(B|A). So the area is widthheight = Pr(A)Pr(B|A) = Pr(B∩A). The right pink rectangle is factorized into Pr(A|B) and Pr(B) instead, but has the same area Pr(B∩A).Also you can see that the aligned side of two rectangle sum to 1. For example on the left side the height of the pink plus the height of the orange always sum to one.Now by changing the probabilities by dragging the sliders you can see that there are configurations in which the height of a rectangle on the left side differs very stark from the width of the rectangle on the right side. For example when Pr(B|A) is totally different than Pr(A|B). But there are also configurations when they match. laszlokorte 22 days ago [–] I added some explanation on the page itself now. Updating priors is hard.So many in my family still haven’t groked that Covid is endemic, that kids don’t have a 50% chance of being hospitalized after getting Covid, etc.I process their anxiety troubles as false priors due to too much media consumption, which has an economic incentive to have valuable fear-driven priors versus statistically accurate priors. I think a more basic problem is people don't even know how to formulate meaningful probability hypothetical/counterfactuals to begin with (let alone update them). For Covid stuff pretty much all an individual should care about is:P(Bad Stuff | I take action A) > P(Bad Stuff | I take action B)So you take action B in this scenario (simplifying to ignore costs, many of these decisions are costless). We get a bunch of meaningless drivel in the news though that doesn't help anyone make any meaningful probability estimates to help them make decisions. I think the Rogan/Gupta interview is a good example. We get various non-sense comparisons such as:P(Bad Stuff | Covid, Person 5, No Vaccine) < P(Bad Stuff | Covid, Person 50, Vaccine)[Rogan to Gupta why is it OK for 50 year old to feel safe and not a young kid without a vaccine? Irrelevant counter-factual unless someone invents a reverse aging treatment.]P(Heart Inflammation | Covid Vaccine, Young Male) > P(Death | Covid, Young Male)[Rogan saying side effects for young people outweigh benefits. This is true but death is quite a bit worse than the side effects, and this does not consider other Bad Stuff from Covid like long haulers.]Knowing Bayes Theorem doesn't help someone figure out the right probability statement they should be interested in to begin with. I prefer Gigerenzer’s idea of natural frequencies over the Venn diagrams. https://www.researchgate.net/figure/Gigerenzer-Hoffrage-Natu... There's a very cool 3Blue1Brown video  that helps give you an intuitive feel for Bayes Theorem. This is literally how they taught it to us in high school. Unfortunately it breaks down when things get a bit more complex so it's a bit of a trap for enabling advanced understanding. What complex situation you are referring to? I've used it in robot localization/filtering algorithm, and things can be simplified by assuming the process is stochastic. I think the problem is that the real application of Bayes theorem is to convert P(A|B) to P(B|A), and the visualisation doesn't map well that. You have P(B|A)P(A) P(A|B) = ------------- P(B) But there is nothing in the visualisation that helps you see what those ratios are ... even less so what it might mean to multiply P(B|A) and P(A). Here is my interpretation:Venn diagram shows the set of outcome of an experiment. But in many real world situation the set of outcome of an experiment is too large or difficult to work with. I think Venn diagram approach is very helpful if you finite set of outcome of an experiment that you can write/generate and draw/color those sets.It's right that P(B|A)P(A) is not intuitive, but if it is replaced by P(B cap A) or P(A cap B) suddenly you are back to the definition of probability of an event. Calculating P(B) using the law of total probability was very illuminating for me. Isn't graphical model a most intuitive and generalized visualization of Bayesian theory? Not only is it visual and intuitive, one can use it to solve practical problems as well. For instance, the famous Monty Hall problem is a simple tree for any one to grasp and calculate the final probability of each outcome. An interactive exploration of virus testing using Bayesian theory https://observablehq.com/@typehorror/how-to-virus-testing Personally I always rederive it when necessary asP(A|B) * P(B) = P(A U B) = P(B|A) * P(A)=>P(A|B) = P(B|A) * P(A) / P(B) I could understand this as someone who just knows algebra. The end part really shows how people shouldn't pretend to understand scientific studies and treat themselves. Is this making the front page because of the NYTimes article on pregnancy testing? Seems highly relevant to that discussion This is a great article but I wish the author had picked something a little more upbeat as example material. Author here... I wrote this after reading Yudkowsky's article (linked in mine) and had the example posted there in mind as I went through the article. This was many years ago and I too wish I had picked something more upbeat. :-( For sure man, I understand. As I said though, great article! Visualizations are important IMO in math and stats, even if people are “visual learners” in particular. Search: