
Bayes Theorem: A Framework for Critical Thinking - neilkakkar
https://neilkakkar.com/Bayes-Theorem-Framework-for-Critical-Thinking.html
======
roenxi
The problem I leap to is that this article seems to suggest that our thinking
process is Bayesian. That is unlikely.

We update our beliefs based on new information, but the whole point of
Bayesinaism via Bayes Theorem is updating it by a very specific amount based
on evidence strength. Nobody is approximately Baysian in their thought
process, and I doubt most people can even be trained to be. Statistics is a
pen & paper exercise for the most part.

In my experience, which is not negligible, is the hardest part of statistics
is talking people down from beliefs they have settled on because of something
that looks like statistical evidence, but in fact is not.

~~~
neilkakkar
Fair point, the beginning section does seem to suggest that. I have a note in
the middle explaining this is not the case.

I make the connection to demonstrate that Bayes is not some arcane statistical
artifact, but qualitatively in line with intuition, albeit miscalibrated. This
opens up avenues to _then_ move towards ideal bayesian-ness.

~~~
roenxi
Ah, hey, didn't see you were also the submitter. I enjoyed your writing; it is
interesting to run across systems thinkers.

And just to fill up the word count; if you haven't read up on the major
statistical paradoxes it is possible you'll enjoy them.
[https://en.wikipedia.org/wiki/Category:Statistical_paradoxes](https://en.wikipedia.org/wiki/Category:Statistical_paradoxes)
for your attention. If you are playing with stats for the framework then the
other half of the fun is delving into the paradoxes; knowing them by heart is
a great trick for interpreting evidence. Simpson's, Berkson's and the Elevator
Paradox explain a lot of life.

~~~
neilkakkar
Ah, yes indeed! Thank you.

I really enjoyed this explanation of Simpson's Paradox:
[http://michaelnielsen.org/reinventing_explanation/](http://michaelnielsen.org/reinventing_explanation/)

~~~
tmoertel
A better, more satisfying explanation for Simpson's Paradox comes from Pearl
et al.'s work in modern causation theory, in which they point out that the
paradox exists only because we fail to consider the causal structure that
generated the apparently paradoxical data. If we take that structure into
account and are clear about the causal question we are trying to answer, the
paradox disappears. For a recent example involving COVID-19, check out this
blog post:

[http://causality.cs.ucla.edu/blog/index.php/2020/07/06/race-...](http://causality.cs.ucla.edu/blog/index.php/2020/07/06/race-
covid-mortality-and-simpsons-paradox-by-dana-mackenzie/)

~~~
srean
Hi Tom I used to enjoy your blog a lot and your posts on G+ pray continue
blogging.

------
Emphere
Others have already raised this point... but let me try to reiterate. The
problem of getting priors is not just one of "acquiring more information". In
many cases it's not even clear what such a probability means. For example, you
believe that Trump is the 45th POTUS...and you assign it a prior of 0.8...what
does the probability mean in this case? In the case of rolling dice it's clear
what each probabilities mean, but not in this case. And in any case, how much
should you update your probabilities for any given piece of prior? All of
these questions (how to assign priors, how much to update them etc.) are the
_crux_ of Bayesianism and Bayesianism itself has little to say about it. The
founders of Bayesianism itself were aware of these issues. For a more
substantive critique read the following.

[https://www.jstor.org/stable/20079192?seq=1](https://www.jstor.org/stable/20079192?seq=1)

Bayes is a good _tool_ but to me it's a very small one and it doesn't and
_cannot_ do most of the heavy lifting of how to live my life. Suppose I want
to decide what I should do next week, Bayes is close to useless for that. And
that is certainly something "critical thinking" should help me with.

Keywords to search for "small worlds vs large worlds Bayes"

~~~
john-shaffer
The book "Probability Theory" by E.T. Jaynes is absolutely amazing at
explaining how to actually think about Bayesian probabilities, how to assign
priors, and the implications of different priors and posteriors. He explains
it far better than I can, but one of the major points is that the prior
becomes increasingly irrelevant with less data. If we have a lot of data, then
we can assign just about any reasonable prior and still get accurate results.
If we don't have much data, then the prior has a large influence. In this case
we can't be as confident in the accuracy of our posterior probability, but we
can calculate just how much confidence we can have in it.

Your example is hard to get into because it just doesn't make sense. The
probability of Trump being the 45th President is 1. Garbage in, garbage out
applies here.

If instead we step back to 2016 and say that we know the next president will
be blue or red, then it's reasonable to assign the maximally uninformative
prior of 0.5 to each outcome. Each additional piece of information modifies
the probability. If we learn that the red candidate has been caught on tape
speaking about groping women, we would calculate a new, lower posterior
probability. How much lower depends on what we can determine about how much
this hurts his election chances. We will get better results if we can estimate
this factor accurately. This won't be much of an issue if we have many other
pieces of data so that this one piece doesn't have a huge influence, but if
this is our only piece of data then its accuracy will have a huge effect on
the accuracy of our conclusion. For the sake of example, let's say that we our
best information provides us with a posterior probability of 0.25. This
posterior will be our prior when the next piece of information comes in (i.e.,
it's no longer an uninformative prior because it now contains past
information).

Now suppose we see on TV that Trump has won the election. We don't have any
money riding on the exact percentage, so let's just estimate the probability
of the TV report being correct at 0.99. If we plug this in to Bayes' Theorem
with our prior of 0.25, we get a new posterior of 0.97. We're not very
confident in this posterior because we used an uninformative prior with 2 very
weak updates, but if we really care about accuracy then we can seek out more
and better data and get a far more accurate posterior. Moreover, we can
calculate the confidence we have in the posterior based on the prior and the
data.

Now in 2020 when the election can no longer be challenged, we update our
posterior again, but it's not very interesting. Bayes' Theorem still applies,
but since we know for a certainty who POTUS 45 is, the prior (0.97) factors
out. We get a posterior probability of 1. Yet even though we've only used 3
pieces of data, we can calculate our confidence in our posterior as being
extremely high since one our data points was so strong.

I don't know how much it can help you decide what to do next week, but if part
of that decision involves calculating or reasoning about probabilities, than
you absolutely should understand Bayes' Theorem. It's helpful to know how to
think about it even if you can't assign exact numbers in the same way that
understanding geometric or logarithmic curves is useful.

------
mathattack
Very interesting topic. Glad the OP took it on. I’ve unsuccessfully tried to
teach non-statisticians to think this way.

Brings to mind the quote attributed to Samuelson and Keynes: “When events
change, I change my mind. What do you do?”

~~~
alexpetralia
I agree with this, but to play devil's advocate (and something I have not
resolved internally yet):

Assume a model states there is a 99% likelihood something will occur.

Now the data changes, and the likelihood drops to 1%.

Was the original model "correct" insofar that there was a 99% likelihood of
something occurring (given the information it had at the time)? Or should it
have "priced in" the fact that data may change substantially, and 99% was far
too overconfident?

How are we supposed to interpret variability in model estimates? Do we throw
up our hands and say "the data changed"? Or do we hold the models somewhat
accountable, saying - no, you weren't "right at the time, given your data". If
your estimates are changing so strongly, you are wrong. A 99% estimate that
drops to 1% is simply, undeniably "unreliable."

In this case, we somewhat care about "model robustness", but how does this
extrapolate to situations where the data changing _should in fact_ impact the
model substantially?

I suspect the answer necessitates a deeper look into the nature of
probability, risk and uncertainty.

~~~
fractionalhare
In general there are rigorous ways to quantify those kinds of model
deficiencies. From an informal perspective: if the change in data is small and
the change in model output is large, your model is poor (and in particular,
possibly overfit).

Formally what you're describing is the bias-variance tradeoff.[1] You can
assess this by looking at the conditioning of your model, which measures how
sensitive it is to changes.[2] Roughly speaking, the condition number of an
estimator (or generally, function) measures how large the change f(x) -> f(y)
in the range for the change x -> y in the domain, where x and y are close.
That will give you variance. If you try to minimize bias too much, you may
overfit your model and it would exhibit high variance in cross validation. If
you try to minimize variance, you may fail to capture relations in your
underlying data, which would exhibit higher bias.

Practically speaking, for your specific example: if a relatively small change
in the sample data resulted in the model adjusting its prediction to 99% from
1%, I would assume your model is severely overfit (I can't quantify exactly
how small without more context, but let's agree it's small). From a meta
Bayesian perspective, it would take quite a lot of further cross validation
for me to drop that belief ;)

_____________________

1\.
[https://en.m.wikipedia.org/wiki/Bias–variance_tradeoff](https://en.m.wikipedia.org/wiki/Bias–variance_tradeoff)

2\.
[https://mathworld.wolfram.com/ConditionNumber.html](https://mathworld.wolfram.com/ConditionNumber.html)

~~~
alexpetralia
What a great answer. Thank you so much!

------
lambdatronics
"It is very important to understand the following point. Probability theory
always gives us the estimates that are justified by the information _that was
actually used_ in the calculation. Generally, a person who has more relevant
information will be able to do a different (more complicated) calculation,
leading to better estimates. But of course, this presupposes that the extra
information is actually true. If one puts false information into a probability
calculation, then the probability theory will give optimal estimates based on
false information: these could be very misleading. The onus is always on the
user to tell the truth and nothing but the truth; probability theory has no
safety device to detect falsehoods."

G. L Bretthorst, "Baysian Spectrum Analysis and Parameter Estimation,"
Springer-Verlag series 'Lecture Notes in Statistics' # 48, 1988, p30-31.

------
new2628
Controversial opinion: Bayes Theorem is overrated. In real life usually we
have no idea about priors, and we have close to zero chance to get any good
estimate of the true probability of something. But we can still get by fine
for the most part, by focusing on limiting possible loss and staying on the
safe side with large margins.

Many of the claimed cognitive biases go away under this view. One textbook
example of Bayes theorem is how doctors overestimate the probability of being
positive for a disease. But what are the priors? Maybe those who visit the
doctor did something risky the day before or are feeling funny. Maybe the cost
of false positive is negligible compared to the cost of a false negative, etc.
People are less stupid than what the TED talk crowd claims.

~~~
xorfish
It is an advantage that priors must be explicitly chosen.

There is always a prior. The question is how aware you are of it.

~~~
edna314
How is explicating the prior an advantage? If the prior is arbitrary anyways
you could also stick to your unknown prejudice. This shouldn't change any
results and if it does you are in trouble anyways, no matter if you explicitly
state your prejudice. I'm still suspecting that Bayesian statistics is just
kind of a hack to make results look more convincing.

~~~
denzil_correa
> If the prior is arbitrary anyways you could also stick to your unknown
> prejudice

One way to think about a prior is to make your prejudices transparent rather
than unknown.

~~~
edna314
But, this might be negative, because you can’t consciously tweak an unknown
prejudice. But, you can tweak a prior until your results support your
hypothesis. In that sense, Baysian statistics might be more transparent, but
less honest.

~~~
denzil_correa
Bayesian approaches are more transparent regardless of them being "honest" or
not.

~~~
edna314
True, but the question is if transparency is desirable. I would say it is
dangerous for three reasons. First, you might be tempted to tweak your prior
until your posterior confirms your hypothesis. Second, using Bayesian
reasoning, you make it seem that the first procedure is justified. And third,
if everyone does the tweaking for example within in a scientific community,
nobody would complain, since everyone automatically would confirm their
hypothesis with higher posterior probability.

~~~
srean
I might be tempted to walk to dangerous places, lets avoid walking, I would
say it is dangerous because it can be abused.

~~~
edna314
I didn't say that you should avoid Bayesian statistics.

------
loftyal
3Blue1Brown came out with a fantastic explanation of bayes theorem:
[https://www.youtube.com/watch?v=HZGCoVF3YvM](https://www.youtube.com/watch?v=HZGCoVF3YvM)

------
greyface-
Also, very possibly the basis of our neural function
[https://en.wikipedia.org/wiki/Bayesian_approaches_to_brain_f...](https://en.wikipedia.org/wiki/Bayesian_approaches_to_brain_function)

------
trashtester
This article confirmed by prior beliefs, so I like it.

------
usgroup
I think there is something to be said for when one first discovers Bayes; not
least because its often presented like this article as a sort of
enlightenment. Perhaps the finest work on the topic is probability theory by
Jaynes; if the underpinning of the Bayesian sect was to have a bible that
would be it and BDA3 would be its practical counterpart. I say this both
owning and having read both and having invested a a lot of time purveying,
learning and applying the Bayesian perspective.

What Bayes makes possible, from an applied statistics perspective, is a kind
of unification of a large variety of modelling approaches into a single
framework _fitted in the same way_ : simple regression yes, but also
hierarchical models, mixtures, pooled, partially pooled, hurdle, regularised,
horse shoe, and so on. So when you learn Stan or PyMC3 or Nimble or whatever,
you're enabled to go forth and make myriad custom models: this is powerful,
and it is enough to respect Bayes.

Various results show that lots of other models can in principle be expressed
in a Bayesian way given the right prior; hinting in some theoretical sense
that Bayes is a universal modelling approach: the panacea you've been looking
for young statistician.

However, Bayes has many epistemological problems and for the actually
interested reader see here for a summary:

[https://plato.stanford.edu/entries/epistemology-
bayesian/](https://plato.stanford.edu/entries/epistemology-bayesian/)

There are thus lots of reasons to believe that we do not think and _should not
think_ in a Bayesian way.

For other mind expanding papers consider Andrew Gelman (very prominent
Bayesian) and Cosmo Shalizi (of 3 toed sloth fame):

[https://arxiv.org/abs/1006.3868](https://arxiv.org/abs/1006.3868)

and Breiman (of random forest fame):

[https://projecteuclid.org/euclid.ss/1009213726](https://projecteuclid.org/euclid.ss/1009213726)

For those, like myself, that are out there (like Breiman was) trying to
actually solve real world problems; you find yourself quickly limited by the
Bayesian approach. The data generating process of the real world is totally
not obivious MOST of the time but you are forced as a Bayesian to pretend
otherwise. As much as I love problems where Bayes does work well; they are
fairly few and far between for me.

So as a closing comment; lets not "Bayes all the things", as tempting as it
is. It is in many respects the first part of the journey for many avid
evangelists and self confessed "how I became a Bayesian" converts, but its not
the end all.

~~~
neilkakkar
Author here. I agree with some of your concerns.

From the post:

> The mind doesn’t always work like Bayes Theorem prescribes. There’s lots of
> things Bayes can’t explain. Don’t try to fit everything you see into Bayes
> rule. But wherever beliefs are concerned, it’s a good model to use. Find the
> best beliefs and calibrate them!

The one concern I'm really interested by, and don't understand, is this:

> There are thus lots of reasons to believe that we do not think and should
> not think in a Bayesian way.

Can you give an example that can serve as motivation to read the entire paper
linked?

~~~
usgroup
See here:

[https://plato.stanford.edu/entries/epistemology-
bayesian/](https://plato.stanford.edu/entries/epistemology-bayesian/)

Section 6.2 gives a very concise list of problems.

For further reading have a go at para-consistent logic for more total upheaval
of the Jaynesian view of the centrality of predicate logic as the principal
language of science (of which Bayes is a probabilistic relaxation according to
Jaynes).

------
johndoe42377
Bayesians is a textbook example of a sect.

This is just an abstract framework superimposed on some aspects of reality and
has nothing to do with its processes.

Let me repeat: a superimposed set of abstractions. Nothing more.

~~~
usgroup
I think you're quite right about that which is not to say Bayes can't be very
useful but I do agree that it has been elevated to the level of religion.

~~~
fractionalhare
Yeah. Further context, for other readers: it's important to distinguish
between Bayesianism the intellectual movement and Bayesian modeling as it
exists in the professional and academic statistical community.

There are communities which pick up Bayes Theorem in isolation (i.e. without
any other statistical education or knowledge) and apply it to every debate and
discussion under the Sun, often pontificating about how rational they're being
about "updating their priors." But using Bayes theorem qualitatively instead
of quantitatively - and without regard for its pitfalls - often leads to
confirmation bias in which someone lends a veneer of rigorous probability to
something they already unconsciously believed anyway. It's also sometimes used
to dress up pseudoscience with intellectual polish, and it takes work to
unpack those errors.

Bayesian probability is not a panacea, and does not displace frequentist
probability. The two paradigms are fully compatible (you can move from one to
the other via inversion of the estimator function). But they answer different
questions, both of which are important. It is also easy to make statistical
errors in both paradigms.

~~~
usgroup
This.

