
Bayes’ Theorem – What is it and what is it good for? - SimplyUseless
http://crucialconsiderations.org/rationality/bayes-theorem/
======
tjradcliffe
Bayes' Theorem tells us that the quest for certain knowledge, which drove a
great deal of science and philosophy in the pre-Bayesian era (before about
1990, when Bayesian methods started to gain real traction in the scientific
community) is much like the alchemist's quest for the secret of transmutation:
it is simply the wrong goal to have, even though it generated a lot of
interesting and useful results.

One of the most important consequences of this is noted by the article:
"Confirmation and falsification are not fundamentally different, as Popper
argued, but both just special cases of Bayes’ Theorem." There is no certainty,
even in the case of falsification, because there are always alternatives. For
example, superluminal neutrinos didn't prove special relativity false,
although they did provide some evidence. But the alternative hypothesis that
the researchers had made a mistake turned out to be much more plausible.

Bayesian reasoning--which is plausibly the only way of reasoning that will
keep our beliefs consistent with the evidence--cannot produce certainty. A
certain belief is one that has a plausibility of exactly 1 or 0, and those are
only asymptotically approachable applying Bayes' rule. Such beliefs would be
immune from any further evidence for or against them, no matter how certain it
was, essentially because Bayesian updating is multiplicative and anything
times zero is still zero.

There is a name for beliefs of this kind, which to a Bayesian are the most
fundamental kind of error: faith.

~~~
davmre
> Bayesian reasoning--which is plausibly the only way of reasoning that will
> keep our beliefs consistent with the evidence--cannot produce certainty.

To nitpick: Bayesian updating _can_ produce certainty, in exactly the way you
suggest: multiplying by zero. If the evidence you observed has zero
probability under a particular hypothesis, then the posterior probability of
that hypothesis will be zero. If the evidence you observe has zero probability
under _all_ hypotheses except for one, then posterior will give probability 1
to that hypothesis (assuming it had nonzero prior probability).

This won't come up if you're stick to densities like Gaussians that are
supported everywhere. And it's certainly a good principle of model design to
always allow your beliefs to be changed by new evidence (consistency theorems
for Bayesian inference do depend on assumptions about the support of the prior
and likelihood). But there's nothing formally preventing you from designing
Bayesian models that rule out hypotheses with total certainty. In fact, this
is what allows classical logic to be a special case of Bayesian reasoning.

------
jhallenworld
I recently started to think about connections between Bayes' Theorem and fuzzy
logic:

[http://sipi.usc.edu/~kosko/Fuzziness_Vs_Probability.pdf](http://sipi.usc.edu/~kosko/Fuzziness_Vs_Probability.pdf)

    
    
        (also from wikipedia on fuzzy logic):
    
        "Bruno de Finetti argues[citation needed] that only one 
        kind of mathematical uncertainty, probability, is 
        needed, and thus fuzzy logic is unnecessary. However, 
        Bart Kosko shows in Fuzziness vs. Probability that 
        probability theory is a subtheory of fuzzy logic, as 
        questions of degrees of belief in mutually-exclusive 
        set membership in probability theory can be represented 
        as certain cases of non-mutually-exclusive graded 
        membership in fuzzy theory. In that context, he also 
        derives Bayes' theorem from the concept of fuzzy 
        subsethood. Lotfi A. Zadeh argues that fuzzy logic is 
        different in character from probability, and is not a 
        replacement for it. He fuzzified probability to fuzzy 
        probability and also generalized it to possibility 
        theory. (cf.[10])"

~~~
darkxanthos
Thank you for sharing this! It stretched my mind a bit. :)

------
shoo
Here are a few tangentially related things that may be of interest:

(i) MacKay's book on Information Theory, Inference, and Learning Algorithms:
[http://www.inference.phy.cam.ac.uk/itila/](http://www.inference.phy.cam.ac.uk/itila/)

(ii) Probability Theory As Extended Logic:
[http://bayes.wustl.edu/](http://bayes.wustl.edu/)

(iii) Causal Calculus: [http://www.michaelnielsen.org/ddi/if-correlation-
doesnt-impl...](http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-
causation-then-what-does/)

(iv) I recall reading a pretty good blog post a year or two ago that described
how to implement some kind of Bayesian token recognition thing to parse screen
captures from some database (or something roughly like that). The gist of the
approach was like this:

1\. define a model expressing that certain combinations of neighbouring tokens
are more likely to occur than others 2\. approximate the full Bayesian
inference problem as MAP inference 3\. the resulting combinatorial
optimisation problem could be encoded as a relatively easy mixed integer
program 4\. easy mixed integer programs are very tractable to commercial
solvers such as CPLEX, Gurobi, or sometimes even the open source COIN-OR CBC

At the time I found the idea fascinating as I was working with LPs/MIPs and
had some interest in Bayesian inference, but hadn't figured out that the
former could provide a way to computationally tackle certain approximations of
the latter.

I cannot for the life of me find the link again for this.

------
le0n
“Seeing the world through the lens of Bayes’ Theorem is like seeing The
Matrix. Nothing is the same after you have seen Bayes.”

I'm pretty sure this is an instance of cognitive bias.

~~~
RogerL
Sure. We all have biases. Almost everything I think about in engineering is
passed through two filters: bayes, and nonlinear optimization[1]. It's
enormously useful, and leads me to a lot of insights that others don't come up
with. But it is a bias, and we always have to guard against it leading us down
the wrong path.

[1] By this I mean I always ask myself - am I incorporating all information,
and in a probabilistic (Bayesian) way. If not, have I analytically proven that
I can discard the information (dimensionality reduction). If I haven't proven
it, my 'go to' assumption is that information, no matter how noisy, should be
incorporated until I can prove analytically or empirically that it isn't
needed. In more concrete terms people endlessly hand wave "that isn't
important" when I ask a question, but then I go prove it is important. It's a
cheap trick in some sense, but it sure does work. Don't throw away
information. Likewise, I view everything as a nonlinear
estimation/optimization problem. I think in terms of manifolds and surfaces -
what are my variables, what can I vary, can I vary them smoothly (is the
surface locally smooth and continuous). In concrete terms, maybe you are
trying to figure out what features to add to a product. Lots of choices, lots
of unknowns. Can I iteratively come to an answer in an agile way, do I have to
make some discontinuous jumps, what step size should I use, etc. It's all just
'mathy'. Meaning I don't have analytic equations for these decisions, but
thinking about it as if it is is _usually_ very informative.

So I 100% agree with the quote.

------
dimino
My biggest issue with Bayes' Theorem as a method of making everyday decisions
is that it assumes the ability to accurately assess the underlying likelihoods
of events taking place, especially on-the-fly.

I would even argue that it's actually providing a _false_ sense of precision
because the sig figs are oftentimes not correctly represented.

~~~
tjradcliffe
This is not a problem with Bayes' Theorem. Any alternative method of updating
beliefs will suffer from exactly the same problem of noisy inputs, and have
the the additional problem that it cannot maintain consistency with all the
evidence (which only Bayes' rule is capable of doing.)

Using Bayes' rule consistently will make you aware of how uncertain the inputs
are, and that is a feature, not a bug.

~~~
dimino
How will it make you aware of uncertainty? At some point you _do_ have to
guess (called "estimating" here), do you not, and that will have a compounding
effect on the outcome. You incorrectly guess a probability somewhere only by a
small amount, and it multiplies its way through to the result, and you're
looking at a potentially huge difference in resulting probability, which could
_easily_ span the "will act" or "won't act" gap.

~~~
Houshalter
I think it's worth keeping the principles of bayesian reasoning in mind. The
idea that you should update hypotheses on evidence, and adding probability in
one place takes it from everywhere else slightly, keeping track of prior
probability, etc.

Not that you should actually do mental calculations on made up probability
estimates. I mean you can do that, and if your estimates are at all decent,
the result might be better. But I don't think anyone actually recommends that.

~~~
dimino
Judging from the content of lesswrong, I actually think literal math is what
many people _do_ recommend, and that's what bothers me.

~~~
TeMPOraL
In some cases you might want to actually do that math, even if you have to
guesstimate numbers, since it still will beat your intuition.

[http://slatestarcodex.com/2013/05/02/if-its-worth-doing-
its-...](http://slatestarcodex.com/2013/05/02/if-its-worth-doing-its-worth-
doing-with-made-up-statistics/)

But in general, no one has enough computing power in their heads to go and do
explicit bayesian updates on everything all day long. You have to pick your
battles, and use the right tools for the task at hand.

~~~
davidgerard
That post is like nobody there has ever heard the phrase "don't fall in love
with your model".

Quite a lot of LessWrong posts are of the theme "my model gives this
counterintuitive result" \- the trouble is they go on to "AND THIS IS VERY
IMPORTANT AND SIGNIFICANT!!" rather than "hmm, maybe my model needs work."

------
DennisP
> the Standard Model of particle physics explains much, much more than
> thunderstorms, and its rules could be written down in a few pages of
> programming code.

As a programmer who doesn't know advanced math, I'd really like to see that
code, in literate form.

~~~
davidgerard
This is a Yudkowskyism, i.e. don't expect to see the code, or if you do then
expect it to have obvious defects.

------
lucb1e
Hint: changing the font to Arial improves readability a lot and actually
displays italics where the author used them.

------
jrgnsd
I recently had the need for Bayes Classifier[1] in a couple of projects, so I
wrote a service that exposes one through an API. You can set up your prior set
and then get predictions against that set.

I haven't gone through the trouble of making it suitable for public
consumption yet. Would anyone be interested in consuming such a service?

[1]:
[https://en.wikipedia.org/wiki/Naive_Bayes_classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)

------
gedrap
Speaking of Bayes, there's a great book by Allen B. Downey 'Think Bayes'
[http://www.greenteapress.com/thinkbayes/](http://www.greenteapress.com/thinkbayes/)
available as free PDF or (if you wish to support the author, which I did) a
paperback from Amazon.

It teaches Bayes theorem accompanied with Python code examples, which I found
really useful.

------
Pamar
This is excellent and finally prompted me to ask how to use Bayes more in my
life:
[https://news.ycombinator.com/item?id=9782767](https://news.ycombinator.com/item?id=9782767)

------
mkramlich
I'm in the middle of designing and building a system which uses Bayesian
models.

One thing that struck me early is that while Bayes itself is rock solid, like
arithmetic, when you go to apply it the results live or die on the quality of
the models, and the relevance/realism of the evidence used to train them.
GIGO.

But once you do have a good, relevant, signal-producing model, then, using it
is a bit like doing a multi-dimensional lookup, or function call. Conceptually
easy to understand, and, in many cases (depending, of course, on the details)
cache-friendly.

~~~
shoo
"all models are wrong, but some are useful" \- Box.

I think the Bayesian approach is a good place to start, and provides a
coherent way to think about things.

Pragmatically, one might end up needing to introduce a few approximations into
the model, to make it computationally tractable, for example, but it is good
to be able to view this in the context of what the gold-plated theoretical
modelling approach would be.

Instead of doing something ad-hoc that appears to work, say.

~~~
RogerL
You can also augment the state to take this into account.

I have a model that says my system does F with Q amount of uncertainty, and my
measurements are Z with R uncertainty. But I have to give precise numbers for
R, when it is just an imprecise model or SWAG. I can add to my state a
parameter for how precise R is, and let the filter estimate it over time. Not
always, and it is noisy, but it can be done.

There are other approaches - use a filter bank, each with a different set of
assumptions. Run 'em all, and either pick one or blend them, depending on your
scenario. 'Depending' being the topic of many a PhD thesis, but again, very
doable in practice for many problems.

