

Bayesian updating of Probability Distributions - darkxanthos
http://www.databozo.com/2013/09/15/Bayesian_updating_of_probability_distributions.html

======
bluecalm
Once you figure out simple examples you slowly start thinking this way about
the world. It's beautiful. Take people for example:

Someone with open mind has a priori with at least slight probability assigned
to unlikely (for them!) hypothesis while on the other hand very religious
people for example have 0 in their priori when it comes to possibility of
their religion being made up so they are forced to ignore evidence to the
contrary (because bayesian updating breaks for them due to division by zero
and mind's way to signal this exception is denial). In general someone with a
lot of weight on given hypothesis is "stubborn" or just very convinced and
someone with uniform or close distribution just doesn't know anything about
given problem.

Someone unable to build heavily weighted distributions is a conspiracy
theorist, someone reluctant to - a sceptic and someone too much eager to a
fanatic. Someone with very bad priors is un/badly educated (in given domain)
or biased or maybe just stupid, someone with good priors is an expert. It's
possible to combine expert with sceptic attitude or expert with fanatic or all
too often stupid with fanatic (very bad and very heavily weighted priors with
possible 0's on some options).

Once you start thinking this way you start expressing yourself differently,
you start adding those probability qualifiers to your sentences: "I am very
sure it's the way to go", "My intuition tells me this but I am not really
sure", "I am very convinced and it's not worth discussing" (yes, it can be
rational and good attitude) or "I would do X but I need more evidence to be
reasonably sure".

It's all there in people's mind, language and interactions once you start
thinking this way it's whole new world of perspective and understanding.

~~~
darkxanthos
Yup it's true. I can seem like a Bayes nut at work as I mention updating my
priors during debates and when discussing hypotheses for split tests.

------
spicyj
A question for people who know more about this than I do:

Why was the uniform distribution on [0, 1] chosen initially? Choosing a
different distribution would give a different result. (And it doesn't make
much sense to say, "Always choose the uniform distribution!" because the
choice of variable affects the meaning of the distribution -- if instead we
wonder about the value of p^2 and choose a uniform distribution for it on [0,
1], won't we get a completely different result?)

~~~
bjterry
If you were holding a physical coin in your hand, you would have to be crazy
to select a uniform distribution unless it was shaped like a sphere. This is
sort of a really minor pet peeve of mine when people use coin-flipping as an
example for these things. If it's even vaguely coin-like, even the most
ridiculous distortion (maybe it's made of uranium on one side and aluminum on
the other) probably couldn't bring the true probability past 70% or something.

~~~
spicyj
So what would you use as a reasonable prior here?

~~~
bermanoid
If it's a normal looking coin being flipped in the air and caught before
presentation, the only reasonable prior is that almost all of the probability
mass should be that it's completely fair. Maybe I'd leave 1% to spread amongst
the rest just in case the adversary has some seriously devious tricks up their
sleeve.

Because physics - it's not possible to bias a rigid body so that it rotates
with non-constant angular speed when flipped, as long as air resistance can be
neglected, and that means that a fair flip gives 50/50 odds as to what side
you catch it on. (Edit: clarity)

If other stuff is going on, like you're letting it bounce or something, then
it depends on the particulars. It's rather easy to load a die, for instance.

------
davidf18
A good (free) explanation of Bayesian Stats using Python: Downey's Think Bayes
[http://www.greenteapress.com/thinkbayes/](http://www.greenteapress.com/thinkbayes/)

~~~
darkxanthos
This helped me the most to get started.

------
tomrod
I love seing people figure this out for the first time. Bayesian methods just
start to make more sense after working through something like this.

------
skybrian
This code has no memory other than the prior probability distribution. It
seems like if you had previously flipped a coin a thousand times to generate
it, your prior beliefs should be more strongly held than if you had just made
up some numbers. Shouldn't the number of previous trials be accounted for
somehow?

~~~
dthunt
In fact, it is accounted for. You'll notice the negative exponent on the
unlikely hypotheses is getting pretty extreme after a handful of flips.

~~~
bermanoid
In case it's not clear (that wording doesn't really click for me, since there
are no exponents involved), the way this information is encoded is in the
shape of the distribution itself. If there have been very few observations, it
will be very wide, but after many it will narrow to a tiny spike.

In this context, at least, the prior distribution encodes everything - there's
no meaning to the idea that you're more or less confident in the prior,
because the prior already represents your uncertainty about the outcomes. If
you were 50/50 on this prior versus another one, then your actual prior would
be the average of the two.

There is a subfield of statistics that deals with imprecise probabilities, but
that's a whole other can of worms and doesn't really relate to this problem.
That said, it's fascinating stuff, and very useful in some contexts (if you're
uncertain about your priors, it can be useful to do sensitivity analyses to
figure out exactly how the end result depends on your prior).

~~~
dthunt
What I meant was that for h_1%, you're going to wind up with probabilities in
the area of like, 1.2e-34, very quickly, if the coin is a fair one. The update
process is adjusting via multiplication, and for a really bad hypothesis,
that's going to bring its probability dramatically close to zero without that
many trials. Even though the absolute difference between 1e-4 and 1-e34
hypothesis feels smallish when you look on a linear scale from 0-1, a 1-e34 is
a lot 'stickier'.

Your explanation has the benefit of being a better explanation; my goal was
just to explain where the inertia was hiding.

------
maaku
A good general introduction to Bayesian updating:

[http://yudkowsky.net/rational/bayes/](http://yudkowsky.net/rational/bayes/)

------
gtani
for folks looking to delve into the subject, Barber and McKay's freely
available content books are terrific

[http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...](http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage)

[http://www.inference.phy.cam.ac.uk/itila/](http://www.inference.phy.cam.ac.uk/itila/)

~~~
liranz
If you want a lot of Python examples, you should check out this great book:
[http://camdavidsonpilon.github.io/Probabilistic-
Programming-...](http://camdavidsonpilon.github.io/Probabilistic-Programming-
and-Bayesian-Methods-for-Hackers/)

