
Think Bayes: Bayesian Statistics Made Simple (2012) - mycat
http://www.greenteapress.com/thinkbayes/html/thinkbayes001.html
======
fpoling
For me the best so far book on Bayesian probability was "Probability Theory:
The Logic of Science: Principles and Elementary Applications" by E. T. Jaynes.

The book starts from the deduction of Bayesian theorem from the first
principles of logic and shows its applications to a wide range of topics.
There is thorough discussion of various "paradoxes" and the author sharply
criticizes the frequentist statistics. In addition there is a lot of
historical references.

~~~
CalChris
_Probability Theory_ is available as a PDF.

[http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...](http://www.med.mcgill.ca/epidemiology/hanley/bios601/GaussianModel/JaynesProbabilityTheory.pdf)

~~~
rtehfm
Thanks. I was about to buy the book on Amazon for ~$90 but at least now I can
give it a read first to see if it's worth, for me at least, buying.

------
boostedsignal
For those unclear on the concrete (rather than philosophical) difference
between Bayesian and frequentist statistics in the first place, I hope it's
not inappropriate for me to share this 5-minute example that I wrote a while
back:
[https://news.ycombinator.com/item?id=11096129](https://news.ycombinator.com/item?id=11096129)

~~~
wodenokoto
You write that the frequentist doesn't answer the question, but it does. It
answers

    
    
        P(H') = (H/H+T)^H'
    

You also write that the frequentist solution fails to give an error estimate,
yet you don't show that the Bayesian solution does give one.

If the goal of the article is to show that Bayesian is more correct than
frequentist then it leaves the reader unconvinced. If the goal is to show 3
ways of finding a probability, you should either say each is fine under its
own paradigm, or argue why only one paradigm is correct.

~~~
boostedsignal
> You write that the frequentist doesn't answer the question, but it does. It
> answers: P(H') = (H/H+T)^H'

The question was asking for P(H' | H, T), not P(H').

> You also write that the frequentist solution fails to give an error
> estimate, yet you don't show that the Bayesian solution does give one.

Because there _is_ no error? In the proof I assume P(p) is known and then
after that every step follows from a law of probability. There is no error to
be accounted for in the procedure. The only caveat is that we need to know
P(p) to be able to perform the procedure, which is a caveat that I point out
at least 3 times in the page.

~~~
zAy0LfpBZLC8mAC
> The only caveat is that we need to know P(p) to be able to perform the
> procedure

I think this is a very confusing way to put it. P(p) is not an objective value
that you can know or not know, it is rather a model of our subjective
knowledge, and therefore it doesn't really make sense to say "the caveat is
that we need to know what our knowledge is" ... yeah, we do, but that is
always the case by definition, so pointless to bring up.

------
baxtr
I regularly forget how Bayes works. Everytime that happens I browse up to that
page: [https://www.bayestheorem.net/](https://www.bayestheorem.net/)

I love the way it’s explained there.

~~~
Gravityloss
How Bayes kinda works, or how I see it.

Conditional probability (with some caveats that someone in the comments can
fill in on):

    
    
        P(a,b) = P(b,a)
        P(a|b) * P(b) = P(b|a) * P(a)
        P(a|b) = P(b|a) * P(a) / P(b)
    

a can be model and b can be data so it becomes

    
    
        P(model | data) =
        P(data | model) * P(model) / P(data)
    

We have or can estimate the things on the right side. We want to ultimately
get the thing on the left side.

~~~
Jach
To clear up your first set to have conditional probabilities for everything,
Bayes' theorem is just a restatement of the product rule:

    
    
        p(a and b | context c) = p(a|b,c) * p(b|c)
                               = p(b|a,c) * p(a|c)
        or = p(a|c)*p(b|c) = p(b|c)*p(a|c) if a and b are independent of each other
    
        so Bayes only matters when there is dependence:
        p(a|b,c) = p(a|c) * p(b|a,c) / p(b|c)
    
        otherwise it's just p(a|c) = p(a|c)
    

I like to put things in that order because p(a|c) is the "prior belief" and
with some handwaving say things like "updated belief = prior belief and new
evidence about belief".

------
epalmer
My youngest has Allen Downey as a professor this year. She says he is crazy.
And she means this in the best way possible. His productivity is prolific
having written Think Java in 13 days. He memorized pictures and bios of all 90
students in the first year class at Olin College of Engineering.

Edit typo

~~~
jabretti
>He memorized pictures and bios of all 90 students in the first year class at
Olin College of Engineering.

It's impressive not so much that he did that, but that he bothered to try.

Most lecturers (myself included) will try very hard _not_ to learn anything
about their students because they consider actually dealing with undergrads
(particularly first-years!) on an individual level is beneath them.

~~~
epalmer
At Olin everyone is an undergraduate. Olin is about reinventing engineering
education. They consider faculty as very important to the process but they are
guides not instructors. The students most of the time have to seek information
and approaches out.

------
partycoder
Thanks for posting this. The Jupyter notebooks (and the fact Github has built-
in support for them) really help illustrating the concepts.

The book I've used so far to study is "Probability and Statistics: The Science
of Uncertainty", by Michael J. Evans and Jeffrey S. Rosenthal. This book is
not being published anymore and is free in PDF form.

~~~
bhattisatish
The book you mentioned is available at
[http://www.utstat.toronto.edu/mikevans/jeffrosenthal/](http://www.utstat.toronto.edu/mikevans/jeffrosenthal/)

~~~
partycoder
May also want to take a look at: "Introduction to Statistics and Probability
using R"

[https://cran.r-project.org/web/packages/IPSUR/vignettes/IPSU...](https://cran.r-project.org/web/packages/IPSUR/vignettes/IPSUR.pdf)

R has builtin functions for most of your needs. You can get a lot done with
very few code.

------
innocentoldguy
“I broke this rule because I developed some of the code while I was a Visiting
Scientist at Google, so I followed the Google style guide, which deviates from
PEP 8 in a few places. Once I got used to Google style, I found that I liked
it. And at this point, it would be too much trouble to change.”

Why would you write a book that targets the Python community and ignore PEP8
styling, inconveniencing an entire community, simply because it would be too
much trouble for you to change?

“Also on the topic of style, I write “Bayes’s theorem” with an s after the
apostrophe, which is preferred in some style guides and deprecated in others.”

It is deprecated in all modern style guides and should not be used. You’ll get
dinged in college English and writing classes for using this outdated and
redundant style.

I’m sure this book is great, but, as a point of constructive criticism, I
would suggest the author do a better job at adhering to the styles of code and
English expected by his target audience, rather than what is comfortable for
him.

~~~
leephillips
From the PEP8 style guide:

"Many projects have their own coding style guidelines. In the event of any
conflicts, such project-specific guides take precedence for that project."

and

"A Foolish Consistency is the Hobgoblin of Little Minds".

And, throughout, PEP8 makes it clear that it is a set of recommendations, and
that if a project or community already has an established style, it need not
be changed.

~~~
franklin_cobb
Why are you arguing against PEP8? As you mentioned in your final sentence, the
Python community DOES have an established standard. It is called PEP8. The
parent has made a valid point. Why would you criticize or trash his "karma"
for stating it?

~~~
leephillips
You say I'm arguing against PEP8 by quoting PEP8? That's hard to understand.
Whose karma am I "trashing", and how? I certainly didn't downvote him, if
that's what you mean.

------
emerged
My introduction to Bayesian probability was accidentally reinventing it while
trying to invent my own AI system. It naturally followed by constructing a
network of information which could be queried to get back whatever had been
fed into it and perform deduction/induction.

------
nafizh
previous discussion:

[https://news.ycombinator.com/item?id=4634843](https://news.ycombinator.com/item?id=4634843)

~~~
stablemap
With some comments from the author. Hopefully he’ll see this too.

------
platz
I was surprised to learn about the 'Likelihoodist', an interpretation of bayes
that avoids ambiguities with choosing a prior.

An Introduction to Likelihoodist, Bayesian, and Frequentist Methods

[http://gandenberger.org/2014/07/28/intro-to-
statistical-](http://gandenberger.org/2014/07/28/intro-to-statistical-)
methods-2/

------
folksinger
Let's take a recent election as an example:

A Bayesian pollster began with a certain set of prior probabilities. That the
college educated were more likely to vote in previous elections, for example,
informed the sample population, because it wouldn't make much sense to ask the
opinions of those who would stay home.

Thus, based on priors that were updated with new empirical data, a new set of
probabilities emerged, that gave a certain candidate a high probability of
victory.

Members of the voting public, aware of this high probability, decided that
this meant with certainty that this candidate would win and therefore decided
to stay home on election day.

In reality the Bayesian models were incorrect as amongst other factors, a much
higher number of non-college educated individuals decided to vote and to vote
for the other candidate.

As it is with Bayesian intelligence, shared as much by pollsters as machine
learning algorithms:

    
    
      Real-time heads up display
      Keeps the danger away
      But only for the things that already ruined your day.

~~~
clircle
I suppose you aren't talking about Andrew Gelman...
[https://www.nytimes.com/interactive/2016/09/20/upshot/the-
er...](https://www.nytimes.com/interactive/2016/09/20/upshot/the-error-the-
polling-world-rarely-talks-about.html)

~~~
folksinger
You mean the same Andrew Gelman who did not predict the election of Trump and
took the time to reflect on the issues with polling methodology?

[http://andrewgelman.com/2016/12/08/19-things-
learned-2016-el...](http://andrewgelman.com/2016/12/08/19-things-
learned-2016-election/)

I'll have to write another poem about pithy rebuttals that cherry-pick a
counter-narrative!

Now, what rhymes with anecdotal...

------
raister
It should also have one called "Think Markov Chain Monte Carlo" \- even the
simplest reference is intractable and others began very simplistically and
ends incomprehensible enough to disgust the subject altogether.

