

May Bayes Theorem Be with You - astrobiased
http://technology.stitchfix.com/blog/2015/02/12/may-bayes-theorem-be-with-you/

======
howlin
This is an ok overview, but it's missing a key point. Classical statistics can
be viewed as the statistic of interest being fixed but unknown. However, it's
better thought of as the parameter is fixed but set in a way that will be most
troublesome for your estimation algorithm. Statistical analysis (when done
carefully) represents a conservative view on how likely you are to be
mistaken.

Bayesian statistics abandons this worst-case approach, instead opting for an
average case analysis. Here, we average over the all the possibilities,
weighting their relative merit by the prior. The analysis is always a little
bit conservative (thus the connection to regularization), but it is never
"worst case" in the same way that classical statistics operates under.

Lots of the other talk in this article is not really about classic vs Bayesian
statistics at all. Both methodologies are perfectly happy working with more
complicated, hierarchal models. Both approaches have plenty of work dealing
with regularization, and both methods will suffer if you mis-specify your
model. The fact that Bayesian analysis is less likely to "crash" in a case of
model mis-specification can be thought of as just as much of a drawback as it
is a benefit.

~~~
wfunction
Could you explain in what sense classical statistics is actually a _worst-
case_ view? i.e., can you link to a page or maybe explain yourself explicitly
(i.e., using a min() operation) how classical statistics is a worst-case
analysis? What scenarios is it actually worst-case with respect to?

It's always seemed to me that the differences I see, such as in ML vs. MAP,
are not due to a question of worst-case versus average-case analyses, but
rather no-prior-knowledge versus yes-prior-knowledge analyses. I've never seen
a proof that classical statistics gives a worst-case bound.

~~~
howlin
Maximum likelihood is literally the simplest thing classic statistics will
tell you. No serious statistician will regard this as meaningful without at
least some indication of sample size, confidence interval or whatnot.

The worst-case nature of classical/frequentist statistics is very well exposed
in PAC (probably approximately correct) analysis:

[http://en.wikipedia.org/wiki/Probably_approximately_correct_...](http://en.wikipedia.org/wiki/Probably_approximately_correct_learning)

------
pdonis
I particularly like the way the key difference between frequentist and
Bayesian statistics is described: in the frequentist model, the parameter is
fixed and the data is random, while in the Bayesian model, the data is fixed
and the parameter is random. Since, as the author points out, the data is what
is fixed in real life, the Bayesian model intuitively makes more sense.

~~~
quacker
_Since, as the author points out, the data is what is fixed in real life, the
Bayesian model intuitively makes more sense._

In real life, both the parameter value and the data are fixed. If you knew the
heights all of the humans on earth, you could compute the _exact_ mean human
height (at the time). This mean is the _true_ mean, it's the parameter value
we're looking for, and it's absolutely fixed.

~~~
kblarsen4
But you will never know the truth. So it seems more practical to assume that
the parameter follows some distribution and that the hand is fixed. Just a
more logical approach to gauging uncertainty in my opinion. But, this is not a
"Frequentist versus Bayesian" post. I think the key is to use what is most
fitting for the analysis at hand.

~~~
quacker
You'll never know all of the data either. You don't need confidence/credible
intervals if you can sample the entire population.

Maybe one is more intuitive than the other, but I'm unswayed by the "one is
fixed in real life" argument.

~~~
pdonis
_> You'll never know all of the data either._

"The data is fixed" refers only to the data we know. (Otherwise, as you point
out, there would be no issue since we would simply calculate the population
statistics directly.) In the frequentist model, we have to pretend that this
fixed data is actually a random distribution in order to calculate the
probability we're interested in. In the Bayesian model, we just combine the
known data with the prior to get the posterior probability; we don't have to
pretend anything.

~~~
quacker
> _" The data is fixed" refers only to the data we know._

Okay, yeah. The _sample_ is fixed. The entire _population_ is random. I'll use
this terminology.

> _In the frequentist model, we have to pretend that this fixed data is
> actually a random distribution_

With confidence intervals, we're modelling the entire _population_ as a random
distribution, using the fixed sample's mean and variance to compute estimates
of the population's mean and variance. I think this is different than
"pretending the sample is random".

If you constructed a normal model of the sample, you would just use the
sample's mean and standard deviation. But to model the population, you
typically use the sample's standard _error_ as an estimate of the population's
standard _deviation_. This is critical. You have to account for the fact that
the population is much larger than your sample.

------
kriro
I will ask a possibly naive question but...can anyone recommend a good "doing
empirical science the Bayesian way" or "Bayesian quantitative methods" type of
book? I'm mostly looking for something that can "replace those p-values for
undergrads". Use of R, Python (or PSPP/SPSS if need be) would be appreciated.

My knowledge is pretty limited, I have heard of BEST but happily t-test away
on a daily basis. Essentially what I'm looking for is a "I know t-tests and
ANOVA and use them regularly...how would I switch all that to a Bayesian
approach".

Does the book I look for even exist or would my best bet be reading the BEST
paper (and the author's website/youtube video)? Edit: It lookes like the
second edition of "Doing Bayesian Data Analysis" by Kruschke looks good. It
does have a dedicated chapter on NHT (vs. MCMC)

~~~
martingoodson
Data Analysis A Bayesian Tutorial Devinderjit Sivia and John Skilling I prefer
the (shorter) 1st edition

------
csirac2
"However, all practitioners in data science and statistics would benefit from
integrating Bayesian techniques into their arsenal" \- that this wasn't
already the case is news to me. Perhaps bioinf is less mainstream than I
thought.

------
timfrietas
If one wanted to learn more about the math behind this, especially when they
get into the "Bayesian Regression is a Shrinkage Estimator" section, what's a
good place to start?

~~~
kblarsen4
The book by Peter Rossi et al that is referenced in this post is a good start
if you want the Bayesian approach to shrinkage.

------
jrgnsd
The clean and crisp look of the site is stunning. The fact that there's no
busy sidebar cramping up the content makes a huge difference.

Does anyone know if it's an available theme, or if it was custom built?

------
tomphoolery
Are Bayesian classifiers useful with any size data set, or is there a
"threshold" amount of data that you need in order for Bayesian classifiers to
be useful/work/be effective?

~~~
kblarsen4
It is also worth mentioning that Bayesian classifiers, like Naive Bayes, are
different from the type of Bayesian regression models described in this post.

Naive Bayes, for example, is more of a "machine learning" technique where the
goal is to classify people into groups based on features. Naive Bayes is
called Naive because it assumes that all regressors (x_j) are independent
given the target variable (let's call it y and assume it is binary). In other
words, the conditional log odds of y=1 given the x_j variables is equal to the
sum of the log density ratios, where the log density ratio for variable x_j is
ln(f(x_j|y=1)/f(x_j|y=0)).

On the other hand, in the price elasticity example described in post we want
to infuse outside knowledge into the model because we don't believe what it
says on its own. This is a situation where interpretation and believability is
an important part of the objective function because we will be running future
pricing scenarios from the model.

If you are building, say, a churn model to predict who is going to cancel
their accounts, you probably wouldn't infuse your model with outside knowledge
since cross validation accuracy is your main goal. You might regularize your
model, however, which can be done in a number of ways (Bayesian or non-
Bayesian). But in a pricing model or media mix model, and many other cases,
the use case above is very real.

I suggest reading the “Elements of Statistical Learning” by Hastie,
Tibshirani, et al.

------
Yadi
This is so good and useful! Thanks for sharing!

I'm just picking up Machine Learning these days and this is a good Bayesian
intro.

------
GoldenHomer
Oh god, all that studying for the actuarial exams is coming back to me like a
nightmare.

~~~
IndianAstronaut
I never went down the actuarial path, but studying for exam P helped me ace
data analyst interviews I had.

