
Frequentists should more often consider using Bayesian methods - jwb133
http://thestatsgeek.com/2016/11/18/frequentists-should-more-often-consider-using-bayesian-methods/
======
kem
I'm getting frustrated by the Bayesian train at the moment, as its drawbacks
get glossed over. "Oh yeah, there's priors, but they're not important for X, Y
and Z reasons."

In large samples, the frequentist and Bayesian methods are the same, so then
why does it matter? In small samples, the prior becomes significant, so why
use it if it shapes the estimates depending on what you use? You could turn
these arguments for Bayesianism on their head.

I'm actually neither Bayesian nor frequentist, or both, depending on how you
look at it, but I think Bayesianism is being overhyped. Sure, the prior is
part of the model, but if you can estimate something without adding extra
baggage, why not?

Imagine doing a meta-analysis, and now having all the extra heterogeneity due
to priors. Why add that?

My guess is a lot of the appeal of Bayesianism has to do with the success of
the machinery surrounding it, like MCMC, which is sort of automatic and has
certain other appeals. As people realize you can do stochastic optimization
with raw ML inference, some of the appeal will probably dissipate a bit
(although not entirely).

~~~
davmre
If the inferential question you're interested in is, "given data X, what do I
conclude about underlying cause/variable/parameter T?", then you are a
Bayesian, like it or not.* Sure you can define a likelihood function p(X|T),
but that doesn't give you p(T|X) unless you multiply by a prior p(T).

Now certainly p(T|X) is not always the question, but in the _vast majority_ of
cases people do want to use data to draw conclusions, and will misinterpret
likelihood-based confidence intervals as posterior credible intervals because
the latter are what they intuitively wanted. By doing so they implicitly
assume a flat prior regardless of whether that that is reasonable or even
mathematically coherent for the problem in question.

The Bayesian argument is not that you have to use an informative prior (though
often this can be very helpful!), but that since some sort of prior is
mathematically necessary to answer the questions that people intuitively want
statistics to answer, we should make that explicit and try to understand how
the prior affects our conclusions, not just sweep it under the rug.

* If the question you're interested in is, "if I run some method in many repeated trials, how often will it identify the true parameter?" then you are a frequentist. I think it's much rarer for this to _genuinely_ be someone's intuitive question, but it's certainly valid. And of course it's valid to ask both questions at once, in which case you might end up analyzing the frequentist properties of a Bayes-derived method.

~~~
Retric
No, if you are drawing conclusions from _only_ the data presented you are not
doing Bayesian. Further, there are more than 2 options.

~~~
j-pb
Have you heard of the universal prior? It allows you to do exactly that with
Bayesian statistics.

Although it is so esoteric that one might argue that it is more the field of
algorithmic statistics which merely employs Bayesian.

[https://en.m.wikipedia.org/wiki/Algorithmic_probability](https://en.m.wikipedia.org/wiki/Algorithmic_probability)
[https://en.m.wikipedia.org/wiki/Solomonoff%27s_theory_of_ind...](https://en.m.wikipedia.org/wiki/Solomonoff%27s_theory_of_inductive_inference)

------
dharmon
Maybe it's because my formal math training is not in probability and
statistics, but it's so bizarre to me that in a technical situation people
would let a philosophical position dictate their approach rather than best
tools for the job.

Sometimes I'll solve a math problem analytically, and sometimes its easier to
do it numerically. But it would be foolish for me to take a hardline stance on
one vs the other. Rather I am a more well-rounded, and thus more capable
technician because I know the benefits and weaknesses of each approach.

If you have small-to-medium sample sizes, then clearly Fisher style statistics
will not work very well. On the other hand, even if you have a small sample
size, if you don't have some general knowledge to guide your priors, you may
very well end up with garbage in a Bayesian approach.

~~~
murbard2
How do you decide which tool is the best for the job?

~~~
kobeya
The one that yields the simpler solution.

~~~
murbard2
How do you know the solution is any good? I'm not going to go all Socrates on
you so I'll jump to my point: when discussing bayesian statistics, we're
touching something very profound about epistemology that can't be swept under
the rug in the name of pragmatism. We're dealing with the core philosophical
underpinnings of what it means to "know" something.

------
johnmyleswhite
This argument would seem to assume that a frequentist is required to use
maximum likelihood estimation for all problems.

If you remove that assumption, the distinction between Bayesian and
frequentist methods becomes murkier. If you ultimately want a point estimate,
you don't particularly care whether a method constructs a posterior
distribution as an intermediate step.

------
SubiculumCode
What are the Bayesian methods I would use if I have multiple related outcomes
and multiple predictors...you know...the situation in which many scientists
find themselves with longitudinal data and typically turn to repeated measures
and random effect general linear models?

~~~
kgwgk
Maybe this helps:
[https://arxiv.org/abs/1506.06201](https://arxiv.org/abs/1506.06201)

~~~
SubiculumCode
Thanks. Your the first to ever reply to that question when I pose it.

------
credit_guy
Many people learn about the philosophy of the Bayesian estimation and fall in
love with it, or at least that happened to me.

I thought that only in the Bayesian formulation of statistics the estimated
parameters (mean, standard deviation, kurtosis, percentiles, whatever) remain
uncertain after the estimation, and therefore they have a distribution; I
didn't know that in the frequentist interpretation what you calculate are
actually estimators, and they are random variables and therefore have
uncertainty in them. Silly me.

One day, reading about estimators on stackexchange, I was led to a quote from
the "Elements of Statistical Learning" ([1], p. 272): "we might think of the
bootstrap distribution as a poor man's Bayesian posterior".

So, I put off learning all the details of the Bayesian estimation, and I
started using the bootstrap to deal with my problems. I recommend everyone to
give the bootstrap method a try before they go all in for Bayesian estimation.

Once you get familiar with the bootstrap, you might feel that actually
Bayesian estimation might be overkill.

But you shouldn't stop there. With a little more contemplation, you might
start doubting the whole Bayesian edifice. Let me tell you why.

Here's a quote from a book [2] "Bayesian Risk Management" that,
unsurprisingly, given the title, extols the virtues of the Bayesian framework:
"If the data are consistent with our prior estimates, the location of the
parameters will be little changed and the variance of the posterior
distribution will shrink. If the data are surprising given our prior
estimates, the variance will increase and the location will migrate" (p. 11).
This is the common view of the Bayesian estimation, and it's wrong. To give
you an example, if you do Bayesian estimation for the mean of a random sample
assumed to come from a normal distribution with a given standard deviation,
then the variance of that mean keeps going down with each observation
regardless of the value of the observation (check first entry in [3])

But that shouldn't shatter your hope for a better world. The Bayesian
estimation is bound to produce the wrong result if you assume the wrong model,
regardless of what prior you use. But now you discover that the Bayesian
estimation doesn't absolve you of the responsibility of choosing a good model
(which is another commonly held believe). But then, if you do need to
carefully choose a model, why do you need Bayesian after all.

I am not sure. Let me kick Bayesian while it's down a bit more, before I start
defending it.

If you ever get curious about Kalman filter estimation, a good book to use is
Durbin and Koopman [4]. In the first 20 pages you will learn that in the
simplest setting (local level model), the Kalman filter gives exactly the same
results in the frequentist and bayesian interpretation. Food for thought.

Here's a quote from Efron&Hastie "Computer Age Statistical Inference":
"Computer-age statistical inference at its most successful combines elements
of the two philosophies, as for instance the empirical Bayes methods in
Chapter 6, and the lasso in Chapter 16. There are two arrows in the
statistician's philosophical quiver, and faces, say, with 1000 parameters and
1,000,000 data points, there's no need to go hunting armed with just one of
them."

Now I did say I'll come to the defense of Bayesian. To my knowledge the Markov
Chain Monte Carlo method was developed only in the Bayesian setting. I do
believe a frequentist interpretation is entirely possible, but so far nobody
offered it. But until someone needs to use MCMC, I don't really see a need to
go Bayesian, when bootstrap works perfectly fine.

[1]
[http://statweb.stanford.edu/~tibs/ElemStatLearn/](http://statweb.stanford.edu/~tibs/ElemStatLearn/)
[2]
[http://www.wiley.com/WileyCDA/WileyTitle/productCd-111870860...](http://www.wiley.com/WileyCDA/WileyTitle/productCd-1118708601.html)
[3]
[https://en.wikipedia.org/wiki/Conjugate_prior#Continuous_dis...](https://en.wikipedia.org/wiki/Conjugate_prior#Continuous_distributions)
[4]
[https://books.google.com/books/about/Time_Series_Analysis_by...](https://books.google.com/books/about/Time_Series_Analysis_by_State_Space_Meth.html?id=XRCu5iSz_HwC)
[5]
[https://web.stanford.edu/~hastie/CASI/](https://web.stanford.edu/~hastie/CASI/)

------
tbrownaw
... _why_ does the order you do your math in have to be a part of your
identity?

~~~
davidgerard
tl;dr LessWrong.

~~~
TillE
Yudkowsky has done such a stellar job in making some fairly simple ideas look
like a weird cult.

~~~
loup-vaillant
He explained stuff plainly, using references (sci-fi, japanese manga and
anime) that are popular among a fairy restricted set of people. It's not
academic, and it's not all-inclusive.

Of course it will sound like a weird cult. I'm not sure why anybody would have
any problem with that.

~~~
vertex-four
Mostly that some of us have had very poor experiences with them while trying
to mind our own business. It's like living next to a cult's compound, and
therefore having more concrete bad experiences with them than someone in the
next city.

In my case, an IRC channel that I previously enjoyed (relationship-related)
was unofficially absorbed into LessWrong's network of channels. This resulted
in people treating it as a meat market despite being asked not to, then a
debate platform despite being asked not to, some refused the existence of
trans people to their faces because they couldn't logic it out, we had one
argue in favour of cheating in a channel that explicitly mentioned openness
and honesty in the topic, they damn near had a riot if we ever dared ban one
of them for being spectacularly awful, and they generally put the value of
other people's emotions at close to 0 in their messed-up logic for interacting
with the world. This carried on for months. Apparently after I left most of
them were finally banned, though not before a good many of the less awful
people in the channel left for good. This was the first major issue the
channel had in four years of its existence.

More generally, many people don't enjoy conflict, and the LessWrong way of
discussing anything seems to be to turn it into a two-sided debate. Even when
you're trying to discuss who you think you are, turning it into you defending
yourself from attack. The concept of having a constructive conversation, seems
to have entirely slipped past them, never mind the idea that not everything
has to be debated right then and there.

Unfortunately, this isn't just my experience - various others I've spoken to
have come across exactly the same sort of thing. Hence the extreme dislike
from some people. I'm not sure whether the community explicitly pushes this
way of interacting with the world, or whether it just attracts assholes and
gives them a "logical" reason to be assholes, though.

~~~
loup-vaillant
Well… I've mainly limited myself with LessWrong top-level posts so far, so I
haven't observed those problems first hand.

I may even have been part of the problem for a while, posting semi-relevant LW
links all over the place. I have since learned to focus my arguments into
something that doesn't require having read the sequences. Maybe no longer
frequenting LW helped.

