
A Fervent Defense of Frequentist Statistics - noelwelsh
http://lesswrong.com/lw/jne/a_fervent_defense_of_frequentist_statistics/
======
czr80
I find this discussion (Bayesian vs Frequentist) somewhat baffling - it's as
if physicists spent time arguing whether Lagrangian, Hamiltonian or Newtonian
mechanics was "correct". In reality all three are equivalent, but different
problems might be more naturally expressed (and easily solved) in one or the
other form.

Similarly Bayesian and Frequentist descriptions are mathematically equivalent,
the only question is which is the more natural description for a given
problem.

Having multiple models like this is incredibly useful - some things that are
intractable in one might be trivial in another - but trying to anoint one
model as "the truth" seems perverse.

~~~
Strilanc
My impression was that frequentists and bayesians do actually disagree in some
cases.

The example I remember is, suppose you have a coin with a uniformly random
bias, except that the bias will not be zero. What is the probability that you
will see heads if you flip it once?

\- Bayesian: 50%

\- Frequentist: Anything except 50%

Although you'd probably find they agreed on all the "what do I expect to see"
stuff, so I guess that kind of is like interpretations of QM.

~~~
pseut
That's almost a straw-man argument. When you're making statements about pure
probabilities, _everyone_ is a bayesian. Most toy examples make "frequentists"
look like morons, because essentially all of the judgement and discretion is
removed from the problem, so you'd have to be an idiot not to apply Bayes's
rule.

The difference shows up when you actually have an interesting data set to
analyze. Bayesian statistics can disagree with frequentist stats in small
samples because they're (often) using different normalization strategies; and
they can disagree in large samples where the CLT fails. There may be other
settings where they diverge too that I'm not aware of. But neither of those
scenarios is one where insisting "I'm a Bayesian, so the answer is blah" or
"I'm a frequentist, so... blah blah" is likely to be a good strategy. Those
are the settings where it's hard.

~~~
delluminatus
If an example can illustrate a difference between two formal methodologies,
the example should be as simple as possible, right? It doesn't paint one as
superior to the other, but simply highlights the different methods by which
they assign probabilities to events. If you make the example more complicated,
the difference becomes less clear.

~~~
pseut
Right, but my point is that "as simple as possible" in this case is still
"very complicated." It's sort of like trying to use "Hello world" to explain
the difference between static and dynamic typing.

------
tansey
I'm not sure why the author picks Myth #5 as the big pro-frequentist point.
The "online learning" problem he's describing is a simple multi-armed bandit
problem. The magical frequentist algorithm that he's touting is UCB-1. It's a
great algorithm, it has finite-time optimality guarantees like he mentions.
However, it still has a tuning parameter that can matter quite a bit in
practice. Also, the Bayesian approach to MABs is Thompson sampling, which is
also finite-time optimal. I guess I don't get the big deal on this point.

From a machine learning perspective, frequentist methods are great because
they scale. All of these fancy nonparametric Bayes methods coming out these
days take weeks to run and include so many approximations that you've got no
idea whether what you're actually inferring is truly similar to the original
model.

The biggest problem with frequentist methods is that they are conceptually odd
in their approach. Bayesians take the view that you should fix the data and
integrate over the possible parameters. Frequentists take the view that you
should fix the parameters and integrate over the data. That is kind of weird
to me.

So you either start out conceptually pure as a Bayesian and compromise to the
point of meaninglessness or you abandon your morals at the start as a
frequentist and things go smoothly from there.

 __Edit __: As noted below, it 's actually Exp3 rather than UCB-1. I glossed
over the part about potentially adversarial bandits.

~~~
noelwelsh
Actually it's not UCB-1. UCB-1 solves the stochastic bandit problem, where you
assume the arms have fixed but unknown expected reward. He's talking about the
adversarial setting, where you make _no assumptions_ about the distribution of
the rewards of the arms. They can even be set by an adversary that is trying
to make you perform badly. Remarkably you can derive results here, though the
"catch" is you are measuring performance against choosing a fixed arm for
every play. (It's actually a bit more complex than that, but that's a
reasonable simplification for direct comparison to the stochastic bandit.) The
standard algorithm in the adversarial setting is Exp3:
[http://cseweb.ucsd.edu/~yfreund/papers/bandits.pdf](http://cseweb.ucsd.edu/~yfreund/papers/bandits.pdf)

Otherwise I agree for the most part with what you've said.

Edit: In response to the parent edit, I should note that I know of no Bayesian
algorithm for the adversarial setting. Thompson sampling is assuming a
stochastic problem. I think a Bayesian algorithm would be possible, but the
details elude me.

~~~
tansey
Same here. I was thinking after my edit that I don't know of such an algorithm
either, but I don't think in principle that there is anything stopping someone
from coming up with one.

So... who wants a NIPS paper? Noel and I are happy to just be 2nd authors,
since you know... it was our idea and everything. ;)

~~~
noelwelsh
Hell yeah. First author gets to do the poster session. I'll do the skiing. ;-)

------
blueblob
Do people really argue against frequentist statistics? I was under the
impression that in bayesian statistics you basically are just weighting your
distribution by a prior, usually because there is not enough data yet to use
robust frequentist methods. Put another way, I was under the impression that
bayesian statistics are a way of solving the cold start problem[1].
Effectively you are using frequentist statistics in bayesian inference
anyways. Please correct me if I am wrong.

[1]
[http://en.wikipedia.org/wiki/Cold_start](http://en.wikipedia.org/wiki/Cold_start)

------
ajtulloch
Michael Jordan's talk "Are You a Bayesian or a Frequentist?" is one of the
better treatments of this topic IMO.

(talk):
[http://videolectures.net/mlss09uk_jordan_bfway/](http://videolectures.net/mlss09uk_jordan_bfway/)

(slides):
[http://mlg.eng.cam.ac.uk/mlss09/mlss_slides/Jordan_1.pdf](http://mlg.eng.cam.ac.uk/mlss09/mlss_slides/Jordan_1.pdf)

~~~
ArbitraryLimits
I've always been partial to Brad Efron's explanations, see
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.179....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.179.2128)

IMO he nails the key distinction here: "One definition says that a frequentist
is a Bayesian trying to do well, or at least not too badly, against any
possible prior distribution."

And there's some nice humor also: "The 250-year debate between Bayesians and
frequentists is unusual among philosophical arguments in actually having
important practical consequences."

~~~
ajtulloch
Yes - Efron's paper (also available at [1]) is excellent.

[1]:
[http://statweb.stanford.edu/~ckirby/brad/papers/2005BayesFre...](http://statweb.stanford.edu/~ckirby/brad/papers/2005BayesFreqSci.pdf)

------
mcguire
I'm not a statistician, but I'm afraid I don't get his Myth 5:

" _For some reason it’s assumed that frequentist methods need to make strong
assumptions (such as Gaussianity), whereas Bayesian methods are somehow immune
to this._ "

What I'd thought I'd heard before, from the Bayesian camp, was that both
methods required strong assumptions, but the assumptions had to be explicit in
the Bayesian model.

------
RyanZAG
_> claim that frequentist methods need to make strong modeling assumptions._

 _> Assumption that horse race winners are not completely random and that
there is a strategy_

This is kind of the assumption that people tend to blame frequentist
statistics for doing (well, any statistician I guess). There are nearly always
assumptions made about what the random variables are or aren't. If the horse
race was completely random, your guarantees fall apart as it's a simple dice
roll with no strategy beyond chance. Yet you've used your assumption to dump
money on the table, and now you've probably lost it.

------
mamp
It's been a while since I was into this, but I think the problem of the bounds
discussed in Myth 5 is that while the theory is fine, in practice the bounds
are big, and Bayesian methods converge faster. The discussion also doesn't
seem to discuss incremental decision making where posterior(t-1) -> prior(t).

I note that the discussion avoids the crazy stuff with frequentist stopping
rules which is much more elegantly handled using Bayesian & decision theoretic
methods.

I agree that there are quite a few myths around frequentist methods as
described in the article.

------
mjw
My attempt to summarise the difference in language familiar to computer
scientists, is that you can look at the frequentist vs Bayesian debate as
being about when a worst-case analysis is preferable to average-case analysis
for unknown parameters of a statistical model.

There's something you don't know (the parameters). Are you looking to make
statements which bound how bad things could be under the worst-case setting of
those parameters? Or do you have some idea upfront about how likely different
parameter settings are, and want to make statements about them in the
"average" case?

Rather like with worst-case vs average-case analysis of algorithms, which is
more appropriate depends what you're trying to do, and sometimes both are
interesting.

------
raverbashing
Blah blah blah

I don't care about the method, I care about a correct prediction.

If one group can't do that, well, tough for them.

But in the end it's the structure of the solution and the "guesses" that had
to be done that contribute to the success more than the method.

