The Great Statistical Schism

msandford · on Nov 16, 2015

What's amusing is that Bayesian reasoning is what human beings do all the time. Most of the -isms that people can do (racism, sexism, ageism, etc) have to do with taking the prior (what already I know about other similar people) and applying that knowledge to the new person. It's "too hard to understand" and yet all kinds of people are doing it every day.

dalke · on Nov 16, 2015

"what already I know about other similar people"

But it doesn't work that way. Some properties get associated to members of a class, while others do not.

Let's take discrimination on the basis of blood type personality theory in parts of Asia. Many people in Japan believe that one's personality is strongly affected by one's blood type. Quoting from https://en.wikipedia.org/wiki/Blood_type_personality_theory :

> Many people have been discriminated against because of their blood type. Employers ask blood types during interviews despite the warnings they have been given. Children at schools have been split up according to their blood type. The national softball team has customized training to fit each player's blood type. Companies have given work assignments according to their employee's blood type.

Yet all evidence is that there is no basis for this form of discrimination. It's pseudoscience akin to discrimination based on one's astrological sign.

So it's clear that discrimination can rise even without a meaningful prior for a given attribute.

We can also look at left- vs. right-handedness for an example of varying levels of discrimination even when the underlying genetics and thus putative justification for discrimination hasn't changed. To copy a quote from https://en.wikipedia.org/wiki/Bias_against_left-handed_peopl... :

> "I was educated in the USA in Catholic school in the 60s. My left hand was beaten until it was swollen, so I would use my right right [sic] hand" ... "I had a teacher who would smack my left hand with a yardstick every time she caught me writing with my left hand" ... "My fourth grade teacher [...] would force me to use my right hand to perform all of my school work. If she caught me using my left hand, I was hit in the head with a dictionary. It turned out that she believed left handers were connected with Satan."

Since we know that discrimination can arise without a prior, and the level of discrimination can change even the prior does not change, it's not clear that Bayesian reasoning is a good way to understand the issue.

In addition, it doesn't help understand why some factors are used to define 'similar people' while others are not. Men are far more likely to be involved in crime in the US, including domestic terrorism to white collar crime, but aren't universally discriminated against for being a member of that class. Or on the topic of white collar crime, editorials like 'Wall Street Criminals Are Still a Protected Class in America' at http://www.vice.com/read/wall-street-criminals-are-still-a-p... highlight how the Bayesian prior is outright ignored.

Thus, I'm not convinced that humans really use Bayesian reasoning as the basis for -isms.

thaumasiotes · on Nov 16, 2015

> In addition, it doesn't help understand why some factors are used to define 'similar people' while others are not. Men are far more likely to be involved in crime in the US, including domestic terrorism to white collar crime, but aren't universally discriminated against for being a member of that class.

You should check out the sentences women get for criminal convictions compared to men. You might also want to look at car insurance rates for 16-20 year olds. I've seen someone advocating that instead of teaching your small children "if you need help, go to the police", you should teach them "if you need help, go to a woman".

Men are routinely discriminated against for their higher criminality; people don't avoid women on the street or freak out if one shows up at a park without a child in tow. I was asked to leave the public library once because they thought it was weird that I was reading under the stairs.

dalke · on Nov 16, 2015

Certainly. This why I wrote 'universally discriminated', by which I meant in the sense that black people and women have historically been discriminated against. (Or Chinese, or Japanese, or Native American, or ...)

Otherwise it's easy to point out historic biases against hiring men as, for example, nurses. Just like it's easy to find cases in the US where white people were discriminated against on the basis of their skin color ( http://edition.cnn.com/2013/09/08/justice/new-york-hate-crim... ) or discriminated against for being having only US citizenship (http://consumerist.com/2013/06/24/welsh-pub-fined-for-discri... ).

thaumasiotes · on Nov 16, 2015

Well, unwillingness to hire men as nurses is hard to attribute to greater criminality. Much heavier criminal sentences for men seems more on point.

Early male obstetricians (really, doctors who started researching childbirth) were not well thought of by midwives. Could you elaborate on the special sense of "universal discrimination" you're talking about?

dalke · on Nov 16, 2015

Are you really asking me for a history of discrimination in the US, including prohibitions against voting or owning land, the Separate Car Act leading to Plessy v. Ferguson and Jim Crow laws, the American Indian boarding schools, redlining, the Asiatic Barred Zone Act, and Japanese interment camps?

Perhaps 'universal' isn't the precise term, but I lack a more nuanced term, and it captures the idea that many more people don't like Obama simply because he is black than those who don't like him simply because he is a man.

The examples you gave are also less universal. I agree that single males in the US face discrimination for being at or near a playground or around children in general more than women do. But that is not as universal as non-whites forced to leave so-called 'sundown towns' by sunset, or the "Tacoma Method" used to expel Chinese residents from cities in the west coast.

Going back to the OP, the recent prejudice against unaccompanied males near playgrounds is another example of how humans don't really use a Bayesian analysis.

thaumasiotes · on Nov 16, 2015

You brought this up as an argument against the idea that people discriminate based on (roughly) Bayesian estimation of what other people are likely to be like. Specifically, you heavily imply that if people were discriminating based on rough Bayesian estimation, there would be some level of discrimination against men as potential criminals (since they make up roughly all of the criminals) which does not obtain now.

I disagree; I think men suffer a lot of discrimination as potential criminals. I want to know what you think should be happening to men (if people were using a roughly Bayesian approach), and what you think is happening to men, and why you think it's not compatible with the idea that what people are doing can be modeled as a Bayesian approach.

dalke · on Nov 16, 2015

I gave several examples of non-Bayesian reasoning going into discrimination, with blood type being the most prominent. Hence I conclude that thinking about discrimination primarily as humans applying Bayesian reasoning is not a useful framework to think about the -isms. I see no need to introduce a new mathematical framework given the historical evidence says there is no such simple explanation.

We need only look at how humans do risk management. Driving is dangerous, and one of the leading causes of death. Yet people are more scared of airplane crashes, despite how flying is a less dangerous means of transportation. Or, parents are increasingly worried about the safety of their children despite the clear evidence that childhood now is safer than ever. We also see people who gamble on the lottery with the incorrect belief that there are winning streaks or lucky numbers. Thus, human responses are clearly neither Bayesian nor frequentist.

I therefore don't see why you think a Bayesian approach, or something close to a Bayesian approach, helps understand the topic.

FWIW, an analysis of the sex-based skew in the prison population would also need to look to many other factors, such as the huge increase in incarceration rates over the last few decades, the strong racial component, the different social roles of men and women in US culture, and more.

Such a model would also need to explain why, if people thought that men are inherently more dangerous then women and "suffer a lot of discrimination as potential criminals", so many more social roles which require trust or safety traditionally went disproportionately to men. Perhaps there is some confounding variable which is more significant than being a male? For example, if the prior is multi-modal, the is there a reason for the two modes?

As another hypothesis which fits the limited data you presented, supposed that men in US culture are supposed to be more trustworthy than women and as such have more power and control. If men break that trust, then they are punished more harshly then women, and for a longer time. That would explain the disproportionate distribution of men in jail, despite the the normal bias for men in the culture.

While that is certainly a form of discrimination on the basis of sex, it's not the same sort of discrimination which made black people scared for their lives in a sundown town.

thaumasiotes · on Nov 16, 2015

> While that is certainly a form of discrimination on the basis of sex, it's not the same sort of discrimination which made black people scared for their lives in a sundown town.

Why not?

dalke · on Nov 16, 2015

Are you kidding me? If you really know so little about US history as to have to ask that question, then read the Wikipedia articles for the topics I listed ("prohibitions against voting or owning land, the Separate Car Act leading to Plessy v. Ferguson and Jim Crow laws, the American Indian boarding schools, redlining, the Asiatic Barred Zone Act, and Japanese internment camps").

jonahx · on Nov 16, 2015

Bayesian thinking is so pervasive, and in most cases so unconsicous, that you don't even have to look that far.

"Where did my keys go?"

They vanished into thin air is not even considered, because its prior is zero.

I must have put them somewhere else is the instant frontrunner, because I do it so often.

After much unsuccessful searching, hypotheses such as "someone is playing a joke on me" or "my wife accidentally grabbed them" might be considered -- low priors that require additional evidence.

Probably only after noticing something like a broken window would "someone broke in and stole them" be considered -- extraordinary claims require extraordinary evidence.

And so on...

squidfood · on Nov 16, 2015

Practically speaking, modern science applies priors via funding decisions. Only variables that fall within the range what funders consider "worth testing" will be tested.

danieltillett · on Nov 16, 2015

In theory this is the case, but in practice since nobody actually checks on what you spend the money on you can spend a fairly large chunk of resources on whatever idea you want. You just need to ensure you put enough effort into whatever area you can get money for to allow renewal.

streptomycin · on Nov 16, 2015

Frequentist: Here's a couple formulas that kind of make sense. Plug in your data and go.

Bayesian: Here's Bayes' theorem, a single formula that makes perfect sense! See, it's simpler, right? What, you want to use it? Something something beta distribution something something Markov chain Monte Carlo something something - okay, got it?

Good luck teaching that to undergrads. In the real world, intro stats classes are taken mostly by people with little math background, no programming background, little interest in statistics, and little innate talent for any of those things. With the frequentist approach, at least they have a fucking chance.

asgard1024 · on Nov 16, 2015

I don't know about you, but I found Bayesian approach much more intuitive than frequentist one. The only reason to use frequentist approach is to have a procedure that you can do with pencil and paper on real problems (and unlike Bayesian methods only under some often weird assumptions). But with computers and statistical packages, that need is going away.

Turing_Machine · on Nov 16, 2015

I agree that frequentist methods are more amenable to teaching students to apply rote formulas that they do not understand.

I do not agree that this is a feature. Quite the contrary.

streptomycin · on Nov 17, 2015

Whether it's a feature or not is subjective. And I think I mostly agree with you - it's probably bad that we're encouraging the application of bad statistics.

But the alternative isn't just replacing bad with good, it'd be more like telling ~98% of the science majors in the country, "hey, you're too dumb to do real statistics, so please just stop trying". I can't see that realistically happening.

jerven · on Nov 16, 2015

I think it is easier to do frequest statistics "ok", then to asses whether your priors are going to be any good. Especially when looking at systems we don't really understand very well.

pdkl95 · on Nov 16, 2015

http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bay...

I recommend this series of posts on frequentism and bayesianism. Of particular interest is part 3, which has a very good explanation of why frequetism shouldn't be used in science.

    Let me say it directly: you should be suspicious of the use of frequentist
    confidence intervals and p-values in science. In a scientific setting,
    confidence intervals, and closely-related p-values, provide the correct
    answer to the wrong question.

david_ar · on Nov 16, 2015

Also read chapter 37 "Bayesian Inference and Sampling Theory" of http://www.inference.phy.cam.ac.uk/itila/book.html

kriro · on Nov 16, 2015

I think the author is treating the frequentists somewhat unfairly. A good design involves both NHST and equivalence tests + discussion about power/effect size (imo).

Random grab from my methods collection in my literature tool: "A Comparison Of Equivalence Testing In Combination With Hypothesis Testing And Effect Sizes": http://digitalcommons.wayne.edu/jmasm/vol2/iss2/6/

I think the biggest problem is that many fields just churn out people who can click buttons in SPSS but couldn't really tell you what a p-value actually represents. "Unfortunately" it's fairly easy to do the testing with tools. From students, I often get the feeling that tests are conducted for the sake of conduction a test (talking about BA theses etc.).

I'm not sure how wide spread Bayesian thinking is in other fields. Coming from CS it is somewhat natural since it's pretty pervasive in AI/ML. I think our econ students didn't get exposed to it at all (iirc). Would love some feedback from other fields :)

cormullion · on Nov 16, 2015

There's a good book on the subject, "The theory that would not die", which covers the history and sheds some light on the people (Fisher, Laplace, Bayes, Pearson) behind the names.

Reviewed: http://www.ams.org/notices/201205/rtx120500657p.pdf

imh · on Nov 16, 2015

I really never understood the divide. They are all just tools in our toolbox. It feels like arguing screwdrivers versus hammers. They may not do the same thing, but both can be useful!

jonahx · on Nov 16, 2015

The author specifically argues against this view as something that obscures the superiority, in most cases, of the Bayesian approach:

One downside of this ecumenicalism is a reluctance to ask fundamental questions: having a strong opinion on this matter has gone out of fashion. Who’s to say one statistical philosophy is better than another? Aren’t all statistical philosophies equally valid paths to good data analysis? Frequentism is “true for me”. As in religion, so in statistics. If you criticise a colleague for using p-values when posterior probabilities are clearly more appropriate, it will lead to accusations of being a ‘zealot’ [2] who should stop ‘crusading’

and then later:

If Einstein’s theory were easier (as well as being more accurate), teaching Newton’s would be silly. Yet that’s the way most statistics curricula are structured. The only reason statisticians think frequentist ideas are easier is that they are used to them

imh · on Nov 16, 2015

I guess I hate the idea of a "statistical philosophy." It's math for god's sake. Given these axioms, these other things are true.

The point about teaching relativistic versus newtonian mechanics is totally different because they're incompatible. One is more correct than the other, which is not the case here. In the stats case, both tools give correct answers to different questions.

Maybe I just get frustrated feeling like people want a data analysis silver bullet to teach.

kgwgk · on Nov 16, 2015

I'd say the analogy is more valid than you think. Classical (frequentist) methods are valid as long as the results correspond to the relativistic (Bayesian) ones.

lottin · on Nov 16, 2015

As far as I know, Bayes came first and Fisher later, so some may say the analogy is the other way round.

david_ar · on Nov 16, 2015

That's true, but Bayesian methods were handicapped by a lack of computational resources for quite a long time. Modern Bayesian methods, although fundamentally the same, are in practice quite different today compared to a century ago. Frequentist statistics gained popularity because it was more tractable to apply to complex problems until computational technology caught up.

Edit: one can argue that frequentist methods are generally approximations to Bayesian methods, much like Newton vs Einstein

murbard2 · on Nov 16, 2015

No it's not "just math". You need to know what questions you're asking. You can't wave this under "do whatever works best" because evaluating "what works best" requires epistemology, and this is precisely what we're discussing.

There are some consistency properties of Bayesianism which make it a much better candidate for epistemology than frequentism. Specifically, there are important questions in frequentism for which there are no answers. In particular, the concept of "power of a test" sweeps the prior under the rug without a proper explanation.

loup-vaillant · on Nov 16, 2015

Those "tools" often give different answers to the same questions. Clearly, at least one of them is wrong. And one is probably better than the other.

Now if you're tempted to use one or the other in a case by case basis, how are we supposed to choose which is best?

keithwinstein · on Nov 16, 2015

I think these kinds of articles generally overstate the "schism." I wrote about this here: http://blog.keithw.org/2013/02/q-what-is-difference-between-... and here: http://qr.ae/70H3k7

Three points:

(1) I couldn't follow the author's Bayesian analysis or what "standard assumptions" were used. Does anybody know what is meant?

Here's a plausible Bayesian analysis using what are arguably standard assumptions: "assume, a priori, that the success rate of the new drug was drawn uniformly between 0 and 1. Given the outcome of 83/100 i.i.d. successes, the probability that the new drug has a success rate worse than the old drug is Integrate[PDF[BetaDistribution[84,18],x],{x,0,0.7}]. In other words, p(worse|outcome,uniform prior) = 0.0018."

Here's a plausible classical analysis: "Before doing the experiment, let's specify how to calculate the p-value at the end. For whatever outcome we get, we'll calculate the one-sided probability that the old drug produces an outcome that good or better. Now perform the experiment: our outcome is 83/100. Per the method that we pre-specified, the probability that the old drug would have produced an outcome that good or better is Sum[Binomial[100,i] (0.7)^i (1-0.7)^(100-i), {i, 83, 100}]. In other words, p<0.0022."

Here's a plausible takeaway: "What do you know, the p-value from the classical analysis and the posterior probability from the Bayesian analysis are almost the same. There isn't much difference in this case, and contrary to a point emphasized in the article, the posterior probability was actually slightly smaller. The claim that p-values overstate the certainty of findings compared with Bayesian methods is not supported and probably not true -- often the accusation is the reverse, that p-values are too conservative!"

(2) The Bayesian analysis and the classical one are trying to achieve different things. Speaking very generally, classical methods are about designing an experiment that has a cap on the rate of false positives, even in the worst-case input (and then running that experiment). Depending on who you are, this may or may not be what you really want to do. Bayesian methods are (again very generally) about calculating the conditional probability of some event, given a particular observation and well-stated prior assumptions. Again, this may or may not be what you really want to do.

The difference is sort of like the difference between saying that the running time of QuickSort is O(n^2) on adversarial input, compared with saying that the running time is O(n lg n) in expectation, assuming the input order is uniformly distributed. Both of these statements can be useful.

You don't have to pick a side and declare yourself a worstcaster or an expecterian, any more than you have to call yourself a Bayesian or a Frequentist. These are families of mathematically-sound techniques, not religions.

(3) Ultimately, statistics doesn't really matter until somebody starts making a decision based on the results. And once you start putting a cost on bad decisions and designing a decision rule to maximize utility (the domain of decision theory), methods based on p-values and posterior probabilities end up reaching THE SAME DECISIONS. This makes sense since there can really only be one utility-maximizing decision theory.

This was understood in the early 1940s when they had to decide how to balance the cost of mistakenly shooting down an Allied aircraft versus the cost of mistakenly letting a Nazi aircraft off the hook. It was understood in the late 1940s when Shannon and others worked out the mathematical theory of communication. These fields do not have squabbles about a Bayesian vs. frequentist schism.

nonbel · on Nov 16, 2015

>"For example, consider the ‘null’ hypothesis that the new drug has exactly the same effectiveness as the old drug."

I'd encourage people to ask: Why should I consider that null hypothesis when it sounds like a strawman? Wasn't the use of strawman arguments debunked over 1000 years ago?

andrewflnr · on Nov 16, 2015

A strawman argument is a distorted version of someone else's position, formulated to be easy to knock down for the purpose of appearing to win debates. A null hypothesis is just a starting point for reasoning.

loup-vaillant · on Nov 16, 2015

Interesting: how do you chose that starting point in the first place?

Sounds like the null hypothesis is just as arbitrary than any Bayesian "prior" (prior probability distribution, I mean).

evmar · on Nov 16, 2015

The null hypothesis is the hypothesis that your control and experimental groups behave roughly the same* . This is the baseline assumption when group membership is randomly assigned. It is not at all arbitrary.

* see my below comment for elaboration on what "same" means

nonbel · on Nov 16, 2015

Check out this paper: http://arxiv.org/abs/1311.0081

Your interpretation is fine up until we start using a cutoff and talking about significant or not significant results, whether the null hypothesis is true, etc. It is using significance levels that introduces the nonsense.

loup-vaillant · on Nov 16, 2015

Okay, but, it's like expecting the two groups to be the same, which is a bit ridiculous. I'd rather just measure how much evidence the experiment provides either way.

nonbel · on Nov 16, 2015

Exactly, we do not care they are different (usually). We care in what way and how much they are different. Then we care about what possible explanations there may be for that difference.

I would say an exception is studies of ESP, etc where the existence of any effect would be surprising. However, in practice, everyone just assumes paranormal researchers messed up the experiment somehow. Small effects are not taken seriously even if they are "significant" because there are many explanations for such differences.

loup-vaillant · on Nov 16, 2015

> I would say an exception is studies of ESP

Actually, E.T. Jaynes touched that topic a bit, and argued that it's not really an exception: probability theory applies to this phenomenons (or lack thereof) as usual. If I recall correctly, he reached the same conclusion as you just did: if a study looks like it demonstrated an instance of ESP, it probably went wrong.

nonbel · on Nov 16, 2015

And whose position is it that a drug/treatment/whatever has exactly zero effect? Also that there are exactly zero differences between two or more groups at baseline and this continues to be true for the duration of the study (except if due to the treatment)?

Really, who is advocating that hypothesis?

Also, a null hypothesis was originally the hypothesis to be nullified. There may be confusion due to the use of "null" here. Null != nil

evmar · on Nov 16, 2015

From your comment history you appear to have some background in statistics, but what you're written here is curiously missing the fundamentals. It's not exactly clear what you're arguing, but I am definitely curious because I must be missing it!

To answer your last question directly: nobody argues that there are zero differences between groups in an experiment, but rather that the central limit theorem applies, which is a statement about the distribution of means in particular circumstances. https://en.wikipedia.org/wiki/Central_limit_theorem

nonbel · on Nov 16, 2015

>"nobody argues that there are zero differences between groups in an experiment"

The null hypothesis (usually) assumes that the two groups are samples from the same distribution.

Do you agree?

evmar · on Nov 16, 2015

Yes. Isn't that fundamental to experimental design, that group membership is random? (You certainly know this as well, so I'm again confused about what your question is intended to provoke.)

nonbel · on Nov 16, 2015

And so what? Forget the math, say the two groups come from different distributions. Omniscient Jones told us so. What can we learn from that?

nonbel · on Nov 16, 2015

Let us see what Ronald Fisher said towards the end of his life regarding confusion over the null hypothesis:

"We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort."

Fisher, R N (1958). "The Nature of Probability". Centennial Review 2: 261–274

That is you downvoters.

Confusion · on Nov 16, 2015

Because any other null hypothesis has the same problem, but at least this one isn't biased.