

Stethoscope as weapon of mass distraction - orin_hanner
http://andrewgelman.com/2015/01/02/stethoscope-weapon-mass-distraction/

======
ScottBurson
I think this anonymous comment on the blog page is very interesting indeed:

 _Incidentally, the usual hypothesis testing paradigm is inherently un-
Bayesian even if posteriors are used to judge the hypothesis. Given for
example hypothesis “H0: theta less than 0″ and “H1: theta great than 0″, then
the full posterior P(theta|data, background) encapsulates everything the data
+ background has to say about theta.

If you gratuitously add another step which determines say H1 is true, and
assume it’s true going forward then you’ve effectively truncated P(theta
|data, background) to theta greater than zero without having any further data
or other evidence for doing so. It’s an inherent violation of the sum/product
rules in other words and hence un-Bayesian. In some instances this truncation
will be a valid approximation to the full Bayesian version, but most of the
time it wont.

The Bayesian version of hypothesis testing (Decision Theory with loss
functions and all the rest) really only makes sense if you’re making final
decisions. For example, if you’re programming a computer to process data and
make automatic decisions about things. Otherwise the Bayesian thing to do is
carry the full posterior P(theta | data, background) forward un-altered.
Scientists too need to make final conclusions sometimes, but most of the time
hypothesis testing is used to make piecemeal judgments along the way (such as
removing a parameter from the analysis) in which you’re effectively truncating
distributions without the evidence needed to do so._

So what we should be doing, if I understand this correctly, is not saying
"this hypothesis is supported by the data (p < .05)" but "given such-and-such
a prior, and the data, we conclude that the hypothesis is 62% likely to be
true" or some such.

~~~
tel
Even "62% true" is too weak: it summarizes the posterior by a certain
integral. This drives home why we don't do either, of course: it's far more
expensive.

------
ludamad
I second the motion in the comments that people should pedantically correct
"statistically significant" as "statistically detectable".

~~~
abecedarius
I dunno. If we're going to change the name and try to be precise, I'd go for
"subjunctively unlikely": a certain random experiment, if run, would probably
not produce results at least this extreme.

It doesn't just foreground what's actually claimed, it's easier to say -- 7
syllables versus 8.

------
tokenadult
Gelman is a very interesting statistician and writer on statistical reasoning,
and this article is well worth a read. Gelman is certainly correct that just
because something is published in a peer-reviewed journal, we shouldn't
necessarily think that it is a true description of the world. Journals are all
hungry for content ("publish or perish" applies to journals too), and journal
editors and reviewers are often surprisingly poorly acquainted with how to
spot errors in statistical reasoning.

------
nmc
> _" the stethoscope_ is _being misused, all the time "_

If I understand correctly, which is not likely at all, this is about the
stethoscope being misused in scientific research, not in the practice of
medicine, right?

Then the claim quoted above would be utterly false, since I am pretty sure
that most of the times someone uses a stethoscope, that person is a doctor,
and is:

a) listening to another person's heart and lungs; and

b) _not_ publishing dubious research findings.

Please enlighten me here, I feel like I am plenty wrong.

~~~
nmc
Thank you knieveltech and chrismcb for enlightening me.

You made me feel very stupid though.

~~~
knieveltech
I don't think anyone thinks you're stupid. Metaphors can be tricky.

------
IndianAstronaut
>I see the problem as being with the entire hypothesis testing framework, with
the idea that we learn by rejecting straw-man (or, as Dave Krantz charmingly
said once, “straw-person”) null hypotheses, and with the binary true/false
attitude which leads people to believe that, once a result is judged
statistically significant (by any standard) and published in a good journal,
that it deserves the presumption of belief.

That is too simply brushing off decades of statistical work. While over
reliance on p values is a problem, especially if the p values reflect garbage
models which don't properly fit the data. We can't just throw out theory
because we don't like it. Setting up proper statistical testing is still a
powerful tool for experimental data.

