
Confirmationist and falsificationist paradigms of science (2014) - cbkeller
https://statmodeling.stat.columbia.edu/2014/09/05/confirmationist-falsificationist-paradigms-science/
======
outlace
Article summary: Many scientists (particularly social scientists) think
they’re doing Popperian falsification of their hypotheses by using null
hypothesis testing. That is, they posit a hypothesis that e.g. IQ is
correlated with happiness and then setup a null hypothesis that says IQ Has
zero correlation to happpiness and reject it if their data does not have zero
correlation. Gelman argues this is confirmation-seeking not falsificationist
since the scientist never posited a precise model of how IQ correlated with
happiness that could be falsified. They only reject a straw man null
hypothesis and then claim that this supports their hypothesis. He also notes
that this has nothing to do with frequentist vs Bayesian statistics since null
testing can be done using either approach.

My own added view: Gelman never elaborates on what “making a hypothesis
precise” means in this article but I think this is the key. A lot of social
science is really about estimating model parameters and not positing new
causal explanations (theories) of phenomena. If you hypothesize that IQ
correlates with happiness, that is not a theory, one could say it’s not even a
hypothesis, it’s really asking “how much does IQ correlate with happiness?”
Anything can correlate with anything else, in principle so this “hypothesis”
is not a new causal model of reality, it’s just a parameter estimation. So it
doesn’t make sense to use falsificationist reasoning here since there’s no
theory to falsify only a parameter to estimate. This is why null hypothesis
significance testing (NHST) is so wrongheaded because 1) most social science
is not about new causal models but about parameter estimation 2) when you do
posit a new causal model you should falsify the predictions of your model not
some straw man null hypothesis.

------
hirundo
There's a good case for stating the hypothesis, and summarizing the evidence
that could falsify the hypothesis, right in the abstract of a paper. Or at
least a separate section on falsification. Whether the authors' views on the
falsification of their prediction are correct or not, they say a lot about
their standard of evidence and therefore about their level of rigor.

If they state that falsification isn't relevant to this particular paper, or
give an impossible threshold, it's a way to sort it into a category other than
science.

------
User23
Science is a deductive method, which means that it can never "confirm"
anything unless it can disprove the complement. Strictly speaking that is
impossible in real world cases. For example, quantum electrodynamics, one of
the most well tested theories ever, holds up in the lab to some absurd number
of decimal places (I recall the figure 9, but I read that years ago and it's
got to be more now), but we can never be quite sure than the theory won't fall
down with the next improvement in measurement. None of this detracts from QED
being one of the great human intellectual achievements.

From a programming perspective, there is an analogy to software testing.
Testing can only prove the presence of bugs, it can't prove their absence,
except in the case of exhaustion. Since on (most?) modern systems exhaustive
testing is impossible, we're left with an imperfect solution that still works
quite well when applied properly.

~~~
cbkeller
Well, we like to say science is deductive, because induction doesn't give you
perfect certainty -- but in practice induction and parsimony are and have been
absolutely critical to scientific progress..

I think QED provides a nice example. I don't know if there are any formal
mathematical constraints on the size of theoryspace, but for practical
purposes we can probably agree that one may conceive of arbitrarily many
hypotheses that fail to be falsified by a given set of observations. If
deduction is all we have at our disposal, how do we choose among all these
options? And should we really not have any more confidence in a theory like
QED which as been tested many times and "failed to be falsified" to nine
significant digits, versus one that has only "failed to be falsified" at 5%
relative?

In practice, I would also claim that parsimony is (and has been historically)
absolutely critical in culling away the dross of theoryspace, yet this can
hardly be presented as a deductive method?

I think induction has been given an unnecessarily bad name lately. The
falsification-centric paradigm that is popular today was proposed by Karl
Popper as a solution to Hume's "problem of induction" [1] -- but Hume, and the
rest of the Empiricists that laid the foundation for the scientific method,
would have considered this entirely backwards. While Hume did claim that
induction could not be justified by reason alone, Hume's conclusion was not to
reject induction, but rather to _prefer induction over deduction_!

> _It is far better, Hume concludes, to rely on “the ordinary wisdom of
> nature”, which ensures that we form beliefs “by some instinct or mechanical
> tendency”, rather than trusting it to “the fallacious deductions of our
> reason”_ [2]

[1]
[https://en.wikipedia.org/wiki/Hume%27s_problem_of_induction](https://en.wikipedia.org/wiki/Hume%27s_problem_of_induction)

[2]
[https://plato.stanford.edu/entries/hume/](https://plato.stanford.edu/entries/hume/)

~~~
User23
I hope "None of this detracts from QED being one of the great human
intellectual achievements" made it clear that I agree completely. Thank you
for elaborating and providing a nice reference.

------
mxwsn
I recommend [https://aeon.co/essays/a-fetish-for-falsification-and-
observ...](https://aeon.co/essays/a-fetish-for-falsification-and-observation-
holds-back-science) for further reading on how falsification has interacted
with science historically and philosophically. In principle, it's absolutely
crucial for good science. In practice, hypotheses that are not directly
falsifiable (they require additional technological development) have
contributed greatly to science in the long run. This is interesting since it's
simultaneously reasonable and correct for contemporaries to label these
hypotheses as non-scientific work since at the time they had little to no
evidence supporting or denying the hypothesis and no way to test them. In
practice, the distinction between "falsifiable" and "unfalsifiable" is a messy
spectrum.

~~~
cbkeller
Thanks, I hadn't seen that one before - some really interesting historical
context. That's about the clearest articulation I've ever seen of the contrast
between the idealized purely-deductive hypothesis-falsification paradigm and
the way science has actually progressed.

------
themodelplumber
> So I think it’s worth emphasizing that, when a researcher is testing a null
> hypothesis that he or she does not believe, in order to supply evidence in
> favor of a preferred hypothesis, that this is confirmationist reasoning. It
> may well be good science (depending on the context) but it’s not
> falsificationist.

A very good point.

------
chrstphrhrt
We need to bring back abductive/retroductive inference (to the best
explanation) as if it were its own mode of reasoning to complement deduction
and induction. This is how scientific discoveries are often made IMHO but
often chalked up to some kind of intuition. There is actually a logical
structure even though is is hard to formalize and prove.

[https://en.wikipedia.org/wiki/Abductive_reasoning](https://en.wikipedia.org/wiki/Abductive_reasoning)

I am working on dialog management stuff as an engineer who sits between data
scientists, NLP, and product people. It is understandable that trained stats
people see everything as a stats problem (induction), but we forgot about
GOFAI & symbolic AI (deductive) systems along the way.

Philosophy has actually been ahead of the curve on this one for over 140
years. Maybe the "3rd wave" of AI will be to synthesize.

[https://www.darpa.mil/attachments/AIFull.pdf](https://www.darpa.mil/attachments/AIFull.pdf)

I personally have not had a chance to read Norvig or dive into LISP or Prolog
much, but am using CLIPS expert system in a project and have toyed with
core.logic in Clojure back in the day. Lots of opportunity to resurrect some
of these older methods that may be superior to imperative/bespoke code given
modern hardware and infrastructure.

