
It's the Effect Size, Stupid: What effect size is and why it is important (2002) - bjourne
https://www.leeds.ac.uk/educol/documents/00002182.htm
======
davros
Yes! My two top peeves with the majority of soft-science research are
underpowered studies (and associated false conclusions) and a seemingly total
neglect of effect size. The binary question 'is an intervention better?' is
not the right question. To discuss any medical, social, policy, etc
intervention you need to know _how much_ benefit it delivers. This can then be
assessed compared to costs and difficulties of the intervention.

Of course, effect size isn't enough either. You still need a good
understanding of the range of outcomes, if there are significant fraction of
participants who had poor/negative outcomes, etc. This can be hidden a bit
inside a single metric like effect size.

------
nabla9
Sometimes sample size of one can be enough.

If you have just one white lab rat (N=1) and it weighs 50 kg after the
treatment, you can just skip all statistical analysis an publish the result.
The importance of the discovery is self evident due to the effect size alone.

The discovery that H. pylori causes peptic ulcers was first submitted to
Lancet as a two papers, 25 patients study and 100-patient study. Lancet was
slow to publish and there was resistance from the medical establishment. It
was hard to find reviewers who would agree on the importance of the paper.
Barry Marshall decided to do experiment with himself and it was very
convincing. Marshall and Warren received a Nobel price for their discovery.

------
glangdale
Very good! I find it aggravating that the word "significant" is thrown around
casually to refer to outcomes that are "statistically significant" yet
"extremely small".

We would encounter this a lot in performance engineering - sometimes you could
isolate that a new optimization was statistically significantly better, yet
only worth 0.1% in terms of performance improvement (possibly in exchange for
a whole bunch of new code).

~~~
jdietrich
At least in medicine, there's a useful distinction between "statistically
significant" (the math says that this drug works) and "clinically significant"
(this might actually be worth giving to patients).

------
jacques_chester
I'd like to point out that if you are doing performance testing, you should
read a little stats or have someone good at stats lend a hand. Because it is
really easy to fool yourself. I've fooled myself and I worry about doing it
again.

Take effect size, for example. Suppose I run my test before and after the
latest commit. The software now has 2% fewer TPS.

Is that meaningful? It depends. Should I rely on a single run? Almost
certainly not.

Suppose performance improves by 203%. Is that meaningful? Probably. Should I
rely on a single run? Almost certainly not.

Then there's the obnoxious problems of vast amounts of uncontrollable
variables. Folks run buckets of testing on cloud platforms without knowing if
they're landing on a physical machine they used last time and so their bits
are disk-warm, or that there's a network glitch in central1-a but not
central1-b but they don't know which one they're using, or they run tests at
different times of day and don't realise they will get different competition
for cloud resources due to diurnal demand ...

tl;dr run more tests and get someone to help you. If you run ab2 three times
and publish a breathless blog post, be that on your soul.

------
twoslide
While effect size is important, it is really only comparable across studies
with randomized, dichotomous treatment. Comparability in non-experimental
studies or in studies with dichotomous/polytomous outcomes is more difficult.

