
You need 16 times the sample size to estimate an interaction than a main effect - luu
http://andrewgelman.com/2018/03/15/need-16-times-sample-size-estimate-interaction-estimate-main-effect/
======
rripken
That 16x number is specific to a set of assumptions in the article. Is this
just the Curse of Dimensionality in action?
[https://en.wikipedia.org/wiki/Curse_of_dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality)

~~~
RA_Fisher
Andrew calls it the blessing of dimensionality:
[http://andrewgelman.com/2004/10/27/the_blessing_of/](http://andrewgelman.com/2004/10/27/the_blessing_of/)

Multi-level modeling's concept of partial pooling is really a killer feature.

~~~
xitrium
Bob Carpenter did a great case study on the this topic, for anyone new to it:
[http://mc-stan.org/users/documentation/case-studies/curse-di...](http://mc-
stan.org/users/documentation/case-studies/curse-dims.html)

Check out the graphs, especially of the euclidean distance from the mode of a
random point as dimensions increase.

------
zetazzed
The most important part of this article is the way that Gelman derives the
value. He writes a simple R script to simulate the outcomes. Many, many
introductory stats classes will teach you a variety of formulas to estimate
the "statistical power" of various experiments, and it's easy to get
intimidated and worry if you know the right one to use. But if you know how to
program, and you have a reasonable set of assumptions you'd like to test, then
you can do a similar analysis in about 10 minutes. These analyses can often
save you months of work when you figure out that your exciting study idea has
no hope of detecting a difference or that you actually don't need 10000
subjects to detect a 10% lift in conversion rate. Thank you, Professor Gelman!

------
tysonzni
_Further suppose that interactions of interest are half the size of main
effects._

Why make this assumption? I'm hoping to see more justification as to why the
main effect should be assumed to be exactly 2x the interaction effect.

Edit: In comments he wrote - _I think it makes sense, where possible, to code
variables in a regression so that the larger comparisons appear as main
effects and the smaller comparisons appear as interactions._

~~~
xyzzyz
This is an exam question, so the author is free to assume whatever he wants.
But, more importantly, the interaction as large as half of the main effect is
already pretty large -- in real world, many studies look at much smaller
interactions. The point here is that if you need 16 times sample size to get
enough power to detect the interaction at half effect size, you will need even
larger sample size to detect smaller interactions.

