Hacker News new | past | comments | ask | show | jobs | submit login

> The only factor that did [affect the likelihood of successful reproduction] was the strength of the original effect — that is, the most robust findings tended to remain easily detectable, if not necessarily as strong.

This is probably just regression to the mean. The comment above suggests to me that the tendency for the findings in replicated experiments to be weaker does NOT necessarily come from any flaw in the experimental design, but from the criteria for findings to be published.

You would expect any given effect to show some variation around a mean effect size. My lab and your lab might arrive at slightly different results, varying around some mean/expected result. If your lab's results meet statistical significance, you get to publish. If my lab's don't, I don't get to. So the published results are the studies that, on average, show a stronger effect than you might see if you ran the study 100 times.

> Yet very few of the redone studies contradicted the original ones; their results were simply weaker.

If a third lab replicates the experiment, their results are more likely to be close to the (possibly non-publishable) mean value than the (publishable) outlier value. So on average, repeating an experiment will give you a LESS significant result.

If the strength of the original effect (and thus probably the mean effect strength over many repeated experiments) is larger, the chance of replicated experiments also being statistically significant is higher.

In other words, these new results are very predictable and don't necessarily indicate that anything is wrong.




Yes, however:

My expectation is that this regression to the mean should not apply to strong effects. Where I define strong by: the strength is enough that significance level and publication criteria are unimportant. In this case, I would expect half the results to return stronger.

The first result was a random sample, and the second result was a random sample. If there's no outside bias from publication cut-off, there should be a 50% chance that either is higher.

It's concerning if the strong results consistently re-test weaker. That shows systematic bias.


For an example of this effect in physics, look at the Millikan experiment

https://en.wikipedia.org/wiki/Oil_drop_experiment#Millikan.2...


Oh, the Millikan experiment. I had to do it in uni lab as a 4 hour long experiment. It's impossible to gather enough data in this short time and your eyes figuratively falls out by staring into the microscope measuring the drops' velocity. I can assure that this is the worst experiment one can do as a student.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: