Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> [...] but it's quite possible to hit real correlations with a pile of plausible hypotheses.

Yes of course. But the trouble is that, if you do this p-hacking expedition, you are guaranteed to find those correlations in pure noise. So if you use a procedure that will find something in noise - you cannot also use it to claim to have found something in your data.

In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false". In this case no refutation would have occurred if the correlation is false. Hence - the result is equivalent as if no test has actually been performed.

Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens". Some astro-physicists then describe that all the observed properties of that object behave just as we expect them to behave in the case of an asteroid. But the person might then reply with: "yeah, but it still might have been aliens".

I feel that the same is true for "yeah, but the correlation might still be true".



> In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false".

Sure, one weak result out of many doesn't pass. But not passing is a far cry from "almost guaranteed" to be spurious.

> Hence - the result is equivalent as if no test has actually been performed.

A result like that takes a big list of plausible correlations and distills it down. If you think even a handful of the original list items are likely to have merit, then the distilled list is useful for suggesting where you should collect more data.

> Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens".

What fraction of asteroids to you expect to be aliens?

If it's one in a billion, then cutting the list by a factor of 20 is useless. If it's one in a hundred, then cutting the list by a factor of 20 is very helpful.

> I feel that the same is true for "yeah, but the correlation might still be true".

It depends on the original list being sufficiently plausible. You can't distill tap water into vodka.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: