Hacker News new | past | comments | ask | show | jobs | submit login

> Let me try again:

> p=0.05 means that one POSITIVE result in 20 is going to be the result of chance and not causality

No, you still didn't get it. In the example above, a full 100% of positive results, 20 out of every 20, are the result of chance and not causality.

Your followup discussion is better, but your statement at the top doesn't work.

(Note also that there is an interaction between p-threshold and sample size which guarantees that, if you're investigating an effect that your sample size is not large enough to detect, any statistically significant result that you get will be several times stronger than the actual effect. They're also quite likely to have the wrong sign.)




> No, you still didn't get it. In the example above, a full 100% of positive results, 20 out of every 20, are the result of chance and not causality.

Yep, you're right. I do think I understand this, but rendering it into words is turning out to be surprisingly challenging.

Let me try this one more time: p=0.05 means that there is a 5% chance that any one particular positive result is due to chance. If you test a false hypothesis repeatedly, or test multiple false hypotheses, then 5% of the time you will get false positives (at p=0.05).

However...

> Imagine a hypothetical scientist that is fundamentally confused about something important, so all hypotheses they generate are false. Yet, using p=0.05, 5% of those hypotheses will be "confirmed experimentally". In that case, it is not 5% of the "experimentally confirmed" hypotheses that are wrong -- it is full 100%.

This is not wrong, but it's a little misleading because you are presuming that all of the hypotheses being tested are false. If we're testing a hypothesis it's generally because we don't know whether or not it's true; we're trying to find out. That's why it's important to think of a positive result not as "confirmed experimentally" but rather as "not ruled out by this particular experimental result". It is only after failing to rule something out by multiple experiments that we can start to call it "confirmed". And nothing is ever 100% confirmed -- at best it is "not ruled out by the evidence so far".


> I do think I understand this, but rendering it into words is turning out to be surprisingly challenging.

A p-value of .05 means that, under the assumption that the null hypothesis you specified is true, you just observed a result which lies at the 5th percentile of the outcome space, sorted along some metric (usually "extremity of outcome"). That is to say, out of all possible outcomes, only 5% of them are as "extreme" as, or more "extreme" than, the outcome you observed.

It doesn't tell you anything about the odds that any result is due to chance. It tells you how often the null hypothesis gives you a result that is "similar", by some definition, to the result you observed.


What do you think that "due to chance" means?


That is a very reasonable question, and in this context we might reasonably say that "this [individual] outcome is due to chance" means the same thing as "the null hypothesis we stated in our introduction is platonically correct".

But I don't really see the relevance to this discussion?

Suppose you nail down a null hypothesis, define a similarity metric for data, run an experiment, and get some data. The p-value you calculate theoretically tells you this:

If the above-mentioned hypothesis is true, then X% of all data looks like your data

It doesn't tell you this:

If you have data that looks like your data, then there is an X% chance that the above-mentioned hypothesis is true

Those are two unrelated claims; one is not informative -- at all -- as to the other. The direction of implication is reversed between them.

Imagine that you're considering three hypotheses. You collect your data and make this calculation:

1. Hypothesis A says that data looks like what I collected 20% of the time.

2. Hypothesis B says that data looks like what I collected 45% of the time.

3. Hypothesis C says that data looks like what I collected 100% of the time.

Based only on this information, what are the odds that hypothesis A is correct? What are the odds that hypothesis C is correct? What are the odds that none of the three is correct?


This is getting deep into the weeds of the philosophy of science. It is crucially important to choose good hypotheses to test. For example:

> Hypothesis C says that data looks like what I collected 100% of the time.

What this tells you depends entirely on what hypothesis C actually is. For example, if C is "There is an invisible pink unicorn in the room, but everyone will deny seeing it because it's invisible" then you learn nothing by observing that everyone denies seeing the unicorn despite the fact that this is exactly what the theory predicts.

On the other hand, if C is a tweak to the Standard Model or GR that explains observations currently attributed to dark matter, that would be a very different situation.


> It is crucially important to choose good hypotheses to test.

But if you were able to do that, you wouldn't need to test the hypotheses. You'd already know they were good.

I'm intrigued as to why you picked those two examples. They differ in aesthetics without differing in implications, but you seem to want to highlight them as being different in an important way!


Seriously? You don't see any substantive difference between an explanation of dark matter and positing invisible pink unicorns? How do I even begin to respond to that?

Well, let's start with the obvious: there is actual evidence for the existence of dark matter -- that's the entire reason that dark matter is discussed at all. There is no evidence for the existence of invisible pink unicorns. Not only is there no evidence for IPU's, the IPU hypothesis is specifically designed so that there cannot possibly be any. The IPU hypothesis is unfalsifiable by design. That's the whole point.


If the invisible pink unicorn hypothesis was true, what about the world would be different?

If the MOND hypothesis was true, what about the world would be different?

The whole reason we have a constant supply of theories attempting to explain observations currently attributed to dark matter in terms other than "dark matter" is that people feel the dark matter theory is stupid. There's nothing else to it. I assume you feel the same way about unicorns. What's the difference supposed to be?

> There is no evidence for the existence of invisible pink unicorns.

You need to be careful here too. The fact that a theory is false does not mean there is no evidence for that theory.


> The whole reason we have a constant supply of theories attempting to explain observations currently attributed to dark matter in terms other than "dark matter" is that people feel the dark matter theory is stupid.

No, that's not true. The reason we have a "constant supply" of dark matter theories is that all of the extant theories have been falsified by observations, including MOND. If this were not the case, dark matter would be a solved problem and would no longer be in the news.

> The fact that a theory is false does not mean there is no evidence for that theory.

What makes you think the IPU theory is false? The whole point of the IPU hypothesis is that it is unfalsifiable.


You can't simply ignore the base rate, even if you don't know it.

In a purely random world, 5% of experiments are false positives, at p=0.05. None are true positives.

In a well ordered world with brilliant hypotheses, there are no false positives.

If more than 5% of experiments show positive results at p=0.05, some of them are probably true, so you can try to replicate them with lower p.

p=0.05 is a filter for "worth trying to replicate" (but even that is modulated by cost of replication vs value of result).

The crisis in science is largely that people confuse "publishable" with "probably true". Anything "probably better then random guessing" is publishable to help other researchers, but that doesn't mean it's probably true.


> p=0.05 is a filter for "worth trying to replicate"

Yes, I think that is an excellent way to put it.

> The crisis in science is largely that people confuse "publishable" with "probably true".

I would put it slightly differently: people conflate "published in a top-tier peer-reviewed journal" with "true beyond reasonable dispute". They also conflate "not published in a top-tier peer-reviewed journal" with "almost certainly false."

But I think we're in substantial agreement here.


Do you know the difference between "if A then B" and "if B then A"?

This is the same thing, but with probabilities: "if A, then 5% chance of B" and "if B, then 5% chance of A". Those are two very different things.

p=0.05 means "if hypothesis is wrong, then 5% chance of published research". It does not mean "if published research, then 5% chance of wrong hypothesis"; but most people believe it does, including probably most scientists.


> if hypothesis is wrong, then 5% chance of published research

I would say "5% chance of positive result nonetheless" but yes, I do get this. I'm just having inordinate trouble rendering it into words.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: