Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  > We know that that, occasionally, a test will generate a
  > false positive due to random chance - we can’t avoid that.
  > By convention we normally fix this probability at 5%. You
  > might have heard this called the significance probability
  > or p-value.

  > If we use a p-value cutoff of 5% we also expect to see 5
  > false positives.
Am I reading this incorrectly, or is the author describing p-values incorrectly?

A p-value is the chance a result at least as strong as the observed result would occur if the null hypothesis is true. You can't "fix" this probability at 5%. You can say "results with a p-value below 5% are good candidates for further testing". The fact that p-values of 0.05 and below are often considered significant in academia tells you nothing about the probability of a false positive occurring in an arbitrary test.



Author of the paper here. You're right this is incorrect. I corrected this in the final copy but a earlier draft seems to have been put on the website. There are a few other errors too. I am describing the 'significance level' here not the 'p-value', as you say.


is the corrected final version uploaded at the same URL? I'd like to distribute to some colleagues.


Just to let you know it's been updated.


Yes, there's perhaps a small error, although it might be that he's rounded up in his favour.

In his described scenario there are 90 cases where the null hypothesis is true (he states as a premise: "10 out of our 100 variants will be truly effective").

So strictly, we expect to see 5% of 90 = an average of 4.5 false positives (he says 5 false positives).

[Edited to add: False positive rate is measured as a conditional probability https://en.wikipedia.org/wiki/False_positive#False_positive_...]


I don't follow. Why would we expect 5% of those 90 cases to be false positives, and what relationship does the estimate of 5% have to p-value? I don't understand how p-value could ever be used to predict the number of false positives one would expect to observe in a bundle of arbitrary tests.


A p-value cutoff of 5% says that you have a 5% probability that you're wrong in rejecting the Null Hypothesis.

So if you test 100 times, you'd expect to wrongly reject the Null Hypothesis 5 times.


> A p-value cutoff of 5% says that you have a 5% probability that you're wrong in rejecting the Null Hypothesis.

I don't think this is right. A p-value cutoff of 0.05 doesn't, by itself, indicate anything about the underlying probability of incorrectly rejecting the null hypothesis. It tells you, in a test that meets your cutoff, if the null hypothesis is true, the chance of seeing a results as strong or stronger than the results observed in the test is 5% or less. But that can't tell you the chance you're wrong in rejecting the null hypothesis.

A 1% chance of seeing results as strong as your results if the null hypothesis is true does not mean that there's a 99% chance of the null hypothesis being false.

Regardless, even putting this disagreement to one side, I still don't see how the original author's point makes sense. He or she seems to be using the cutoff as an indication of the underlying false-positive probability for prospective tests, regardless of the results of those tests meet the cutoff or not.


gabemart, you're right, a_bonobo, ronaldx, you guys are wrong. p-values are commonly misunderstood to mean that the result has "5% change of being wrong". That's not what a p-value is. Please go ahead and read the 'misunderstandings' section on p-values in wikipedia.


sigh, no. When the author says "p-value cutoff", this refers to the significance level. I interpreted this correctly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: