
Not Even Scientists Can Easily Explain P-values - ryan_j_naughton
http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/
======
elbigbad
P-Values can be pretty unintuitive to people unfamiliar, but can readily be
explained I think.

I do think that this book did an excellent job of explaining it:

What Is a P-Value Anyway?
[http://www.pearsonhighered.com/vickers/](http://www.pearsonhighered.com/vickers/)

Excerpt (starts at p.57 in the link above):

[I]f you do nothing else, please try to remember the following sentence: “the
p-value is the probability that the data would be at least as extreme as those
observed, if the null hypothesis were true.” Though I’d prefer that you also
understood it—about which, teeth brushing.

I have three young children. In the evening, before we get to bedtime stories
(bedtime stories being a nice way to end the day), we have to persuade them
all to bathe, use the toilet, clean their teeth, change into pajamas, get
their clothes ready for the next day and then actually get into bed (the
persuading part being a nice way to go crazy). My five-year-old can often be
found sitting on his bed, fully dressed, claiming to have clean teeth. The
give-away is the bone dry toothbrush: he says that he has brushed his teeth, I
tell him that he couldn’t have.

My reasoning here goes like this: the toothbrush is dry; it is unlikely that
the toothbrush would be dry if my son had cleaned his teeth; therefore he
hasn’t cleaned his teeth. Or using statistician-speak: here are the data (a
dry toothbrush); here is a hypothesis (my son has cleaned his teeth); the data
would be unusual if the hypothesis were true, therefore we should reject the
hypothesis.

[...]

So here is what to parrot when we run into each other at a bar and I still
haven’t managed to work out any new party tricks: “The p-value is the
probability that the data would be at least as extreme as those observed, if
the null hypothesis were true.” When I recover from shock, you can explain it
to me in terms of a toothbrush (“The probability of the toothbrush being dry
if you’ve just cleaned your teeth”).

~~~
sbov
This is way too much for the layman and full of mathematical language. What is
null hypothesis? What is "at least as extreme as"? What is probability?
Nevermind that the null hypothesis forces us to think in double negatives,
which isn't the most intuitive.

It also isn't clear to me how you would apply your example given how most
p-values are used. I read an article that says exercising makes you richer
with a 95% confidence. So, as a layman, if I attempt to apply your example, it
leads me directly into a common p-value mistake: the probability of me
becoming richer if I exercise. 95%, right? Wrong!

~~~
ewzimm
I'm having a hard time understanding why p-values are so hard to explain
without resorting to jargon... it seems very intuitive to me, and none of
these examples really put it in context.

I'd explain it like this:

It's easy to lie with statistics if you take them out of context. Maybe I have
an idea that drinking flouridated water causes cancer, so I take a survey and
find that 40% of people who drink flouridated water develop cancer. That
number is meaningless without knowing how likely it is for people who don't
drink it to develop cancer, which is about 40%, so the numbers match up 1:1,
completely normal. In this case, 1 is the p-value, and it gives you a starting
point for developing an experiment.

~~~
nonbel
>"That number is meaningless without knowing how likely it is for people who
don't drink it to develop cancer, which is about 40%. That's the p-value, and
it gives you a starting point for developing an experiment."

That is the frequency of cancer given not drinking fluoridated water. That is
not a p-value.

~~~
ewzimm
Sorry, I saw how that could be misleading by saying that's so I clarified
right away... not fast enough I see. I mean that comparing the observed data
to a baseline gives you a p-value.

~~~
nonbel
This example is still misleading because you have selected a special case
where the frequencies are exactly equal. In any other case the p value will
depend on the sample size.

~~~
ewzimm
True, but what I'm trying to convey is that if someone just wants to generally
understand what a p-value is from no knowledge, understanding how to read a
specific number isn't important. At first, I wasn't even going to include a
specific number, but I realized not having one would be confusing in its own
way.

Looking at a specific p-value and understanding its significance is only
useful once they're actually looking at a study. But first they just want to
know how it's derived and what it's good for.

------
peatmoss
Lest anyone skimming the headline misses it, the point here isn't that these
scientists couldn't correctly explain a p-value. The scientists here all
could. It's that they couldn't explain it in a way that made intuitive sense
to anyone. So, rather than being a dig at scientists, I think the points here
are:

1\. p-values are intrinsically none-too-intuitive.

2\. What p-values tell us isn't necessarily that interesting.

~~~
jsprogrammer
> 1\. p-values are intrinsically none-too-intuitive

P-values are quite intuitive. They are the ratio of observations made which
agree with the null hypothesis to the total number of observations made.

Edit: If you are going to downmod, at least point out the error you perceive.

~~~
evanpw
It doesn't make sense to say in a binary sense that an observation agrees or
disagrees with the null hypothesis, there are only probabilities. The p-value
is the probability of seeing a set of values as extreme as your actual
observations, assuming that the null hypothesis is true.

~~~
TeMPOraL
FWIW, 'jsprogrammer is smuggling here the basic definition of probability -
that P(event) = (number of observations in which an event happens) / (total
number of observations).

~~~
germanier
That only is the case if you work on a discrete uniform distribution, which
you usually don't.

~~~
jsprogrammer
The axioms are only defined on discrete uniform distributions. All other
distributions must be constructed from the discrete uniform.

~~~
evanpw
This is not true. If you have a discrete uniform distribution, you can only
have finitely many elements in your probability space, so it wouldn't make
sense to have an axiom which refers to countably-many disjoint sets. The
axioms are most useful for continuous distributions, and are closely related
to measure theory.

~~~
jsprogrammer
The discrete, uniform nature is obvious in the rule of computation:

Σ P(E)

Each "unit" of the computation is weighted equally (though, the value of the
units may differ).

~~~
evanpw
I'll give you the benefit of the doubt and assume that you aren't deliberately
trolling me, but you do seem to be deeply confused. I'll just give you a link
to some lecture notes that go all the way from the axioms of probability
(lecture 1) to p-values (lecture 12) and leave it at that:
[http://www.win.tue.nl/~rmcastro/2DI90/index.php?page=lecture...](http://www.win.tue.nl/~rmcastro/2DI90/index.php?page=lectures).

~~~
jsprogrammer
Thanks for the massive set of pages. I already went through a similar course a
long time ago.

You'll note that the notes don't derive the calculation of p-value. It merely
gives a trivial example (curiously, the sum of two probabilities) with a fiat
interpretation.

It's curious that my last post, merely quoting the third axiom and showing its
properties plain, makes you think that I am deeply confused, when you cannot
even refute trivial points and must resort to arguments (poor, in that they do
not address our issue here) from authority.

~~~
germanier
I assume you refer to page 28 in the first slide deck? If you would have read
closely you would have noticed that it says:

> We will restrict ourselves to discrete sample spaces for now, to avoid some
> technical difficulties…

In the later lectures they take a look at sample spaces that are uncountably
infinite. Good luck working with a sum there.

~~~
jsprogrammer
No. I am referring to lecture 12 (ie. Ch9).

What reformulation of the probability axioms are you using to eliminate the
sum in the third axiom?

The axioms may be trivially extended to bounded intervals (a la calculus).

------
goodside
My go-to phrase: "Even if there were no real difference, we would see a result
like this X% of the time anyway."

~~~
llasram
I agree that makes intuitive sense when applied to explicitly constructed
experiments we could attempt to exactly repeat, but the intuition begins to
break down in more complicated circumstances, such as retrospective or
longitutinal studies. Saying something would happen "X% of the time anyway"
becomes a philosophical issue.

~~~
omginternets
Yes, well, we could talk about frequentist vs bayesian statistics, but that's
hardly the point of the article.

That _is_ what a p-value means, feasibility aside. There are also more
intuitive ways to think about it, "false-positive rate" being one.

------
jforman
Having been both a scientist and, now, a product manager for a medical device,
I think science would benefit from the work around verification vs.
validation:

[https://en.wikipedia.org/wiki/Verification_and_validation](https://en.wikipedia.org/wiki/Verification_and_validation)

A p-value is a verification tool and nothing more, and yet scientists all too
often take the "hunt for the p-value" as the only goal. Rather than fish
through data for a p-value that ends up defining the scientific narrative in a
paper, p-values need to be integrated into a much larger validation process
driven by pre-defined scientific goals. It's no wonder that so many results
are found to be specious given today's scientific culture.

------
blahblah3
The point of p-values is to limit false positive rates.

If you only accept results with p-value <= 0.05, your false positive rate is
at most 5%. Of course, if you had apriori knowledge that everything was a
"null result" , then 100% of the results you accept would be false positives.

By Bayes' theorem, P(H|D) = P(D|H) * P(H) / P(D) . Therefore, given a fixed
"prior probability" of a hypothesis P(H), a lower p-value (P(D|H)) implies a
lower probability of the null being true. I think this relationship leads to a
lot of confusion when thinking of p-values.

~~~
jcranmer
You are incorrect. False positive rates in practice appear to be about 20-50%,
even with a p-value of 0.05.

~~~
nonbel
The best estimate right now is 20-50% replication rate for fields that rely on
p less than 0.05 to judge results. So I think you meant 50-80% false positive
rate, although these are not equivalent (ie is the null hypothesis of no
difference ever really exactly true...):

>"Ninety-seven percent of original studies had statistically significant
results. Thirty-six percent of replications had statistically significant
results; 47% of original effect sizes were in the 95% confidence interval of
the replication effect size; 39% of effects were subjectively rated to have
replicated the original result; [...] In cell biology, two industrial
laboratories reported success replicating the original results of landmark
findings in only 11 and 25% of the attempted cases"
[http://www.ncbi.nlm.nih.gov/pubmed/26315443](http://www.ncbi.nlm.nih.gov/pubmed/26315443)

Edit:

For comparison, Dr. Oz was accused of fraud and called in front of congress
for only being wrong half the time:
[https://www.washingtonpost.com/news/morning-
mix/wp/2014/12/1...](https://www.washingtonpost.com/news/morning-
mix/wp/2014/12/19/half-of-dr-ozs-medical-advice-is-baseless-or-wrong-study-
says/)

------
Jach
P-values are only hard to explain when you try to explain in English. Stick to
the math, where it's just P(X >= x | Ho) and once you understand the math it's
immediately clear in a way an English sentence (or paragraph trying to equate
to the math without math) isn't. My go-to presentation on this is:
[http://www.biostat.jhsph.edu/~cfrangak/cominte/goodmanvalues...](http://www.biostat.jhsph.edu/~cfrangak/cominte/goodmanvalues.pdf)

------
data_hope
In my experience, people in science who are not statisticians have not
completely understood the p value. Typically people are clueless on p-Values
and the statistical power and how they relate.

~~~
MollyR
This is scarily true. When my lab (bioinformatics) would work/collaborate with
research biologists and their labs, the research biologists lack of statistics
would become very clear. Especially when we did data mining on their qpcr
experiments, we constantly have to explain things like bonferroni corrections.

Edit: This is just my experience at one university with 6-7 biological
research labs. I regret using the generalization, apologies.

------
btilly
My belief is that p-values are popular exactly BECAUSE they can be
misunderstood so easily as the answer to the question people really want
answered. Namely, "What are the odds that this is the right answer?"

I many years of experience in helping people understand A/B testing. And I
have found that no matter how many times you give a correct explanation,
people will have this exact misunderstanding. Repeatedly.

So if you're A/B testing, here is something to consider. If you successfully
set up a testing culture where you decide at a 95% confidence level, and are
running one A/B test per week, you will make an average of 2-3 mistakes per
year. In the long run, mistakes are unavoidable. You will make mistakes. The
only question is how often you make mistakes, and how bad those mistakes are.

P-values manage to bound how often you make mistakes with the trick of most of
the time saying you couldn't figure out an answer. THIS IS USELESS FOR A
BUSINESS. A business needs to decide what text to use, even if they're not
confident.

Your A/B tests should always produce a usable business answer. The question of
interest is how often you make _bad_ mistakes and _how bad_ they are. And
p-values are useless for that.

------
nonbel
>"I’ve come to think that the most fundamental problem with p-values is that
no one can really say what they are."

Not surprising. The correct interpretation of p-values was not discovered
until ~2 years ago. If you search around the internet you will find out that,
according to the discoverer, he couldn't find a stats journal to publish it.
The number of people who know the correct definition must be vanishingly
small.

Thank god for arxiv, or I would have never understood p-values:
[http://arxiv.org/abs/1311.0081](http://arxiv.org/abs/1311.0081)

tldr: Calculating the p-value + sample size is a lossless compression
algorithm. The thing being compressed is a likelihood function (way of
describing effect size).

~~~
kgwgk
I think you overestimate the importance of that paper. You can get "a"
likelihood from the p-value in the same way that a sufficient statistic
determines "the" likelihood function. In the best case, if the sufficient
statistic is one-dimensional and you are doing a one-tailed test, both
likelihood functions are the same (because the mapping from the sufficient
statistic to the p-value is bijective). In general you're losing information
(there will be multiple values of the sufficient statistic mapped to the same
p-value).

------
jroitgrund
The coin example makes perfect sense to me and I had no idea what a p-value
was before I read this article.

Here's an intuitive explanation: a p-value is the probability of getting your
experimental results given that your hypothesis is wrong.

~~~
Estragon
> a p-value is the probability of getting your experimental results given that
> your hypothesis is wrong.

That's a common misconception. Actually it's the probability of getting your
experimental results given that your _null_ hypothesis is _right_.

~~~
im2w1l
Well normally, hypothesis is wrong <-> null hypothesis is right.

~~~
iamsohungry
No, if the null hypothesis is right that implies that the hypothesis is wrong,
but the implication relationship _does not_ go the other direction.

Consider an experiment where your hypothesis is that cold temperatures cause
the common cold. This is a good example for a thought experiment because we
"know the answer" in a way (there have been a lot of experiments on this). The
null hypothesis in this case is that cold temperatures are uncorrelated with
incidence of the common cold.

You place people in isolation in cold areas and a control group in warm areas,
and study how many get the common cold. None of the people who didn't already
have colds get the common cold: because they are in isolation and the common
cold is caused by rhinoviruses (which they can't get because they are in
isolation), you get exactly the same results.

This disproves the hypothesis, but it _does not_ prove the null hypothesis,
that cold temperatures and the common cold are unrelated. Cold temperatures
are, in fact, related to the common cold.

Try a second experiment: you place people in groups of five in cold areas and
in warm areas, and discover a moderately high correlation between cold
temperature and incidence of the common cold. This disproves the null
hypothesis. But the simple hypothesis that cold temperature causes the common
cold has also been disproven by your first experiment.

The reason for this is that the correlation between cold temperature and the
common cold is a dependent correlation: _given that rhinovirus is present in
the system_ cold temperature is correlated with incidence of common cold
(rhinoviruses reproduce ideally at temperatures significantly lower than human
homeostatic temperature).

The null hypothesis is not just a statement that there is no independent
correlation, it's a statement that there is no independent _or_ dependent
correlation. As such, the null hypothesis is an extremely broad hypothesis
which is impossible to practically prove. This is why there's such a focus on
finding correlations rather than finding non-correlations: you aren't going to
prove the null hypothesis.

------
omginternets
From my limited teaching experience, renaming "p-value" to "probability of a
false positive" makes it all very simple.

~~~
nonbel
This is a very common, but dangerously wrong, interpretation of p-values.
Others in this thread have expressed the same misconception, see the paper
linked here:
[https://news.ycombinator.com/item?id=10628860](https://news.ycombinator.com/item?id=10628860)

~~~
omginternets
Duly noted, thanks for the reference!

------
TallGuyShort
This is an excellent talk I attended a while ago that explains it very well
from a programmer's perspective:
[https://www.youtube.com/watch?v=5Dnw46eC-0o&list=PL055Epbe6d...](https://www.youtube.com/watch?v=5Dnw46eC-0o&list=PL055Epbe6d5Y8_iZPo7pH3hOnAtchMCJt&index=8)

------
graycat
How about: The p-value is, in a statistical hypothesis test, the probability
of Type I error, that is the probability of rejecting the null hypothesis when
it is true.

Or, suppose we have some data and want to use it to estimate the value of some
number b. It can be that if we can assume something about b, e.g., that b = 0,
then we can calculate the probability distribution of our estimate of b. This
assumption about b is the _null hypothesis_. Intuitively it is a hypothesis
that there is _no effect_ , that what we thought might have happened didn't,
was a _null_ effect.

We can make two mistakes. We can reject the null hypothesis when it is true --
this is called _Type I_ error. Or we can accept the null hypothesis when it is
false -- this is called _Type II_ error.

With our null hypothesis, we get and look at the distribution of our estimate
of b and see where our actual estimate is in that distribution. The p-value is
the probability, from the distribution of our estimate, of getting an estimate
as far or farther from our null hypothesis value for b as we did. So, if the
p-value is really small, say, 1%, then we can reject the null hypothesis, that
is, say that it is false, and be wrong only 1% of the time.

E.g., if our null hypothesis is that b = 0 and our estimate of b is 10 and
from the distribution of our estimate a value of our estimate being as far as
10 from b = 0 is 1%, then we reject that b = 0 and conclude that it b is not
zero and are wrong only 1% of the time. If the probability of our estimate
being greater than or equal to 10 is 1% and we reject the null hypothesis,
then we conclude that b > 0.

This is all just hypothesis testing in statistics 101. Will also want to know
about the _power_ of a test, the t-test, the F ratio, the chi-squared test,
and _resampling_ and distribution-free tests.

There is more detail in

[https://en.wikipedia.org/wiki/Type_I_and_type_II_errors](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors)

~~~
nonbel
>"How about: The p-value is, in a statistical hypothesis test, the probability
of Type I error, that is the probability of rejecting the null hypothesis when
it is true."

No, see here: "Of critical importance, as Goodman (1993) has pointed out, is
the extensive failure to recognize the incompatibility of Fisher’s evidential
p value with the Type I error rate, α, of Neyman–Pearson statistical
orthodoxy."

P Values are not Error Probabilities
[http://www.uv.es/sestio/TechRep/tr14-03.pdf](http://www.uv.es/sestio/TechRep/tr14-03.pdf)

~~~
graycat
Okay, I looked at that PDF. I read through several pages and mostly saw lots
of discussion of statements the authors claimed were common and wrong. I can
agree with the wrong but am less convinced about the common.

I never saw a clean definition of p-value or alpha -- probability of Type I
error.

In my study of statistics, p-value and alpha are essentially the same. Maybe a
difference: A p-value is what a researcher has in mind before the hypothesis
test as a 'cutoff value', say, 5%, for alpha while alpha is whatever the
statistical hypothesis test says, for the data and the test, is the
probability of Type I error, e.g., 3.141592653 or some such random number. So,
since the 3.14 is less than the p-value of 5%, the experimenter rejects the
null hypothsis. MAYBE this is the difference, if any, between p-value and
alpha, in which case we're talking a triviality.

The paper you referenced also starts to get wound up over beta = 1 - alpha as
the _power_ of a test. Okay, but that little equation is the definition of
_power_. So,if have some data and a p-value or alpha in mind, and have several
candidate statistical hypothesis tests, say, some _parametric_ and the others
_distribution-free_ , etc., then will want to use the test with the smallest
probability of Type II error, that is, the largest beta. Okay. Now are done
with beta, and no more strain or struggle needed.

A biggie point is the real role of the null hypothesis -- it lets us calculate
some probabilities that, otherwise, we would not have assumptions enough to
do.

If there was a fight between Fisher and Neyman-Pearson nearly 100 years ago,
then I'm sorry, but now I have no sympathy for whatever the heck, if anything,
they were arguing about then.

~~~
nonbel
>"I never saw a clean definition of p-value or alpha -- probability of Type I
error."

On re-reading this I agree. They do not make a clean comparison. They do
define them though:

p-value: "the probability of the observed and more extreme data given the
null"

alpha: "α is the long-run frequency of Type I errors"

To begin with, you appear to have these reversed. Second, the p-value is
dependent on the data, it will be different from experiment to experiment. The
exact same experiment can give you p=0.001 one time and p=0.32 the next time.
Which is the probability of type I error?

~~~
graycat
Each of 0.32 and 0.001 is the _conditional probability_ of type I error
conditioned on the data actually used in finding it. It is common to play fast
and loose when are conditioning on something and when not.

This conditional probability is conditioned on the observations we used; since
those are random variables, so is the conditional probability. Then the 0.32
and 0.001 are values of this random variable on the two trials. The
expectation of this random variable will be the probability of Type I error
alpha.

I strained and tried to find a difference between p value and alpha, but in
current usage I see no difference. My guess is that people prefer p-value
because it abbreviates _probability_ while alpha does not.

Maybe some people want to say that

alpha = E[E[p value|given data]]

but this is not very _operational_. Maybe it is what people mean in which
case, right, alpha is the expected value of p-value.

So, for a given experiment and hypothesis test, can draw a graph with alpha on
the X-axis and beta (probability of Type II error) on the Y-axis. Then the
graph shows beta as a function of alpha. So, typically the graph runs from
(0,1) to (1,0) and is convex. A _more powerful_ test is lower at each value of
alpha. A _perfect_ test is just the point at the origin. A _trivial_ test is
the one get by ignoring the real data and using a random number generator, and
here the curve is just the straight line from (0,1) to (1,0) and is useful for
some cases of interpolating when the available values of alpha are discrete,
e.g., in resampling plans for distribution-free statistics.

Right, beta and _power_ = 1 - beta don't have much to do with alpha or p-value
except for the point that for a given statistical test in a specific context,
e.g., distribution of the data and number of data points, there is that curve
I described that relate alpha and beta, and, thus, also 1 - beta, exactly. As
I outlined, a different statistical test, maybe parametric instead of
distribution-free, can have a different curve and more _power_ for a given
alpha.

~~~
nonbel
>"0.001[the p value] is the conditional probability of type I error
conditioned on the data"

From the paper: alpha= probability of type I error

So, there are two equations:

p value= p(alpha|data)

alpha = E[E[p value|given data]]

Substituting we get:

alpha = E[E[(alpha|data)|given data]]

What is the difference between "given data" and "data"? How do you isolate
alpha?

~~~
graycat
alpha is a number. Type I error is an event.

------
nartz
If i flipped a regular coin 1000 times, and happened to get 900 heads in a
row, you may think 'hey, something is wrong with this coin' \- however, could
you be sure something is up? How would you go about telling whether something
is really up, or if this is just a fluke chance this one time?

The motivation for the p-value is to tell you what the probability of a fluke
chance this would be.

The way to do better is to run the experiment over and over, flipping this
coin 1000 times again, and again. However, since science experiments are
extremely expensive, instead we only run them one time, and try to 'make sure
this wasnt some fluke chance'.

------
jowiar
That it's called a "p-value" is rather indicative of it's nebulous nature. For
something that is used in a rather applied context, giving it a one-letter
name that has zero context is a bit of a cop-out.

~~~
GFK_of_xmaspast
That's how basically everything in mathematics works tho.

------
lovboat
In order to use p-values properly, that is to make decision, you should
explain what are you going to do with the information that the p-value
provides. Rejecting the null hypothesis when the p-value < 0.05 is a sensible
thing, but making a strong decision when the p-value is < 0.0001 doesn't make
sense in many circumstances. You should have a scheme for the decision before
having the p-value.

------
leephillips
I'm surprised nobody has brought up the backlash against routine use of the
p-value among scientists and journals. At least one journal has actually
banned papers that report p-values:

[http://www.sciencebasedmedicine.org/psychology-journal-
bans-...](http://www.sciencebasedmedicine.org/psychology-journal-bans-
significance-testing/)

~~~
jsprogrammer
Leave it to psychology to ban pretty much the only scientific tool we have...

------
jdeisenberg
My attempt to describe it for psych research methods students - in visual
novel form: [http://evc-cit.info/psych018/hyptest/index.html](http://evc-
cit.info/psych018/hyptest/index.html)

------
myNXTact
Could someone grade the explanation I gave my non-technical boss last week?

In the context of a regression analysis, I said, "P-values indicate the chance
that the apparent effect of the variable is from random fluctuations in the
data instead of the variable itself."

~~~
analog31
While we're at it, let's try this one too: The p-value, _if correctly
computed_ , is the probability that data collected under different conditions
actually come from the same distribution. My definition sidesteps the word
"effect."

~~~
myNXTact
My two cents is that once you start talking about "distributions" you add a
layer of obfuscation that makes it a non-intuitive explanation.

~~~
analog31
Thanks. That's a good point. I wonder if there's a better term, or way of
explaining it.

------
nickpsecurity
My takeaway is to not depend on or use them wherever possible. If I do use
them, I get some heuristics from a survey of a bunch of statistics experts
claiming to know what it is. And I'll be sure to keep a minimal, survey size
of 30. ;)

------
cafebeen
This is a surprisingly meta article... It claims there's a general lack of
statistical understanding among experts, and then supports itself by making
sweeping generalizations from a tiny sample size!

------
pvaldes
I'm sure that there are lots of scientists that can easily explain p-values.

"Some scientists can not easily explain p-values" should be a more accurate,
and much better, title.

------
graffitici
From what I understand, p-values are typically used by frequentist method for
verifying that the "prediction" seems correct. On the other hand, the Bayesian
approach is to split the data into training and test sets, and verify how much
of it holds.

Does this make sense? If the p-values are not very good at conveying
"confidence in the prediction," is this another argument in favor of a more
Bayesian approach to statistics? Any thoughts would be appreciated!

~~~
gbrown
Well, neither your description of p-values, nor your understanding of Bayesian
inference is correct, so... no?

~~~
TeMPOraL
It would be helpful to tell how the commenter is wrong, and what is the
correct description.

~~~
gbrown
It would also be helpful if they'd read the article.

~~~
TeMPOraL
I read the article and it's almost completely orthogonal to the questions
asked by 'graffitici. There's nothing suggesting that they didn't read the
article, so now I assert that you have _three_ things to explain - where
they're wrong, what's the right answer to their question and _why on Earth did
you think they didn 't read the article_?

~~~
gbrown
From the article: "We want to know if results are right, but a p-value doesn’t
measure that. It can’t tell you the magnitude of an effect, the strength of
the evidence or the probability that the finding was the result of chance."

From the first sentence of the parent comment: "From what I understand,
p-values are typically used by frequentist method for verifying that the
"prediction" seems correct."

If my reply seemed glib, it was because I thought the question was so far off
base, that graffitici almost certainly hadn't read the article. To relate to
the topic under discussion, the distribution of possible comments produced by
people who have carefully read the article is extremely unlikely to result in
a post as extremely misinformed as the one above.

If there was a simple statistical misunderstanding, perhaps that would be
worth clearing up. Instead, the parent comment used some statistical words,
but was very nearly incoherent. Perhaps I shouldn't have said anything if I
wasn't willing to write a protracted essay about the differences between
frequentist and Bayesian inference.

~~~
TeMPOraL
> _To relate to the topic under discussion, the distribution of possible
> comments produced by people who have carefully read the article is extremely
> unlikely to result in a post as extremely misinformed as the one above._

I disagree. There's an alternative hypothesis that could explain a comment
like that - the author harbors some confused views about frequentist and
Bayesian approaches to statistics. I'm inclined to believe that this
hypothesis is right, because I recognize that comment as something I'd write
myself back when I was more confused about this topic. Reading the article is
unlikely to affect this particular issue.

> _Perhaps I shouldn 't have said anything if I wasn't willing to write a
> protracted essay about the differences between frequentist and Bayesian
> inference._

I think even one sentence explaining the gist of the author's confusion would
be enough. Plenty of essays have been written on the topic, but one needs to
be pointed in their general direction in order to benefit.

~~~
gbrown
That's a fair point, but I'm not sure I understand the gist of the author's
confusion. Any response I would craft would end up being a re-statement of the
usual one sentence definition of p-values and Bayesian probability, both of
which are already under discussion elsewhere in the comments.

------
timvdalen
Is the video broken for anyone else?

