I’m assuming the question we are trying to answer is not “which H_n is most prob...

nonbel · on April 24, 2018

>“how safe is it to conclude that H_0 is not true”

I would just take "very safe" as a principle, there is even the truism "all models are wrong".

>"the difference between two groups is less than a certain amount or greater than a certain amount"

You are ignoring a lot of the model being tested here (eg, normality, independence of the measurements, etc) and only considering one parameter.

tprice7 · on April 25, 2018

“how safe is it to conclude that H_0 is not true”

What I meant here by H_0 was the hypothesis that the difference between groups is less than some particular threshold. I think if you made the threshold large enough then it would not be safe to conclude that H_0 is not true.

"You are ignoring a lot of the model being tested here (eg, normality, independence of the measurements, etc) and only considering one parameter. "

I said enough so that if you were arguing in good faith you could fill in the gaps yourself.

nonbel · on April 25, 2018

I am arguing in good faith. There may be some cases in physics where there is some theoretical distribution derived related specifically to the problem, and they believe the model to be actually 100% true. Otherwise, the model should be assumed to only be an approximation at best.

tprice7 · on April 25, 2018

Yes, the t-test does assume normality and you can never be sure of perfect normality if that's what you are getting at (although I believe that simulation tests of the robustness of the t-test against deviations from normality generally show that this isn't too much of a practical concern). I wasn't trying to address every potential weakness with the t-test (or p-values in general); I was addressing the one stated in the article.

nonbel · on April 25, 2018

I'm saying that unless you have bothered to derive a statistical model from your theory, and you believe that theory may actually be correct, then you know that you will reject your model if enough time/money is spent on testing it.

tprice7 · on April 25, 2018

Ok, I will assume that by "model" here you mean a probability distribution on the parameters relevant to your experiment. In that case I agree with what you just said: knowing exactly the correct model is impossible in a similar way that knowing someone's height to an infinite degree of precision is impossible. But I never said anything to the contrary. The H_0 I gave corresponds to an infinite set of models and not a single one (note that I said the difference is less than a certain threshold, not that the difference is 0 (although it still would be an infinite set in that case, but the probability would be 0)).

tprice7 · on April 25, 2018

And in anticipation of the rebuttal that the probability is still 0 because it's never exactly normal, what I really meant originally but didn't write out explicitly for brevity and because I assumed it would be implicit: when I talk about giving an upper bound on p(null | data) from p(data | null), what I really mean is giving an upper bound on p(null | data, normal) from p(data | null, normal) where normal is the assumption that the distribution of whatever parameter we are looking at is normally distributed and null is the event that the difference in means between the two groups we are looking at it is less than some predetermined positive threshold. Or, for a 1-sample test, that the mean of a single group is within that threshold of some default value.

nonbel · on April 25, 2018

If you write out the actual calculation you will see normality (which was just one example of an assumption) is actually part of the null model being tested. It is not something different or outside of it.

tprice7 · on April 25, 2018

That is just a trivial semantics issue, and yes I am familiar with the calculation.

Where do you think the flaw is specifically? Say we are doing a 1-sample test.

0. (setup) Suppose we have a real number mu and a positive epsilon. Define the interval I as [mu – epsilon, mu + epsilon]. For each “candidate mean” within this interval, we have a corresponding t-statistic. Let the statistic t_0 be the inf of all these t-statistics. Let T be the event that t_0 is at least as big as the observed value.

1. You can use Student’s t-distribution to compute an upper bound for the probability of T under the assumptions that the observations are iid normal and the mean lies in I. I will call this probability p(T | null, normal, iid), where “null” is the event that the mean exists and is in I. It makes no difference that it is more typical to lump these assumptions together as “null” because in math you can define things however you want as long as you are consistent.

2. We have that p(null | T, normal, iid) = p(T | null, normal, iid) * p(null | normal, iid) / p(T | normal, iid).

3. Therefore, if we have an upper bound for x = p(null | normal, iid) / p(T | normal, iid) then we can get an upper bound for p(null | T, normal, iid). That is my main claim.

Which of the above statements do you object to?

nonbel · on April 25, 2018

>"You can use Student’s t-distribution to compute an upper bound for the probability of T under the assumptions"

I'm not sure what you are arguing anymore. I am saying you will never test a parameter value in isolation, it is always part of a model with other assumptions. There is simply no such thing as testing a parameter value alone. To define a likelihood you need more than simply a parameter...

You seemed to be disagreeing with that, but are now acknowledging the presence of the other assumptions.

tprice7 · on April 25, 2018

“I'm not sure what you are arguing anymore.”

It’s the claim I make in 3, and then the secondary claim that making our upper bound on p(null | T, normal, iid) small for significant p-values (i.e. p(T | null, normal, id)) could be used as a criterion for whether our threshold for statistical significance is small enough.

“You seemed to be disagreeing with that”

I’m not sure what I said that gave that impression. I didn’t mention anything about the normal / iid assumptions initially not because I thought we weren’t making these assumptions but because I didn’t think these details were essential to my point.

tprice7 · on April 25, 2018

"Let the statistic t_0 be the inf of all these t-statistics. Let T be the event that t_0 is at least as big as the observed value." Oops, I meant the inf of their absolute values, and T is the event that t_0 is at least as big in absolute value as the observed value.

tprice7 · on April 26, 2018

Also, every probability mentioned should also include in the list of conditions that the number of samples observed matches our experiment.

nonbel · on April 25, 2018

>"The H_0 I gave corresponds to an infinite set of models and not a single one"

How do you calculate a p-value based on this infinite set of models? Normally it is done using just one.

tprice7 · on April 25, 2018

Ok for simplicity let’s assume this is a 1-sample test. So there is a certain “default” mean, say mu, and we are concerned with whether the mean of some random variable on whatever population we are sampling from is within, say, epsilon of this default value. For every number in [mu – epsilon, mu + epsilon] we can get a p-value giving the probability that we would have observed the data if this was the true mean. In order to get a probability that we would have observed the data given that the mean was somewhere in this interval, we need some prior distribution on the means in this interval, which we don’t have. However, we can just take the sup of all the p-values for each mean in this range to get an upper bound. (I think this is similar to how confidence intervals work also but take that with a grain of salt)

nonbel · on April 25, 2018

Looking closer at this you are describing one model but testing different values of one of the model parameters.

tprice7 · on April 26, 2018

“Looking closer at this you are describing one model but testing different values of one of the model parameters.”

I am inferring from context that this is probably supposed to be a criticism but it doesn’t make much sense to me. Of course we have to consider different values of the mean, the whole point is to get a p-value corresponding to a range of potential different means.

But anyways, I do think that an explanation based on the t_0 statistic I defined in my other post is better.

1. We can define a statistic t_0 that is the infimum of the absolute value of all t-statistics for every candidate mean in the interval.

2. Suppose the mean is some value mu’ in the interval. Whenever the t_0 statistic is at least as big as its observed value, the t-statistic corresponding to mu’ is also at least as big in absolute value as the observed value of t_0, by definition.

3. So we can give an upper bound for the probability of the former event by the probability of the latter event. But the probility of the latter event is the same no matter what mu’ is, and can be computed using Student’s T distribution.

4. Therefore, we have an upper bound for the probability of t_0 attaining a value at least as big as the observed value, assuming that the mean is somewhere in the specified interval (plus the other standard assumptions). This is the p-value.

Do you agree with those assertions? If not, specifically where is the problem?

tprice7 · on April 26, 2018

“Looking closer at this you are describing one model but testing different values of one of the model parameters.”

Furthermore, this is even done with the usual 1-sample t-test because the variance can be anything.

tprice7 · on April 25, 2018

By the way, it's not normally done using one either. With a 1-sample t test for example, the null hypothesis is that the underlying distribution has a certain prescribed mean, but the variance can be anything.

nonbel · on April 25, 2018

Ok, you seem to accept that there is an assumption that the data is generated by a distribution with a mean, so start with that. This is not necessarily true: https://en.wikipedia.org/wiki/Cauchy_distribution

I use that only as an example. If you look closer you will find many other assumptions being made as well that are used to derive the actual calculation (for whatever statistical test you choose to look at).

tprice7 · on April 25, 2018

I've already addressed that. See "in anticipation of the rebuttal" post.