Changing stopping rules after seeing the data creates bias/distortions. One does have to set the rules in advance, that's not a mistake.Consider a data set consisting of a single license plate number. (This example is from Richard Feynman.)If you set the rules in retrospect, you can go "Wow, what were the odds my one license plate would be XJKDL-2342-KE? One in a million?" But that's wrong.On the other hand if you predicted XJKDL-2342-KE in advance, then the same data point would have a different meaning. How did you predict it?Patterns that you can predict in advance are different from ones you can find in retrospect after looking through whatever results you get. So the same data point -- XJKDL-2342-KE -- can take on different meaning depending on the original intent and design of the experimenter.People make this mistake all the time with more mundane examples. Like they will roll snake eyes three times in a row, then calculate the odds of that happening, and then says "wow 1/6^6, there was such a minuscule chance i'd get screwed like this". but they're just wrong. ANY exact ordering of the 6 individual dice roles has a 1/6^6 chance of happening, and you have to get one of the "unlikely" results.To help make this more intuitive, consider that they would have been surprised by rolling all 2s, or all 3s, or various other patterns. So you at least would have to figure out how many outcomes they'd deem surprising and figure out what proportion of the possibilities are in that category. And then take into account all the rolls they made when they weren't surprised and didn't record any data...

 Thank you for your comments - there's a lot of issues about this problem that I'm not entirely comfortable with.With that said, I'm not sure that I see the connection between what you're arguing and the significance problem in the original. What do you think of the example with the two doctors? http://lesswrong.com/lw/mt/beautiful_probability/Changing stopping rules after seeing the data creates bias/distortionsWe're talking about a fixed stopping rule, which depends on the data.
 It's getting too late for me to think about statistics :) A few points:Similarly, imagine a study of coins which had a stopping rule to stop whenever you have at least 60% heads. You'll always be able to get that result and conclude the coin is biased, even if all coins used are fair.This is not true. Because of the law of large numbers, the probability of ever reaching the 60% decreases with time.I do think the one that says "If I get one conclusion, stop. If I get the other, keep trying," has got to be a bad idea!I understand what you're talking about. I see the potential for a problem. But my understanding is that Bayesian statistics isn't subject to that.Proper Bayesian result reporting doesn't say "We believe that the coin is biased". We would rather say "The probability that this coin is biased is 60%, subject to our assumptions and model".My feeling is that this statement is true:If the model and assumptions are correct, then the Bayesian outcome will be true regardless of the stopping rule.In this case: 60% of coins for which the Bayesian analyst proclaims P(biased)=0.6 WILL be biased (barring sampling variations). The stopping rule doesn't matter.I'll try and figure out a solid explanation by tomorrow.
 FYI I edited my post to mention the issue about the coins (your first point) shortly after submitting it. I'm guessing you read the non-edited version.> Proper Bayesian result reporting doesn't say "We believe that the coin is biased". We would rather say "The probability that this coin is biased is 60%, subject to our assumptions and model".I'm not really sure what you're getting at here. None of the coins are biased, by premise, so they shouldn't be concluding either thing.If you throw in "if our model and assumptions are right" then you can shift the blame (if they assumed their stopping rule was OK, or came up with a model that says it's OK). But I'm not sure how that substantively helps.Will check back tomorrow for further comments from you.
 xenophanes,I've been thinking about the problem a lot today. I'm pretty sure that my point is basically right, if the model is correct, but my ideas are not clear enough to explain it properly. Model correctness in Bayesian statistics is a complicated problem, and as far as I can tell, it's not a completely solved one. Bayesians usually agree about their calculations, but there's heavy debate about the "philosophy".In any case, maybe you'll find Eliezer's other post insightful:http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...I really hope to figure out model correctness, and this optional stopping problem looks a good vector of attack.Thank you for the discussion, and sorry for leaving you hanging!Cedric(if there's any Bayesian out there willing to continue the discussion, my email is in my profile)
 I was thinking it through more and I think it's the stopping procedures that might not halt that are the problem. You can have a data dependent stopping procedure if it's guaranteed to halt which makes sure all data does get counted.For example of one that might seem bad, but does halt, and turns out to be OK:Flip a coin until you have more heads than tails OR reach 500 flips.This procedure will produce a majority of trials with more heads than tails, but I think the average over many trials will be 50/50. The conceptual reason is that stopping early sometimes prevents just as many heads as tails that would have come up after stopping. I haven't formally proved this but I did a simulation with a million trials with that stopping procedure and got a ratio of 1.0004 heads per tails which seems fine (and after some reruns, I saw a result under 1, so that is possible). Code here:http://pastebin.com/H42qHYbAWith a guaranteed halt, a sequence of 500 tails and 0 heads can be counted. With no guaranteed halt, it's impossible to count a tails heavy sequence, which is not OK because it's basically ignoring data people don't like.Does that make sense? I think it may satisfy the stuff you/Bayesians/Eliezer are concerned with. It means it's OK to stop collecting data early if you want, but you do need some rules to make sure your all your results are reported with no selectivity there.There's also a further issue that these kinds of stopping procedures are not a very good idea. The reason is that while they are OK with unlimited data, they can be misleading with small data sets. It's like the guy who bets a dollar, and if he loses he bets two dollars, and if he loses again he bets 4 dollars (repeated up to a maximum bet of 1024 dollars). His expectation value in the long run is not changed by his behavior but he does affect his short term odds: he's creating an above 50% chance of a small win and an under 50% chance of a larger loss. If you only do 10 trials of this betting system, they might all come out wins, and you've raised the odds of getting that result despite leaving the long term expectation value alone. Doing essentially the same thing with scientific data is unwise.BTW/FYI I believe I have no objections to the Bayesian approach to probability but I do think the attempt to make it into an epistemology is mistaken (e.g. because it cannot address purely philosophical issues where there's no data to use, so it fails to solve the general problem in epistemology of how knowledge (of all types) is created.)
 You're stuck in frequentist thinking. "Bias" is a property of repeated sampling -- the expectation over repeated samples. But we just have one! The relevant question is what is your best guess for p, the probability the coin will be heads.Under a uniform prior [0, 1] the posterior mean is the empirical mean. How you sample is of no consequence. The likelihood/posterior f(p|#heads, #tails) is p^(#heads)(1-p)^(#tails) regardless of how you sample. Differentiate with respect to p and you get p*=heads/total.It is rather amusing that most statistics professors are happy to have taught their students that the sampling procedures matter while at he same time crushing the natural intuition that your decisions should be based on the data you observe not on what might have happened in a world that doesn't exist.
 Consider an infinite string of coin flips. Now consider a subset selected by a stopping rule to meet a particular criterion. And a different subset chosen with an N=100 criterion. The first stopping rule creates a bias: you have a non-random sample chosen to meet that criterion. The second stopping rule doesn't do that, it gets what we call a "random sample".If someone then takes your dataset and assumes it's a random sample -- e.g. just the same as the N=100 doctor trial -- he's wrong. It's not, it's something else, and that something else is less useful.You say "how you sample is of no consequence". But suppose your sampling method selectively throws out some data that it doesn't like. That is of consequence, right? So sampling methods do matter. Now consider a method which implicitly throws out data because some sample collections are never completed. That matters too.
 Yes, clearly. I stated that too strongly. Sampling procedures can definitely matter enormously, but stopping rules are within a class of ignorable rules. The link above gives a more precise definition.
 I think that you are mostly right about halting (guaranteed) stopping rules.See my other comment, up a few times then down the other branch, the one with the pastebin code.However the example with the two doctors was not the halting type.Can you agree to that? Or do you have a defense of non-halting stopping rules, even though they are incapable of reporting some data sets?I think I figured this out but would be interested in criticism on this point if not. Is there some way of dealing with non-halting that makes it OK?The book says if there's a stopping rule then inferences must depend only on the resulting sample but that assumes there is a resulting sample -- that the procedure halts.
 Off-topic, but what happened to statsia? I was curious to see what you were working on.
 Website is down but the project continues. Email beta@statsia.com if you want to be put on our insider's list.
 Changing stopping rules is perfectly fine in a Bayesian setup. It's the likelihood principle. This is one of the central differences that arises when you condition on the data (Bayesian) rather than the parameter (frequentist).

Search: