Similarly, imagine a study of coins which had a stopping rule to stop whenever you have at least 60% heads. You'll always be able to get that result and conclude the coin is biased, even if all coins used are fair.
This is not true. Because of the law of large numbers, the probability of ever reaching the 60% decreases with time.
I do think the one that says "If I get one conclusion, stop. If I get the other, keep trying," has got to be a bad idea!
I understand what you're talking about. I see the potential for a problem. But my understanding is that Bayesian statistics isn't subject to that.
Proper Bayesian result reporting doesn't say "We believe that the coin is biased". We would rather say "The probability that this coin is biased is 60%, subject to our assumptions and model".
My feeling is that this statement is true:
If the model and assumptions are correct, then the Bayesian outcome will be true regardless of the stopping rule.
In this case: 60% of coins for which the Bayesian analyst proclaims P(biased)=0.6 WILL be biased (barring sampling variations). The stopping rule doesn't matter.
I'll try and figure out a solid explanation by tomorrow.
> Proper Bayesian result reporting doesn't say "We believe that the coin is biased". We would rather say "The probability that this coin is biased is 60%, subject to our assumptions and model".
I'm not really sure what you're getting at here. None of the coins are biased, by premise, so they shouldn't be concluding either thing.
If you throw in "if our model and assumptions are right" then you can shift the blame (if they assumed their stopping rule was OK, or came up with a model that says it's OK). But I'm not sure how that substantively helps.
Will check back tomorrow for further comments from you.
I've been thinking about the problem a lot today. I'm pretty sure that my point is basically right, if the model is correct, but my ideas are not clear enough to explain it properly. Model correctness in Bayesian statistics is a complicated problem, and as far as I can tell, it's not a completely solved one. Bayesians usually agree about their calculations, but there's heavy debate about the "philosophy".
In any case, maybe you'll find Eliezer's other post insightful:
I really hope to figure out model correctness, and this optional stopping problem looks a good vector of attack.
Thank you for the discussion, and sorry for leaving you hanging!
(if there's any Bayesian out there willing to continue the discussion, my email is in my profile)
For example of one that might seem bad, but does halt, and turns out to be OK:
Flip a coin until you have more heads than tails OR reach 500 flips.
This procedure will produce a majority of trials with more heads than tails, but I think the average over many trials will be 50/50. The conceptual reason is that stopping early sometimes prevents just as many heads as tails that would have come up after stopping. I haven't formally proved this but I did a simulation with a million trials with that stopping procedure and got a ratio of 1.0004 heads per tails which seems fine (and after some reruns, I saw a result under 1, so that is possible). Code here:
With a guaranteed halt, a sequence of 500 tails and 0 heads can be counted. With no guaranteed halt, it's impossible to count a tails heavy sequence, which is not OK because it's basically ignoring data people don't like.
Does that make sense? I think it may satisfy the stuff you/Bayesians/Eliezer are concerned with. It means it's OK to stop collecting data early if you want, but you do need some rules to make sure your all your results are reported with no selectivity there.
There's also a further issue that these kinds of stopping procedures are not a very good idea. The reason is that while they are OK with unlimited data, they can be misleading with small data sets. It's like the guy who bets a dollar, and if he loses he bets two dollars, and if he loses again he bets 4 dollars (repeated up to a maximum bet of 1024 dollars). His expectation value in the long run is not changed by his behavior but he does affect his short term odds: he's creating an above 50% chance of a small win and an under 50% chance of a larger loss. If you only do 10 trials of this betting system, they might all come out wins, and you've raised the odds of getting that result despite leaving the long term expectation value alone. Doing essentially the same thing with scientific data is unwise.
BTW/FYI I believe I have no objections to the Bayesian approach to probability but I do think the attempt to make it into an epistemology is mistaken (e.g. because it cannot address purely philosophical issues where there's no data to use, so it fails to solve the general problem in epistemology of how knowledge (of all types) is created.)