Hacker News new | comments | show | ask | jobs | submit login

Consider an infinite string of coin flips. Now consider a subset selected by a stopping rule to meet a particular criterion. And a different subset chosen with an N=100 criterion. The first stopping rule creates a bias: you have a non-random sample chosen to meet that criterion. The second stopping rule doesn't do that, it gets what we call a "random sample".

If someone then takes your dataset and assumes it's a random sample -- e.g. just the same as the N=100 doctor trial -- he's wrong. It's not, it's something else, and that something else is less useful.

You say "how you sample is of no consequence". But suppose your sampling method selectively throws out some data that it doesn't like. That is of consequence, right? So sampling methods do matter. Now consider a method which implicitly throws out data because some sample collections are never completed. That matters too.

Yes, clearly. I stated that too strongly. Sampling procedures can definitely matter enormously, but stopping rules are within a class of ignorable rules. The link above gives a more precise definition.

I think that you are mostly right about halting (guaranteed) stopping rules.

See my other comment, up a few times then down the other branch, the one with the pastebin code.

However the example with the two doctors was not the halting type.

Can you agree to that? Or do you have a defense of non-halting stopping rules, even though they are incapable of reporting some data sets?

I think I figured this out but would be interested in criticism on this point if not. Is there some way of dealing with non-halting that makes it OK?

The book says if there's a stopping rule then inferences must depend only on the resulting sample but that assumes there is a resulting sample -- that the procedure halts.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact