What matters is how likely it was that the result you got simply arose by chance. This can only be measured relative to some a priori probabilistic model. If the odds of the result having arisen by chance (the "null hypothesis") is low, then you can confidently "reject the null hypothesis" and conclude that something else must have happened. Of course, the statistics can't tell you what that "something else" was. That requires an explanatory theory.
The problem is that the probabilistic model is also kind of arbitrary. For example, suppose you flip a coin ten times and it comes up heads every time. That odds of that happening by chance are, naively, one in 2^10. But that is only true if you only do the experiment once. If you have, say, 2^8 people flipping coins, then the odds of someone seeing 10 heads in a row is one in four. Parapsychology experiments and stock-trading schemes often fall into this trap.
This is a general problem with "big data". The more data you have, the more likely you are to see things happen by pure chance that are intuitively unlikely.
And I disagree that the number of subjects doesn't matter. It matters enormously, precisely because this is a biology experiment.
It's ironic that you say this so definitively when you were the one asking for help interpreting the statistics originally.
Lisper is completely correct. Given a large enough effect size, detecting significant differences in a small population is very possible.
The statistics are sound. The assumption we should be questioning is where the subjects came from and if they are actually representative of the population that we are extrapolating this result to.
The correct interpretation is that the model, the foundation of the null hypothesis, didn't fail yet.
Part of this model is backed by more experiments: "The reason that we focused on SWA is that it is the only sleep characteristic that reflects the depth of sleep" .
The model doesn't consist of a single variable. 11 people choosing the same 7 numbers out of 49 by chance is rather unlikely. The null hypothesis would include that there are only 11 people picking, that they don't cheat, and that random chance is indeed a thing. If now 11 people would indeed all choose the same, then the experiment could be repeated, e.g. to show that they are cheating or to increase the significance.
No. If I advance the hypothesis that reciting the Kama Sutra backwards will make you grow a third arm, then a single subject who recites the Kama Sutra backwards and shortly thereafter grows a third arm would be a statistically significant result, because the odds of someone growing a third arm by chance are quite small.
you're asking for people with deeper knowledge than you for their help, and then disagreeing with them?
I get it -- you're skeptical because of sample size. But recall that "sample size" isn't necessarily the number of test subjects, but the number of measurements, as the comment about Mercury demonstrates.
Moreover, a strong signal can be detected even from a small sample size; a coin flipping the same way 11 times in a row could be mere chance, but a roulette wheel hitting the same number 11 times in a row should not happen in the entire history of the universe by mere chance. It's not just the number of measurements, but the significance of each one, that matter for statistical confidence.
Sure, you could get an even stronger signal with a larger sample size. But that doesn't mean "a study of 11 people" is necessarily insignificant or too small of a sample. It might be too small, or it might be enough to have a very high degree of certainty.
I was talking about this experiment, the one the article is about, the one which I linked in my comment. The one which is apparently attracting modestly widespread media coverage.
The analogies are meant to demonstrate that sample size is (1) not as straightforward as asking "how many subjects" and (2) less important in circumstances where the signal is strong.
In other words, we're trying to give you the tools to evaluate for yourself. The initial article is about "an experiment involving 11 people", and your skepticism seemed to be about whether it's even possible to get valid results with that sample size (answer: yes). Even in this study, yes, you can get a valid result from "only" 11 people.