
When Is A Small Sample Really A Small Sample? - sant0sk1
http://mattmaroon.com/?p=520
======
waldrews
As a practicing statistician, I have to whine about this.

The issue is selection bias; the four observed events are not independent
random samples from the same Bernoulli distribution, but the events about
which we have learned because they were interesting enough to write about.
Unless we are willing to model the process by which people choose to write
about their experiences, which is quite infeasible in this case, we can make
no inferences.

~~~
mattmaroon
Well, the whole thing began here. I made a joke about the title of a post
"Macbook Brick" which admittedly is much likelier for someone who has observed
a lot of failures to do, and then we devolved into a discussion about sample
sizes.

I was more intending to point out that small sample sizes can prove things
with a high degree of certainty sometimes. It's true though that how they are
collected matters much.

------
sanj
The issue isn't whether the sample is small. The issue is whether it is
random.

~~~
greatreorx
So if we had 4 purely random macbooks, and 3 of them failed, then _that_ would
be a meaningful indication that the failure rate is higher than 10%?

The issue is both small sample and lack of randomness.

~~~
mattmaroon
Yes, it would be a meaningful indication. I'll grant the lack of randomness in
the sample, but the math is correct.

------
icey
How can anyone really take Matt's opinion about anything Apple or Microsoft
related seriously?

Here are his opinions:

Apple: Bad

Microsoft: Good

It's a classical case of noticing that certain things are bad because you
already think they're bad. That doesn't make things statistically significant
at all. It's just observation bias.

~~~
mattmaroon
It's more like

Apple: over-hyped Microsoft: under-appreciated

Not about bad or good. Both companies are both.

~~~
olefoo
Bah. Enough with the false dualism, there are plenty of other operating
systems and software vendors out there; most of them speak standard protocols.

Here is an experiment you should try, go down to your local public library and
do a quick census of who is running what on their own computers. You may be
surprised at the diversity that is actually out there.

------
mrtron
The first thing you learn about statistics is that you can do all the analysis
in the world, but if your basic data is flawed then all of your conclusions
are useless.

You state this 3/4 number like it is a hard fact, however it is a very biased
sampling. It involves YOU, who obviously has a problem with your laptop or
wouldn't be writing this article. So, clearly the probability your laptop is
damaged is 1. Then you add in 3 friends, who you probably chose because 2 of
them have problems.

Let me put it this way. I have a Macbook that isn't broken, and know 3 other
people who all bought Macbooks about the same time as me and none of us have
had problems. Just like your scenario, there is no reasonable conclusion I can
make from that.

However, if I was to ask for the first 4 people who own a mac to respond if it
is currently working or not, that would be a (decently) random way of
collecting data.

Finally let's throw in user problems. Just because you have friends that have
visited the Apple store does not mean that their device is a failure. Factor
in user problems like disabling wireless accidentally and being unaware how to
start it, and user's fault like dropping your laptop or pouring water on it.
All events your friends could have a reason to lie to you about the real
reason for their visit.

~~~
teej
Basically, he has a SMALL sample, but not a RANDOM sample.

------
helveticaman
"Just out of curiosity (and my own mathematical ineptitude) I wrote a quick
PHP Monte Carlo simulator to goof around with the numbers, and it pretty much
just confirmed Baye’s Theorem exactly. You can snag the source for it here.

And yes, I’m aware I’m an awful programmer."

Did you learn to hack recently? What's the story?

------
greatreorx
"If I put the purchases of hundreds or thousands of them in chronological
order, then found 4 in a row that had failed, it wouldn't mean much (like the
Aces example)."

But this is exactly what happened to your friend. You can't ignore the
thousands of other Macbooks that were sold to calculate one number, but then
try to apply that rate to those thousands.

"If I have a sample of only 4, and 3 failed, that means quite a bit more."

That means quite a bit more for that specific sample of 4 Macbooks - but no
one cares about the failure rate for your 4 specific Macbooks. You proved that
failure rate for those 4 Macbooks was exactly 75%. Nothing more, nothing less.

~~~
byrneseyeview
_You proved that failure rate for those 4 Macbooks was exactly 75%. Nothing
more, nothing less._

Since you're being pedantic, I should point out that he's also proved that the
_total_ failure rate is at least three per however-many-Macbooks-there-are.

------
hhm
Just a small typo: it's Bayes Theorem, not Baye's Theorem.

~~~
parenthesis
Bayes' Theorem.

~~~
hhm
That's right, I tried to write that on the wrong keyboard.

