

The Pattern-Seeking Fallacy - cwilson
http://blog.asmartbear.com/pattern-seeking-fallacy.html

======
bjplink
As a sports fan, I have a problem with this statement: '...picking the number
"7" implies that number 8 got on base.'

Most of the time, at least when it comes to sportscasting, picking the number
7 implies that's the total number of batters faced up to that point. You don't
say a batter has gone 2-for-3 through 6 innings when they really went 2-for-4.
When a guy is doing baseball play-by-play he doesn't say the "last 5 of 7
batters" and mean there were more than seven total and he's just choosing not
to tell you about them.

Look at the kid who had a hit in every college baseball game he played this
season. His hit streak was 56 games long before his team was eliminated from
the post-season. If his streak ends during the first game of next season, they
won't go on Sportscenter the next day and say he's gotten a hit in 56 of his
last 56 games and try to hide the fact that he didn't get a hit in game #57.

Last night, during the Celtics-Lakers, game they mentioned that Ray Allen was
0-16 on three point attempts since his last make and that he was cold. No one
would sanely argue he wasn't having a tough time shooting but I'm sure a math
nerd could explain this away with a large enough sample size of shot attempts
over a longer stretch of time then one playoff series.

But the point of a streak is that it's based on a finite set of recent
attempts. No one is saying Ray Allen is statistically a bad shooter (he's
actually one of the best in the history of the game) but you can't argue that
in recent games he's shooting poorly from distance. It doesn't get much worse
than making 0% of your shots.

~~~
dionidium
_No one would sanely argue he wasn't having a tough time shooting but I'm sure
a math nerd could explain this away with a large enough sample size_

That's exactly what I would argue. I think the math nerd is right. Given
consistent shot mechanics, we'd expect occasional "streaks". Unless you're
proposing that circumstances may change the shooter's mechanics, I don't get
what you're saying.

 _You don't say a batter has gone 2-for-3 through 6 innings when they really
went 2-for-4_

You're right that they don't do that sort of thing about a single game, but
they say stuff like "last 5 of 7 batters" to refer to streaks _across_ games
all the time, which makes the example I quoted a straw man.

~~~
bjplink
The math nerd would be right Ray Allen is a good shooter. You can look over
the course of his career and verify that. But that's not the issue with
streaks in sports. The issue is that Ray Allen isn't a good shooter right now.

The idea that you "expect" something doesn't diminish its importance;
especially when the Celtics need Ray Allen to hit threes to improve their
chances of winning. They don't care that he's hit them in the past because
previous scores don't count in current games.

~~~
dionidium
I think you're missing my point. The question is about _why_ he is missing so
many shots in this game. Is it because he's _cold_ or is it for the same
reason that flipping a coin sometimes results in a series of heads? When heads
comes up ten times in a row, you don't say that the coin isn't a 50/50 coin
_right now_.

------
util
"The fallacy is that you're searching for a theory in a pile of data, rather
than forming a theory and running an experiment to support or disprove it."
But looking at data is an important step in forming new hypotheses --
<http://en.wikipedia.org/wiki/Exploratory_data_analysis> . You may just want
to then gather more data to independently check on your ideas. (And to go back
to the runs example, isn't the problem that the announcers weren't willing to
seriously consider an alternate hypothesis, ie, they weren't doing enough
simultaneous estimation?)

"Instead of running multiple AdWords variants each against multiple landing
page variants each feeding a different website funnel, run just one experiment
at a time, one variable at a time." I think this is bad advice. If there's an
interaction between your variables, this will lead you to totally miss it.
Even without interactions, it can be a more efficient use of resources to
estimate the effects of multiple variables at a time:
<http://en.wikipedia.org/wiki/Factorial_experiment>

Also, in this context, it seems worth looking at
[http://www.stat.columbia.edu/~cook/movabletype/archives/2008...](http://www.stat.columbia.edu/~cook/movabletype/archives/2008/03/why_i_dont_usua_1.html)

------
tyn
"If you torture your data long enough they will confess to anything"

~~~
alex_stoddard
Great adage. It is attributed to economist Ronald Coase.

(I also enjoyed seeing it quoted here grammatically correctly, the word data
is plural, datum being the singular. Wikipedia has it grammatically incorrect
but as Coase never published the phrase who knows if his grammar was as good
as his maxim).

~~~
jessriedel
MW says that the singular usage is standard now:

>USAGE. "Data" leads a life of its own quite independent of datum, of which it
was originally the plural. It occurs in two constructions: as a plural noun
(like earnings), taking a plural verb and plural modifiers (as these, many, a
few) but not cardinal numbers, and serving as a referent for plural pronouns;
and as an abstract mass noun (like information), taking a singular verb and
singular modifiers (as this, much, little), and being referred to by a
singular pronoun. Both constructions are standard. The plural construction is
more common in print, perhaps because the house style of some publishers
mandates it.

------
Jun8
"Turns out players are not streaky; simply flipping a coin produces the same
sort of runs of H's and M's. The scientists gleefully explained this result to
basketball pundits; the pundits remained non-plussed and unconvinced.
(Surprised?)"

Yes, I am surprised. The article makes good points but _its_ fallacy is
applying mathematical abstractions to the real world. Now, there are real-
world phenomena that are closer to the assumptions made by the dependency
assumption in the coin tossing (Bernoulli) trials, e.g. rolling real dice.
However, player performance is complex and _does_ depend on previous history;
for example a player who did badly in the previous game or missed an important
shot will be under pressure this time, which will affect his performance.

The model in this case is too simplistic, a Markov model may be a better
approach perhaps. However, estimating the Markov order of a given time series
is a hard problem.

~~~
powrtoch
I haven't checked their research myself, but I feel like you're missing the
point.

It doesn't matter that you can argue about different pressures, or about their
confidence level, or anything else.

They ran the numbers. The tests came back "random". They generated definitely
random data and people labeled it "streak".

The fact that you can come up with reasons why players ought to be more
streaky than random trials is irrelevant. We ran the numbers, and they aren't.

------
jessriedel
This is really not the best way to think about this problem. The issue is not
that it's necessarily bad to test many hypotheses, it's that testing more
hypotheses requires more data. For instance OP says

>Instead of using a thesaurus to generate 10 ad variants, decide what pain-
points or language you think will grab potential customers and test that
theory specifically.

In other words, he says test one hypothesis rather than ten. But there's
nothing wrong with testing ten variants so long as you have enough data to
ensure that the chance that _any_ of the variants produce a false positive
because of statistical fluctuation is small.

Given a fixed, finite amount of data, you can do some pretty simple statistics
to find out exactly how many hypothesis you can test at a given confidence
level.

------
Xurinos
Boiled down, isn't this just, "Choose the right sample size"? The article does
not point to ways to solve the real problem.

How do I know that 1000 coin flips is the right number? It seems like an
arbitrary round number; how do I derive the right number of coin flips to get
the 50% probability? Or maybe a better way of asking it would be: How many
coin flips do I need in order to guarantee 50% heads with 4% deviation?

It was good to point out the fallacy. I just want the second step of how to
avoid the fallacy.

~~~
smartbear
No this does NOT have to do with sample size.

It's about picking a theory, then testing that theory only, rather than trying
to seek a theory in data that already exists.

At the end of the article it explicitly says this and gives three specific
ways to solve the problem.

~~~
Xurinos
You're not wrong. I get the initial thrust. The problem is in the solutions.

The measuring of those things at the end could very well fall into the realm
of "streaks", too, unless we have good consideration of sample size. To use
one of your examples:

Let's say that I do 10 coin flips to determine bias, and I see those 10 heads
in a row. Now I want to test this situation further. Is my coin biased? I run
the test 10 more times. What happens if I get a "streak" of 10 heads again?
Have I verified my claim of bias? No! I just have not flipped my coin a
sufficient number of times.

From this I see two things: One, an interesting anomaly arose that suggested I
need to look into things a bit more. My theory is that the coin is biased
towards heads.

Two, to solve the problem, I need to narrow down variables where I can (you
mention) and perform the test enough times (you did not mention). We know that
10 is not enough. How do we know this? 1000 is arbitrary. What is the right
number?

In other words, I agree with the fallacy you pointed out. The solutions are
just insufficient. The fallacy can continue with the right probability of
occurrences, even if someone tried to narrow down the variables.

~~~
metellus
What you're talking about is called hypothesis testing. Basically, you can
figure out exactly how unlikely a given event is under a certain set of
assumptions (in this case, how unlikely it would be for a fair coin to show
heads 10 times in a row). If it is sufficiently unlikely, where you and people
reading your work decide what "sufficiently unlikely" means, you can claim
that your initial assumptions were incorrect.

I've been away from statistics for too long to go into specifics, but you can
mathematically determine how large a sample size you need to be X percent
confident that something weird is happening. I'd suggest reading
<http://en.wikipedia.org/wiki/Hypothesis_test> for more info.

------
tjmaxal
Epidemiology regularly encounters this problem because you can't experiment on
humans(you can't hurt them) Epidemiology relies on mountains of data to test a
hypothesis. The problem is when you are asking the data to tell you about
something exceedingly rare. For example say you think a rare brain cancer is
caused by high levels of exposure to gamma radiation plus a genetic
predisposition. Because genetic testing is relatively new, and exposures to
gamma radiation at high levels are rare. You might come up with so much noise
in your data that all results are meaningless.

This is one reason there is very little progress made on rare diseases and
conditions.

Of course the same problem occurs when something is extremely common.

------
igrekel
Not too sure how to place this article. While I understand and have seen this
fallacy widespread and know that people have a hard time dealing with it even
if they know about it (I used to work for the lottery and casino industry). I
am not too sure I have noticed the kind of site statistics analysis described.
Sure there were many A/B testing related articles recently and I have not read
all of them but is it what he is referring to?

------
LargeWu
The problem with the analysis is that it's treating shooting a basketball as a
stochastic process, but it isn't. Flipping a coin is random, while shooting a
basketball is a skill that requires athletic ability. The analysis treats all
shots the same, when in fact sometimes a player may be off balance, or rushed,
or tired, or any other number of physical and/or mental factors than cannot be
accounted for simply by check marks.

Anybody who has ever played sports in any sort of competitive environment has
had moments when they are "on", whether it's basketball, golf, curling,
whatever. Movement becomes easier. Shooting feels effortless. The game "slows
down". The converse also happens, when everything feels labored and
overwhelming.

Furthermore, the analysis focuses on "streak" in a very literal sense, as in
runs of the same outcome that end when another outcome occurs, but the term
used by sports fans often applies in a broader sense. One typically refers to
streaky players as those who tend to be either very productive or very
unproductive for extended periods of time, such as a game or a half, or maybe
even over the course of several games.

~~~
btilly
Actually the analysis didn't focus on streaks. It focussed on the correlation
between the outcome of one shot and the outcome of the next. Even a slight
tendency towards streaks in your sense should show up in sharp relief. Yet it
didn't.

Yes, I know how it feels. I also know the tricks that memory can play. I also
know how much we tend to see patterns. And I know about statistics.

Of all of those, I trust statistics the most.

~~~
LargeWu
Fair enough. I only had time to glance at the original analysis, and I
definitely saw "test of runs" in there, so that's what I assumed the analysis
was based on.

------
kingcub
Coincidentally, Michael Shermer's Ted talk on a somewhat related topic (He
calls it Patternicity) was posted today:
[http://www.ted.com/talks/michael_shermer_the_pattern_behind_...](http://www.ted.com/talks/michael_shermer_the_pattern_behind_self_deception.html)

------
mattmichielsen
The way I've argued about probability in the past is that 10 heads in a row is
just as likely as any other specific outcome, for instance HTHTHTHTHT or
HHHHHTTTTT. The coin flip events themselves are not related to each other, as
the outcome of one flip doesn't affect the next. So it doesn't really make any
sense to look for trends.

I was actually talking about colors in roulette, but the coin flip is a pretty
close example. This seems extremely logical when you think about it, but a lot
of people don't look at it this way. They always say something like, "Well,
black has been coming up a lot lately, so it's definitely going to be red
soon."

------
lotharbot
A broad rule of thumb:

for every 20 possible data sets you cover in your trawl, you should expect to
find about 1 "statistically significant" correlation that's actually a fluke
(at 95% confidence, using naive statistics). If you're digging through a
massive data set looking for correlations, you need to account for that, and
use more advanced statistical methods.

Sportscasters often do exactly the opposite -- they quote results that aren't
even particularly statistically significant (say, a 30% shooter making 3-of-4)
and act as though it shows a player is "playing well" and should be trusted to
continue the trend.

------
johnl
My summary of the article would be: Relying on the statistics of a
distribution works but relying on a optimization of the same distribution
doesn't work.

------
InclinedPlane
The basic problem comes down to the sample similarity fallacy. Human intuition
is that sampling is "fair" and random samples will be similar. If sports-ball-
player-A has a 40% lifetime average at shooting shots for goal points then
most people expect that out of, say, 10 shots they'll hit 4 and out of 30
shots they'll hit 12, plus or minus some small delta. The fact that statistics
doesn't work this way is quite frustrating to the human mind.

