
Simpson's paradox - nhebb
http://en.wikipedia.org/wiki/Simpsons_paradox
======
ddlatham
The most fascinating aspect is that once you recognize the issue, you can't
simply rely on using just the partitioned data or the aggregate. Be sure to
read the "Implications to decision making" section.

[http://en.wikipedia.org/wiki/Simpsons_paradox#Implications_t...](http://en.wikipedia.org/wiki/Simpsons_paradox#Implications_to_decision_making)

 _The practical significance of Simpson's paradox surfaces in decision making
situations where it poses the following dilemma: Which data should we consult
in choosing an action, the aggregated or the partitioned? In the Kidney Stone
example above, it is clear that if one is diagnosed with "Small Stones" or
"Large Stones" the data for the respective subpopulation should be consulted
and Treatment A would be preferred to Treatment B. But what if a patient is
not diagnosed, and the size of the stone is not known; would it be appropriate
to consult the aggregated data and administer Treatment B? This would stand
contrary to common sense; a treatment that is preferred both under one
condition and under its negation should also be preferred when the condition
is unknown.

On the other hand, if the partitioned data is to be preferred a priori, what
prevents one from partitioning the data into arbitrary sub-categories (say
based on eye color or post-treatment pain) artificially constructed to yield
wrong choices of treatments? Pearl[2] shows that, indeed, in many cases it is
the aggregated, not the partitioned data that gives the correct choice of
action. Worse yet, given the same table, one should sometimes follow the
partitioned and sometimes the aggregated data, depending on the story behind
the data; with each story dictating its own choice. Pearl[2] considers this to
be the real paradox behind Simpson's reversal.

As to why and how a story, not data, should dictate choices, the answer is
that it is the story which encodes the causal relationships among the
variables. Once we extract these relationships and represent them in a graph
called a causal Bayesian network we can test algorithmically whether a given
partition, representing confounding variables, gives the correct answer. The
test, called "back-door," requires that we check whether the nodes
corresponding to the confounding variables intercept certain paths in the
graph. This reduces Simpson's Paradox to an exercise in graph theory._

~~~
tel
To not stand too long on my soapbox, causal Bayesian networks are, I think,
the most important tool for high level statistics users. I wish Pearl were
more popular.

~~~
tmoertel
Fortunately, Pearl has distilled the highlights of his book _Causality_ into a
50-page paper that makes a great introduction to the modern theory of
causation:

<http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf>

Through this paper, and others like it, Pearl's ideas are gaining traction.

------
sp332
I think this and "Anscombe's Quartet"
<http://en.wikipedia.org/wiki/Anscombe%27s_quartet> are great examples of why
you should look at your data before trying to analyze it.

------
billybob
The "Berekley gender bias case" section made it make sense to me. Every single
department was more likely to admit a woman than a man, and yet the school as
a whole was more likely to admit a man than a woman.

This was because women were applying to more competitive departments, on
average, so a lower percentage of women applicants were getting admitted.

~~~
BenOfTomorrow
> Every single department was more likely to admit a woman than a man

Not quite; you can see several exceptions to this in the table in the article.

The key points of the partitioned data were:

No department was _significantly_ biased against women.

 _Most_ departments had a statistically significant bias towards women.

It's interesting to note that the departments that ARE biased towards men have
more female applicants; I wonder if being in a minority group for that
department is an even more important confounding factor.

------
remi
I like how the article uses "Lisa" and "Bart" for people’s name in the
example: <http://en.wikipedia.org/wiki/Simpsons_paradox#Description>

~~~
stinky613
When I saw the post title I had my fingers crossed hoping "please be a paradox
based on something from The Simpsons". Oh well... Interesting nonetheless

~~~
sonnyz
I think that Simpson's Paradox is: "You're damned if you do and you're damned
if you don't."

~~~
dolphenstein
<pedantic-nitpicky-internet-comment> Thats not technically a paradox
</pedantic-nitpicky-internet-comment>

------
r00fus
I think the take-home here is summarized here:

 _This imagined paradox is caused when the percentage is provided but not the
ratio. In this example, if only the 90% in the first week for Bart was
provided but not the ratio (9:10), it would distort the information causing
the imagined paradox. Even though Bart's percentage is higher for the first
and second week, when two weeks of articles is combined, overall Lisa had
improved a greater proportion, 55% of the 110 total articles. Lisa's
proportional total of articles improved exceeds Bart's total._

So it's important to not rely on percentage data alone, as it is a form of
reduction. Rely instead on ratios, allowing you to see the validity of the
percentage.

On Amazon and review sites, when I search by rating, I always look at the
number of reviews and weight accordingly. I remember that IMDB (before it
became an Amazon property) implemented a beyesian posterior mean to cull out
the low-review anomolies:
[https://secure.wikimedia.org/wikipedia/en/wiki/Internet_Movi...](https://secure.wikimedia.org/wikipedia/en/wiki/Internet_Movie_Database#Ranking_.28IMDb_Top_250.29)

I wish other places like Amazon would implement similar weighting mechanisms
to really allow a user to navigate by reviewed quality.

~~~
jodrellblank
I glance at the 5 star reviews then head straight for the 1 stars.

Reviews aren't really ratio based - a single counter of the right kind can
completely disuade me. I really don't want "low revie anomalies" culled.

I watched the Benoit Mandelbrot TED talk earlier, he had a graph of S&Ps stock
market index normally compared to it win the five most anomalous daily trades
removed - it was very different. Stock market models, he said, try to smooth
out the rough bits which are hard to handle, but that's really where the meat
is.

~~~
r00fus
Sorry, when I said low-review anomalies, I'm talking products that are not
reviewed enough (i.e., 5-star from a single review). They likely give an
inaccurate view of the product.

Comparing two separate products, of which one has 100 reviews and another has
7, I'll take the 100-review rating as more "interesting" than the 7-review
product, even though the 7-review item maybe rated higher.

------
jackfoxy
Unintended consequence of reading the article: I need to rethink decades of
conditioning. I thought Republicans opposed the 1964 Civil Rights Act.

~~~
wildwood
<http://en.wikipedia.org/wiki/Southern_Democrats>

There was a time when the Republicans were at least as liberal as the
Democrats.

------
nwhitehead
Simpson's paradox is amazing, it's worth reading all the examples in the
article and really understanding it.

Another good source is Martin Gardner's "Aha! Gotcha" book. He presents the
paradox as a woman trying to find eligible bachelors at a party. In room 1 her
odds are better if she goes for guys with mustaches. In room 2 her odds are
also better if she goes for guys with mustaches. But when everyone goes into
one room, her odds are better for guys _without_ mustaches. It's incredible.

------
Jach
It's not really that interesting, and the paradoxical/unexpected aspects of it
go away once you start looking at everything through the lenses of conditional
probability theory. e.g. <http://uncertainty.stat.cmu.edu/> goes over it as
early as chapter 2.

------
canistr
This happens to students on a fairly consistent basis.

Suppose a project is worth 10% of the student's final mark while a midterm is
worth 30% and the final exam is worth 60%. If the student performs well on the
project, average on the midterm, but poorly on the Final Exam, their mark is
still going to be poor due to the heavier weighting of the exam and midterm
over the project.

EDIT: I know it's not a perfect example, but comparing marks between different
students based on how they are weighted is essentially the idea.

~~~
orangecat
That's not quite right. The "paradoxical" version would be something like:

Alice and Bob are taking a course where their grades are determined by 5
essays and/or presentations, and they can choose how many of each to do. Alice
does one essay where she earns 80%, and 4 presentations where she earns an
average of 90%. Bob does 4 essays earning an average of 85%, and one
presentation where he earns 95%. For each assignment type Bob has a higher
average than Alice, but Alice's overall grade (assuming equal weighting) is
higher than Bob's: Alice gets 88% ((80 + 4 * 90)/5) and Bob gets 87% ((4 * 85
+ 95)/5).

------
larrik
"A real-life example is the passage of the Civil Rights Act of 1964. Overall,
a higher ratio of Republicans voted in favor of the Act than Democrats.
However, when the congressional delegations from the northern and southern
States are considered separately, a higher ratio of Democrats voted in favor
of the act in both regions. This arose because regional affiliation is a very
strong indicator of how a congressman or senator voted, whereas party
affiliation is a weak indicator."

The chart then shows that the "Northern" House had 316 members vs. the South's
104, and that the "Northern" Senate had 78 members vs. the South's 22.

Uh... the South has Florida, Texas, and California. How can it be represented
by less than a _quarter_ of Congress? I mean, the Senate makes a bit more
sense due to the tiny states of the northeast and such (though it still seems
low), but the House too? Really?

Makes me wonder how exactly they classified South vs. North... Former
Confederate states vs. everyone else? Either way, Geography cannot be a major
factor.

~~~
hugh3
In US geography, "The South" means something very different to the southern
half of the country. Likewise, "The Midwest" is concentrated entirely in the
eastern half of the country. "The South" is really the south-east corner of
the country.

Is it just confederate states? Well, eleven states seceded, and 22 senators
implies eleven states, so probably. Not sure where Kentucky fits in.

~~~
btcoal
Being from the upper midwest, I always considered "the South" anything south
of the Mason-Dixon Line plus W.Va and Kentucky. Wow this is off-topic.

Little gems of insight like Simpson's paradox are why I studied statistics.

