
The Ecological Fallacy - glaugh
http://blog.statwing.com/the-ecological-fallacy/
======
clarkm
The notion that rich _people_ voted for Romney while rich _states_ voted for
Obama may also be misleading. While you can attempt to prove this by comparing
the incomes of voters at the 100k+ breakpoint, I'm fairly certain the
correlation disappears if you set a breakpoint at 250k+. You can see this
trend illustrated in the 2008 election exit polls[1]. Obama won the lower
income ranges and McCain won the 100k - 200k ranges. However, Obama also won
the 200k+ income level. In other words, if you look at those with incomes
greater than 100k, it appears that McCain won the rich; however, if you look
at those with incomes greater than 250k, the rich seem to favor Obama.

While I suspect something similar holds true for the 2012 election, such
granular breakdowns weren't reported in this cycle's exit poll summaries. The
answer is probably hidden away in Edison Research's database, but the raw data
hasn't been released yet. For now, you can get a good feel for the income
breakdown by looking at Reuters polls[2], and doing the cross-tabs yourself,
but there are quite a few undecideds and the sample-size is small.

And I agree that the 250k+ breakdown is also arbitrary, though slightly less
so, since it's the lower bound for what many politicians define as "rich". But
who knows who those with incomes of 1 million+ voted for? I suspect it was
probably Romney. But how about billionaires? The point is that setting such
broad and arbitrary breakpoints can be misleading. I'm sure this fallacy has a
special name, I just don't know what it is.

[1] <http://www.cnn.com/ELECTION/2008/results/polls/#USP00p1>

[2] <http://elections.reuters.com/#poll>

------
mistercow
It's important when discussing this fallacy to be precise in your terminology,
because phrases like "more likely" have different meanings depending on
context.

>U.S. states with proportionally more immigrants have proportionally more
households with income above $100k.[1] Ergo, immigrants are more likely than
non-immigrants to have household incomes above $100k.

Whether or not that's a fallacy really depends on how you interpret that
statement. If my only information about the world is what is stated above,
then finding out that someone is an immigrant should increase my estimation of
the likelihood that their household income is above $100k. Being an immigrant
is evidence of living in a state where it's more common to have a household
income over $100k. Living in a state where it's more common to have a
household income over $100k is evidence of having a household income over
$100k. When I learn more about the world, my model will change, and I'll stop
being wrong about this particular thing.

The fallacy comes when you say that the group correlation _implies_ a
correlation at the individual level.

------
SoftwareMaven
As my first introduction to the ecological fallacy, I thought it did a good
job concisely stating the fallacy, with good examples to illustrate it (both
intuitive and non-intuitive).

The next question that would inevitably come up is: how do you know? I'm
guessing there isn't a way short of looking at the data for individuals. It
would probably be safe to always assume group data does not imply individual
data.

And, of course, this is another way that people can use statistics to lie to
you. I would not be surprised at all to find people intentionally using this
fallacy to their benefit.

~~~
glaugh
Unfortunately, I'm not aware of any good heuristics for identifying when the
ecological fallacy is or is not an issue. As you suggested, that implies that
one should always consider it to be an issue if the individual level data
isn't available.

Does anyone know of anything I'm missing here?

~~~
ChuckMcM
Recommended reading "Proofiness: The Dark Arts of Mathematical Deception" by
Seife. Talks about this fallacy and others in depth.

The best approach is multi-variate analysis. Which is to say for each
correlation you find, identify another correlation that would be true (and is
measurable) if the cause was what you hypothesize is the cause. Its a great
way to write a paper too.

You start with "look at this correlation", we hypothesize that cause is "Q"
and now we go look at the following correlations to prove or disprove our
hypothesis, ... analysis and graphs ..., as you can see our hypothesis is
thoroughly {proven | disproved} by these correlations.

If the data isn't available to do additional analysis then you are stuck
trying to collect that data somehow. You can end up with inconclusive results
in that case.

------
glomph
One example of this that I recently read about was the measurement of
development in india, that for a long time happened on a family level and so
missed a lot of inequality and lacking of basic capabilities and freedoms for
women. Not only did this mean that the information was wrong, it also lead to
it taking longer to acknowledge just how important empowerment of women is in
fighting poverty.

------
Yhippa
For the chart showing "% Vote for Romney vs. Median Household Income (2010
data)" isn't that actually a positive correlation? It looks like the income is
moving in lock step with the vote for Romney percentages.

~~~
glaugh
Thanks for the question.

The fact that income moves very tightly with vote for Romney means it's a
_strong_ correlation. Since one variable goes up while the other goes down,
it's a _negative_ correlation.

You can have any combination of strong/weak and negative/positive correlation.

The image on wikipedia is good for wrapping your head around the distinction
between the strength and the direction of the correlation (and the distinction
between strength and the actual slope of the line of best fit).
<http://en.wikipedia.org/wiki/Correlation_and_dependence>

Apologies if I misunderstood your question.

Cheers!

------
glaugh
OP here. Thoughts/questions/comments?

~~~
ph0rque
I would recommend toning down the advertising a bit. Leave it in the
beginning, or at the end, but not both.

~~~
glaugh
Duly noted. Thanks for the feedback.

edit: Deleted the ad at the end, replaced with a link to the front page.

