
How Not to Lie with Statistics: Avoiding Common Mistakes (1986) [pdf] - danso
http://dash.harvard.edu/bitstream/handle/1/4455012/not%20lie%20stat.pdf
======
capnrefsmmat
I agree with most of the points here, although I've never seen anyone
seriously attempt regression on residuals (instead of multiple regression)
outside of an introductory regression course. (Actually, I graded a homework
problem on it just last week.)

With all the data collected by tech companies these days, I'd be worried about
other problems. It's easy to dig through loads of variables looking for
correlations, and you'll inevitably find false positives. If you dig deep
enough in your data, looking for differences in conversion rates between
Southeast Asian Chrome users and Nordic users of Opera Mini, then you'll also
have poor statistical power and end up with wildly exaggerated results.

(I am slightly biased here because I have written an entire book on the
subject:
[http://www.statisticsdonewrong.com/](http://www.statisticsdonewrong.com/))

~~~
jzwinck
Thanks for posting the link to your book--it's interesting reading.

There seems to be a fairly common error within:

> One 1992 telephone survey estimated that American civilians use guns in
> self-defense up to 2.5 million times every year – that is, about 1% of
> American adults have defended themselves with firearms.

We cannot simply divide the event count by the population count, because a
single person may have used a gun more than once in a year. In fact, someone
who has used a gun during the year is more likely to use one later in the year
than someone who has not yet used one, because some people live in dangerous
areas, are themselves belligerent, or both.

~~~
bainsfather
Seconded. A great book (I've read 75% of it so far).

About the error you mention - I think you are right - but the author brings up
the ~1% because he is talking about 'base rate fallacy' \- he wants to say
that the errors from the 99% of the population will swamp the true signal from
the 1%. So his ~1% number is likely qualitatively ok for what he is using it
for. It should still be reworded though - one wants 0 errors in a book about
statistics mistakes :)

------
danso
OT: I've read and accessed this before, but I hadn't noticed the note in which
Harvard is soliciting feedback on its Open Access initiative, in which it
releases papers such as this...their online feedback form is here:

[https://osc.hul.harvard.edu/dash/open-access-
feedback?handle...](https://osc.hul.harvard.edu/dash/open-access-
feedback?handle=1%2F4455012&title=How%20Not%20to%20Lie%20with%20Statistics%3A%20Avoiding%20Common%20Mistakes%20in%20Quantitative%0D%0APolitical%20Science)

Hopefully they're getting positive response to open access, and see that both
the academic community and public benefit from more open access.

------
amathstudent
Of course, the problem with things like this, Huff's book, everything by David
Freedman, etc., is that people _want_ to lie with statistics. To put it more
prosaically, people have biases, prejudices, socially-created expectations,
ulterior motives, and usually statistics is a more-or-less subtle technique
for whitewashing those into 'scientific knowledge'. This happens all across
the social & biological sciences, in medical research, and in industry.

------
mattdeboard
Zed Shaw wrote on this topic awhile ago as well:

[http://zedshaw.com/essays/programmer_stats.html](http://zedshaw.com/essays/programmer_stats.html)

------
thisjepisje
From an article on HN a week ago:

 _In 1900, about 4 percent of the U.S. population was older than 65. Today, 90
percent of all babies born in the developed world will live past that age._

