
What Statistics Can and Can’t Tell Us About Ourselves - haltingproblem
https://www.newyorker.com/magazine/2019/09/09/what-statistics-can-and-cant-tell-us-about-ourselves
======
melling
Here are the 3 new books mentioned in the article:

By the author of the article:

Hello World: Being Human in the Age of Algorithms - Hannah Fry
[https://www.amazon.com/Hello-World-Being-Human-
Algorithms/dp...](https://www.amazon.com/Hello-World-Being-Human-
Algorithms/dp/039363499X)

Do Dice Play God?: The Mathematics of Uncertainty - Ian Stewart
[https://www.amazon.com/Dice-Play-God-Mathematics-
Uncertainty...](https://www.amazon.com/Dice-Play-God-Mathematics-
Uncertainty/dp/1781259437/ref=sr_1_1?keywords=Do+Dice+Play+God%3F&qid=1567518958&s=books&sr=1-1)

The Art of Statistics: How to Learn from Data - David Spiegelhalter
[https://www.amazon.com/Art-Statistics-How-Learn-
Data/dp/1541...](https://www.amazon.com/Art-Statistics-How-Learn-
Data/dp/1541618513/ref=sr_1_2?keywords=The+Art+of+Statistics&qid=1567518933&s=books&sr=1-2)

------
ppod
>In the era of Big Data, we’ve come to believe that, with enough information,
human behavior is predictable. But number crunching can lead us perilously
wrong.

The article isn't actually negative about quantitative approaches overall, but
the subheading has to conform to the New Yorker's ideology.

------
batoure
As someone working in the field I struggle with this type of writing because
it is both helpful but also misrepresents the fundamental problem with how
statistics are messaged. Few statisticians would argue that big data driven
statistics demonstrates determinism in action. Simply that certain markers are
more likely than others to predict likely outcomes. None of this explains why
something happened simply that it was likely it would.

In the first example of the doctor purposely killing his patients a
statistician would say this doctor fit the statistically predicted outcome.
But what isn't discussed is that this doctor must have existed as an anomaly
within his career group. He had to have been ABNORMALLY excellent as a doctor,
it is the only way he could hide his victims within his normal morbidity
rates.

This is one of the reasons anomaly detection is such an interesting field
because it requires dimensional understanding of a subject.

~~~
mcguire
"Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s
_excess_ mortality—the number of his older patients who had died in the course
of his career _over the number that would be expected of an average doctor’s_
—was a hundred and seventy-four women and forty-nine men at the time of his
arrest."

~~~
ohbleek
Just to add to your comment: Excess mortality is the number of deaths, or
mortality, caused by a specific disease, condition, or exposure to harmful
circumstances such as radiation, environmental chemicals, or natural disaster.
It's a measure of the deaths which occurred over and above the regular death
rate that would be predicted (in the absence of that negative defined
circumstance) for a given population.

Had the doctor not murdered any patients, he would have had the expected
number of patient deaths, but the Cambridge statistician found that the number
of patients that died under his care almost perfectly matched the expected
number plus the number he murdered, meaning he was a statistically average
doctor when he wasn’t murdering his patients.

------
zby
The article omits the problem number one here: it is the most surprising
results that get the most attention and among the most surprising many would
be false. Extraordinary claims get extraordinary attention and this is not
cancelled out by the requirement of extraordinary evidence.

~~~
Tarq0n
Not quite. It's a fact of statistics that sometimes you'll sample
extraordinary evidence for a hypothesis that's false. Check out this paper for
some examples:
[http://www.stat.columbia.edu/~gelman/research/published/PPS5...](http://www.stat.columbia.edu/~gelman/research/published/PPS551642_REV2.pdf).

------
sittingnut
statistics always involves selection and/or generalization of data. both of
those acts always involves some kind of bias (even random selection) in what,
how, and why, criteria used in process

------
autokad
I like how it has an audio option, much easier to consume that way

~~~
jerrya
Sigh, I was hoping to hear Hannah Fry. To this yankee, she has a wonderful
accent (that sounds to me, very much like Pearl Mackie's Bill Potts)

The Curious Cases of Rutherford & Fry Series 12 The Horrible Hangover
[https://www.bbc.co.uk/programmes/m0001r7k](https://www.bbc.co.uk/programmes/m0001r7k)

------
justwake
"The dangers of making individual predictions from our collective
characteristics" is a major area of research, especially in adtech, and the
"Great Hack" documentary has showed us how well some have gotten at this for a
slice of the population. A critical question for the future of data science
is: "can a person really change?" And "how can we determine if they have?"

~~~
autokad
I hate to be one of those "I been complaining about this all along", but I
have since my early college years in 2000!

I didn't have quite the sophistication in the arguments today, though they
were essentially the same (1/20 is arbitrary, effect size is very important
but ignored, false positives bound to happen, need to consider how rare the
event is, etc).

I also lamented something that I feel is the core of the problem to begin
with, people treated p-value = science and science = p-value. Once you do
that, its easy to put blind faith into a single study that can have huge
negative impacts on our freedoms and way of life. statistical significance is
a tool that can be used in science, but saying that at the time produced huge
defensiveness, especially from statisticians. I am glad this has finally begun
to change

