

Cory Doctorow: Our dangerous statistical ignorance - graywh
http://www.guardian.co.uk/technology/2008/may/20/rare.events

======
niels_olson
As someone who deals in diagnostic tests all day, nobody's diagnosed on one
test. There's a similar problem I worked out on the odds of getting into
medical school, I'm sure this is elementary stats and has someone's name on
it, but I'll call it the acceptance problem: how many medical schools do you
have to apply to in order to have 90% chance of getting into medical school?
For any school there is a acceptance quotient: Q = (acceptances sent
out)/(number of applications received)

For any given student applying to some schools 1 thru n, the goal is getting
at least once acceptance, and applying to more schools, mathematically, can't
possibly hurt in the closed case (neglecting social engineering, time spent on
applications, etc), so the chance of acceptance, Ca, approaches 1 with every
new application in the following fashion:

Ca = (1 - (1 - Q1)(1 - Q2)(1 - Q3)...(1 - Qn)) _100

There's a visual and some worked out examples here:
<http://nielsolson.us/MedSchool/>

Similarly, if the sensitivity of a test is 90%, that means the test identifies
9 of every 10 people with the diagnosis. If I administer n different tests
each with a sensitivity of S, then the chances of accurately diagnosing the
disease, Cd, _goes up* with each additional positive but never gets to 1.

Cd = (1 - (1 - S1)(1 - S2)(1 - S3) . . . (1 - Sn)) _100%

So lets say you are doing, say, genetic testing, and any one gene is 1%
sensitive for the disease. If you tested 300 genes you could be no more than
95% certain of the diagnosis.

(1 - (1 - .99)^300)_100 = 95.09591...%

Now, if your genetic tests were 5% accurate, you're panel could be no more
than 95% accurate with 59 tests.

If your test was 50% accurate, you're panel could consist of 6 tests and be no
more than 95% accurate.

Of course, if some of the tests are negative, things get more complicated. One
of the problems with these data sets is that we have no idea how predictive
they are. You can't even calculate the predictive power of the database. There
simply haven't been enough events. Then we get into surrogate measures (how
many were positive on tests 1 - n and were found to have razor blades in their
homes, etc).

The claim that these databases can't be effective isn't true. They could be. P
might also equal NP. Whether the hypothesis is strictly true or not, the vague
but real set of 'practical concerns' suggest that the truth of the hypothesis
is sufficiently difficult to test as to render the null hypothesis the de
facto assumption until proven otherwise.

~~~
aston
The assumption of independence underlying the math you're doing is probably
outright false (albeit mathematically simple). As one example, imagine the
case where all n schools use the exact same admissions criteria. Unless you're
applying with a random application to each school, your math is shot; you will
either get into every school or none of them. I won't even go into the genetic
independence issue.

~~~
niels_olson
Every mathematical model is a false representation of reality. The predictive
accuracy must be validated by experiment. And using the link I reference,
you'll see there's actually some pretty strong data to start from in this
case.

------
tonystubblebine
There's a quote I like that I think came from Marvin Minsky, "the US needs a
Department of Homeland Arithmetic." We're protecting ourselves from all the
wrong risks.

~~~
byrneseyeview
I don't understand why Marvin Minsky hates America so much that he wants to
put the government in charge of assessing such risks. The government is an
incredibly complex device for distilling a tiny amount of individual ignorance
from each voter into a moonshine of batshittery.

~~~
rw
It's presumptuous to say that voters effect any real change in government
policy.

~~~
eru
Perhaps you have more effect by voting (i.e. switching) TV channels. The
public opinion seems to be very important for government policy - and popular
shows can be and are seen as a barometer for it, I guess.

------
jpeterson
"Here's how that works: imagine that you've got a disease that strikes one in
a million people, and a test for the disease that's 99% accurate. You
administer the test to a million people, and it will be positive for around
10,000 of them – because for every hundred people, it will be wrong once
(that's what 99% accurate means). Yet, statistically, we know that there's
only one infected person in the entire sample. That means that your "99%
accurate" test is wrong 9,999 times out of 10,000!"

No, it means that the "99% accurate" test is wrong 9,999 times out of
1,000,000. It would be clear to anyone when stated that way. What's
counterintuitive is the author's statement of the result, not the result
itself.

~~~
tel
A better wording is that your chance of having the disease if given a positive
result from the test is 1/10000 (0.1%).

That's a huge increase from a 0.0001% chance of having the disease, but it's
still not flat out terrifying. Repeat testing can weed through the false
positives at a speed proportional to its accuracy.

~~~
graywh
Not necessarily. If you get the same results every time, repeated testing
provides nothing.

~~~
DougBTX
It depends on what causes the false positives.

If the test is picking up something in the person being tested, then yes,
you'll get the same result every time and repeated testing proves nothing. But
you can still repeat using other tests.

If the test gives false positives purely at random, then repeated testing will
help. Say the test is wrong 50% of the time, and you do the test five times.
If you get the same results every time, then you can be 100-(50/100)^5*100 =
97% sure of the results.

------
breck
"But the fact is that attacks by strangers are so rare as to be practically
nonexistent. If your child is assaulted, the perpetrator is almost certainly a
relative."

Maybe that statement is true in today's world. But for tens of thousands of
years, while our brains evolved, I would guess attacks from strangers were a
lot more common.

~~~
tel
The causation implied here is unlikely at best. More likely might be that
until the last century no one had to deal with numbers large enough to need
these kinds of statistics.

------
JacobAldridge
I believe the worst damage to statistics is specious reasoning. My favourite
chocolate promotion is Mars' '1 in 6 Wins a Free Bar' (on currently here in
Oz). If I buy 6 bars, most people would assume I would win once. In fact, I
have only a 2/3 chance of winning a free one

1 - (5^6/6^6)

Buy 12 bars, and there's still a more than 10% chance I won't have won
yet...most of the chocolate-buying government-voting lottery-praying public
would be stunned.

~~~
aston
Although the average number of bars you'd need to buy before you win is indeed
six. I think almost no one would put the odds of actually winning one by
buying six at 100%.

~~~
JacobAldridge
"Any 6 bars may not contain a free bar"

You're probably right, and I hope that you are. Still, funny to note that Mars
has even put this disclaimer on the bottom of their promo site!

<http://www.marsfreebars.com.au/>

------
dhimes
If I had an intuitive grasp of statistics, I may not have a start-up!

~~~
SwellJoe
Why not? It's a one-in-four (or one-in-three, depending on who you believe)
chance of being rich in a few years--that means that in the worst case
scenario, if you keep at it, starting over when you fail, in about 10-15 years
you're very nearly guaranteed to be rich. And you almost certainly won't
starve in the process.

Of course, that assumes a reasonable level of intelligence, education, and
drive.

Startups are about the best game going, as far as I can tell--I wouldn't be
playing if the game was rigged against me (more than a little, anyway...sure,
small companies have higher relative regulatory burden, but on the whole the
technology game is actually rigged in favor of new companies, from a growth
perspective).

~~~
iamwil
I don't get how you're guaranteed to be rich in 10-15 years. If it's a 1/4
chance of being rich every x years, it's still just 1/4 chance over the span
of 10-15 years, right?

However, I'm going to guess that you mean every time you try is going to
influence the next time you try for the better (as evidenced by some paper
about higher success rates for 2nd+ time entrepreneurs I remember on here),
which makes sense. You learn from your mistakes, you make contacts, you have a
better view of the market--so it shouldn't stay one in four every time you
try.

~~~
SwellJoe
"If it's a 1/4 chance of being rich every x years, it's still just 1/4 chance
over the span of 10-15 years, right?"

So, no matter how many times you roll a die, you've only got one in six chance
of rolling a 1 in all of the rolls?

Somehow, I think your math is slightly off.

~~~
iamwil
No, just a mismatch with understanding your wording. With your original
wording, it didn't make sense, since each roll would have the same prob, no
matter how many times you rolled. That's why I figured you meant that rolls
weren't independent from each other.

But it seems that you mean, what's the chance that given x number of rolls,
the very last one is a "1" (assuming you only need/want to get rich once). As
the number of rolls increase, the chance of that scenerio (a string of non-1s
with the last one being 1) becomes smaller and smaller when taken as a whole.

~~~
SwellJoe
How else could I possibly mean "keep trying for 10-15 years"? One can't take
2-5 year increments of your life in isolation, since, as you've noted, you
only need to get rich once to be rich.

But, I'm glad we're all clear now.

------
andrewparker
Riding on the subway earlier today, a guy got on an starting preaching how
"the lord saves" and "you must embrace Jesus." I wish math could create
equivalent evangelism. I'd love to live in a world where a guy gets on the
subway and starts preaching the Pythagorean theorem?

~~~
graywh
That one's not the problem. Perhaps Bayes' theorem?

------
michael_dorfman
I agree with all of his points, but it was a very slim article-- I was kind of
hoping for a more profound analysis from someone like Doctorow.

~~~
spydez
If you want a profound analysis, he wrote a (fiction) book recently on this
subject, and put it out on the internet under a CC license. It's an enjoyable
read; he pretty much distilled a few bits from the book to make that article.

Here tis: [http://craphound.com/littlebrother/Cory_Doctorow_-
_Little_Br...](http://craphound.com/littlebrother/Cory_Doctorow_-
_Little_Brother.htm)

------
__
Something doesn't seem right about Doctorow's example of attacks on children.
He writes, "But the fact is that attacks by strangers are so rare as to be
practically nonexistent. If your child is assaulted, the perpetrator is almost
certainly a relative (most likely a parent)."

I'm sure that's true, but it doesn't answer the real question. For most
parents, the question is: _given my child's particular environment, what is
the greatest threat to him?_

Doctorow is saying: _given that your child was attacked (and no additional
information), the attacker is more likely than not to be a relative._ That
seems backwards.

~~~
tel
He's stating that the evidence is P(Was Relative | Child Attacked) is
relatively high, especially compared to P(Was Total Stranger | Child
Attacked). This means that it should, if you use Bayes' Theorem correctly in
everyday life, shift your suspicion away from the random photographer.

Of course, since P(Child Attacked) is so very low it's still not a huge deal.

It's not really backwards. That sort of inversion is exactly part of Bayes'
Theorem.

------
ken
Statistician Peter Donnelly on the same subject:

<http://youtube.com/watch?v=kLmzxmRcUTo>

------
seshadripv
It's not just the probability of an occurence that leads us to behave in an
unreasonable manner. The other important thing is the impact of that 'rare
occurence'. Most critics don't seem to take that in to account at all...

------
redorb
good case against data mining... and the "if it saved one life, would it be
worth it?" argument doesn't change my mind.

------
henning
a more basic problem is just plain ignorance.

many if not most citizens of the USA do not understand the basic organization
and functions of their government. most of them cannot name a single sitting
supreme court justice, have never read the constitution, do not know how a
bill becomes a law.

and these people vote.

