
Can chance make you a killer? - jonp
http://www.bbc.co.uk/news/magazine-10729380
======
scotty79
100 operations is really small sample for something that happens once in a
100.

If you have chance 1/100 of death then with sample 100 you have to have 0 or 1
deaths to fit in 60% but with sample 200 you can have 0,1,2,3 deaths to still
fit.

It's just 60% is harsh limit for small samples. There should be some lower
limit on sample size you have to gather for given probability before you start
judging. Either that or threshold should depend on sample size and
probability, but limiting sample size might be easier to understand for
general public.

~~~
dinedal
I wish that journalistic integrity extended into statistical analysis, and
that the bar for understanding of basic statistics were raised so that someone
with a high school degree would be as irked as you or I about poor excuses for
analysis.

~~~
goodside
Wishing for normal people to understand statistics isn't reasonable. I'd also
like a pony. But the first point is important: journalists shouldn't be
allowed to publish statistical musings without some oversight from a
statistician, which any reputable paper would have on staff anyway to avoid
basic mistakes. Journalists would never be allowed to speculate as freely as
this in deciding questions of climate science or molecular biology, but once
it's a question of statistics everyone is instantly an expert.

~~~
hugh3
Perhaps statistics, (of the relatively advanced variety rather than the mean-
mode-median variety), should be inserted into every high school student's
curriculum. It is, after all, _far_ more important to the life of the average
citizen than, say, trigonometry or calculus. And it _ought_ to be easier to
understand.

------
MindTwister
As with everything there is always multiple factors, one of them being chance,
this article (interactive no less) seems to highlight this very well.

I guess the rest of us not in the hospital business can use this to be weary
of statistics and data presented to us, even when provided from a reliable
source. When gathering our own statistics its often necessary to do that for a
prolonged period in order to let the data reveal what and what isn't pure
chance.

------
carbocation
They are describing a basic Monte Carlo simulation. It shows you that, given a
certain distribution, some subset of trials will exceed some arbitrary cutoff.
But, well, of course! This must be topical in Britain - is there some flap
about death rates going on there? Perhaps there is some concern about too-
stringent cutoffs being used there to try to catch bad physicians?

The real take-home here is that if you set your 'warning' threshold to be too
low, you'll get a lot of warnings. Here, they used an arbitrary cutoff of 8 as
their warning threshold. A fairer move would have been to allow some data to
come back, try to infer a distribution and it's parameters, and then set
future cutoffs based on those parameters. One could even continuously
recalculate based on incoming data.

~~~
jonp
I don't think there's anything particularly topical about death rates here in
the UK right now. The author together with Michael Dilnot presented a radio
show "More or Less" which covered lots of topics of this sort, showing how
maths can aid an understanding of the world in general and current affairs in
particular.

Their book "The Tiger That Isn't" ([http://www.amazon.co.uk/Tiger-That-Isnt-
Through-Numbers/dp/1...](http://www.amazon.co.uk/Tiger-That-Isnt-Through-
Numbers/dp/1861978391)) is excellent and includes some other examples around
sample size and regression to mean.

One thing that struck me from the article was how computers and Monte Carlo
simulations have lowered the barrier to entry for statistical understanding.
It seems much simpler to run this sort of simulation than to understand the
intricacies of different probability distributions.

~~~
carbocation
I agree, but understanding probability distributions is actually key to this
simulation that they've performed. The results would differ, greatly, by
running it with a uniform vs normal vs binomial distribution, or by tuning
their parameters.

~~~
jonp
I was thinking that they probably just simulated each individual patient by
picking a random number and seeing if it's less than the death rate.

So the number of deaths in a hospital is just

    
    
      sum([random()<deathRate for i in range(numPeople)])
    

and they can do the analysis without needing to know what a Bernouilli or
Binomial distribution is, even though these both feature in the problem.

~~~
carbocation
Again, the "random number" comes from a distribution. You can have a random
number from the binomial, normal, etc distribution, and the output will look
quite different. Also, there are clearly bounds on the distribution. Etc. They
absolutely _do_ need to understand how their pseudorandom number generator
works in order to understand the properties of their simulation.

~~~
jonp
I can't tell if we're disagreeing or just at cross purposes.

In the code above the random number is uniform(0,1), the standard random
number in most languages. So each individual death is Bernouilli, although you
don't need to know the word Bernouilli. Then this leads to a binomial
distribution for a hospital as a whole.

But the binomial assumption doesn't need to be known or explicitly built into
the model. It's effectively an emergent property of the individual deaths.

------
known
"You are a product of your environment." --Clement Stone

