My favorite probability theory problem is related to this article. You have a te...

joshuamorton · on April 24, 2018

>The answer is not at all the one most people think at first when given this problem.

The answer depends on the disease!

If it's the common cold, it's probably close to 99%. If it's huntington's disease, the likelyhood is much lower. (when asked, this question is normally posed as "you are given a test, which is 99% accurate, for some rare and deadly disease", the "rare" part is important)

sykh · on April 24, 2018

Sorry, you are correct. I'll update the problem.

joshuamorton · on April 25, 2018

I'll follow up by mentioning that even as you have it worded now, it's not precise enough to answer. You need to explicitly describe the false positive and negative rates separately. As is, a test that is just "return false" (100% false-negatives, 0% false-positives) will be 99% accurate, but gives no information, whereas a test with 1% false positive rate can have a 1% false negative rate and still be correct 99% of the time, and will provide much more information.

Statistics is weird and unintuitive.

sykh · on April 25, 2018

I guess I should say that everyone takes the test. I'm not a statistician, I'm a mathematician. The way I understand the phrase, "the test is 99% accurate" is that this means: assuming everyone were to be tested then 99% of the time you get an accurate result. Thus .99(1%) = 0.99% of the people will correctly test positive and 0.01(99%) = 0.99% of the population will incorrectly test positive.

beojan · on April 25, 2018

The problem still stands then.

cdancette · on April 24, 2018

I'm not sure you phrased the problem correctly. If we follow your explanation, then the probability of having the disease is indeed 99%.

If you want to show the implication of Bayes' Theorem then you need to be more precise : Say you have a 1% of false positive and false negative rates (99% reliability) and 1% of the population is sick. If you are tested positive, then the probability of being sick is much less than 99%.

thaumasiotes · on April 24, 2018

> If we follow your explanation, then the probability of having the disease is indeed 99%.

This is not correct; the probability of having the disease is unknown. He didn't say what he meant by the test being "99% accurate", but that doesn't mean you can just make your own assumption.

Note that in your more precisely specified scenario, when the test has 99% reliability, it is perfectly true that "99% of the time the test gives a correct result", which immediately disproves the claim that, if we follow that definition, the probability of having the disease given a positive test result is 99%.

cdancette · on April 25, 2018

The problem is that "99% the time gives a correct result" is imprecise.

It can be understood as both:

- p(sick|positive) = 0.99 - p(positive|sick) = 0.99

We get totally different results, the first one is obvious (99% change of being sick), and the second one needs Bayes' Theorem (and is the one we want to use).

thaumasiotes · on April 25, 2018

I would only interpret "the test gives a correct result 99% of the time" to mean that out of every 100 test results, 99 are correct and one is wrong. Neither of your interpretations matches that. You need all kinds of additional information to say anything more specific. "99% of results are correct" can easily be true while p(sick | positive) and p(positive | sick) each vary anywhere between 0 and 1.

sykh · on April 24, 2018

I updated the problem. Sorry for the mistake.

smartician · on April 25, 2018

Let's see... Let's say you test 10,000 people, so about 100 actually have the disease. Since the test is only 99% accurate, only 99 of those will test positive. Of the remaining 9,900 actually negative people, 99 will test falsely positive. So if you test positive, you have a 50% chance of actually having the disease?

sykh · on April 25, 2018

As I understand the phrase, "99% accurate", this is correct. However, I gather that to statisticians this phrase could mean other things. I think the source of the ambiguity comes from whether or not everyone gets tested or something along those lines.

To me the phrase should mean that when I test people who have the disease then 99% of the time I get a positive result and when I test people who don't have the disease then 99% of the time I get a negative result. That seems the most reasonable interpretation but I'm not a statistician.

zxv · on April 24, 2018

Great point. This is the effect of a low base-rate.

https://en.wikipedia.org/wiki/Base_rate_fallacy

Here's a paper on it's impact on network intrusion detection.

http://www.raid-symposium.org/raid99/PAPERS/Axelsson.pdf

itchytrotters · on April 25, 2018

Bruce Schneier has pointed out the implications of the base rate fallacy for various security systems for a long time now:

https://www.schneier.com/blog/archives/2006/03/data_mining_f...

https://www.schneier.com/blog/archives/2006/07/terrorists_da...

https://www.schneier.com/blog/archives/2012/05/criminal_inte...

xelxebar · on April 25, 2018

> This problem is why getting two tests is always a good thing...

It's important to note that the tests results should ideally be as uncorrelated as possible. At worst a test always gives the same result as its first outcome, so further testing would give zero information.

In practice this means that you probably want tests that are based on completely different mechanisms.

Fomite · on April 25, 2018

Also that those tests are actually testing the same thing, which is tricky with your "two different mechanisms" requirement a lot of the time. My field is currently struggling to get a handle of what one should do in a particular circumstance when one test is negative and the other positive (for those interested, PCR + and toxin assay - for C. difficile).