Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My favorite probability theory problem is related to this article.

You have a test for a disease that is 99% accurate. This means that 99% of the time the test gives a correct result. You test positive for the disease and it is known that 1% of the population has the disease. What is the probability you have the disease?

The answer is not at all the one most people think at first when given this problem. This problem is why getting two tests is always a good thing to do when testing positive for a disease.

EDIT: I updated the statement of the problem to be one that can be answered!



>The answer is not at all the one most people think at first when given this problem.

The answer depends on the disease!

If it's the common cold, it's probably close to 99%. If it's huntington's disease, the likelyhood is much lower. (when asked, this question is normally posed as "you are given a test, which is 99% accurate, for some rare and deadly disease", the "rare" part is important)


Sorry, you are correct. I'll update the problem.


I'll follow up by mentioning that even as you have it worded now, it's not precise enough to answer. You need to explicitly describe the false positive and negative rates separately. As is, a test that is just "return false" (100% false-negatives, 0% false-positives) will be 99% accurate, but gives no information, whereas a test with 1% false positive rate can have a 1% false negative rate and still be correct 99% of the time, and will provide much more information.

Statistics is weird and unintuitive.


I guess I should say that everyone takes the test. I'm not a statistician, I'm a mathematician. The way I understand the phrase, "the test is 99% accurate" is that this means: assuming everyone were to be tested then 99% of the time you get an accurate result. Thus .99(1%) = 0.99% of the people will correctly test positive and 0.01(99%) = 0.99% of the population will incorrectly test positive.


The problem still stands then.


I'm not sure you phrased the problem correctly. If we follow your explanation, then the probability of having the disease is indeed 99%.

If you want to show the implication of Bayes' Theorem then you need to be more precise : Say you have a 1% of false positive and false negative rates (99% reliability) and 1% of the population is sick. If you are tested positive, then the probability of being sick is much less than 99%.


> If we follow your explanation, then the probability of having the disease is indeed 99%.

This is not correct; the probability of having the disease is unknown. He didn't say what he meant by the test being "99% accurate", but that doesn't mean you can just make your own assumption.

Note that in your more precisely specified scenario, when the test has 99% reliability, it is perfectly true that "99% of the time the test gives a correct result", which immediately disproves the claim that, if we follow that definition, the probability of having the disease given a positive test result is 99%.


The problem is that "99% the time gives a correct result" is imprecise.

It can be understood as both:

- p(sick|positive) = 0.99 - p(positive|sick) = 0.99

We get totally different results, the first one is obvious (99% change of being sick), and the second one needs Bayes' Theorem (and is the one we want to use).


I would only interpret "the test gives a correct result 99% of the time" to mean that out of every 100 test results, 99 are correct and one is wrong. Neither of your interpretations matches that. You need all kinds of additional information to say anything more specific. "99% of results are correct" can easily be true while p(sick | positive) and p(positive | sick) each vary anywhere between 0 and 1.


I updated the problem. Sorry for the mistake.


Let's see... Let's say you test 10,000 people, so about 100 actually have the disease. Since the test is only 99% accurate, only 99 of those will test positive. Of the remaining 9,900 actually negative people, 99 will test falsely positive. So if you test positive, you have a 50% chance of actually having the disease?


As I understand the phrase, "99% accurate", this is correct. However, I gather that to statisticians this phrase could mean other things. I think the source of the ambiguity comes from whether or not everyone gets tested or something along those lines.

To me the phrase should mean that when I test people who have the disease then 99% of the time I get a positive result and when I test people who don't have the disease then 99% of the time I get a negative result. That seems the most reasonable interpretation but I'm not a statistician.


Great point. This is the effect of a low base-rate.

https://en.wikipedia.org/wiki/Base_rate_fallacy

Here's a paper on it's impact on network intrusion detection.

http://www.raid-symposium.org/raid99/PAPERS/Axelsson.pdf



> This problem is why getting two tests is always a good thing...

It's important to note that the tests results should ideally be as uncorrelated as possible. At worst a test always gives the same result as its first outcome, so further testing would give zero information.

In practice this means that you probably want tests that are based on completely different mechanisms.


Also that those tests are actually testing the same thing, which is tricky with your "two different mechanisms" requirement a lot of the time. My field is currently struggling to get a handle of what one should do in a particular circumstance when one test is negative and the other positive (for those interested, PCR + and toxin assay - for C. difficile).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: