
Medicine’s Uncomfortable Relationship with Math (2014) - todd8
http://archinte.jamanetwork.com/article.aspx?articleid=1861033
======
leemailll
The problem is more deep-rooted in most biology and medicine education. In
many fields of biology and medicine people from faculty to undergraduate lack
a systematic education in statistics and probability, and this problem keeping
reflected in the research and publications. I collaborated with many people in
these fields, and surprisingly the statistical analysis is just clicking a few
buttons in a software and copy-pasting the alpha value. Another reason why
this problem is so rampant in the fields, I think it is because the graduate
and postgraduate training in these fields still like apprenticeship. So you
learn mostly from what senior members do in the lab, what the classical papers
in the field did, and what your paper's reviwers ask.

~~~
smrtinsert
They are required courses for those degrees, so you might have a bad sample of
people, or the field is simply bad at it entirely.

~~~
leemailll
Since this is something about statistics and probability, I think enough
samples for conclusion is required. ;)

You can read an example
([http://science.sciencemag.org/content/351/6277/1037.2.long](http://science.sciencemag.org/content/351/6277/1037.2.long),
[http://www.be-md.ncbi.nlm.nih.gov/pubmed/26941311/#comments](http://www.be-
md.ncbi.nlm.nih.gov/pubmed/26941311/#comments)) I linked in another comment
([https://news.ycombinator.com/item?id=11325254](https://news.ycombinator.com/item?id=11325254)).
There are quite a few blogpost on this also, and if you allow me to shorten
the story in one sentence, that is a comment on Science arguing a previous
study on Science did the statistical analysis wrong is under the critics that
the comment itself did statistical analysis wrong.

And recently life sciences fields are in a "reproducibility" controversy which
basically arises from education and application of the topic.

~~~
niels_olson
I'm a physician, with an undergrad in physics. My research is very applied and
very math heavy. When applying to med schools, I asked a couple very senior
mathematicians at my college as to whether I should take a stats class. They
assured me I would not need a stats class.

The first time I got a biostats lecture in medical school, I genuinely
questioned their advice. By the third class, I was convinced I knew nothing
about math. Luckily our biochem prof was a physics major and assured me I
didn't need to worry about the stats.

Somewhere in residency, I realized I really had been getting all the stats
questions right. But not because I did anything like the other residents. They
have all these crazy stories and mnemonics to keep track of what kind of
problem they're looking at. They seriously try to classify their way out of
basic math. They classify. Everything.

I now agree with my profs. I didn't need a stats class. But the stats
education in medical school seems designed to convince physicians that they
need statisticians. Which is probably a coup for biostatisticians, but it's a
damn shame for physicians trying to break into research outside major academic
centers.

~~~
ska
Irrelevant to your main point - I am surprised you didn't have at least an
intro to probability and statistics, and later a statistical mechanics course
as part of a physics undergrad curriculum.

Relevant to your main point - I have worked with many clinicians. Most of them
have had a fairly tenuous grasp of the statistics they were working with,
which matches your experience in training I think. A couple had much better
understanding, but had driven that education themselves.

~~~
niels_olson
Yes, we took a statistical mechanics course senior year, deriving Gibbs free
energy from buckets of 1s and 0s, etc. Brutal class.

~~~
ska
Ah, good. Had me worried there for a moment !

It is true that depending on how statistical mechanics is taught, it can be
difficult to connect to other statistical reasoning (the distributions are
"nice", you have a gazillion measurements, and your population sampling
doesn't tend to have bias issues, etc.)

------
orting
As with a lot of "simple" math, the trick is to actually write it down and
calculate it, because our intuition (at least for some of us) is often not the
best when it comes to this kind of calculation.

In this case we write down the contingency table. Assuming that the test
perfectly detects what we are looking for we find

True positives: 1

False positive: 5% of 1000 = 50

True negative: 949

False negative: 0

Chance of disease given positive results = 1/51 = 1.96%

~~~
SeanDav
It really depends on how you define "False Positive Rate". There are at least
2 different ways of looking at this:

\- Out of 1000 tests 51 will be positive, 50 of which are incorrect.

\- Out of 1000 _positive_ tests, 50 will be wrong, 950 will be correct.

The 2 interpretations give vastly different results.

~~~
felisml
"False positive" has a precise meaning in statistical analysis. The second
thing you're talking about is useful to know, but calling it the false
positive rate is just wrong.

"Out of 1000 positive tests, 50 were false positives, 950 were true positives"
\- valid statement.

"The false positive rate was 50 out of 1000" \- abuses a common technical term
in a way that sounds valid on the face of it, but which is potentially VERY
misleading.

We can't even calculate a valid false positive rate from the above data, since
that requires taking the ratio vs. all tests and not just positive tests.

~~~
SeanDav
Here is a definition of False Positive from wikipedia: _" In medical testing,
and more generally in binary classification, a false positive is an error in
data reporting in which a test result improperly indicates presence of a
condition, such as a disease (the result is positive), when in reality it is
not"_

I don't see anything here that definitively clarifies which of the 2 scenarios
above it can exclusively be applied to.

~~~
felisml
There are many ways in which you can incorrectly interpret statements on
Wikipedia, which is why specialized textbooks and so forth still have use.

------
jacobolus
Exhibit B: “A Mathematical Model for the Determination of Total Area Under
Glucose Tolerance and Other Metabolic Curves.”
[http://care.diabetesjournals.org/content/17/2/152.abstract](http://care.diabetesjournals.org/content/17/2/152.abstract)

Cited by 272(!!):
[https://scholar.google.com/scholar?cites=1812909520721081729...](https://scholar.google.com/scholar?cites=18129095207210817294)

~~~
matheist
Context: In this 1994 paper, the researcher "invented" the trapezoidal rule
for computing integrals[+]. This is a very common method for computing
approximate integrals, and can probably be found in your local high school
calculus class.

[+]
[https://en.wikipedia.org/wiki/Trapezoidal_rule](https://en.wikipedia.org/wiki/Trapezoidal_rule)

~~~
agumonkey
Oh that is alarmingly sad.

~~~
amelius
At least they got the correct result.

------
pak
For what it's worth, these exact same questions do show up on board exams, so
it's not like medical students don't ever learn this information.

What is a bit sad is that data analysis and quantitative skills continue to be
de-emphasized in the pre-clinical curriculum for the sake of rote memorization
of huge lists and tables, which would seem to be less and less important as
instant access to references on smartphones becomes commonplace.

~~~
usrusr
I agree, people rarely enter medicine because they are particularly good at
maths. All that memorization just happens to draw a different kind of
personality. But it's hard to blame medicine for that: in the practical act of
applying medical knowledge to heal people, the memorization is much more
important than mathematical understanding.

Where it breaks down is medical science, which relies so much more on subtle
mathematical interpretation than most other empirical sciences. But that is
just what happens when there is so little opportunity for deliberate
experimentation and such a large dataset of "naturally occurring" experiments
to observe. (and it surely is helped by this millenia old tradition of
nonempirical medicine, i guess all that bloodletting based on beautiful but
hilariously wrong theories is to this day giving the theory-building part of
the scientific method a bad reputation in medicine, making it work completely
in "stupid mode empirism" where every part of thinking that is not strictly
looking at numbers is heavily frowned upon)

Maybe there is a need for a stronger differentiation between academic paths
leading to to applied medicine and those leading medical science. Biotech
comes close, but it certainly isn't about teasing more reliable insights from
clinical datasets. Other sciences take the easy road and simply train
everybody for science, leaving practical application to learning on the job,
but we sure would not want to get treated by that kind of MD.

------
kyouens
The issue of positive predictive value versus specificity comes up almost
every day in my work as a pathologist. There is widespread misunderstanding,
and in my own anecdotal experience, it very frequently results in unnecessary
lab testing and misinterpretation of test results by clinicians.

When I was a pre-med student, for some reason the prerequisites required a
year of calculus. That succeeded in weeding out people who can't make an A in
freshman calculus, but I'm not sure what else it accomplished. Calculus has
little to do with the daily practice of medicine, unless you're a radiation
oncologist or doing some hardcore research.

A year of statistics would have served me and my patients much better. That
goes double, given the current firehose of data that is part and parcel of the
personalized medicine revolution.

~~~
niels_olson
heh, pathologist here as well. See my other comment in this thread (1). I did
physics, so I don't know what the general pre-med curriculum is like, but I'm
not so sure a stats class would necessarily help more than two semesters of
calculus when you're up against big data. Understanding integration and
continuity are more valuable in the big picture.

My experience has been that big data is more about linear algebra, which is
usually several classes beyond entry level calc or stats. You have to be able
to reason about arbitrarily large collections of partial differential
equations (albeit reduced to difference equations).

For example, if you want to talk about genomics, Durbin's _Biologic Sequence
Analysis_ is probably the most foundational text available. It introduces
Bayes's theorem on page 8, has stuff that looks suspiciously like calculus on
page 40, and is into Markov chains by page 48. They hold off on a formal
treatment of entropy until about half-way through the book.

And for phylogenetics, the equivalent books is Felsenstein's _Inferring
Phylogenies_. He introduces linear algebra _before_ integration.

My favorite quote from Felsenstein, particularly germaine for pathologists
(surely the taxonomists of medicine):

"Knowing exactly how many tree topologies ... is not particularly important.
The point is that there are very large numbers of them. ... one use for the
numbers was "to frighten taxonomists."

1\.
[https://news.ycombinator.com/item?id=11326178](https://news.ycombinator.com/item?id=11326178)

------
RA_Fisher
Yet the authors don't offer the data and statistical code.

~~~
paulio
This is a common and large issue in my opinion. Papers built and published
without source.

------
geomark
Reminds me of Gerd Gigerenzer's quizzes of over 1,000 gynaecologists on
interpreting mammogram results - only 21% got it right. [1]

[1]
[http://www.bbc.com/news/magazine-28166019](http://www.bbc.com/news/magazine-28166019)

------
turaw
Did anyone else learn how to compute probabilities by drilling through the
algebra? Like, if asked how to convert a plain English probabilistic query
(e.g. "what is the chance of picking two red candies from a mixed bowl") into
a formula, I would focus on the 'and', mentally recite something to the effect
of 'conjunctions multiply probabilities', then write p(red)*p(red). There's no
intuition or understanding, just rote memorization.

I've been trying to recognize and account for this deficiency by drawing
mental decision trees enumerating probabilities instead ... anyone else doing
anything similar?

~~~
grayclhn
Not to be an asshole but, unless you replace the first candy after taking it
out, your formula's wrong.

If you start w/ 50 red candies and 50 non-red candies, the chance of the first
one red is 1/2 (from 50/100) but the chance the second one is red after first
drawing a red candy is now 49/99\. After removing one red candy, there are now
49 red candies and 99 total candies.

So, P(red) * P(red | removed 1 red) = 49/198

~~~
turaw
XD okay, fair enough. I was thinking of the ideal candy bowl, which is self-
refilling. Do you grade undergraduate stats quizzes, by the way?

------
youngjh
I just wanted to point out that I think it is crucial that the authors
included the statement "Assuming a perfectly sensitive test,..." Otherwise, if
using only the information presented in the question, there would be no way to
calculate the true positive rate, or P(test+ | disease), in Bayes' Theorem.

~~~
xaa
True, but any false negative rate below 100% would cause the answer to be
_lower_ than 2%. The problem they were focusing on was the subset of people
who overestimated.

