
AI Detects Heart Failure from One Heartbeat: Study - rusht
https://www.forbes.com/sites/nicholasfearn/2019/09/12/artificial-intelligence-detects-heart-failure-from-one-heartbeat-with-100-accuracy/
======
corodra
Wasn't a similar claim made about an AI detecting skin cancer from moles? Once
the AI was deployed in the real world is failed miserably. I think it was a
ton of false-positives because it was trained on images where cancerous moles
all had images of rulers with them and the benign ones didn't have rulers in
the image. So it just picked up on the ruler as a cancer indicator.

~~~
Sanzig
Would you happen to have a source for that story? My workplace has really
swallowed the AI Kool-Aid lately, so I would like to have some cautionary
counterexamples to demonstrate potential pitfalls of the technology.

It's got a lot of interesting applications for our field which I am excited
about, but there seems to be a tendency among non-experts to consider it a
magic bullet that can solve any sort of problem. In particular, I am concerned
about applications where conventional approaches have already converged on an
optimal solution that's used operationally, but somebody wants to throw AI at
it because they thought it might be cool without first understanding the
implications.

~~~
3b18
Check this link out
[http://132.206.230.229/e706/gaming.examples.in.AI.html](http://132.206.230.229/e706/gaming.examples.in.AI.html)

There was a hn discussion about it a while back:
[https://news.ycombinator.com/item?id=18415031](https://news.ycombinator.com/item?id=18415031)

~~~
carbocation
What a great resource! The first example is perfect:

> Aircraft landing

> Evolved algorithm for landing aircraft _exploited overflow errors in the
> physics simulator_ by creating large forces that were estimated to be zero,
> resulting in a perfect score

------
Cass
As a doctor as opposed to an AI researcher, so many of the choices this study
makes are baffling to me.

First of all, why just one heartbeat? You never capture just one heartbeat on
an ECG anyway, and "Is the next heartbeat identical to the first one?" is such
an important source of information, it seems completely irrational to exclude
it. At least pick TWO heartbeats. If you're gonna pick one random heartbeat,
how do you know you didn't pick an extra systole on accident? (Extra systoles
look different, and often less healthy, than "normal" heart beats, as they
originate from different regions of the heart.)

Secondly, why heart failure and not a heart attack? One definition of heart
failure is "the heart is unable to pump sufficiently to maintain blood flow to
meet the body's needs," which can be caused by all sorts of factors, many of
them external to the actual function of the heart - do we even know for sure
that there are ANY ECG changes definitely tied to heart failure? Why not
instead try to detect heart attacks, which cause well-defined and well-
researched known ECG changes?

(I realize AIs that claim to be able to detect heart attacks already exist.
None of the ones I've personally worked with have ever been usable. The false
positive rate is ridiculously high. I suppose maybe some research hospital
somewhere has a working one?)

~~~
Cass
To add to this, looking at figure 4, why is their "average" heartbeat so
messed up? That's not what a normal average heartbeat looks like. P is too
flat, Q is too big, R is blunted, and there's an extra wave between S and T
that's not supposed to be there at all. If their "healthy patient" ECGs were
bad enough to produce this mess on average, it's no surprise their AI had no
trouble telling the data sets apart.

(For comparison, the "CHF beat" looks a lot more like a healthy heartbeat.)

------
et2o
I'm going to sound like a skeptical jerk here, but 490,000 heartbeats is how
many patients? From what I recall these public ECG datasets are like 20
patients who underwent longitudinal ECGs. 500k heart beats is like 5 person-
days of ECG recordings.

Ninja Edit: N=~30 patients. For something like ECGs which are readily
available, they really should have tried to get more patients. A single clinic
anywhere does than 30 EKGs per day. Suggesting this is clinically applicable
is ridiculous. It's way too easy to overfit. Chopping up a time series from
one patient into 1000 pieces doesn't give you 1000x the patients.

I even think this approach probably will work. Very reasonable given recent
work from Geisinger and Mayo. But why are ML people doing press releases about
such underwhelming studies?

~~~
joker3
A lot of machine learning people don't really understand study design or power
or things like that. It's gotten a little better over the past decade or so,
but this is an area where the field has a lot of room to improve.

~~~
Camillo
"A lot" of ML people perhaps, but also the overwhelming majority of clinical
scientists and the near totality of doctors.

~~~
braindeath
There are plenty of academic and academic center trained physicians that
understand study design and are competent in research. They aren’t typically
primary care/general practitioners so you just don’t encounter them as much.
And yes they are the minority. But it’s not the totality.

Clinical research that isn’t making ridiculous claims tends to get much less
press.

Furthermore, of all places to lap that crap up... this hacker news site is
frankly one of the worst.

I mean look at this submission. Yes it’s true people are pillorying it here
(including some doctors), but i don’t recall much interesting well designed
medical research being discussed here (though arguably maybe not the place for
it)

------
brilee
So first clarification is that heart failure != heart attack. Heart failure is
a chronic condition where the heart is unable to pump hard enough to keep
blood flowing through the body. Typically results in blood pooling in the leg,
shortness of breath, etc.

The study avoids the obvious pitfall, which is to put different slices of one
patient's data into both training and test. The press also reports the
training accuracy (100%) when the test accuracy/sensitivity/precision metrics
are all at around 98%.

Another encouraging sign is that when you dig into the 2% error rate, a
majority of those errors turned out to be mislabeled data.

The study also acknowledges the following:

"Our study must also be seen in light of its limitations... First the CHF
subjects used in this study suffer from severe CHF only...could yield less
accurate results for milder CHF."

I think this is a good proof of concept but that the severe CHF and tiny
sample size (33 patients) means that we're a long ways away from clinical
usage.

~~~
carbocation
The study looks at 33 patients total, and the cases and controls come from
entirely different data sets, with data coming from different devices that
recorded signal at different frequencies.

There is nothing to see here.

~~~
brilee
Apologies - I didn't catch that the HF / healthy patients are from two
different datasets. Agreed that this essentially invalidates the result.

The missing experiment is to have a third dataset from yet another machine,
with both positive/negative examples, and use it as the test dataset. Then
transferability questions are at least somewhat addressed.

~~~
carbocation
No apologies needed, we're all in agreement on the broad strokes! Your
proposal is good: I would be quite surprised if it generalized, but that is
definitely the way to find out.

------
qiqitori
Pff, I can detect heart failure with no heartbeats at all.

~~~
manmal
Care to explain?

~~~
jacquesm
It sort of spoils the joke but if there are heartbeats it hasn't failed yet.
So no heartbeats = failure.

~~~
manmal
Oh I really didn't get it. Thanks for explaining. I still don't find it funny.

------
binalpatel
Discussion of this on /r/machinelearning:

[https://www.reddit.com/r/MachineLearning/comments/dj5psh/n_n...](https://www.reddit.com/r/MachineLearning/comments/dj5psh/n_new_ai_neural_network_approach_detects_heart/)

~~~
phonebucket
The Reddit thread is worth a read. There's a healthy dose of scepticism about
the paper there.

~~~
forgot-my-pw
Claiming 100% accuracy with a single heartbeat is just hard to believe

~~~
RosanaAnaDana
And its also just clearly wrong on its face.

------
fencepost
This is interesting, but more because it indicates that there's adequate data
in a single heartbeat to do such diagnosis. In practical terms it's probably
not nearly so relevant because it sounds like they were working with the raw
data not tracing. By the time you have a patient hooked up to the proper
equipment to do this diagnosis you're going to be getting adequate data
anyway.

The main impact might be that if this holds up people could be tested with a
short hook up in an office instead of with a 24-hour monitoring where they
have to bring back a Holter device the next day. Of course, that 24 hour
dataset may have independent value of its own for further diagnostics beyond
just whether the patient has CHF.

~~~
VHRanger
The study is not worth paying attention to.

The datasets for positive cases and negative cases come from different
databases. n=30 patients, on top of it.

All this does is recognize the patient/ECG technician who recorded the data.
It's basically certain it doesnt generalize

------
ryanschneider
IMO, the important part is Section 3.3 of the
[paper]([https://www.sciencedirect.com/science/article/pii/S174680941...](https://www.sciencedirect.com/science/article/pii/S1746809419301776)),
particularly the image at [[https://ars.els-
cdn.com/content/image/1-s2.0-S17468094193017...](https://ars.els-
cdn.com/content/image/1-s2.0-S1746809419301776-gr4_lrg.jpg\]\(https://ars.els-
cdn.com/content/image/1-s2.0-S1746809419301776-gr4_lrg.jpg\)). To my eye the
difference in shape of the orange and green signals could also be found
through more traditional signal processing/statistical means that machine
learning.

In a past job I did a combination of manual and machine-learning-based
analysis of cardiac signals. We didn't have ECG, but did have PPG (blood flow)
and PCG (sound) signals, and a pretty large study group. I recall there being
one study participant who's signals were very clearly indicative of heart
failure, enough that we raised the issue with our medical advisor about
whether the subject should be deanonymized and contacted. In the paper they
state that "the CHF subjects used in this study suffer from severe CHF only";
my suspicion is that a simpler, "hand rolled" model based on the features of
the ECG could compete very well with this CNN approach for finding the same
level of pathology in the ECG signal, without the "black box" of a CNN casting
doubt on the technique.

------
nradov
Congestive heart failure can also be detected fairly reliably based on a
sudden increase in weight. It causes fluid retention. There are several
programs underway to give Internet connected scales to high risk patients and
those report weight every day.

~~~
pkaye
It could also be kidney failure. My weight ballooned by 30 lbs when I was
progressing towards it. Even the doctor was saying the usual "you need to
exercise and eat healthy" until he got my blood test results.

------
nycbenf
Heart failure patient here. This is kinda cool but tempered a bit by the fact
that I've seen multiple cardiologists make a diagnosis by just glancing at a
12 lead ECG sheet. There are some pretty recognizable hallmarks.

------
mikece
Perhaps the title/premise might best be summarized as _based on everything we
know_ we can detect heart failure form monitoring one's body for one
heartbeat.

Along with doing a lot of good and making a lot of early catches, I suspect
that relying on AI to do medical analysis is going to bring into sharp relief
just how much medical science DOESN'T know about the human body and its
mysteries. I think we're a long, long way away from handling medical science
over to AI and the real fun of AI-guided exploration is about to begin.

------
kbody
Clickbait title aside. I find that ethical issues around AI raised by Musk
etc. shouldn't be around AI taking over the planet, but rather have ethics
around overfitted models or otherwise unrealistic models being pushed for PR
or whatever and responsibly playing with people's health and hopes.

------
querious
One of our simplest “screening” questions for DS roles at my company is: “your
model is 100% accurate. How do you feel?” If the answer is anything other than
deep skepticism (Data leakage, trivial dataset etc), it’s a big red flag

------
wil421
How long until society becomes Gattaca? Sorry citizen our “AI” has detected
genetic anomalies you will be a disposable factory worker. The rich would
surely pay for their children to be genetically altered.

~~~
fermenflo
What exactly guarantees that reality? Why can't these tools be used for good
moving forward? e.g. detecting heart problems. Seems a little arbitrary to
spin technological advancements as progress towards some inevitable dystopian
AI-driven future.

~~~
mdorazio
I'd say it's fairly certain given a capitalist society and human nature.
Remember pre-existing conditions and insurance denials pre-Obamacare? Yeah,
insurance companies would _love_ to get their hands on your genetic data and
tailor rates to your likelihood of future healthcare cost. At the same time,
rich parents pretty much always pursue any advantage they can for their
children - that's why private schools and homes in good school districts are
so expensive. Add in the fact that we _already_ have massive inequality and
dropping social mobility in the US, and a Gattaca-like future starts looking
pretty damn likely.

------
logicbombr
last year we've started to cloud recording obstetric ultrasound videos. we add
more than 17,000 ultrasound exams to our platform each month. It's probably
the largest dataset in the world of obstetric ultrasounds videos (~ 300,000
exams). reading news like this makes me think about how we can explore our
dataset using ML/AI and help produce better diagnosis. I have no idea how
(We're not an AI company).

If someone here wants to start a project with AI on top of ultrasounds, I'm
all in.

let me know at hn at angra.ltd and I can give more details

~~~
smt88
I'm not sure that data set will mean anything without human-drawn conclusions
about the patient (diagnoses, abormalities, etc.)

~~~
logicbombr
Obstetric ultrasound is very standarized and easy to evaluate with Hadlock
([https://www.sciencedirect.com/science/article/abs/pii/000293...](https://www.sciencedirect.com/science/article/abs/pii/0002937885902984))

We, however, have access to the report too.

------
godelzilla
I guess adding "Congestive" to the title would've ruined the click bait.

Also how can the detection of a progressive disease be 100% accurate? I guess
details ruin the click bait too.

------
ryanmcbride
Don't see any mention of how many false-positives they had in the article
so... Yeah we'll see how effective this actually is.

------
conjectures
Guy detects bullshit from one headline.

------
m3kw9
Or zero heart beat

------
counterpig
if it doesn't beat the heart has failed.

------
cat199
Dear Elizabeth Holmes,

I found a great startup opportunity for you.

\- Recruiter

