
Apple Watch can detect arrhythmia with 97% accuracy, study says - brandonb
https://techcrunch.com/2017/05/11/apples-watch-can-detect-an-abnormal-heart-rhythm-with-97-accuracy-ucsf-study-says/
======
robbiep
With due respect to the Cardiogram developers (hi guys) as a doctor I really
can't see a huge amount of value in this - my patients who present to
emergency departments with paroxysmal AF are all anticoagulanted or on rate
control and there has only been one instance in the last 3 years and many
thousand patients when someone has presented to emergency and we have had to
run through the full spectrum of echo -> anticoagulante -> cardiovert.

To the founders: what do you see as being he end game here? Are you just
looking for validation (of the concept in itself, not the app- I use it on my
watch)? Is the US market so different that this is particularly useful and
cost effective for detection? Do you see this as eventually displaying early
warning in the instance of an early warning?

Thanks and I don't mean to denegrate your efforts, but I do see lots of
Consumer Med tech as solving a problem that really isn't creating value (i.e.
Proliferation of devices,wearables and algorithms that proclaim the ability to
help with X but are really marginally helpful at best) and I'm wondering if
I'm missing something about the actual medical benefit, or whether what I feel
like is true- that they aren't after being a medical device at all but instead
are chasing the consumer dollars by making medical claims

~~~
brandonb
Circulation published a nice review this week on the rationale for atrial
fibrillation screening:
circ.ahajournals.org/content/135/19/1851.full?ijkey=StzSPk8eljGaP2G&keytype=ref

The main reason is that 10% of strokes are associated with undiagnosed atrial
fibrillation. The patients who present to the emergency room are a pretty
biased sample--for one, they're experiencing symptoms. To prevent strokes, we
need to have a way to catch AF in asymptomatic people.

Part of the challenge here is that episodes of atrial fibrillation can be
infrequent--in CRYSTAL-AF, for example, it took 84 days from randomization to
first episode--and existing monitoring devices like Holters or Zio patches are
only worn 24 hours to 2 weeks. The great thing about Apple Watch and other
consumer wearables is that they're worn for months or years. That means if we
can prove the algorithm is accurate, we can get higher time-coverage than a
traditional medical device, catch more AF early, and prevent those 10% of
strokes described in the Circulation article.

~~~
pbhjpbhj
Hmm, this makes me think personal insurance will at some point in the not too
distant future give hefty premium hikes to those who refuse to wear a 24-7
monitoring device; as vehicle insurance in the UK is doing with monitoring
devices (cameras/accelerometer-type devices).

~~~
mattybrennan
Pretty sure Obamacare wouldn't allow that, but they could start subsidizing
devices.

------
brandonb
(Cardiogram Co-Founder here)

Let me know if any of you have questions on the study, app, or deep learning
algorithm. My colleague Avesh wrote a post with a little more technical detail
here: [https://blog.cardiogr.am/applying-artificial-intelligence-
in...](https://blog.cardiogr.am/applying-artificial-intelligence-in-medicine-
our-early-results-78bfe7605d32)

~~~
cheath
This is really exciting! Quick question for you.

Have you experimented at all with using the Apple Watch to measure blood
pressure?

I have done some reading that suggests the optical sensor could measure blood
pressure with some accuracy, but that Apple is hesitant to release it as a
feature due to regulatory and accuracy concerns. It's my #1 wished for
feature.

~~~
JshWright
Measuring blood pressure (non-invasively) requires applying pressure to the
artery and determining at what point that pressure obstructs blood flow. That
would be a) challenging to do with a watch, and b) really annoying to the
user.

~~~
captainmuon
Not entirely true, there is a clever method that works without (pulse transit
time measurement). You measure the electrical signal of the heartbeat, and the
acoustic signal, and take advantage of the fact that the acoustic signal
propagates slower in blood than the electrical signal propagates through
nerves. This difference is directly related to blood pressure! I don't know if
you can get away with one measurement on the wrist + some clever calibration,
or if you need two points (e.g. one close to the heart). If you take two
measurements, you can apparently also calculate the aortal pressure, which is
different from the pressure in your arm, and hard to obtain non-invasively.
(I'm not an expert so I'm sorry if I got something slightly wrong :-) .)

There was a startup that worked on using this, but they failed (due to
financial but also regulatory reasons), and then remotely bricked all their
sold devices...

I really wish someone would develop this further. Even if it is not as
accurate as a normal measurement, the implications would be huge. There are so
many people running around with hypertension who have no idea. I also don't
see a risk in false positives in this case, since in principle everybody is
recommended to have their blood pressure checked - false positives who go to
the doctor are then just like people who read an article and go to the doctor,
and are weeded out there. False negatives might be a problem - but if you
don't advertise it as a blood pressure measurement tool, but just implement it
as an additional warning in a smart watch, you'd reduce the false sense of
security people would get if it didn't work properly.

~~~
_up
Newer Chinese "Fitbits" promise that. But they only use an optical system and
it doesn't work for me even after entering a baseline.

------
nonbel
Accuracy is a useless metric for something like this. If you have binary data
filled with 97% zeros (ie most of the time it is not arrhythmia) you can use
the sophisticated machine learning technique of:

    
    
      if(TRUE) return(0)
    

This will give you 97% accuracy.

EDIT:

I just read the headline earlier. Now after checking:

> _" The study involved 6,158 participants recruited through the Cardiogram
> app on Apple Watch. Most of the participants in the UCSF Health eHeart study
> had normal EKG readings. However, 200 of them had been diagnosed with
> paroxysmal atrial fibrillation (an abnormal heartbeat). Engineers then
> trained a deep neural network to identify these abnormal heart rhythms from
> Apple Watch heart rate data."_

So 1 - 200/6158 = 0.9675219. My method performs just as well as theirs if we
round to the nearest percent. This is ridiculous.

~~~
downandout
From the commments on the article itself:

 _Cardiogram engineer here. 97% accuracy refers to a c-statistic (area under
the ROC curve) of 0.9740. An example operating point would be 98% sensitivity
with 90% specificity.

These important details are often lost in the news. You can some more details
on our findings in our blog post: _

[https://blog.cardiogr.am/applying-artificial-intelligence-
in...](https://blog.cardiogr.am/applying-artificial-intelligence-in-medicine-
our-early-results-78bfe7605d32)

~~~
mulletboy
Just wondering, with such an unbalanced dataset (5,958 negatives, 200
positives), wouldn't have been fairer to use average precision (area under the
precision-recall curve) instead of ROC-AUC?

------
pak
We need to see the full, published study and its methods (particularly around
recruitment and exclusion criteria) before we can judge it properly. Until
then, the presented statistics about accuracy, sensitivity, and specificity
potentially bear no relation to real world usage, if the cohort and data
quality were tightly controlled, as you'd expect for an initial study
involving the makers of the algorithm. A few other thoughts:

1\. Even at 98% sensitivity and 90% specificity [0], which I don't think would
hold up with real world usage in casual, healthy users, if AFib has a
prevalence of roughly 2-3% [1] then by a quick back of the envelope
calculation a positive test result is still 5× more likely to be a false
positive than a true positive. With those odds, I don't think many
cardiologists are going to answer the phone. You'd still need an EKG to
diagnose AFib.

2\. There is huge variance among people's real world use of wearable sensors,
and also among the quality of the sensors. (Imagine people that wear the watch
looser, sweat more, have different skin, move it around a lot, etc.) You'd
likely need to do an open, third-party validation study of the accuracy of the
sensors in the Apple Watch before you can expect doctors to use the data. My
understanding is that the Apple Watch sensors are actually pretty good
compared to other wearable sensors, but I don't know of any rigorous study of
that compares them to an EKG.

3\. Obviously, this is only for AFib. AFib is a sweet corner case in terms of
extrapolating from heart rate to arrhythmia, because it's a rapid & irregular
rhythm that probably contains some subpatterns in beats that are hard for
humans to appreciate. As others—including Cardiogram themselves [2]—have
pointed out previously, many serious arrhythmias are not possible to detect
with only an optical heart rate sensor.

[0]: [https://blog.cardiogr.am/applying-artificial-intelligence-
in...](https://blog.cardiogr.am/applying-artificial-intelligence-in-medicine-
our-early-results-78bfe7605d32)

[1]:
[https://www.ncbi.nlm.nih.gov/pubmed/24966695](https://www.ncbi.nlm.nih.gov/pubmed/24966695)

[2]: [https://blog.cardiogr.am/what-do-normal-and-abnormal-
heart-r...](https://blog.cardiogr.am/what-do-normal-and-abnormal-heart-
rhythms-look-like-on-apple-watch-7b33b4a8ecfa#fabf)

~~~
brandonb
Full journal publication is coming--as you likely know, the system doesn't
always move as fast as we'd like.

> quick back of the envelope calculation a positive test result is still 5×
> more likely to be a false positive than a true positive.

For what it's worth, about 10% of people who come in to the cardiology clinic
experiencing symptoms are diagnosed with an abnormal heart rhythm. So even a
20% positive predictive value would be an improvement over the status quo.

As mentioned below, you can use other risk factors (like CHA2DS2-Vasc, or even
simply age) to raise the pre-test probability, and thereby control the false
positive rate.

As a meta-point, I do think we let the perfect be the enemy of the good in
medicine, and that potentially scares people away who could otherwise make
positive contributions. For example, many of the most common screening methods
in use today are simple, linear models with c-statistics below 0.8. You can
build a far-from-perfect system, and still improve dramatically over how
people receive healthcare today.

My overall message to machine learning practitioners sitting on the sidelines
would be: please join our field. The status quo in medicine is much more
primitive than we have been led to believe, and your skills can very literally
save lives.

~~~
pak
Thanks for replying! I'll certainly be looking forward to the publication.

>about 10% of people who come in to the cardiology clinic experiencing
symptoms are diagnosed with an abnormal heart rhythm

OK, but I'd be more careful about staying apples to apples in your
comparisons; your app is about asymptomatic AFib. So how many of those people
going to the cardiology clinic had undiagnosed AFib; for how many of those
would a new diagnosis of AFib have changed the plan of care; etc. Kind of like
robbiep was saying, I would be interested in actual added value from the
larger perspective.

Totally appreciate your point about perfect being the enemy of the good. The
danger is that these semi-medical wearables currently straddle a strange zone
between medical and consumer use. The inevitable marketing strategy is to co-
opt the positive reputation of medical products while acknowledging none of
the pitfalls of consumer products. Most of the screening methods you bring up
are used by a doctor on symptomatic patients with a suggestive history, and
only as a partial component of clinical judgement. The way Cardiogram seems to
make the most money, on the other hand, is to sell the product to
asymptomatic, casual users. (Furthermore, CHA2DS2-Vasc costs 30 seconds of
talking or reading a medical record, not $700 in Apple products.) So you're
inevitably running up against some doubts among physicians [0].

And finally, I agree that more machine learning practitioners should join
medical research. I hope the field works to set more reasonable expectations,
however, as in: ML will solve very specific subtasks in clinical reasoning (as
in the diabetic retinopathy study [1]). Instead, the headlines usually ratchet
that up to "AI will replace radiology/cardiology/$specialty in X years." That
tends to hurt the people currently in the trenches, since their contribution
in bringing about practical, incremental change is diminished. The top answer
of this Quora thread [2] has a good discussion of the many dimensions of the
problem.

[0]:
[https://twitter.com/Abraham_Jacob/status/860119573915287552](https://twitter.com/Abraham_Jacob/status/860119573915287552)

[1]:
[http://jamanetwork.com/journals/jama/fullarticle/2588763](http://jamanetwork.com/journals/jama/fullarticle/2588763)

[2]: [https://www.quora.com/Why-is-machine-learning-not-more-
widel...](https://www.quora.com/Why-is-machine-learning-not-more-widely-used-
for-medical-diagnosis)

~~~
epmaybe
The diabetic retinopathy study (and the somewhat recent stanford dermatology
study) were the first ML studies I had read about that blew me away in terms
of their sensitivities and specificities, as compared to real doctors. Your
comment on specific subtasks is perfect, and I try and use these examples when
discussing ML with fellow medical students.

However, like you said, the medical field is very slow, and has quite a lot of
inertia to maintain the status quo. Unless insurance companies refuse to
compensate practitioners that don't use these tools, I fear that few, if any,
in the healthcare field will opt to use such techniques.

And finally: How should someone with both a medical and computer science
background get into ML?

~~~
pak
I found the Statistical Learning self paced course on Stanford's site to be a
great formal intro to ML algorithms implemented in R, and it is taught by the
inimitable Hastie and Tibshirani:
[http://statlearning.class.stanford.edu](http://statlearning.class.stanford.edu)

This post on ML in medicine is a pretty good overview of everything that has
been going on recently and the nuances often lost in the current hype:
[https://lukeoakdenrayner.wordpress.com/2016/11/27/do-
compute...](https://lukeoakdenrayner.wordpress.com/2016/11/27/do-computers-
already-outperform-doctors/)

------
peterjlee
For those who's going to mention Bayes' Theorem regarding medical tests,
here's a link to save you a Google search

[https://en.wikipedia.org/wiki/Bayes%27_theorem#Drug_testing](https://en.wikipedia.org/wiki/Bayes%27_theorem#Drug_testing)

------
chris_va
If I am reading this correctly, there were 6,158 patents with ~200 true
positives, so approximately 3% of the population. 98.04% sensitivity (recall)
and 90.2% specificity (%true negatives) leads to...

~4 false positives for each true positive.

That isn't bad, all things considered, but still a long way to go.

------
openasocket
They aren't clear what they mean by "97% accuracy". Does that mean 97% of
people with arrhythmia are correctly diagnosed, or 97% of people are correctly
diagnosed or not-diagnosed? If it's the latter, it's not very helpful at all.
The number of people in the general population with arrhythmia is
significantly less than 3%, so if this Apple Watch test says you have
arrhythmia it is far more likely to be a false positive than a true positive.

~~~
bluGill
No, accuracy doesn't tell us anything about usefulness. If the test is cheap
and it brings in people for a better test and thus saves lives it is useful.
The only real danger is if people decide that since the watch says there is
nothing wrong so they can ignore other symptoms.

It is something like chest pain: most of the time chest pain is not a symptom
of a heart attack, but it is best indicator we have so you go to the emergency
room when you have chest pain. Doctors there can evaluate your situation.

~~~
nraynaud
On a related note, I had a very rare stroke last week, only symptom: headache
for 3 days. The doctor sent me for a CT scan without any idea of what to look
for. The only clue: it was not a migraine because I missed any neurological
anomaly.

On the other hand, stroke prevalence is rare, 1.3% for men, and generally
don't look like mine, and my kind is 2% of all strokes, it would have been
completely weird to suspect it with such a low probability and such a non-
specific evidence.

Long story short, I almost got sent home with an aspirin while I had an
extensive cerebral veinous thrombosis.

~~~
TheOtherHobbes
Having spent a couple of days in a stroke ward over Christmas, the low
prevalence of strokes should _never_ be used as a reason not to test where
possible.

I suspect a lot of people don't understand how serious and crippling -
mentally and physically - a bad stroke can be.

If you're lucky a bad stroke kills you. If you're not lucky you lose a good
part of your brain and motor function.

In practice this means you can be left unable to move some or all of your
limbs, unable to talk, unable to hear, unable to understand what's happening
around you, and perhaps unable to see.

It's _no exaggeration_ to say that it can turn life into a nightmare.

Anything that makes this less common and less likely is a good thing.

------
zobzu
I've arrhythmia and I can see it on my Garmin watch by looking at the data, as
well as .. my phone.

It doesn't help much though, because I don't know if its good or bad (well,
actually I know but not because of the watch data). Doctors are still needed
for this, and generally that includes a bunch of controlled tests and people
listening to your heart while also gathering data (similarly to the watch
albeit with a more precise apparatus)

I guess it can help to tell people they might wanna see a doctor if they
haven't though.

------
jasonmp85
Probably want to update the title, which still refers to the more general
'arrhythmia'. It sounds like Cardiogram's work is mostly focused on afib?

~~~
brandonb
First validation study is on atrial fibrillation, although we've had users who
have discovered other arrhythmias through the app. Here's on example of a
person who discovered supraventricular tachycardia:
[https://blog.cardiogr.am/my-apple-watch-saved-my-
life-a61256...](https://blog.cardiogr.am/my-apple-watch-saved-my-
life-a6125663b2bb)

------
paul7986
I'm seeing more and more people wearing Apple watches and more similar PR
stories like this.

I saw one story that talked about a guy whose car flipped and he was unable to
reach his phone but thanks to his watch he was able to call for help.

------
dennyabraham
Despite all the caveats around this work, early detection is one of the
reasons HR trackers are a great investment. You can't manage what you can't
monitor

------
caycep
the one question I have is - how much better is the DNN vs. just simple rhythm
analysis, i.e. periodicity, etc.

------
revelation
Always great to have percentages reported for a sample size <100.

~~~
matt4077
? I don't get that logic... And neither do half of my friends. The other 50%
actually don't get it, either. But I did only ask <100 people.

------
deepsun
Am I the only one who thinks that 97% accuracy (1/30 chance of an error) is
very bad in medical diagnosis?

At least I'd expect something like 99.9% accuracy (1/1000 chance of an error)
when someone gives me my own heart diagnosis.

~~~
brandonb
You'll find most most medical algorithms pretty disappointing. :)

For example, the algorithm in implantable cardioverter defibrillators
generates unnecessary shocks in 1 in 6 patients. Its accuracy is getting worse
over time: [http://www.reuters.com/article/us-untimely-jolts-
idUSTRE70O7...](http://www.reuters.com/article/us-untimely-jolts-
idUSTRE70O7IP20110125)

------
threeseed
This is really great news but I wish it was actually available. The Apple
Watch right now is largely useless.

It captures all of these health metrics but then does absolutely nothing with
it. It really is desperate for some actual killer health use cases.

~~~
brandonb
Thanks! You can use the Cardiogram app today to understand your heart rate
data:
[https://itunes.apple.com/us/app/cardiogram/id1000017994?ls=1...](https://itunes.apple.com/us/app/cardiogram/id1000017994?ls=1&mt=8)

We'll be incorporating these results into the app itself over time.

But as with anything in medicine... it's ready, aim, aim, aim, aim, aim...
fire! :)

