
Scalable and accurate deep learning for electronic health records - pagade
https://arxiv.org/abs/1801.07860
======
dqpb
As one of the apparently few remaining people on HN who like Google and are
excited about the technology they're producing, I applaud this effort. I've
had multiple doctors in my life who were worse for my health than seeing no
doctor at all. Two of them outright looked up my symptoms (in a book in one
case, and some shitty intranet in the other) to tell me what I already knew
from googling it myself. We can do better than that. This is the way forward.

~~~
throwaway2048
until you realize its going to be people like insurance companies using this
data to deny coverage and other stuff aimed soley at profiting on the people
its harvested from.

How much benefit does the average person get from being profiled like crazy
across the web? Besides the (dubious) benefit of free services, there isnt
much positive you can point to from such activity, and a shitload of really
scummy exploitative crap.

I dont see how such medical analysis is really going to pan out much
differently at scale.

~~~
Numberwang
Here in socialized Europe it will be very beneficial. Our health system
focuses on solving problems, not selling us stuff. So this will be another
great tool assisting with making us aware of what options we have in improving
health outcomes.

~~~
throwaway2048
Plenty of private for profit providers in europe too.

~~~
PeterisP
Yes, but since the mainstay of the _financing_ isn't really based on private
insurance coverage, the main objection doesn't apply. Really, denying
insurance is pretty much the only legal-but-bad use case for such information.
All other legitimate uses are good for the patient, balanced by an increased
risk of crime and loss of confidentiality that may be embarassing but not
profitable for anyone (well, except if there's some blackmail case).

~~~
throwaway2048
"all other legitimate uses are good" is a tautology

~~~
PeterisP
No, saying that they're good _for the patient_ is not a tautology as there
certainly could exist legitimate uses that are good for someone else at the
patients expense.

------
Thriptic
I have very little hope for this succeeding, although I hope it does.

I say this because this problem is vastly more complicated than people believe
for the following reasons:

* A substantial amount of data about patients is encapsulated in clinical notes which are free text. There is no standard format for notes; notes wildly vary in terms of quality, detail, and accuracy; and notes are full of non-standard abbreviations and clinical shorthand.

* A large amount of the data in an EHR is incomplete or incorrect. Information may be out of date or recorded incorrectly. Patient medication lists are a great example of this, and they frequently cannot be trusted by physicians even during the course of a single hospital stay (medications are started or discontinued without being properly recorded). Then there is the issue of non-adherence: patients frequently claim to be adherent to treatment regimens that they are not actually adherent to which is going to distort outcome data. In fact, the majority of people are non-adherent for many different treatment regimens. More generally, diagnoses may have been incorrectly determined and data may be incorrectly labeled or missing.

* Data exists all over the place even within an EHR, let alone across facilities. I myself have been to at least 5-10 hospitals and more outpatient clinics over my life, none of whom share data with each other. No one entity has all the relevant data about a patient, including insurers, specific care teams, or the hospital systems they got this data from.

* Data placed into a EHR may be intentionally falsified for simplicity or to get insurance coverage. It is not uncommon for physicians to intentionally misdiagnose a patient so their insurance will cover a diagnostic / treatment / procedure, or to input a less specific diagnostic code to save time when a more specific one is available.

Trying to train with data sets this fucked up is an enormous challenge.

~~~
codefined
I feel like Google have the power behind them to try to at the very least data
mine the free text data for information. Possibly they even have enough sway
to convince doctors and other individuals towards a form that is far less
random, or at least to provide a platform to share information between
patients. Note that NLP seems to have come a long way in five years, who knows
where we'll be in a decade or more.

In terms of incorrect records, I'm not sure this would be a major problem over
a large dataset. Making this up, but I imagine the range of errors is fairly
random across all fields and with enough data you can always identify an
approach that appears to save a life more than some other approach, even if
the diagnosis is incorrect.

Although I totally agree with you that this is going to be an enormous
challenge for Google, I feel like it would be underestimating them to assume
this has very little hope of succeeding.

~~~
Thriptic
I agree that if Google throws enough money and talent at the problem then they
can certainly make improvements, but I seriously doubt that they are going to
create some sort of revolutionary clinical decision support system which is
going to massively improve outcomes.

------
dang
This is a case where discussion will likely be better if HN points to the
paper rather than an information-poor popular treatment.

We changed the URL from [https://qz.com/1189730/google-is-using-46-billion-
data-point...](https://qz.com/1189730/google-is-using-46-billion-data-points-
to-predict-the-medical-outcomes-of-hospital-patients/).

------
jijji
If this is for personalized recommendations about the Diagnosis or Plan for a
particular patient based on past medical record history on that patient, this
might be something that a hospital would want integrated into their current
environment.

If this, however, is something that takes multiple patients records into
consideration, this would most likely never see the light of day. Most
hospitals don't want anyone knowing that, for instance, their city has the
highest number of leukemia related deaths per capita than any other city in
the US.

~~~
ci5er
Doesn't the CDC know that anyway?

------
tremendulo
Electronic bureaucracy has been developed to replace 1 or 2 brains plus paper
records. However, put simply, it doesn't work.

(e.g. I had to give my address _three times_ to the same outfit last month
just to get a flu jab.)

But we can't go back to paper records so we'll try building a giant brain (AI)
to try to decipher and manage millions of electronic records. Trouble is that
that brain still won't have _written_ the records or _talked_ with the people
they're about. So I doubt it will work.

------
siculars
The veracity of any analyses is directly correlated to the quality of the
input. I have no doubt that Google's methods are state of the art. My concern
is with the input those methods were fed. Sample size needs to be much, much
larger on a patient population basis.

------
tzm
> Google obtained de-identified data of 216,221 adults...

How much did this data cost?

~~~
kprybol
Having been privy to some of the contractual details of deals that Google has
made with other medical centers, I’m betting that they probably got it for
”free”, as in they didn’t directly pay a set fee to the universities. Google
most likely provided funding in the form of donations (tax write off), free
cloud compute resources and/or cloud storage (write off), and the opportunity
for university researchers to co-author high impact publications (everybody
wins).

Also, 200k patients is actually kind of small. Granted this dataset is far
more granular/robust than what you’d typically find in commercially available
healthcare datasets, but to give you some frame of reference, the healthcare
datasets I work with contain > 20 million individuals (again, with orders of
magnitude fewer features).

------
fwdpropaganda
How did they get this data?

~~~
oh_sigh
From the third sentence in the article:

> To conduct the study, Google obtained de-identified data of 216,221 adults,
> with more than 46 billion data points between them. The data span 11
> combined years at two hospitals, University of California San Francisco
> Medical Center (from 2012-2016) and University of Chicago Medicine
> (2009-2016).

I know I'm not supposed to say 'Did you read the article?', so I won't say
that.

~~~
mtgx
That de-identification is probably the absolute minimum they have to do to
achieve compliance, but don't mistake it and think that the data is actually
anonymous. We've seen many studies show that 90%+ of the people can be re-
identified from such datasets.

Unless Google starts using fully homomorphic encryption or something similar,
then you should consider that data non-anonymous. And until that happens,
Google and national healthcare agencies should also require consent from the
patients before giving the data to Google or any other company for such
studies. Otherwise, I hope class action lawsuits will be started against both.

~~~
lalaland1125
Homomorphic encryption currently doesn't exist in any useful sense. So you are
asking for the impossible. If we followed your advice, medical record research
would be impossible. All research comes with risks. Society is all about
balancing those risks with there potential benefits.

Come back with your criticisms when you have actual evidence of Google
mishandling data or misleading the IRB boards.

~~~
jnbiche
> All research comes with risks.

And the core foundation of medical ethics is that the patient gets to decide
if/how much of that risk to take.

