Hacker News new | comments | show | ask | jobs | submit login
Deep Patient Predicts Patients Future from Health Records (nature.com)
144 points by jacquesm 10 months ago | hide | past | web | favorite | 29 comments

This was a collaboratively written review paper, published on github with submissions encouraged in the form of pull requests, with co-author credit given to anyone whose contributions met the ICJME standards of authorship. Very cool.

I'm a bit disappointed as I only found out about this a month or so ago. I would have loved to contribute to it, if only because of how the paper was written. Just fantastic.

I found this comment at least as interesting as the paper, but I can't find the github repo for it, do you have a link?

Oops, it's a different paper. When I was googling Deep Patient a few months ago I must have got confused because they mention the Deep Patient paper in their issue tracker:


Same here, especially since I am working on the same problem and in fact also plan to try out denoising autoencoders to reduce the sparse EMR data into more learnable features. I am also using NLP on the medical notes to generate additional features. It is good to see promising results from these approaches. I will read this paper closely tomorrow but from what I have skimmed it looks well written.

How do we know if this is really predicting vs. identifying undiagnosed diseases?

For each patient we considered only the prediction of novel diseases, discarding the re-diagnosis of a disease

It seems to me you need a longer time series (patient with 3,4,5 years of medical history) to be able to predict a future disease state. And I don't believe I saw that in the article (I admit I read through it quickly).

If it turns out, this is just identifying undiagnosed diseases, I wonder if this research will have any legal ramifications.

The paper fails to discuss the fact that the data for a given patient is biased by the preliminary (likely unrecorded) diagnoses of the doctor. The doctors are trained to systematically order certain sets of tests depending on their preliminary diagnoses. If some doctors have a pretty good idea of the final diagnosis, the deep learning classifier is simply learning what doctors do when they suspect a particular disease. It is not learning to predict a disease. A truer test of the deep learning method would also require that the deep learning algorithm orders the tests, or that we randomly select tests for a patient -- both of these possibilities are highly unethical.

I think calling it prediction is a bit of a strong claim. It would probably be more accurate to say that deep patient comes to a diagnosis before the EDIT:s/prediction/physician, rather than necessarily predicting the future disease state. To properly claim prediction of the future disease state you would need to show that the patient does not have the disease at the time of the prediction.

From what I have skimmed they do not do this in the paper, although to be fair this would probably not be practical because it would require more tests to have been done in the past which in many cases were not done, e.g. imaging to confirm absence of a cancer at the time of prediction.

That said, I have worked on thia same problem and also lazily call it prediction. Also there is still immense value in earlier diagnosis in cases where early treatment leads to better outcomes, like in sepsis or cancer.

I rather think the incompleteness and inaccuracy of the data makes having more of it less valuable than it might seem at first glance. Medical history information is a mess (and that's being kind) in general.

Why would there be legal ramifications?

I'd say (at least) three ways:

1) Docs using it to double check themselves before diagnosing patients, thereby, hopefully, limiting their liability.

2) Patients getting second opinions from Deep Patient and suing the previous doc for not catching a disease.

3) Patient getting diagnosed at a very late stage of an illness and patient + family thinking earlier docs should've caught disease. Hire lawyer, lawyer runs through Deep Patient simulation, Deep Patient identifies illness, DP identification used to prosecute earlier docs. I guess kinda like 2.

Does your point 2 exist today with human doctors? Can you sue after getting a second opinion?

And couldn't it also make a wrong diagnosis? Who do you sue in that case?

How would you know if it made a wrong prediction? It would take a lot of sampling and time to show it.

This is quite interesting. The obvious business case is for missed-diagnoses reimbursement. U.S. hospital reimbursement is based on a monstrous coding system (e.g. ICD-10). A subset of these codes are Hierarchical Condition Categories (HCCs). Hospitals can bill based on the constellation of HCCs that apply to a patient (e.g. morbid obesity, rheumatoid arthritis, COPD, etc.). Their system could identify all HCCs for each patient and submit missed HCCs for reimbursement.

Edit: This is not a new idea, there are lots of companies already doing this using natural language processing.

I am workng for one, we are called Apixio and we are hiring if anybody is interested.

We have access to one of the cooliest data sets ever - health records of millions of patians, and we too are aiming to predict conditions (among other things) and save lives.

Our corp website is crap, but we are doing interesting things out there.

I can't wait to receive a letter in the mail saying, "If you don't start eating more grapes, you'll develop testicular cancer in six years."

I'll be happy to send it if that's what it takes to get you to eat more grapes.

Grapes are high sugar fruit. There are others you might want to eat more of, but limit the sweet stuff.

There's a great new slogan for the grape industry in there somewhere. . .

It's an interesting result, but it's important to note there are likely further developments in this field of study (the paper was published in May 2016). Thanks to the age, there's some really good Reddit threads [0][1]!

[0], (2016): https://www.reddit.com/r/MachineLearning/comments/4jtfgh/dee...

[1], (3 days ago): https://www.reddit.com/r/MachineLearning/comments/65aoos/p_d...


It seems that you need more, and more complete information to do deep learning on records, but it also seems this is a first small step into centralizing and using health records for public health.

It would be interesting if people are already starting to gather as much data about themselves that will relevant in few years when similar tech is widely available. For example, taking a photo of each of your meals might prove to be good investment of your time down the line.

We need something like this but with natural diet recommendations for how to stave off diseases in each individual case. So instead of guessing, people can be more informed about what to eat.

Having said that, how do you train it when you can't run billions of trials internally like you can with chess?

I like the paper's emphasis on pre-processing the data. Once the data is properly organized, I have no doubt that a variety of ML algorithms can be applied to reveal different insights.

We all know what this will be used for in Trump's America -- to disqualify people from health care coverage and even employment.

Ok, perhaps this is needlessly inflammatory, but this comment actually highlights an important issue. Be careful what you wish for. These approaches require data sharing and centralisation to work well. It could be used to sell data centralisation to health care providers, to get access to advanced algorithms.

But then the government starts using those algorithms to ration healthcare. Not contributing your data means you don't get anything. There won't be any going back then.

Interesting research, but terrible name for the project.

So this is patient2vec?

predicts but refuses to explain anything

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact