Great anecdote from Jeff Dean's twitter [1] about how they thought to try this experiment, given that doctors didn't know such a thing was possible:
"Funny story. We had a new team member joining, and @lhpeng suggested an orientation project for them of "Why don't you just try predicting age and gender from the images?" to get them familiar with our software setup, thinking age might be accurate within a couple of decades,and gender would be no better than chance. They went away and worked on this and came back with results that were much more accurate than expected, leading to more investigation about what else could be predicted."
Something to keep in mind the next time someone assures you that 'gaydar' is pseudoscience and totally impossible and any paper must be datamining - everything is correlated. We are routinely surprised what can be extracted from data, and anyone telling you a priori what cannot be done is pushing their politics.
Just want to point out a few things that didn't get covered in the other thread:
- The standard CV risk calculators that we actually use in medicine don't do a tremendous job at discriminating risk. The AUC for the pooled risk equations tends to be around 0.74. As a refresher, AUC can be interpreted as the probability that someone with CVD has a risk score that is higher than someone without.
- This deep learning model had an AUC of about 0.7, similar to the pooled risk equation which was about 0.72
- Adding this deep learning model to the pooled risk equation did not change the AUC (0.72 -> 0.72). There may have been recategorization, but we can't tell from the paper as published.
- The pooled risk equations in this study did not have the second most important component: the lipid profile. (Age is the most important risk factor.) That absence hamstrings the baseline model compared to the deep learning model. We aren't seeing the standard pooled risk equations here.
- As my PI argues on Twitter, you could see this deep learning model as a very good age predictor.
- The fact that the model is paying "attention" to blood vessels and other structures that are relevant to humans is a promising sign.
> paying "attention" to blood vessels and other structures that are relevant to humans
Why is that necessarily a good thing? The point of machine learning is to unveil relationships that we may or may not have thought of. For all we know there's a structure we've never looked at that provides better correlates with disease, and if a model pays attention to that it would be an equally promising sign.
I understand that in medical research it would be best to have a physiologically relevant correlate that is being recognized by the model, but it doesn't mean that we should completely disregard the other pixels that a model could find important either.
I agree with you. It's a double edged sword. It's good because if you want to put something like this into clinical use, physicians are going to be skeptical if it's not interpetable. It's bad because, if humans have already identified the relevant structural relationships, there is less likely to be a big breakthrough. Nevertheless, one could imagine a model identifying features that humans wouldn't consider even within the structures that we already know about.
People who still smoke nowadays aren't even deterred by the very graphic images of internal organs affected by smoking-related diseases that are printed on cigarette packs in many countries (next to very clear wording that smoking will harm you and people around you). So I doubt that this would make much of a difference in that regard.
Probably not. They already know the outcomes and don't seem to care.
However 71% seems like a very low number... Does that mean that in 7 out of 10 cases the system guesses correctly if the patient is smoker or not? That's not really impressive IMHO. What am I missing?
I seriously doubt that, just reflecting on how smokers so often remain unswayed¹ even by far more brutal prognostications about the habit's effect on their longevity.
It feels like we're approaching the time where a home diagnostic kit could be economically employed to check health status and screen for a host of diseases. You'd receive sample containers at home for blood, saliva, urine, breath. Premium package includes cheap stethoscope, retinoscope, EEG attachments for your smartphone. Samples are couriered to the test lab and the data analysed by AI software with the results available online both to you and your GP if desired.
I would be interested to know to what extent the prediction of cardiovascular risk goes beyond a linear combination of the other factors such as hypertension, diabetes, age. I.e. if we just prediction these variables with the network then run a linear regression is it less accurate? This would give some indication as to whether there is additional independent information that is useful in the retinal imaging.
> Given the retinal image of one patient who (up to 5 years) later experienced a major CV event (such as a heart attack) and the image of another patient who did not, our algorithm could pick out the patient who had the CV event 70% of the time.
To put that in perspective, in this study, a random choice would be right 50% of the time.
So this is interesting, but would it be useful diagnostic tool?
The major goal here would be to screen for people to go for a more advanced test, like an MRI. But even that wouldn't hit the 90-95% level of predictor.
"Funny story. We had a new team member joining, and @lhpeng suggested an orientation project for them of "Why don't you just try predicting age and gender from the images?" to get them familiar with our software setup, thinking age might be accurate within a couple of decades,and gender would be no better than chance. They went away and worked on this and came back with results that were much more accurate than expected, leading to more investigation about what else could be predicted."
[1]: https://twitter.com/JeffDean/status/965720435290791936