
DeepVariant: Highly Accurate Genomes with Deep Neural Networks - tsaprailis
https://research.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html
======
inciampati
I implemented a similar model based around the amazing out of core linear
learner Vowpal Wabbit. It did pretty well in the Precision FDA challanges
despite being developed in two person months. I has the benefit of using
fantastically less compute to train than something like deepvariant.
([https://github.com/ekg/hhga](https://github.com/ekg/hhga))

The approach is the right one for small genetic variants. But it will be hard
to handle more complex kinds of variation without adapting the alignments to
training example synthesis.

I think the field should cool it on calling the results of something like
deepvariant "genomes". These are genotypes, not fully sequenced and
reconstructed genomes. The evaluations are typically on easy regions and we
have no reason to believe that those are the only ones that are important. One
important tool to dig into this is syndip, which is a simulated synthetic
diploid where the full haplotypes are known. It is a mixture of two haploid
human genomes that were de novo sequenced with pacbio technology.
([https://www.biorxiv.org/content/early/2017/11/22/223297](https://www.biorxiv.org/content/early/2017/11/22/223297)).
For the curious these haploid human genomes only exist in molar pregnancies,
so even this isn't ideal but it is maybe the best resource we have at present.

------
dcdanko
The figures in this paper use pretty deceptive scales. To be clear,
DeepVariant is 0.5% better than a tool built in ~2010 (GATK), on DeepVariant's
best test.

GATK is still the standard, not because better variant callers don't exist,
but because it's more important that everyone uses the same tool for
comparisons between studies.

~~~
alexlikeits1999
That first paragraph is pretty deceptive. They are not comparing against the
results from GATK 1.0.

~~~
jghn
I didn't see them mention which version they were using, presumably GATK3. I'm
curious to see what it'd look like against GATK4 which is being released in a
month.

------
j7ake
very nice but do you think neural networks will also be able to interpret the
function of these genotype ?

~~~
nl
Yes. LIME works really well for this type of problem.

~~~
danblick
What is LIME?

~~~
nl
LIME: Local Interpretable Model-Agnostic Explanations;
[https://www.oreilly.com/learning/introduction-to-local-
inter...](https://www.oreilly.com/learning/introduction-to-local-
interpretable-model-agnostic-explanations-lime)

"“Why Should I Trust You?” Explaining the Predictions of Any Classifier":
[https://arxiv.org/pdf/1602.04938.pdf](https://arxiv.org/pdf/1602.04938.pdf)

[https://homes.cs.washington.edu/~marcotcr/blog/lime/](https://homes.cs.washington.edu/~marcotcr/blog/lime/)

[https://github.com/marcotcr/lime](https://github.com/marcotcr/lime)

Anytime anyone makes snide HN comments like "oh you can't understand why
neural networks make predictions" the correct response should always be "why
doesn't LIME work in your specific case".

LIME is being used within the EU to explain credit decisions and fraud
detection flagging on neural network based models, which is quite a high bar
to regulatory oversight to pass.

~~~
danblick
Hadn't heard of that before, thanks!

In this case, I understood the question to be "will deep learning do a good
job predicting the function of/phenotype emerging from individual SNPs" and I
don't think model interpretation would help (for starters, the model is
trained to predict linkage and doesn't deal with data related to phenotypes).

~~~
nl
I believe that comment was edited.

Of course the NN won't interprete the results, it will just provide better
results.

