
Geneticists pan paper that claims to predict a person's face from their DNA - biridir
http://www.nature.com/news/geneticists-pan-paper-that-claims-to-predict-a-person-s-face-from-their-dna-1.22580
======
Real_S
Interesting conflict between these researchers.

From Erlich [0]:

"The take home message should be that identifying someone in a group of ten
people requires very little effort. Anyone with access to even low dimensional
data, such as basic demographic, can do that. This is not very surprising."

"To summarize this point, the title says: “Identification of individuals by
trait prediction using whole- genome sequencing data” but most of the trait
predictions is carried by ethnicity of the individual (genomic PCs) rather
than the trait specific SNPs."

Summarizing: DNA can estimate ethnicity, which can be used to predict a range
of traits, including facial structure. These results would be far more
impressive if they were able to predict faces in data composed of a single
ethnicity. HLI may have overstated their results, but nevertheless they raise
important points about re-id.

[0]
[http://www.biorxiv.org/content/early/2017/09/07/185330.1](http://www.biorxiv.org/content/early/2017/09/07/185330.1)

------
carbocation
The fact that identical twins look similar suggests that most of the variation
in facial identity is going to be driven by a heritable element. If you could
fully execute the program encoded by a person's DNA, I'd expect to see a
reproduction of their face.

However, that depends on correctly mapping the program to its output. I'm
skeptical that we're at the point where we can model this correctly. I think
this is why you see that all of the predicted faces in the Venter paper look
like a generic/averaged face.

~~~
Ygg2
Yeah, but they also grew in same uterine environment.

~~~
mbreese
They also probably grow up together, have similar diets, speak the same
language... there are lots of correlated environmental factors with twins
aside from their DNA.

A better comparison would be separated twins (widely separated). But I'm not
sure if anyone has looked at the relative differences between twins raised
together vs apart.

~~~
Mayzie
> A better comparison would be separated twins (widely separated).

The two people in The Parent Trap are twins and were widely separated, yet
they still looked identical, so much so that their parents could not even tell
them apart.

------
daughart
Anonymous participants in genome research have been re-identified in the past
[0]. In one of these examples, people were re-identified merely using zip
code, date of birth and gender. Everyone should assume that they can be
identified once they have revealed even a few personal datapoints. The genome
contains millions of datapoints.

[0]
[https://www.forbes.com/sites/adamtanner/2013/04/25/harvard-p...](https://www.forbes.com/sites/adamtanner/2013/04/25/harvard-
professor-re-identifies-anonymous-volunteers-in-dna-study/#7d5413c792c9)

------
mrfusion
I'd really like to hear from the resident hacker news skeptic/crumudgen on
this. Really shouldn't be possible at this level of technology.

For example We have no idea what genes control the length of someone's nose or
forehead.

~~~
evolve2017
On mobile, so I may not get to reply fully!

We actually do have some idea about necks and I believe noses - the important
elements are DNA enhancers. There was a presentation at the Society for
Developmental Biology in 2012 on this topic, though I can't recall the
scientist who discussed this...

------
aaron695
Of course it's bunk we are not even close to this level for DNA.

We don't even know if whole races are physically different at certain levels.
Yet we can get complex individual data like a face from DNA?

[https://medium.com/words-escape-us/are-japanese-
intestines-l...](https://medium.com/words-escape-us/are-japanese-intestines-
longer-8a41ca3e7d89)

Why would Nature publish a rebuttal? What next, proof flat earthers(aka funny
trolls) are wrong.

But when we look closer we see its people claiming privacy is still intact, a
much more dangerous push than the original article rubbish.

------
andy_ppp
Could you just throw loads of genomes (about 1.5gb of data) and a photo of
peoples faces and train a deep learning model? It would probably do okay,
unless of course environmental factors are more important than genetics here.
Judging by how different siblings tend to look I’d say genetics isn’t the
whole story.

~~~
allenz
In theory, yes. In reality, we don't have enough data for that to work. A
model that takes the whole genome as input is excessively expressive and would
overfit, finding spurious correlations everywhere. In the near term, we still
need to preprocess the genome to extract lower-dimensional features of
interest.

~~~
posterboy
you could use montecarlo to search for an effective compression of the input
data. That's Singular Value Decomposition if I'm not mistaken. Dimensional
Reduction is a hot topic in coding theory. Optimaly, understanding of the
biologic process involved would certainly help here. DNA is thought to be
higly compressed and self modifying, so a smaller encoding is unlikely.
Therefore, seperation of the DNA sequences involved in the effect under
scrutiny might fail on the possibly highly random inputs without good
techniques and heuristics. Effectively, pre-processing could involve
_recompilation_ and all the techniques used in software analysis, only then
_live debugging_ is to be taken literal.

Reduction to spectra can be used to achieve sparseness, that cover the error
margin with probalistic precission including invariants and the mentioned
external factors, maybe also data aquisition errors from cost effective (read
cheap:) methods.

------
userbinator
Reminds me of this satirical paper:
[http://languagelog.ldc.upenn.edu/nll/?p=18315](http://languagelog.ldc.upenn.edu/nll/?p=18315)

------
joshfraser
The technology may not be quite there yet, but there's little doubt in my mind
that we'll get there within the next decade or two.

~~~
dogruck
And this is just a relatively uncontroverisial prediction. Wait until they
move to SAT scores.

