The talk is absolutely amazing, probably the most captivating talk I have ever watched (thanks HN commenter that shared it a while ago). But anyway, Sotos specifically tells his audience that, as a security measure, they shouldn't release their dna to heritage services. Because it isn't vital.
Anywhi, great talk, highly recommand it although Sotos mentions DNA testing for heritage for a few seconds only, so slightly off topic :)
Mostly for one reason alone. Genetic association testing.
To get statistical power necessary to associate variants with complex genetic traits, we need very large sample sizes (hundreds of thousands are good, millions of individuals are better).
If everyone clings to their genomes in fear of what bad-actors might discover, we will never discover anything at all. Cures included.
DNA analysis of your microbiota can indicate more than just what you are or who you are, but also your current health status, where you have been, and who you associate with. It would also be one way to distinguish between identical twins.
When this technology is available, attackers will be able to perform biological exploits targeting individuals, families, ethnic groups or entire species.
Many examples are given of some conditions that are controlled by genetics and why an attacker would want to deploy them. Animal right activists making everybody allergic to meat, radical islamists rendering women unable to expose their skin to sunlight, a state ruining an enemy by inducing learning disabilities in children, an activist embarrassing his political opponent by giving him a fishy smell, etc.
Attacks can be immediate, delayed like a time bomb or even target the germ line.
One recurring recommendation is to keep your genome secret unless you have a medical reason.
Besides, the same knowledge that makes attacks possible also makes defense possible.
The best defense to the current threat landscape (privacy violations related to insurance, employment, and the law) is to define your genome-related queries as narrowly as possible and type/sequence only those relevant loci/regions.
I get how their statistical models come up with those numbers, but I have a very hard time understanding how any reasonable person would put this much stock in the interpretation. "3% Scandinavian?" "Flatly wrong?" Where does the author think Scandinavia is, if not north-western Europe? Does the author think the borders of "Scandinavia" were closed in 1000CE, with no interaction with the rest of Europe?
Um, north-eastern Europe perhaps?
Not really. I don’t presume to claim to know what “most” people think, but placing Scandinavia in north-western Europe is hardly eccentric [1, 2]. Furthermore, its history, politics and certainly its genetics are fairly intertwined with north-western Europe so it’s a sensible categorisation.
Quite simply, one cannot look at a DNA sequence and say definitively it is one race or another—there is no official Rosetta Stone for translating DNA to race or heritage.
I'm just guessing here, but I think the problem is they are trying to do something more complicated than just find your nearest genetic cluster. They want to find exactly what mix you are between different clusters. It's clear on the graph that some people have more African ancestry than others. But what would be an algorithm for working out the exact percentage "African" someone is? Is it euclidean distance to the median african? Or the furthest bottom left african? Or is it just the first principle component alone?
The second problem is they don't just want 7 big clusters. People already know they are European, they are more interested in exactly what countries/regions they are from. And making a bunch more clusters makes the problem more complicated.
UCL has a nice page collecting the scientific facts:
Also consider magic potions, for example.
My dad has taken several tests. One said his DNA is 10% East Asian, another said he had no East Asian DNA but that he was ~8% Middle Eastern and ~1% African. He's currently waiting for his 23andMe results, but mine said I was only 0.5% Native American and that the remainder was European, so I'm guessing it'll be similar for him.
Now imagine she has a bunch of children and let's ignore cross-overs to simplify things. The 50% of genes that she contributes to each child can vary from 0% A and 50% G all the way to 50% A and 0% G. So having a child with 16% G would be perfectly normal (not even counting any contribution from the man, which would make it even more complicated).
So the idea of dividing by 2 in each generation (great grandparent 100% X, grandparent 50% X, parent 25% X and so me 12.5% X) is an extremely crude approximation and having these genetic tests not match that doesn't necessarily mean that they are wrong.
By contrast, 23andme delivers high accuracy for genetic variants. Meaning, for those variants in their reference set they can tell with very high accuracy whether a given sample has a given allele. Interpretability of this information unfortunately varies greatly because genetics is hard, and a lot of the connections between genotypes and disease predisposition are complex and still being fully discovered. It’s for this specific reason that the FDA told 23andme off for providing disease predispositions; not because the genetic test itself is inaccurate (it isn’t).
For simple (Mendelian) disease predispositions, 23andme is essentially as accurate as any other for of genetic testing (which would at any rate also use genotyping).
On a side note, most of 23andme's reference set is limited to Europe, so it's even less accurate for a lot of people. Maybe things have changed?
The fact that DNA ancestry estimates are inaccurate for some people is largely unrelated to the technology (microarray vs sequencing), and rather to the fact that our reference set for ancestry is too sparse (especially outside of Europe).
But doesn’t AncestryDNA, in particular, have exactly the data one would need to overcome that problem?
What's useful is using the "DNA match" features of various websites to track down genetic relatives and use their research to flesh out your family tree beyond the point they intersect.
What's not useful is to use them to say "I'm 30% ____". It just doesn't work that way. Even the Ancestry results that show I'm like 30% "Ireland/Scotland/Wales" shows a whisker plot when you click in that gives a margin of error of +/- 35%.