Hacker News new | comments | show | ask | jobs | submit login
DNA Testing Botched My Family's Heritage (gizmodo.com)
106 points by anarbadalov 35 days ago | hide | past | web | favorite | 47 comments



I always kinda curious about dna heritage testing. Until I saw John Sotos' talk at Def Con 25 [0]. He highly recommands you keep your genome private. And that you only give it away if it is vital (aka you're really sick).

The talk is absolutely amazing, probably the most captivating talk I have ever watched (thanks HN commenter that shared it a while ago). But anyway, Sotos specifically tells his audience that, as a security measure, they shouldn't release their dna to heritage services. Because it isn't vital.

Anywhi, great talk, highly recommand it although Sotos mentions DNA testing for heritage for a few seconds only, so slightly off topic :)

[0] https://www.youtube.com/watch?v=HKQDSgBHPfY


It is yet again disturbing, that two month after having watched the talk, the part I took away, besides never give away your personal data, was this https://youtu.be/HKQDSgBHPfY?t=2371 or to be more precise this https://youtu.be/HKQDSgBHPfY?t=2568


I disagree.

Mostly for one reason alone. Genetic association testing.

To get statistical power necessary to associate variants with complex genetic traits, we need very large sample sizes (hundreds of thousands are good, millions of individuals are better).

If everyone clings to their genomes in fear of what bad-actors might discover, we will never discover anything at all. Cures included.


You’re right. I also find it telling that John Sotos doesn’t give a single argument for his recommendation. He simply “proclaims” that you shall not share your genome “without good reason”. He says that everything in life is a cost–benefit balance but that’s an empty phrase. What are the costs in this scenario? I’m not saying they are nil, but they’re probably negligible compared to the benefits you mention, especially considering that theft of genomic material would be trivial for a vested actor. No need for you to share anything willingly.


I also disagree. If a bad-actor wants your genetic information they just need to go somewhere you've been. You're constantly shedding DNA.


The DNA you leave behind when you touch things can actually be more useful than just the cellular DNA isolated from a genome test and put into a database.

DNA analysis of your microbiota can indicate more than just what you are or who you are, but also your current health status, where you have been, and who you associate with. It would also be one way to distinguish between identical twins.


Identical twins have the same DNA.


Sure, but they have different microbiomes.


The problem is. The bad actors are stealing your data from the dna companies, not from your house...


There's a reason databases exist. Consolidated and normalized data is light years easier to obtain and less time consuming than seeking out individual pieces of disparate data.


If a bad actor wants to rob you, they can break into your house. But my door still has a lock on it.


Any tl;dr's or good insights for a lazy non-video watcher?


Cancer research will soon allow us to modify DNA in people in a very selective manner, using If-then-else logic.

When this technology is available, attackers will be able to perform biological exploits targeting individuals, families, ethnic groups or entire species.

Many examples are given of some conditions that are controlled by genetics and why an attacker would want to deploy them. Animal right activists making everybody allergic to meat, radical islamists rendering women unable to expose their skin to sunlight, a state ruining an enemy by inducing learning disabilities in children, an activist embarrassing his political opponent by giving him a fishy smell, etc.

Attacks can be immediate, delayed like a time bomb or even target the germ line.

One recurring recommendation is to keep your genome secret unless you have a medical reason.


Only works if you keep all your relatives' genomes secret too. After all, any attack that works on them has a high chance of working on you too.


There's no way you can keep your genome private in the long term. Even today anyone can get DNA samples from you without your knowledge or consent, but in the near future it will be trivial to collect and sequence anyone's DNA using an iPhone.

Besides, the same knowledge that makes attacks possible also makes defense possible.


Surreptitious DNA collection requires some level of targeted effort: someone must have a prior interest in you specifically, which is unlikely for most persons. Uploading your DNA to a database exposes you (and your relatives) to all sorts of Birthday-Problem-like mischief from persons and organizations trawling the database.

The best defense to the current threat landscape (privacy violations related to insurance, employment, and the law) is to define your genome-related queries as narrowly as possible and type/sequence only those relevant loci/regions.


Ooh, I am currently reading a recent paper of his: Biotechnology and the lifetime of technical civilizations

https://arxiv.org/abs/1709.01149


This is a really fascinating talk, thank you for linking to it.


"23andMe’s ancestry results were the most confounding of all. It found that I was only 3 percent Scandanavian, a number that, based on my recent family history, I know is flatly wrong. It also found I was only 5.5 percent Middle Eastern and a whopping 62.6 percent Northwestern European."

I get how their statistical models come up with those numbers, but I have a very hard time understanding how any reasonable person would put this much stock in the interpretation. "3% Scandinavian?" "Flatly wrong?" Where does the author think Scandinavia is, if not north-western Europe? Does the author think the borders of "Scandinavia" were closed in 1000CE, with no interaction with the rest of Europe?


> Where does the author think Scandinavia is, if not north-western Europe?

Um, north-eastern Europe perhaps?


If you look at the image in the article, 23andMe includes Scandinavia in northwestern Europe.


A surprising categorization by 23andMe certainly, but not one which detracts from my post: very few people would place Scandinavia in north-western Europe since that category implies that north-eastern Europe is an option.


> very few people would place Scandinavia in north-western Europe

Not really. I don’t presume to claim to know what “most” people think, but placing Scandinavia in north-western Europe is hardly eccentric [1, 2]. Furthermore, its history, politics and certainly its genetics are fairly intertwined with north-western Europe so it’s a sensible categorisation.

[1]: https://en.oxforddictionaries.com/definition/Scandinavia [2]: https://en.wikipedia.org/wiki/Northwestern_Europe


I see, I wasn't aware that it is normal. Agree about history, politics and human genetics (not Finland though!). I was thinking more about geography and non-human biogeography, which support north-eastern.


TL;DR - Genetic tests don't give good regional ancestry results if you are from a region that is underrepresented by genetic testing.


That's only partially true. The real issue is that there is no official representation of race or heritage found in DNA. Modern genetic testing is a best-effort exercise for classification whose 'accuracy' varies by sample size, race coverage, and group(s) sponsoring testing/analysis.

Quite simply, one cannot look at a DNA sequence and say definitively it is one race or another—there is no official Rosetta Stone for translating DNA to race or heritage.


That's not really true. If you look at just the basic PCA of genetic data, several races emerge very clearly and distinctly. E.g. look at this image: https://i.imgur.com/J2Xl2FK.png

I'm just guessing here, but I think the problem is they are trying to do something more complicated than just find your nearest genetic cluster. They want to find exactly what mix you are between different clusters. It's clear on the graph that some people have more African ancestry than others. But what would be an algorithm for working out the exact percentage "African" someone is? Is it euclidean distance to the median african? Or the furthest bottom left african? Or is it just the first principle component alone?

The second problem is they don't just want 7 big clusters. People already know they are European, they are more interested in exactly what countries/regions they are from. And making a bunch more clusters makes the problem more complicated.


That, and that gene tests are simply correlating against gene distributions from when the samples were collected, so present day. If there was a major population movement in the past fifty years (Syria to mediterranean Europe in the author's case), the results won't reflect an American-centric view of what it means to be from Syria or Italy or wherever.


It's also the fact that the author's family is heavily admixed. That makes assigning them to any one group precisely, a very difficult task with large margins of error.


DNA heritage testing is often referred as genetic astorology: http://www.bbc.co.uk/news/science-environment-21687013

UCL has a nice page collecting the scientific facts: http://www.ucl.ac.uk/mace-lab/debunking


Oh, but there is more to that. It's the kind of mysticism that the educated people wouldn't be ashamed to engage in. The demand was there all along, and still is, strong as ever.

Also consider magic potions, for example.


Please note that the UCL page specifically distinguishes DNA ancestry testing from genetic astrology: DNA ancestry testing is in principle scientifically valid, albeit still fairly imprecise due to sparse data.


I suspect those tests have no way to work for people who come from urban commerce centers where ethnic groups have mixed for centuries, which includes most port city on the Mediterranean sea


I was thinking the same. Southern Spain, Italy, Portugal and Greece and Middle Eastern or North African will have a lot of genetic overlap.


My dad had a similar experience. He's taken several different DNA tests, and all of them give him different results. His great-grandfather apparently had an affair with a Native American women, who had a baby but didn't want it, so his great grandparents took the child in and raised it as their own, only revealing the truth to him later in life.

My dad has taken several tests. One said his DNA is 10% East Asian, another said he had no East Asian DNA but that he was ~8% Middle Eastern and ~1% African. He's currently waiting for his 23andMe results, but mine said I was only 0.5% Native American and that the remainder was European, so I'm guessing it'll be similar for him.


If you did 23 and me, you should be able to add him as like a parent and it will show you what you got from him in terms of your traits/dna


Either my understanding of genetics or most people's understanding is wrong. Suppose you have a woman with roughly 100% African genes and a man with roughly 100% German genes and they have a daughter. She will have 23 chromosome pairs, each with one African (A) and one German (G) chromosome.

Now imagine she has a bunch of children and let's ignore cross-overs to simplify things. The 50% of genes that she contributes to each child can vary from 0% A and 50% G all the way to 50% A and 0% G. So having a child with 16% G would be perfectly normal (not even counting any contribution from the man, which would make it even more complicated).

So the idea of dividing by 2 in each generation (great grandparent 100% X, grandparent 50% X, parent 25% X and so me 12.5% X) is an extremely crude approximation and having these genetic tests not match that doesn't necessarily mean that they are wrong.


One thing that's good to know is that 23andMe and Ancestry don't currently do genetic testing. They do genotype testing instead. Genotype testing is more like a population comparison and it's not accurate, or only as accurate as the genotype database you're being compared against.


I don’t know where you get this idea because genotyping is (a form of) genetic testing. And as for accuracy, human reference genome databases are highly accurate. Sure, they’re not error-free — but then nothing is. Ancestry in particular is a tricky problem due to the highly variable representation of different ancestries in the reference set.

By contrast, 23andme delivers high accuracy for genetic variants. Meaning, for those variants in their reference set they can tell with very high accuracy whether a given sample has a given allele. Interpretability of this information unfortunately varies greatly because genetics is hard, and a lot of the connections between genotypes and disease predisposition are complex and still being fully discovered. It’s for this specific reason that the FDA told 23andme off for providing disease predispositions; not because the genetic test itself is inaccurate (it isn’t).

For simple (Mendelian) disease predispositions, 23andme is essentially as accurate as any other for of genetic testing (which would at any rate also use genotyping).


Ok - genotype testing is much less accurate as genome testing. A potential consequence is the common experience described in the article. Genotype testings's main advantage is cost.

On a side note, most of 23andme's reference set is limited to Europe, so it's even less accurate for a lot of people. Maybe things have changed?


I’m still not happy with this characterisation. By “genome testing” I suspect you mean whole-genome sequencing (WGS) [1]. First off, this is also a form of genotyping. And, depending on which definition of accuracy you’re using, it’s no more accurate. It just provides more signal (because it looks at much larger parts of the genome than microarray genotyping [2] does). The technology used by 23andme and DNA ancestry services (microarrays) will miss some information, compared to WGS (= lower recall [3]). But importantly its precision is no lower than WGS (in fact, until very recently it was higher).

The fact that DNA ancestry estimates are inaccurate for some people is largely unrelated to the technology (microarray vs sequencing), and rather to the fact that our reference set for ancestry is too sparse (especially outside of Europe).

[1] https://en.wikipedia.org/wiki/Whole_genome_sequencing [2] https://en.wikipedia.org/wiki/DNA_microarray [3] https://en.wikipedia.org/wiki/Precision_and_recall


> “They’re not telling you where your DNA comes from in the past,” he told me, “They’re telling you where on Earth your DNA is from today.”

But doesn’t AncestryDNA, in particular, have exactly the data one would need to overcome that problem?


I'd still like to see what mine contains. I know my ancestors back to the late 18th century with one exception - my fathers grandfather appears to have been an imposter, as he does not exist in any record he should (Prussia), or at least his background was not known to anyone I ever talked with who knew him or knew someone who knew him. Genealogy is fascinating but hard to find data that may have been destroyed in wars or country upheavals. At least at little DNA information might provide an alternate connection but of course that might itself be confusing. The past is often nebulous.


"Botched" seems too strong of a word for what the article is describing.


I am not a scientist but in my limited research I have been led to believe that these things are not entirely accurate.


Well, they're accurate but not precise.

What's useful is using the "DNA match" features of various websites to track down genetic relatives and use their research to flesh out your family tree beyond the point they intersect.

What's not useful is to use them to say "I'm 30% ____". It just doesn't work that way. Even the Ancestry results that show I'm like 30% "Ireland/Scotland/Wales" shows a whisker plot when you click in that gives a margin of error of +/- 35%.


Hmm, my understanding leads me to expect: these companies to be more likely to deliver repeatedly similar results (precise) upon multiple analysis of the same tested specicime. It's the accuracy of the conclusions drawn from those results that is more difficult to offer... Your thoughts?




Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: