Thanks for your kind words; I feel the same way re: your posts as well.
> So for those of us following along at home, the crucial idea is that most of the 'information' that 23andMe provides paying clients has not been validated. Not only has it not been validated as to correctness of the genome analysis software (the industry scientist's observation), it has even less been validated as a clue to clinically significant disease risk for the majority of diseases that afflict people in developed countries.
In general, what you say here is going to true. I can't comment specifically on what 23andMe claim to demonstrate, because I haven't seen their actual output, but it's usually very difficult to go from genetic data to individual risk prediction. In fact, trying to do so is low-yield-enough that I don't even expect to find individual risk prediction to be interesting for most diseases (at least, not for the next several years). So heuristically I will assert that any claims about individual risk prediction, for most diseases, are unlikely to be clinically important. The obvious exceptions are Mendelian genetic conditions. If they find that you are homozygous for CFTR ∆F508, you're almost certainly going to develop cystic fibrosis (well, you'd probably already have clinical symptoms by the time you get the test).
But a good number of disease conditions aren't (typically) Mendelian. Will you develop heart disease? If you have certain high-impact mutations in the LDL receptor, we might be able to say with reasonable certainty that you will develop heart disease by a certain age. But high-impact mutations are (usually) rare for most diseases.
The majority of what we have been discovering over the past few years are common variants of modest impact (powerful statistical associations but with odds ratios barely differing from 1). When we try to answer questions like, "Why do black Americans have more heart disease in the US?" we don't get smoking-gun mono- or oligogenic answers.[1] Even combining these markers (that are known to be robustly associated with a phenotype) into a single score doesn't do a whole lot more than just knowing the biomarkers that we already measure.[2]
Finally, we have to wrestle with the issue of causality. For example, increased HDL-cholesterol is epidemiologically associated with decreased risk of heart disease. So people say, "HDL is good cholesterol." OK, maybe. My colleagues tested that hypothesis with one single nucleotide polymorphism in a gene that appears only to modify HDL-C levels, LIPG. (Most genes that modify HDL-C also modify LDL-C or triglycerides, and pleiotropic effects ruin your ability to assess a single biomarker.) They said, "Epidemiologically, an X% increase in HDL-C associates with a Y% decrease in risk of heart attack. Also, this LIPG SNP associates with a J% increase in HDL-C. Therefore, based on that J% increase in HDL-C, we expect a K% decrease in heart disease if HDL-C is a causal biomarker." What was the result? The LIPG SNP had the expected effect on HDL-C levels, but had absolutely zero association with your risk for heart attack (the OR was like 0.99 with a CI that easily included 1.0). In contrast, an LDL-C SNP score had a robust association in the expected direction (more genetic variants that are known to raise LDL-C? more heart disease risk).[3]
In other words, is HDL a causal protective factor? It appears that, for at least one cause of high HDL-C, it is not. (This doesn't give me license to dismiss the entire HDL hypothesis and I wouldn't intend to do that without exhaustive scientific work, of course.)
How much of this is out there? Probably tons. It's probably only known by domain experts, or (more commonly) nobody.
So if you're still following this minimally coherent post, we're discovering common variants with very weak effects that often impact biomarkers which may or may not be causal for the diseases of interest, and our view is being revised all of the time.
Is it interesting? From a research perspective, yes, it's awesome. We're finding all of these landmarks in this enormous genomic map. This gives us insight into the architecture of diseases. It gives us smarter therapeutic targets. It helps us evaluate potential therapeutic targets via tools such as Mendelian Randomization to save years of time and billions of dollars avoiding clinical trials that are unlikely to bear fruit (one example of which I discussed above with the LIPG paragraph).
But to an individual alive today wondering about the direct clinical utility of this information? Not today, not right now, not in my opinion. If it's any indicator, I haven't done 23andMe, and if I did, it would be for entertainment purposes.
Will this stuff be in clinics in a few years? Yes, probably. Will it be useful? My guess is that it will be most useful, even then, for research. Eventually there will be clinical significance, but I suspect that the clinical significance will largely go hand-in-hand with the development of therapeutics that have specific genetic targets (e.g., perhaps you have some predisposition to developing cancer, but with your genotype we know that if we give you these 3 tyrosine kinase inhibitors you're very unlikely to develop cancer). I could see that developing.
> Pay your money for the service at the new lower price if you like, but prepared to see your personal genome results repackaged and reinterpreted for years to come before you learn anything from them that will help you improve your health.
Yes, this is absolutely true; there is much more work to be done, and if 23andMe's users are lucky they'll have to keep re-downloading their genomic data as they are updated with new information. If they end up stuck with what we've got now, well, that's stable but incredibly boring.
> But that they are running ahead of their ability, based on current science, to deliver actionable information to the clients who pay for their services. There is still an astounding lack of replicability and of large effect sizes in almost any genome study related to common human diseases or to socially meaningful human behaviors.
I think that there is actually remarkably good reproducibility of the genetic associations with common human diseases, but these are mostly of small effect. A lot of the associations in the candidate-gene/pre-human-genome era do appear to be spurious, however.
> So for those of us following along at home, the crucial idea is that most of the 'information' that 23andMe provides paying clients has not been validated. Not only has it not been validated as to correctness of the genome analysis software (the industry scientist's observation), it has even less been validated as a clue to clinically significant disease risk for the majority of diseases that afflict people in developed countries.
In general, what you say here is going to true. I can't comment specifically on what 23andMe claim to demonstrate, because I haven't seen their actual output, but it's usually very difficult to go from genetic data to individual risk prediction. In fact, trying to do so is low-yield-enough that I don't even expect to find individual risk prediction to be interesting for most diseases (at least, not for the next several years). So heuristically I will assert that any claims about individual risk prediction, for most diseases, are unlikely to be clinically important. The obvious exceptions are Mendelian genetic conditions. If they find that you are homozygous for CFTR ∆F508, you're almost certainly going to develop cystic fibrosis (well, you'd probably already have clinical symptoms by the time you get the test).
But a good number of disease conditions aren't (typically) Mendelian. Will you develop heart disease? If you have certain high-impact mutations in the LDL receptor, we might be able to say with reasonable certainty that you will develop heart disease by a certain age. But high-impact mutations are (usually) rare for most diseases.
The majority of what we have been discovering over the past few years are common variants of modest impact (powerful statistical associations but with odds ratios barely differing from 1). When we try to answer questions like, "Why do black Americans have more heart disease in the US?" we don't get smoking-gun mono- or oligogenic answers.[1] Even combining these markers (that are known to be robustly associated with a phenotype) into a single score doesn't do a whole lot more than just knowing the biomarkers that we already measure.[2]
Finally, we have to wrestle with the issue of causality. For example, increased HDL-cholesterol is epidemiologically associated with decreased risk of heart disease. So people say, "HDL is good cholesterol." OK, maybe. My colleagues tested that hypothesis with one single nucleotide polymorphism in a gene that appears only to modify HDL-C levels, LIPG. (Most genes that modify HDL-C also modify LDL-C or triglycerides, and pleiotropic effects ruin your ability to assess a single biomarker.) They said, "Epidemiologically, an X% increase in HDL-C associates with a Y% decrease in risk of heart attack. Also, this LIPG SNP associates with a J% increase in HDL-C. Therefore, based on that J% increase in HDL-C, we expect a K% decrease in heart disease if HDL-C is a causal biomarker." What was the result? The LIPG SNP had the expected effect on HDL-C levels, but had absolutely zero association with your risk for heart attack (the OR was like 0.99 with a CI that easily included 1.0). In contrast, an LDL-C SNP score had a robust association in the expected direction (more genetic variants that are known to raise LDL-C? more heart disease risk).[3]
In other words, is HDL a causal protective factor? It appears that, for at least one cause of high HDL-C, it is not. (This doesn't give me license to dismiss the entire HDL hypothesis and I wouldn't intend to do that without exhaustive scientific work, of course.)
How much of this is out there? Probably tons. It's probably only known by domain experts, or (more commonly) nobody.
So if you're still following this minimally coherent post, we're discovering common variants with very weak effects that often impact biomarkers which may or may not be causal for the diseases of interest, and our view is being revised all of the time.
Is it interesting? From a research perspective, yes, it's awesome. We're finding all of these landmarks in this enormous genomic map. This gives us insight into the architecture of diseases. It gives us smarter therapeutic targets. It helps us evaluate potential therapeutic targets via tools such as Mendelian Randomization to save years of time and billions of dollars avoiding clinical trials that are unlikely to bear fruit (one example of which I discussed above with the LIPG paragraph).
But to an individual alive today wondering about the direct clinical utility of this information? Not today, not right now, not in my opinion. If it's any indicator, I haven't done 23andMe, and if I did, it would be for entertainment purposes.
Will this stuff be in clinics in a few years? Yes, probably. Will it be useful? My guess is that it will be most useful, even then, for research. Eventually there will be clinical significance, but I suspect that the clinical significance will largely go hand-in-hand with the development of therapeutics that have specific genetic targets (e.g., perhaps you have some predisposition to developing cancer, but with your genotype we know that if we give you these 3 tyrosine kinase inhibitors you're very unlikely to develop cancer). I could see that developing.
> Pay your money for the service at the new lower price if you like, but prepared to see your personal genome results repackaged and reinterpreted for years to come before you learn anything from them that will help you improve your health.
Yes, this is absolutely true; there is much more work to be done, and if 23andMe's users are lucky they'll have to keep re-downloading their genomic data as they are updated with new information. If they end up stuck with what we've got now, well, that's stable but incredibly boring.
> But that they are running ahead of their ability, based on current science, to deliver actionable information to the clients who pay for their services. There is still an astounding lack of replicability and of large effect sizes in almost any genome study related to common human diseases or to socially meaningful human behaviors.
I think that there is actually remarkably good reproducibility of the genetic associations with common human diseases, but these are mostly of small effect. A lot of the associations in the candidate-gene/pre-human-genome era do appear to be spurious, however.
1 = http://www.ncbi.nlm.nih.gov/pubmed/21347282?dopt=Abstract
2 = http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2845522/
3 = http://www.thelancet.com/journals/lancet/article/PIIS0140-67...