Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there any way to just have your entire genome sequenced and get all the data in a software-friendly format? At that point there could/should be some open source software for analyzing it and finding common or well understood things like this. That way the software could be updated and people could re-run their analysis to look for newly discovered stuff.

I think this would be an awesome amount of fun. I for one would be interested in looking for certain gene variants that are not mentioned at all over at 23andMe.



Check out this series of articles from 2016 by Carl Zimmer [1]. He gets his genome sequenced by Illumina ($3100) and joins a medical study so that they'll give him the raw data. He gets the 70GB "BAM file" (Binary Alignment/Map) and passes it around to experts and they dig into it. Multiple weeks of computer time plus expert analysis---this is not a simple thing yet.

[1] https://www.statnews.com/feature/game-of-genomes/season-one/

[2] Supplementary materials to [1], https://zimmerome.gersteinlab.org/


Have you already done a 23andMe analysis? If so, you can check out https://promethease.com/. It's exactly what you're looking for as they have constant updates that make it worth your while to rescan every year or so.


Promethease is awesome. I uploaded my 23andMe data to it and got back the kind of data I'd been hoping for in the first place.

Fair warning: the UI is very geeky. I think any HN reader should be able to find their way around without trouble, but I wouldn't recommend it to my non-technical friends or family.


Are you saying 23&Me gives you a file with the full list of chromosome's ACGT data? I've always wanted that.

Also is prometheus and open source analyzer?


23&Me will let you download a text file with the ACGT data, but only for the SNPs that it has. 23&Me does not sequence your full genome, so the SNPs available are a small subset of your DNA.

Prometheus is not open-source (I think), but all it does is read various files with DNA data (like the 23&me export), and match it up with the information in SNPedia (a Wikipedia-like open repository of what we know about certain SNPs), and then exports it to a pretty HTML/JS web report for you that you can download and save.


Is it possible to do client side? I'd rather download the db and match it locally than ship my genome to them.


Yes. The author pushes people to the web version since it's more up to date, easier to maintain, etc., but there's a local version, see:

http://snpedia.com/index.php/Promethease/Desktop http://snpedia.com/index.php/Promethease/privacy


Why would you need to be rescanned? Does your DNA change that often?


Not your DNA, but your Promethease analysis can be redone every year or so as they're constantly adding new analyses.


Makes sense, thanks!


They mean rescan with the Promethease software. Our knowledge around genetic variation is evolving rapidly.


IIRC but this is what 23andMe used to do before their FDA smackdown.


Thanks! This is way more interesting than the data 23andMe provides.


I did Illumina UYG. As part of that I got a 1TB hard drive with the nearly-raw files (BAM format with raw reads, VCF with variants).

Lots of people say " I for one would be interested in looking for certain gene variants that are not mentioned at all over at 23andMe." but they either never do anything with the data, or they look into it and realize that SNP analysis of gene variants is still a charltan's game.


For those interested in doing this, I will second parent. I have my full genome sequenced, and learned basically nothing that was all that interesting or actionable. It is very early days for DNA analysis.


So what did you do with all the data?


I archived it to cloud storage because I've decided that this raw data has no utility except to waste my time.


Then why pay so much money for the exhaustive test in the first place when other tests are on the market? (Ownership over the results maybe?)


I didn't pay, personally.


I'm a bioinformatician, but haven't really pondered doing this on my own DNA too much. Wouldn't be terribly complicated to sequence and analyse your entire genome though.

Provided you could purify your DNA, sequencing wouldn't be an issue - just send it off to someone like BGI (Beijing Genomics Institute) and download the seq files when they're done. Purified DNA is stable and inert, so no special conditions required for posting it.

Sequence files are just text (if they're in FASTQ format), and all the common tools are open-source. No doubt someone somewhere has put together a Docker image with software for the entire workflow (FASTQ file processing --> read alignment --> variant calling), so processing isn't a big issue. As there's no de novo genome assembly or anything like that, the whole thing can be done on a run-of-the-mill PC, and would take a few days, depending on the depth of sequencing.

My guess is cost would be approaching US$1000 now.


Totally agree with everything you said, although I believe the price is closer to 500 than a 1000. I worked at a lab last year which did methylation analysis on rat genomes, and the price for sequencing was not nearly a 1000. Although the analysis was slightly different since they pulled out all the non methylated DNA, we still ended up with >50GB of 50 bp reads that had a decent coverage of the genome. I'm certain that whole genome sequencing would be easier than what I described.


I wrote some tools for transforming and analyzing your own DNA - http://genomejs.com (Code here: https://github.com/genomejs)

I couldn't find any format that was neutral between vendors, so I wrote something (dna2json) that converts these vendor specific ones into a flat JSON file you can query easily.


If you are willing to let your DNA be publicly available online, check out George Church's Harvard-affiliated Personal Genome Project: http://www.personalgenomes.org/

The goal is to have a dataset of free and open genomic data so that scientists can analyze freely and avoid commercial silos of data.

They will sequence your entire genome for free, subject to a backlog caused by funding shortages.

I think you can pay $1,000 to jump to the head of the line. You may also be able to jump to head of the line if you meet certain "interesting" criteria, like willing to have multiple folks in your family get sequenced. Haven't looked into this in a while, so you'll need to check and verify this paragraph.


OpenSNP is for analysing your own SNP data, which you can download from services like 23andme: https://opensnp.org/

However, sequencing your entire genome is generally not available commercially. If you can find it, expect to pay at least a few thousand dollars for the raw data, and that's just sequencing reads that will need a lot of work to get to anything like a genome. Your best bet might be to try to join a genome sequencing research study and pre-agree to have access to your own data.


Very few companies even scan your entire DNA. 23andMe analyzes for example maybe less than 1 or 2%.


It's way less than that. They scan SNP's, of which there are about 10 million in total. So only 0.3% of the human genome varies between all of us. I think they only do the 602,000[1] most common SNPS, which is only 0.02% of the genome though they might do a few more.

A SNP is a Single Nucleotide Polymorphism, ie places in the genome which vary from the reference human genome by change of one base pair. [1] https://www.snpedia.com/index.php/23andMe_v4_differences


Presumably most (98% isn't it) of our DNA is the same thought, right?

About 98% of our DNA does just makes an ordinary human body with normal systems.

So we're only interested in the 2% that can vary.

Or whatever the actual numbers are.


A single error in the very large part of DNA that shouldn't vary per individual but "makes an ordinary human body with normal systems" means that you don't get an ordinary human body with normal systems.

Many such errors cause non-viable embryos, but if you have survived up to this point, then such a difference is still quite likely to have a meaningful impact to your health and is precisely the part that you'd want to have scanned and verified.

For adult DNA scanning we're not really interested in all the genes vary between all people and code for the color of your eyes, the melanine content of your skin, the shape of your nose or your height - but we are very much interested in, for example, scanning your genes that encode CFTR protein to check if you (or your kids!) will have issues with cystic fibrosis.

It's possible that you don't really have (or your kids are likely to not have) an "ordinary human body with normal systems" - that's what you'd need to find out.


>A single error in the very large part of DNA that shouldn't vary per individual

However true, that is irrelevant to genetic diagnostics as they exist today. We have no idea how a random error might impact health aside from very limited known mutations that are sufficiently frequent in the population to enable statistical correlation. We are probably decades away from being able to say, for a random mutation, 'this will lead to a deficiency in the synthesis of protein A which impact the development or working of organ B'. We can't even agree on the proportion of junk DNA.


This is helpful if you have rare symptoms with no currently available explanation - if you get a list of the "unusual" mutations that you have, and correlate it with the same data from the few people worldwide that have the same issue, you get a possibility to improve that condition.

I recall seeing cases of rare genetic disorders that have been diagnosed that way, by online communities sharing data.

http://matt.might.net/articles/my-sons-killer/ is one story that counters "this will lead to a deficiency in the synthesis of protein A which impact the development or working of organ B". For many parts of DNA we do know what protein it makes. For many proteins/enzymes/etc we have some idea about their function in the body - and if we have a test subject missing that protein, then the symptoms will be even more indicative about this, even if the population is tiny (1 in this example!) and doesn't allow for any statistical inference.

This means that if we really want to, we can try to find out the likely effect and possible workaround of a particular mutation, even if we currently don't have a ready-made answer for it.


That's right. I'm suggesting the average diagnostic test need only concern itself with those areas of the genome that are know to contain mutations that result in pathologies.

I think you said the same thing but in a clearer way.


I think the problem is that aside from a handful, they are not known. It's like saying you're only going to copy the parts of a program where bit errors are known to cause problems.


It's not that only 2% _can_ vary, it's that each person has about 2% different from the reference genome (and that 2% is different for every single person).


And only a small subset of that 2% is able to cause pathology pathology.

So any diagnosic or risk predictive tests need only check those areas know to result in or increase the risk of pathology.


That's not at all true. For rare, undiagnosed disease we have to sequence the entire genome in order to look for the causative variants. For well understood (common) genetic disease we have small panels, but to say that only a small portion of the genome is informative is not correct. Additionally, there is no way to know a priori which loci will have the variation without sequencing the entire genome.


I think 23andme's genotyping is about 1% coverage compared to whole-exome sequencing, which itself sequences the ~1% of your DNA that codes for proteins.


One option is to take raw data from a service like 23andme and use imputation[1] to generate the missing data. This isn't as accurate properly sequencing everything, but it will get you more data to play with for free...

1: https://en.wikipedia.org/wiki/Imputation_(genetics)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: