

Ask HN: Does anyone have experience comparing genomes?  - stuntgoat

What types of complexity would be involved in computationally comparing 2 genomes to find over-expressed genes?<p>There has been news of success in comparing the genome sequences between cancer cells and healthy cells. I would like to hear any expert opinions regarding the complexity.<p>Some background:<p>The goal is to sequence patients' and their cancer's genomes, then compute the gene candidates to target for drug therapy, based on the difference of the two ( I know it's more complicated than that ). It took researchers a month to process the data, between the diff of normal and malignant cells. They then found a likely candidate gene that was over-expressed in the cancer cells- and coincidentally, a drug existed that targeted the gene, to the apparent success of the patient.<p>http://www.charlierose.com/view/interview/12455
http://www.nytimes.com/2012/07/08/health/in-gene-sequencing-treatment-for-leukemia-glimpses-of-the-future.html
======
tom_b
Sure. There are teams doing it every day right now, with clinical patients.
It's still research, but we are pushing variant calls back into patient's
medical record to help make treatment decisions.

From a complexity perspective, not nearly as much as you would think (by which
I mean there are _many_ software tools that already do this pretty
efficiently). The day-to-day problems are much simpler (and more standard IT)
than you might think. For example, you can safely assume that finding variants
is done. Now, where is the source of drugs that are in clinical trial or in
approved treatments for a specific mutation. Can you get that in front of
pathologists quickly?

You probably want to take a look at The Cancer Genome Atlas (TCGA) project.
They are sequencing normal and tumor tissue from patients across a large
number of cancer types and making the resulting sequence data available for
research.

Edit (additional info): It does NOT take a month to do this sequencing and it
is getting tremendously quicker every day. The MiSeq from Illumina can pump
out the fastq file from a normal or tumor sample in 24 hours (and it can do
something like 96 samples at a time, but I don't know if people push that in
production).

------
pyrogyn
I'm no expert, but it's an incredibly complex process. One of the things about
cancer cells is that they mutate at an amazingly fast rate because they are
constantly multiplying and dividing, and so it can be difficult to target just
a single gene that is responsible for transformation from a normal cell to a
cancerous cell. Unfortunately as well, we may be able to tell that a
particular gene has been mutated in such a way that it leads to overexpression
of X, but it's often not just the overexpression of a gene, it's often the
mutation of some enzyme that blocks another protein that will serve as a
repressor of some other gene. There's immense bodies of research, but there's
a ton that we simply just don't know yet.

