Hacker News new | past | comments | ask | show | jobs | submit login
Your Genome Structure, Not Genetic Mutations, Makes You Different (wired.com)
24 points by evo_9 on July 30, 2011 | hide | past | favorite | 12 comments



This article really made me question my understanding of gene sequencing? Is it saying we can't really sequence a genome? We can't tell how many times a gene is repeated?


Depends on what you mean by "sequence a genome". There's nothing out there that can take a single chromosome and sit and spit out its sequence from one end to the other (even from organisms like yeast that have much smaller chromosomes). Instead, with high throughput sequencing you sequence between 30 and 150 base pairs at a time, and then stitch several million of those reads together. This computationally difficult problem can be made easier if you know the approximate spacing between two given reads, so you can connect islands together. Illumina (and possibly SOLID, but I'm not sure) lets you do paired-end sequencing, where you get both ends of a particular DNA molecule, and depending on how you set up your samples, you can have mate pairs separated by 3,000-5,000 base pairs. In order to get a really good de novo assembly of a genome, you typically want on the order of 50-100x average coverage of the genome. A single sequencing lane (which costs ~$1300) can get you about 5x coverage of a genome.

It's often possible to tell how many times a gene is repeated because, statistically, you should get fairly uniform coverage of the genome (although there are lots of artifacts in the sequencing process that makes this less true). If you get significantly more reads from a particular locus, it's likely that it's been duplicated.

Glancing at the figures in the article, more than 65% of this "structural variation" seems to be insertions or deletions of less than 10 bp, with only 2% coming from elements greater than 1,000bp.


Thanks so much for the detailed answer? Is there anywhere to learn more about this besides wikipedia? Every time I try to learn this I end up there and it confuses the bejesus out of me.

What are mate pairs? How does that help with sequencing?


Sometimes, the Illumina website descriptions aren't too bad[1-2]. Unfortunately, I don't know of any great resources targeted to a lay audience (which is not to say they do or don't exist). I work in a lab that does a lot of high throughput sequencing, so when I have questions about particular assays, usually someone in lab knows, and if not someone else in the building does.

Mate-pairs are essentially reads taken from opposite ends of a single fragment of DNA. This is useful for de novo sequencing, where they act almost like a single, very long read. The problem with random sequencing (both shotgun sequencing and current "next-generation" sequencing technologies) is that it's not obvious how a given read relates to the other reads you have.

Using the jigsaw puzzle analogy, mate-pair sequencing is roughly equivalent to being told that a given piece is 8-10 pieces away from another piece in a given direction. You still won't necessarily know exactly where to put either of the pieces, but it narrows the search space hugely, especially when you have one in place.

[1] Mate Pair libraries: http://www.illumina.com/technology/mate_pair_sequencing_assa...

[2] Paired-end sequencing: http://www.illumina.com/technology/paired_end_sequencing_ass...


Feel free to ping me after August 5 if you still have questions.


What happens then?


Unfortunately you are correct, we are not really sequencing the genome. We are simply reading short segments of DNA, and comparing that to a reference sequence to find point mutations (eg. changes from G to C). We cannot easily determine whether there are large-scale duplications with the present technology. Ironically, the older, more expensive tech was better at this.

There is so much variation missed by "genome sequencing" that it is no surprise to me that we can only explain a tiny percentage of heritable human variation with genetics.

See: http://www.nature.com/nature/journal/v461/n7265/full/nature0... for more.


I believe Halcyon Molecular is doing long-reads using an electron microscope probe - which should be able to read longer sequences.

Also, if we identify specific mutations using the HM approach, I'd think that a test could be developed and run in parallel with a specific long base-pair probe that could be detected - correct me if I'm wrong.


This isn't all that surprising. I've spent the summer working in a bioinformatics lab, and just the other day we were talking about the importance of structural variation.


When should we expect full genome sequencing to become affordable?


Depends on "affordable". Current bulk prices are about $5000 per genome though the coverage can vary.

Here's the price per genome for the last decade: http://www.genome.gov/sequencingcosts/

and an explanation: http://singularityhub.com/2011/03/05/costs-of-dna-sequencing...


This blog claims 2014, although it reads as extremely optimistic, so we should probably assume 2020 to be safe.

http://blog.genomequest.com/2010/07/implications-of-exponent...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: