
Your Genome Structure, Not Genetic Mutations, Makes You Different - evo_9
http://www.wired.com/wiredscience/2011/07/genome-structure/
======
tocomment
This article really made me question my understanding of gene sequencing? Is
it saying we can't really sequence a genome? We can't tell how many times a
gene is repeated?

~~~
rflrob
Depends on what you mean by "sequence a genome". There's nothing out there
that can take a single chromosome and sit and spit out its sequence from one
end to the other (even from organisms like yeast that have much smaller
chromosomes). Instead, with high throughput sequencing you sequence between 30
and 150 base pairs at a time, and then stitch several million of those reads
together. This computationally difficult problem can be made easier if you
know the approximate spacing between two given reads, so you can connect
islands together. Illumina (and possibly SOLID, but I'm not sure) lets you do
paired-end sequencing, where you get both ends of a particular DNA molecule,
and depending on how you set up your samples, you can have mate pairs
separated by 3,000-5,000 base pairs. In order to get a really good _de novo_
assembly of a genome, you typically want on the order of 50-100x average
coverage of the genome. A single sequencing lane (which costs ~$1300) can get
you about 5x coverage of a genome.

It's often possible to tell how many times a gene is repeated because,
statistically, you should get fairly uniform coverage of the genome (although
there are lots of artifacts in the sequencing process that makes this less
true). If you get significantly more reads from a particular locus, it's
likely that it's been duplicated.

Glancing at the figures in the article, more than 65% of this "structural
variation" seems to be insertions or deletions of less than 10 bp, with only
2% coming from elements greater than 1,000bp.

~~~
tocomment
Thanks so much for the detailed answer? Is there anywhere to learn more about
this besides wikipedia? Every time I try to learn this I end up there and it
confuses the bejesus out of me.

What are mate pairs? How does that help with sequencing?

~~~
jsarch
Feel free to ping me after August 5 if you still have questions.

~~~
tocomment
What happens then?

------
turing
This isn't all that surprising. I've spent the summer working in a
bioinformatics lab, and just the other day we were talking about the
importance of structural variation.

------
ristretto
When should we expect full genome sequencing to become affordable?

~~~
jsarch
Depends on "affordable". Current bulk prices are about $5000 per genome though
the coverage can vary.

Here's the price per genome for the last decade:
<http://www.genome.gov/sequencingcosts/>

and an explanation: [http://singularityhub.com/2011/03/05/costs-of-dna-
sequencing...](http://singularityhub.com/2011/03/05/costs-of-dna-sequencing-
falling-fast-look-at-these-graphs/)

