

Genomics: ENCODE explained - carbocation
http://www.nature.com/nature/journal/v489/n7414/full/489052a.html

======
east2west
I am an indirect member of modENCODE project -- I am paid by the project but
work on computational and data analysis and therefore not wed to it. I feel I
should clear up a few things.

First, that 80% figure is a guesstimate, far from definitive.

Second, those "new" discoveries are "associated" by statistical tests. All
popular reporting has sidestepped the issue by not explaining the quote around
word associated. Basically it means they occur together above a certain
statistical threshold. For vast majority of noncoding regions we know nothing
about their functions, and this situation will remain for the foreseeable
future. I don't have to tell people here that correlation is not causation.
Recall all the reported health benefits of wine drinking; it is same degree of
certainty.

Third, new technologies inevitably have teething problems, which have not been
fully worked out. This means all reported conclusions comply with accepted
notions within respective scientific communities. The data are not accurate
and reliable enough to overturn any received wisdom.

Forth, yes, this means the main reported scientific results are not new nor
that surprising to people in the field. But this is the way popular science
and specialists interact.

By and large these papers represents a lot of hard work, but in terms of
amount of data and research topics they are already being overtaken, as they
should be.

------
ChuckMcM
This is great work. read the NYT article on the train home and then picked up
a couple of the papers. The cool thing for me was the realization that when
DNA is coiled, parts of the strand that are linearly very far apart end up
adjacent in the coil. And further, that adjacency affects how the gene
expresses proteins. That means that you not only need a decoded genome, you
need to understand the relationship of the segments in 3D. For me at least it
feels a lot more reasonable than the original numbers of genes. At least in
terms of describing humans, as diverse and complex as we are.

Someone of course will do the same for chimps and bonobos, and then we'll know
which of the switches are important as well as the genes. Very cool result and
I am confident that several Nobel prizes will come out of it.

~~~
jostmey
A lot more information is needed to understand the genome than just the linear
sequence or 3D structure. There is phosphorylation and histone tags and RNA
splicing and more (and that's just what has been discovered). All this
information is needed to understand how the genome works. It is basic
genetics.

~~~
ChuckMcM
_"It is basic genetics."_

According to the NYT [1] it's 'new' basic genetics. Basically all the 'junk'
DNA isn't junk, and the reason the function of that DNA was not appreciated
before the ENCODE effort is that these bits appeared 'far away' from the genes
which it turns out they affect.

[1] "There is another sort of hairball as well: the complex three-dimensional
structure of DNA. Human DNA is such a long strand — about 10 feet of DNA
stuffed into a microscopic nucleus of a cell — that it fits only because it is
tightly wound and coiled around itself. When they looked at the three-
dimensional structure — the hairball — Encode researchers discovered that
small segments of dark-matter DNA are often quite close to genes they control.
In the past, when they analyzed only the uncoiled length of DNA, those
controlling regions appeared to be far from the genes they affect."
[http://www.nytimes.com/2012/09/06/science/far-from-junk-
dna-...](http://www.nytimes.com/2012/09/06/science/far-from-junk-dna-dark-
matter-proves-crucial-to-health.html?_r=1&pagewanted=all)

~~~
east2west
If you knew how they found the elements together, you would be far less
sanguine about the significance. They started with random sampling of close
proximity of genomic positions, plus some basic structural info on how
chromatins (coiled DNA molecular) are organized. Then it is a nonlinear
optimization with 1e-2 convergence criteria.

Several things lay people are never told: 1\. Even for fixed data, the results
are variable, i.e., you will get a different answer every time you run the
nonlinear programming code. 2\. There are different ways of getting to 3D
positions of genomes and none is ever verified. 3\. Tuning parameters greatly
impact results. Yet one set of tuning parameters are just as valid as another.

I had the misfortune of helping someone to extend the code for this stuff that
made into a Science paper, and I can tell you right now it is the worst C++
program I have ever seen.

------
jostmey
When the first human genome was sequenced many people thought that all the
hard work had been done. Most people thought that the genetic code, perhaps
like computer code, was well organized and well laid out. But organization
arises in complex systems that are designed by intelligent beings, namely
ourselves. The genetic code, shaped by the slow process of evolution, is not
organized. It is like a tangled mess of spaghetti code. When scientist
announced that there were only 25 000 genes in the human genome, they had
identified only the most organized parts of the genome, comprising a mere 1%
of our DNA. Determining what the remaining 99% of the genome does will be
challenging because there appear to be few organizing principles to the DNA's
layout to make research easier.

~~~
bigiain
"Most people thought that the genetic code, perhaps like computer code, was
well organized and well laid out. "

Oh my god! We've sequenced and decoded the entire human genome and… we're
written in _Perl!_

(And, given my gag and the backend of this site, of course I have to post:
<http://xkcd.com/224/> )

------
turok2step
Is the quest for immortality the only thing that drives research like this?
The PI's interviewed in the promo video for ENCODE (
<http://www.youtube.com/watch?v=PsV_sEDSE2o> ) can only point to biomedical
implications to justify their continued funding.

~~~
east2west
This is the reality of biomedical research today. With NIH funding level
flatlining in recent years and congress leery of more spending in anything
other than military, medical utility is the mantra now within NIH. The irony
is that most of ENCODE people are not medical researchers and are less
qualified and less experienced than human genetics researchers. This is all
about money. Scientists need to eat too. And reputation, and power. It is a
very high-stakes game.

~~~
jostmey
Agreed. While I think what ENCODE is doing is important, I feel that they
oversold the value of their work. In my opinion, they obviously wanted to
ensure continued funding.

It is not a bad thing that every once in a recession less fruitful areas of
research are trimmed from funding to make room for more important work. To
often scientist lose sight of why their research is important, and forget
about trying to solve real problems. That said, too much money is poured into
military research and not enough into basic science, which is what pays off in
the long run.

~~~
cwhittle
This is the basic science that pays off in the long run. You can't just turn
biomedical science projects on and off. It takes time and investment to
develop the techniques and technologies to do this work, to gather the
samples, and ensure data quality across the project labs. During the time that
the ENCODE project was funded, the technology for doing the types of
experiments to get this kind of data advanced many times. We are now talking
realistically about personal genomics and the $1000 genome; at the project
start we were still celebrating the 3-billion-dollar genome sequence.

