
Scientists discover second code hiding in DNA - bananacurve
http://www.washington.edu/news/2013/12/12/scientists-discover-double-meaning-in-genetic-code/
======
skosuri
I looked over the paper pretty quickly, but on a quick read it looks really
nice. The data are really impressive, and they seemed to have pulled out
really interesting trends for how transcription factors are affecting coding
sequence. Furthermore, they map how variants found across the various cell
lines they used (81!) to show that these variants actually are causal for
transcription factor binding changes. For those saying this is not a big deal,
or is obvious; it is, and is not (respectively).

It's been pretty clear for some time that codon choice is not random (hence
codon optimization is useful when taking a gene from one organism to another),
and over the last few years it's also been clear that codon bias can be
evolutionarily constrained. For a while, it mostly thought to be based on tRNA
levels (basically anticodons). However, it's been increasingly clear that
there are other constraints to protein coding sequences than just the decoded
protein sequence (aka the genetic code).

For example, we published a paper a few weeks back [1] showing that in
bacteria (and probably higher organisms), the N-terminus of genes has a lot of
rare codons and this is due to other constraints such as relieving mRNA
structure to allow better translation of proteins. I think in the coming
years, we will find that other regulatory elements also shape this code,
including sequences that control splicing, small RNAs, mRNA degradation and
transport, et cetera

Anyways, it's a pretty fun time in biology. The tools we have now make studies
that were ridiculously impossible just a few years ago, a reality for an
individual lab. I can't wait to see what the next few years bring.

[1]
[http://arep.med.harvard.edu/pdf/Goodman_Sci_13.pdf\](http://arep.med.harvard.edu/pdf/Goodman_Sci_13.pdf\\)

EDIT: Since my comment has hijacked the most useful comment linking to the
original study, I'll link to the comment here:

[https://news.ycombinator.com/item?id=6897122](https://news.ycombinator.com/item?id=6897122)

~~~
sounds
So I admit I haven't skimmed the paper yet.

Is it as comprehensive a map of DNA-encoded transcription factors as can be
had? I ask because I can understand the way DNA codes for proteins–as a
software-head–but the article definitely gets me with this "second language"
and I want to think more about how the overlapping meanings might have
developed (and how that might be apply to digital code).

Other than "start," "stop," "intron," "exon," and "suppress" (aka disable),
what do the transcription factors do in terms of actual cell proteins?

I admit that a really detailed description of cell protein processes will be
beyond my depth. :)

~~~
waterlesscloud
I know next to nothing about this field, but I just sort of brute-forced my
way through the OP paper with the help of wikipedia. It's a painful process.
:-)

But it has piqued my curiosity about the field. Are there any good
introductory books/websites for the curious layman to get a grounding?

~~~
skosuri
Depends on what level. Wikipedia is actually quite good for this sort of
thing. The classic book as a basic overview of molecular biology in general is
the Molecular Biology of the Cell [1]. A really cool website to look at on
cutting edge studies such as these is the ENCODE consortium's publications in
Nature [2]. They simultaneously published 30 papers last year, and this is a
really cool meta page that explains the ENCODE project and what they found.

[1].
[http://www.ncbi.nlm.nih.gov/books/NBK21054/](http://www.ncbi.nlm.nih.gov/books/NBK21054/)

[2]. [http://www.nature.com/encode/](http://www.nature.com/encode/)

------
tokenadult
It's a university press release.[1] I'm sending the link to the published
paper from _Science_ kindly shared in an earlier comment here to some
geneticists I know locally, who have just been in a seminar together this
afternoon, I'm pretty sure. The editors of _Science_ evidently agreed that
there is something interesting here, but they have been wrong before.[2]

[1]
[http://www.phdcomics.com/comics.php?f=1174](http://www.phdcomics.com/comics.php?f=1174)

[2]
[http://www.slate.com/articles/health_and_science/science/201...](http://www.slate.com/articles/health_and_science/science/2010/12/this_paper_should_not_have_been_published.html)

[https://www.sciencenews.org/node/5635](https://www.sciencenews.org/node/5635)

[http://www.nature.com/news/arsenic-life-bacterium-prefers-
ph...](http://www.nature.com/news/arsenic-life-bacterium-prefers-phosphorus-
after-all-1.11520)

AFTER EDIT: I heard back from one of my local geneticist friends, a
mathematician turned psychologist by higher education who largely does
statistical analysis as part of a team of researchers on behavior genetics. He
writes, from the perspective of behavior genetics research, "That is
fascinating, but if duons are also tagged by SNPs, and especially if they are
in the exomic DNA, we've already been studying them and finding very little.
In other words, this is huge for molecular genetics and physiology, but I'm
not so sure it changes what we do in genotype-phenotype association research."
So I take that to say that this could be quite a big deal for molecular
genetics and physiology, if this finding is confirmed in follow-up research.

------
bede
TL;DR for biologists: synonymous codon variation is heavily constrained by the
need to maintain transcription factor binding motifs.

While I found the article very interesting as a new postgraduate student, it
seems like such a straightforward deduction that I find it difficult to
believe no one has ever put it into words until now.

~~~
dl8
After talking to someone who knows quite a bit about this stuff, apparently
most people in the genomics/bioinformatics community kinda already knew about
this, just that the researchers who wrote the paper were the first one to
officially categorize them.

~~~
Agathos
My first thought on seeing the headline was, "What? Only 2?" Off the top of my
head I can think of:

1\. Amino acid coding

2\. Transcription factor binding and transcriptional regulation

3\. Post-transcriptional regulation and RNA degradation

4\. Intron/Exon Splice sites

5\. Chromosome structure and methylation

6\. Origins of replication

All of these have been shown to be controlled at least in part by the DNA's
sequence. This story seems to be about a (very interesting) new wrinkle in (2)
above.

~~~
dkural
In this paper I explore 1,3,4,and 6 (3 = microRNA binding):
[http://genomebiology.com/content/10/11/R133](http://genomebiology.com/content/10/11/R133)

------
bananacurve
Can someone with an AAAS membership please post the full article?

[https://www.sciencemag.org/content/342/6164/1367.full.pdf](https://www.sciencemag.org/content/342/6164/1367.full.pdf)

$20 for one day access! Fucking obscene.

I should join AAAS, it is only $50 for the year, but I can't swing it just
now.

~~~
bede
[http://bede.im/Science-2013-Stergachis-1367-72.pdf](http://bede.im/Science-2013-Stergachis-1367-72.pdf)

~~~
bananacurve
Thank you.

~~~
bede
Thank my university :) ...And perhaps my server, which definitely isn't
accustomed to HN traffic.

------
luketych
Funny how discoveries like this don't surprise me at all anymore. After
embracing concepts such as fractal geometry, recursion and the demand for
efficiency in nature, it makes sense that in nature everything has evolved in
a way in which every layer is a part of the next layer above it (quarks ->
electrons, protons -> elements -> compounds -> amino acids -> proteins ->
information -> abstract information -> ..., or binary -> hexadecimal ->
assembly -> low-level code -> high-level code -> abstractions -> ...), and
everything has value in more than one way. DNA, proteins, muscles, neural
clusters, even programming are all based around this principle. If you were
building two programs that were identical in many ways why wouldn't you
abstract out the commonalities, instead of building them twice?

The brain is no exception. I think musicians, for example, are better at math
because at the heart of music and at the heart of math, many of the same brain
circuits are involved. It is more efficient to have one copy of these circuits
and just apply them to both music and math, rather than having two copies of
nearly identical circuits. Proteins have more than one use in our bodies,
because if each only had only one use we would need an inefficient number of
them, possibly more than what exists. Same for neurotransmitters. When nature
is limited by constraints it usually finds a way to fold into a higher
dimension around that constraint (such as the neocortex wrinkling to increase
surface area, or DNA containing information on more than a single level, or
even how grass grows over a fallen log instead of "choosing" to just grow
somewhere else, or if you are into String Theory, how the universe/multiverse
has folded into 11 dimensions, possibly because that was the most efficient
way for our universe to exist.)

It reminds me of a river when it is initially forming down the side of a
mountain. The water takes the most efficient path at any given instance from
the top of the mountain to the bottom. The process is like a greedy algorithm.
The water cannot foresee where it will end up, it just flows. The water won't
always find the most efficient solution, but give it enough time and will find
an efficient-enough solution.

~~~
fivethree
This is sort of a misunderstanding of evolution. It's not building to the most
efficient or the greatest form of anything. It's spontaneous change that may
or may not survive with no real reference to "the best."

~~~
gizmo686
This also seems to misunderstand evolution. You can think of evolution as a
hill climbing optimization algorithm, with a complicated and constantly
changing fitness metric. While it is true that any given change is random, and
even 'good' changes still only succeed with some probability, over a long time
period we see a series of incremental improvements. If there is an opportunity
to increase efficiency it is likely that evolution would have found it, given
how long the process has been running. Of course, this become less likely when
we are discussing new features (the thumb is less optimized than DNA because
it is so recent), and as the benefit of these optimizations becomes less. They
also become less likely if their are fewer evolutionary paths that might lead
to them.

~~~
Double_Cast
That's not how it works. There's no single "global optimum" that lifeforms are
evolving towards; there's no final hilltop; there's no intent. This notion of
"Evolutionary Teleology" is how we get things like X-Men (which I enjoy for
the record, it's not like X-Men is taught in biology class). If there _were_
some global optimum, we'd all have eyesight like Legolas and the color-pallete
of a Mantis-Shrimp, instead of a blind spot in the center of our retina.

Evolution just happens, and it is what it is. E.g. Natural Selection didn't
"optimize" giraffes so they could reach leaves. Natural Selection is simply a
way of explaining that some giraffes had long necks, some had short, and the
shorter just happened to die out. Similarly, NBA players didn't "evolve" to
play basketball. Some people are tall, and some are short, and the short
people just didn't make the cut. "Select" != "Optimize". Evolution is better
thought of as a historical accident. We're talking monkeys on a flying rock.

------
bananacurve
Here is a decent analogy I read:

>Suppose we have a computer program which makes books. All of the commands
which tell your computer program how to write a book are stored as sequences
of 0's and 1's. Also, all of the letters, punctuation marks, and formatting
symbols (line break, indent etc) are stored as specific sequences of 0's and
1's.

The commands which control the computer program and the actual text of the
book are both stored in the same file. Up until now, we thought that these
were placed side by side in the file, so you would have a segment saying "at
6am every day, print the following text in 12 pt font, and bind it in a hard
red cover", followed by a text containing segment, and then another command
segment saying "when you finish, print a copy of the book 134 positions
further along in the file.

We thought this was the case because we have known the sequences of 0's and
1's that stand for each letter for a long time now. When we looked at a file
and attempted to interpret it as If it all stood for letters which made up
words' we would see something like "fffffgfgy6- fsjjjjjj the quick brown fox
jumped over the lazy dog.bttt68-%jjjjjjfffffffff". After a while, we learned
that the 'gibberish' segments were actually full of meaningful statements, but
they were in a different language and contained the commands for the program.

Now getting to today's article, this group has found that inside those
sections which code for the actual text of the book, there are some commands
that run the program. This is accomplished because there is more than one way
to write most of the letters. As an example, you could have one segment of
text which says "It was the best of times, it was the worst of times", and
another segment which says "It was the best of times, it was the worst of
times" and also tells the book program to bind the book in gold leaf. And the
only difference between the two is that the first one wrote the 's' in 'best'
as 00010110, and the second one wrote it as 00010111.

------
dnautics
This is very similar to earlier discoveries that the choice of codon can be
intimately tied in with protein domains folding; allowing proteins to
partially fold while the remainder is "stalled" on the ribosome waiting for
the rare codon. It's interesting, but I would hardly call it a "second code".

------
ddebernardy
Honest question... How is this different from what was discovered around 2007?

I ask, because insofar as I'm aware, the moment geneticians parsed the human
genome was the moment it occurred to them that most (90%?) of the information
was missing. (As in, information to make the remaining proteins that we build
in our body.)

I vaguely recollect 2007 being the year where they discoverned and/or
described how ARN can influence how ADN is read and turn bits and pieces of
information on and off to use the same code to build new proteins. And another
team the same year described how injecting ARN could allow to use existing ADN
to potentially build proteins that aren't synthesized in the body.

So... How is this different exactly? :-|

------
forgottenpaswrd
It looks like there is software in DNA, not just firmware. We are just
starting to understand that gene expression is very important too, code that
express(is executed) or not based on the environment, and that junk DNA is
probably "not understood code" .

~~~
kissickas
Any source on that bit about junk DNA?

I just completed a reasonably advanced course on computational biology and I
was still taught that although we have found noncoding DNA performing
functions, the majority of our DNA is both noncoding and not used for anything
(i.e. evolutionarily obsolete, transposons, etc).

~~~
jonathansizz
You are correct. Most non-coding DNA is junk, which amounts to around 90% of
the genome. Dead transposons alone make up just under half of the human
genome.

~~~
ConceptJunkie
You know, if we really wanted to be sure about that, we could just check the
commit comments.

------
xerophtye
I am confused. I have absolutely zero biology background so please, bear with
me.

So genes only code how to make proteins? That's IT? What about all the other
stuff like what you look like, what diseases you may or may not get, some
special functions of your body, your biological strengths and weaknesses, etc
etc. Or defining protein generation actually defines all of that? (that would
be so fascinating).

And 90% of the DNA is labeled as "non-coding?" ?!?!? Seriously? well it can't
be "junk" can it? Or maybe its like a long bitcoin chain with only the latest
commit relevant....

~~~
Myrmornis
Yes, specifying which proteins are made, and how much of each and how that
varies with cell type is most of what determines an organism's phenotype. This
is largely because enzymes are proteins, and enzymes control all the
biochemical reactions in a cell. But proteins can also be transcription
factors, which control which other proteins are produced, and they can be
important for their mechanical properties, etc. So, not some boring category
of biomolecule that gets listed on nutritional information section on food
packaging.

------
tgb
Like this one? [http://dresdencodak.com/2009/07/12/fabulous-
prizes/](http://dresdencodak.com/2009/07/12/fabulous-prizes/)

------
tsenkov
I am not a biologist, but this discovery seems ground-braking to me.

What strikes me as an interesting question (I hope someone of you can answer)
is: How long will it be, before physicians/gene specialists/biologists/etc.
will use the new way of DNA interpretation in cases for regular people? How
long before this discovery can be used in mainstream medicine?

My (uneducated) guess is: there has to be a huge reevaluation of the
accumulated data (and assumptions) about gene connections with deceases.

~~~
schiffern
>How long before this discovery can be used in mainstream medicine?

Almost immediately. While it won't have therapeutic applications for some
time, the first benefit will be helping scientists better understand the cause
of certain diseases.

>ground-braking

That's like aerobraking, right? ;)

~~~
dragonwriter
> > ground-braking

> That's like aerobraking, right? ;)

More like lithobraking [1].

[1]
[http://en.wikipedia.org/wiki/Lithobraking](http://en.wikipedia.org/wiki/Lithobraking)

------
wbhart
The wording of the news is quite similar to a discovery reported in 2006:
[http://www.nytimes.com/2006/07/25/science/25dna.html?_r=0](http://www.nytimes.com/2006/07/25/science/25dna.html?_r=0)

So is this a third code? A rediscovery of the same thing? Or a refinement of
that earlier work?

------
mseepgood
64 letter alphabet? Base64 encoded?

~~~
chmike
A protein is a chain of amino acids. There are 21 different amino acids. The
alphabet of proteins is made of 21 different letters.

DNA strings are sequence of Deoxyribose Nucleic Acid (DNA). There are only 4
of them named A, T, C, G. The alphabet of DNA is thus made of 4 letters.

One needs at least 3 DNA letters to encode one of the 21 amino acids. These 3
DNA letters are called _codon_. But with 3 DNA letters there exist 4^3 = 64
different codons. This is similar to a Base64 encoding just by the number of
letters in the codon alphabet.

Since 64 is much more than 21, many different codons encode the same amino
acid. This provides a liberty degree which is apparently used to encode
something else. As suggested by this research result, the secondary encoding
would sort of encode the predicate specifying when and maybe how the DNA
sequence encoding a protein should be decoded.

------
vincie
The Trekkies were right all along:
[http://en.wikipedia.org/wiki/The_Chase_(Star_Trek:_The_Next_...](http://en.wikipedia.org/wiki/The_Chase_\(Star_Trek:_The_Next_Generation\))

------
coldtea
So, there was a second code hiding in DNA, with different types of
instructions, and those scientists discovered it.

That's nice, that's how science progresses.

Now, let's go to how business works. What about all those "scientists" (nay,
engineers doing whatever for a payroll) and all those corporations, cocky and
convinced that they had all the knowledge they needed, working as amateur
magicians with DNA to create synthetic food, organs, new drugs, etc?

They essentially said "fuck" to a better and timely scientific understanding,
in order to greedily rush to market some half baked results -- and
consequences (medical or otherwise) be damned.

------
qrybam
This is very interesting. Makes me imagine that there might be even higher
levels of control/abstraction over the top of this.

------
brutopia
Seems like scientists haven't discovered yet how not to include unencrypted
resources on https sites.

------
mikecane
So, can someone tie this into 23andMe? Wouldn't it change what they looked
for?

~~~
kissickas
It might indeed change our understanding of specific point mutations, for
example, but it won't negate any of their research, as far as I can tell. On
the contrary, it will help them moving forward as we now have some basis on
which to understand what is going on a little better in the steps between
genotype and phenotype.

------
atmosx
That's a Nobel prize imho. Good going :-)

------
RokStdy
The second code is:

    
    
      Up Up Down Down Left Right Left Right B A

~~~
AnthonBerg
Comments like this encourage me to write more quality posts so I can get
downvote rights. Sorry.

~~~
RokStdy
I figured I might get downvoted, but when I read the headline the konami code
stuck in my head and I couldn't resist. It made me laugh, and I thought it
might make others laugh. Which to me has (a little) value.

But now, you are pot and I am kettle (or vice versa). Your comment adds no
_value_ or, I'd argue less value, because at least I was trying to be funny.
</derail>

~~~
biot
Instead of having a debate between the pot and kettle, perhaps you'd listen to
the stove?

[http://www.paulgraham.com/hackernews.html](http://www.paulgraham.com/hackernews.html)

    
    
      "The most dangerous form of stupid comment is not the
       long but mistaken argument, but the dumb joke. [...]
    
       Bad comments are like kudzu: they take over rapidly.
       Comments have much more effect on new comments than
       submissions have on new submissions. If someone submits
       a lame article, the other submissions don't all become
       lame. But if someone posts a stupid comment on a thread,
       that sets the tone for the region around it. People 
       reply to dumb jokes with dumb jokes."
    

By writing a dumb joke, you are signaling to new members that it's okay to
rehash old, lame memes for nothing more than the lulz.

~~~
asveikau
Right, because what constitutes a "dumb joke" can be universally agreed upon.

~~~
coldtea
No, because any joke is a "dumb joke" in the context of a HN discussion.
There's no debate about the smartness of them.

The "dumb" part is about it being a joke, as opposed to a serious
contribution, argument or commentary. Not about the joke's quality as a joke.
That is, even if you're Woody Allen, HN does not want you to write jokes in
here -- and especially one liners without any other content.

~~~
mtdewcmu
>> HN does not want you to write jokes in here -- and especially one liners
without any other content.

But humor is insight, and brevity is the soul of wit. I saw it as commentary
on the headline itself, which was ambiguous and vague. I thought it meant
scientists had found an easter egg or the Enigma code hidden in DNA.

~~~
coldtea
> _But humor is insight_

Not really. Most of the time is just the mental analogue of fart jokes.

> _and brevity is the soul of wit_

No, that's just a cliche.

~~~
mtdewcmu
_> Not really. Most of the time is just the mental analogue of fart jokes._

I didn't say all insights were equal. If this was just a fart joke, I wouldn't
have thought it was funny. But you can't draw a line between insight and
humor, can you?

 _> No, that's just a cliche._

It's a cliche because it's Shakespeare. That doesn't make it less true.

