
DNA seen through the eyes of a coder - xuki
http://ds9a.nl/amazing-dna?
======
stiff
Biology is completely different from Computer Science and metaphors between
the fields build no understanding and can only be misleading, every time I
hear someone comparing DNA to a computer program I fall into pieces. I
recommend "Molecular Biology for Computer Scientists" instead for those
willing to learn some actual biology:

[http://www.biostat.wisc.edu/~craven/hunter.pdf](http://www.biostat.wisc.edu/~craven/hunter.pdf)

I think it's the first chapter of this book:

[http://mitpress.mit.edu/books/processes-
life](http://mitpress.mit.edu/books/processes-life)

I once considered going into bioinformatics, and did an intense three weeks
sprint trying to learn some molecular biology, ending in a seminar
presentation to other people explaining the basics. I used this book back then
which I also recommend strongly to those interested:

[http://www.amazon.com/Bioinformatics-Molecular-Evolution-
Pau...](http://www.amazon.com/Bioinformatics-Molecular-Evolution-Paul-
Higgs/dp/1405106832/ref=sr_1_1?ie=UTF8&qid=1387141666&sr=8-1&keywords=bioinformatics+and+molecular+evolution)

It covers all the basics of molecular biology very understandably and at the
same time the scientific/computational content is interesting even for a
computer scientist. Still, learning this stuff takes hard work, you have to
rehash some relevant chemistry first or you get nowhere, than biologists use a
lot of both chemical and biological lingo which you have to understand, and
only then the actual biological content becomes clear. Once you do understand
it, however, it's beautiful, beautiful stuff, one of the most beautiful things
one can learn in general I think, of which you unfortunately won't get a sense
from reading this article, or in general from trying to understand it by
sloppy metaphors. Do yourself a favour and try to understand this for real.

~~~
ye
DNA is not just a sequence that encodes proteins. We just found out there's a
secondary higher level code in it:

[https://news.ycombinator.com/item?id=6896779](https://news.ycombinator.com/item?id=6896779)

And then there's RNA, which can fold and form structures, sort of like
proteins.

Anybody who has ever seriously dealt with things produced by evolution knows
that evolution "cheats" and optimizes when constrained by resources, producing
structure upon structure upon structure, crazy indirection and non-obvious
side-effects, as long as it helps the goal (survival and reproduction
usually).

That's why I don't think human brains will be understood any time soon. We
might figure out all the layers, but to figure out all the hidden structures
would take a superhuman AI.

EDIT:

Example of circuit design by evolutionary algorithms:

[http://hforsten.com/evolutionary-algorithms-and-analog-
elect...](http://hforsten.com/evolutionary-algorithms-and-analog-electronic-
circuits.html)

~~~
timr
_" DNA is not just a sequence that encodes proteins. We just found out there's
a secondary higher level code in it"_

So, I don't want to rain on anyone's parade, but we didn't "just" find this
out. We've known for quite a long time about secondary (and even tertiary!)
"codes" in DNA. That HN article was the result of a press-release about
something that was interesting, but certainly not earth-shattering new theory.
The reason it was in Science was because they did an extremely large-scale
test of a whole bunch of different codons on thousands of different gene
promoters, and _directly quantified_ the impact of rare codons. That was an
impressive way to settle an open debate.

Anyway, it's good to know that there are higher-order interactions encoded in
DNA than just DNA -> RNA -> Protein, but you should realize that this is an
old/deep area of research. This is pretty much what bioinformatics is _about_
, actually: deducing the higher-order structures in DNA, RNA and proteins.

------
atratus
It's important to be wary of the term 'junk DNA'...just because a segment of a
chromosome is noncoding does not mean it has no role in the genome's function.
Assembly of functional structure ie a Replisome requires formation of
elaborate secondary and tertiary 3-D conformations that support the primary
replication machinery. This is facilated by topoisomerases, binding proteins,
a whole soup of RNAs, and spans of "junk" which allow the necessary
conformations. In other cases, the 'junk' can serve to insulate highly
conserved genes. "Junk" is a terrible characterization.

This is one of those instances where the press/pop media can be a bit behind.
Some bchem textbooks from even a few years ago are obsolete. Research into
DNA-DNA interaction really has become hotter in only the last few years as
we've begun pinning down protein roles. There is a whole layer of interaction
between epigenetics, differential RNA splicing, and DNA-DNA feedback that is
just mind-boggling.

~~~
jonathansizz
Describing functional non-coding DNA as junk is obviously incorrect, but the
fact remains that most non-coding DNA is indeed non-functional, and therefore
junk.

~~~
dekhn
hahahah you're so naive.

can you provide some evidence there is any non-functional DNA in the genome?
Try hard, now, you're talking to somebody who's studied this for 20+ years.

~~~
jonathansizz
Despite your patronizing tone, I fear it may be you who is naive. I don't know
what field it is that you've been in for the last 20 years, but I suggest you
start reading up on genetics and biochemistry, as you've got a lot of catching
up to do. In the meantime, here are a few questions for you to think about.

You are already aware, of course, that 90% of the human genome is unconserved,
that 50% of the genome consists of dead transposons and viruses, and that mice
have been generated that are homozygous for megabase-scale deletions with no
discernible effects? Perhaps you can come up with some hypotheses that could
explain these facts that are consistent with your claim of functionality?

You've heard of pseudogenes, right? Why do they look exactly like broken
genes? What do they do?

Why is there great variability in the sequences of repetitive DNA in many
species, including humans? Why do some individuals have many more copies of
these tandem repeats than others? Are the extra repeats functional? If so, why
does the number of copies change rapidly and stochastically from generation to
generation? Do you know anyone who thinks satellite DNA is functional?

The links below will help you get started on the basics of genome biology.
Good luck.

[http://www.genomicron.evolverzone.com/2007/04/word-about-
jun...](http://www.genomicron.evolverzone.com/2007/04/word-about-junk-dna/)

[http://sandwalk.blogspot.com/2008/02/theme-genomes-junk-
dna....](http://sandwalk.blogspot.com/2008/02/theme-genomes-junk-dna.html)

~~~
dekhn
Regarding mice, you are likely referring to Eddy Rubin's paper (I familiar of
the work in Eddy's team at Berkeley when I was a postdoc in functional
genomics in Steven Brenner's lab).

[http://www.ncbi.nlm.nih.gov/pubmed/15496924](http://www.ncbi.nlm.nih.gov/pubmed/15496924)

"Some of the deleted sequences might encode for functions unidentified in our
screen; nonetheless, these studies further support the existence of
potentially 'disposable DNA' in the genomes of mammals."

Note their qualifications. I'm asking for proof. Making a viable mouse that
has a lot of deletions is in no way evidence that the regions are
nonfunctional. For example, maybe they deleted a conserved element which has
cold-shock response potential, but the mouse was raised in a room temperature
environment.

Many of those "worthless transposons" actually seem to play a role in
evolution
[http://www.sciencedirect.com/science/article/pii/S1369526612...](http://www.sciencedirect.com/science/article/pii/S1369526612001094)
and others show that even "dead" transposon probably play a role in the
evolution of tail exons:
[http://genomebiology.com/content/11/6/r59](http://genomebiology.com/content/11/6/r59)

basically, I know where you're coming from. I used to even believe the dogma.
Then I spent some deep time looking at genomes, functional evolution, and
biophysics, and came to the conclusion that papers like this:
[http://www.nature.com/nature/journal/v284/n5757/abs/284604a0...](http://www.nature.com/nature/journal/v284/n5757/abs/284604a0.html)
which influenced an entire generation of scientists, are just wrong.
Meaningless speculation in the absence of data! Those of us who have spent a
lot of time digging into ENCODE and trying to find the really valuable nuggets
are starting to come to the conclusion that vast "deserts" of the genome are
in fact filled with rich regulatory elements and other functional (including
as-yet uncharacterized elements) elements such as RNA genes that classic
mechanisms of DNA-evolution-constraint measurement, like Jukes Cantor, are
unable to process.

~~~
jonathansizz
Why won't you answer my questions? It's not up to me to prove a negative, if
such a thing were possible.

I wasn't referring to active or co-opted transposons, which make up less than
1% of total transposon sequence in the genome. I'm talking about dead, non-
functional transposons. These were indeed active and functional in the distant
past, but are long since dead. If you have an hypothesis that such sequences
in fact have a function, then by all means let's hear it. Likewise with dead
viruses, satellites, etc.

It's interesting you mention 'meaningless speculation in the absence of data',
since your friends at ENCODE recently embarrassed themselves in just such a
way with their pervasive transcription nonsense, as do those who claim
function for repetitive, unconserved regions of the genome without any
evidence or even an hypothesis, when we have every reason to believe that
these are the remnants of formerly-active elements that were subsequently
inactivated.

As I said above, it's clear that a small fraction of non-coding DNA has a
regulatory role. We've known about this for decades. Recent discoveries don't
account for much; RNA genes make up about 4% of the genome.

Sorry if I've been overly harsh in my replies, but from my experience many
bioinformaticians have shown themselves to be ignorant about even basic
biology, so I took the precaution of going over a few basics.

~~~
dekhn
I'm not a bioinformatician. I'm a biophysicist (BS in Molecular Biology and
Biochemistry) who works in computer science. That said, as I have worked
closely with most of the world's genome experts (and parried with them on this
very issue), I am an authority in this area.

Anyway, I don't mind that you're harsh. It's pretty hard for people to unlearn
their years of misunderstanding the genome! There is a lot of misinformation.
Basically, you're just insisting on this line of reasoning:
[http://selab.janelia.org/people/eddys/blog/?p=683](http://selab.janelia.org/people/eddys/blog/?p=683)
"ENCODE says What?"

In regards to the ENCODE project's "embarassment", a couple things happened:
1) the press attributed a number of overly aggressive claims to ENCODE. If you
read the papers you'll notice they used very hedged statements. 2) ENCODE
actually observed some very interesting things. We took their data, and worked
on it some more, and we found that a number of their measured transcription
tracks actually did represent functional biology. In particular, I can
_assure_ you, based on _very solid evidence_ that your estimate that RNA genes
make up only 4% of the genome is _vastly_ lower than reality.

------
Jun8
This is fantastic! It would be awesome if there were workshops, say, of 3
months duration, where people from totally unrelated disciplines are put
together with no pre-knowledge and see if anything useful will come out of it.
Most of the time, nothing may come out of this, but every now and then
spectacular advances may come about, I'm sure (for an example, see Adleman's
development of DNA Computing,
[http://en.wikipedia.org/wiki/DNA_computing#History](http://en.wikipedia.org/wiki/DNA_computing#History)).

The problem is that decades of work in a narrow field, although it makes you
an expert, also dulls an outsider's novel look to the subject and leaves you
with numerous explicit and, more dangerously, implicit dogmas/assumptions.

~~~
toufka
As someone who has a DNA synthesizer in lab, who works on manipulating DNA
daily to make various synthetic proteins, who lives and works in the bay area,
and who knows enough coding to have understood the language of the OP - I'd
love to help bridge some of those boundaries. There is a _desperate_ need for
some programming types to think about these issues. The fields of systems &
synthetic biology are just starting to gain traction, but there is so much we
can learn from cross-discipline interaction. The one issue is that we pay
terribly...

~~~
rch
Could you add some contact information to your profile perhaps? And feel free
to contact me as well.

------
dekhn
Ah, this reminds me of my childhood. No- seriously, when I was in high school
almost 25 years ago I thought this way. My interest was more in the similarity
between the C preprocessor and intron splicing, and even dabbled with the
similarity between the ribosome and the compiler (except, the ribosome is
simultaneously far simpler than a compiler, yet infinitely richer in
complexity).

It's useful to have these analogies, and to some extent that really do
represent true universals. In particular, in reading the history of Crick, I
realized that he was a huge fan of information theory, and it helped guide his
thinking about how DNA sequences are interpreted and converetd to protein
sequences.

However, it can be dangerous to fall down this path. In particular, biology is
hotter, wetter and messier than computing. It requires scientists to have
extraordinarily flexible brains; I woudl say after many years, I think the
people I met in MIT Biology are smarter than the people in MIT CS- their
ability to reason over ambiguous data and come up with predictive conclusions
is downright amazing.

If you're a computer person who wants to learn more about this, I have a
couple suggestions: 1) buy Molecular Biology of the Cell 2) read the whole
goddamn thing, slowing down to understand every concept rather than skimming.

------
thethirdwheel
My background is in bioinformatics, so this naturally caught my eye. I came
away disappointed. The mappings are no easier to understand than simplistic
descriptions in biology textbooks. The only thing they add is the mistaken
impression of intent in the genetic code, and the expectation the analogy will
continue to hold outside the scope of the enumerated mappings. Kind of ironic
to run into that issue with so many Dawkins references at the end...

~~~
benched
Surely mappings onto _programming_ concepts can provide a different kind of
understanding to _programmers_ than biology textbooks. I'm kind of appalled by
the anti-cross-disciplinary attitude displayed on HN sometimes. (Like the
recent 'This guy's a physicist. He should shut up about a.i.' in regards to
David Deutsch.)

------
cristianpascu
It's beyond my understanding, as a physicist and programmer, how can someone
write a full comparison between DNA and a programming language or source code
as written by intelligent beings, and at the end recommend a work on 'evidence
that there is no designer' of life.

The very definition of intelligence is not 'being smart', but having the
ability to select one option out of a set of possible options. That is what
we, programmers, do. We don't just throw lines of code randomly. We select
specific ones for a specific purpose. That's how we build software, mechanisms
of information put in motion by the computer. We put our logic into a
decisional mechanism which mimics our decisional ability.

However, life does more than that. Life is more than a mechanism driven by a
source code. Consciousness goes beyond rules of decision found in programmable
machines. But even if you're a physicalist, the abilities that simple beings
have such as recognizing objects, paths, building nests, traveling long
distances, using tools, are amazing in their own right.

And yet, these all are strong evidence there is no designer beyond it all.
It's mere chance, bits on a string selected by nature.

~~~
skyraider
Not to go all [0][1][2][3] on one word in your thoughtful comment, but for me,
it is always worthwhile to ponder the fact that the theory of evolution is not
about chance per se.

You take the intelligent-design approach - pattern recognition on designed
things. I hear you. But what about the compelling inductive evidence we have
that deterministic, if chaotic, natural systems are perfectly capable of
establishing a feedback loop and modifying themselves?

IMHO, accepting evolution is all about wrapping one's head around the notion
that the result of this modification, when stretched over unimaginably large
timescales, is significant change.

~~~
kingkawn
The time scales are not unimaginable at all.

[http://en.wikipedia.org/wiki/Domesticated_silver_fox](http://en.wikipedia.org/wiki/Domesticated_silver_fox)

By selecting for behavior rather than physical byproduct, they have induced
evolution at a rate not thought possible in vertebrates.

Not to mention:
[http://en.wikipedia.org/wiki/Evolution_of_influenza](http://en.wikipedia.org/wiki/Evolution_of_influenza)

Edit: I think the parent comment was about the difficulty imagining the
process of evolution which we assume takes billions of years because we cannot
observe its impact directly in ourselves except to interpret the results. Yet,
it is both observable and inducible in things around us. Not so hard to
imagine.

~~~
nitrogen
I believe the parent comment was referring to the difficulty of imagining
billions of years.

~~~
skyraider
Right. My point is that the evidence is there but there's a difficulty in
developing or accepting an intuition for evolution given the huge timescales
involved in unicellular-to-human evolution, even though it's well-supported by
inductive and deductive evidence.

------
kamakazizuru
this guy really needs to go speak to a bioinformatician. Having studied the
same myself - I can safely say that he is at best drawing vague analogies -
the goal of this exercise however is very unclear (especially ending with all
the Dawking b.s.). I take it as him trying to say "oh look it may seem like a
programming language - but it's not - so that means we were not designed by
some intelligent being". But that's based on the flawed assumption of DNA
being like a programming language. It's not - it's a mapping - there's no
point comparing an orange to an apple - and saying - here's why it could be
that an orange is an apple - but in reality it isn't. In fact - drawing such
analogies is what limits our understanding of the DNA in the first place (and
which is why increasing amounts of research is going into looking at it from
more multi discplinary perspectives). As a simple example - researchers at
Uwash recently discovered the 2nd meaning of some genetic sequences [1].
Essentially - this article is taking something man-made (programming languages
& software engineering approaches) - which are often influenced by natural
designs - and then comparing them to a natural design - that has a different
purpose.

[1] [http://www.washington.edu/news/2013/12/12/scientists-
discove...](http://www.washington.edu/news/2013/12/12/scientists-discover-
double-meaning-in-genetic-code/)

~~~
ahubert
author here. Turns out I am a bioinformatician these days, ten years after
writing this page. I work in an actual lab, with biologists, getting results.
[http://bertusbeaumontlab.tudelft.nl/](http://bertusbeaumontlab.tudelft.nl/)
is what I do. DNA is not a programming language, and the page itself says so.
But it is possible to admire the features that DNA _does_ have through the
eyes of a software developer!

Secondly, within the world of practicing DNA scientists, the link you provided
was 'laughed out of court' as being the stuff of PR dreams, but not actually
true. For example, see [http://www.bacteriatobonobos.com/1/post/2013/12/go-
home-pr-t...](http://www.bacteriatobonobos.com/1/post/2013/12/go-home-pr-team-
youre-drunk.html)

~~~
gravedave
Your article was written quite a while ago. Could you please update it to
conform to more modern ways of of the web (e.g. less tables, more floats and
divs/sections) if it's not much trouble? I like to style pages client-side,
and the tr td stuff gets a bit in the way.

------
nabla9
Using coding examples like conditional compilation is not right abstraction
for programmers to understand how genes compute.

GRN is.

[https://en.wikipedia.org/wiki/Gene_regulatory_network](https://en.wikipedia.org/wiki/Gene_regulatory_network)

[https://en.wikipedia.org/wiki/Gene_regulatory_network#Modell...](https://en.wikipedia.org/wiki/Gene_regulatory_network#Modelling)

GRN can be modeled using different levels of abstraction and accuracy as
boolean network , recurrent neural network or as stochastic gene networks.

In other words, they are capable of complex computations, but computational
model looks more like neural network or stochastic network.

~~~
mattdeboard
Well, speaking as someone who has no idea what the computational model of a
neural nor stochastic network is, I strongly disagree with you.

------
Tycho
_Now, DNA is not like a computer programming language. It really isn 't. But
there are some whopping analogies. We can view each cell as a CPU, running its
own kernel. Each cell has a copy of the entire kernel, but choses to activate
only the relevant parts. Which modules or drivers it loads, so to speak._

I wonder if we turned this back around, would it suggest some novel designs
for computer systems?

------
grownseed
This is wonderful, I've always seen programming as the application of a given
mindset (as opposed to the other way around) and for years since I was a kid,
I thought biology, and in particular DNA, applied to the concept very well.
It's not until watching the show Regenesis that I realized there was a field
for it, Bioinformatics!

After years of being a senior dev and such in some web shop, I'm actually
starting a job in bioinformatics in a few weeks, it's beyond exciting. It's
articles like this that remind me why being a programmer can be interesting
beyond the code. We live in very interesting times.

------
mikelemmon
Perhaps the programming analogy is more similar to a language such as Logo or
G-code used in CNC machines, that is used more to provide instructions to
build something rather than computation and logical operations.

[http://en.wikipedia.org/wiki/Logo_(programming_language)](http://en.wikipedia.org/wiki/Logo_\(programming_language\))
[http://en.wikipedia.org/wiki/G-code](http://en.wikipedia.org/wiki/G-code)

------
coin
"Coder" \- boy do I hate that term. It implies that all the person does is
code - no design, no collaboration, no releasing, no testing. It's like
calling a roofer an hammerer.

As a software engineer/developer/programmer, coding is just one aspect of what
I do.

~~~
err4nt
Perhaps _encoder_ would fit your use of the term better. A craftsman who takes
all of these disparate considerations (design, functionality, speed,
sustainability) and measures them up against the known limitations of
programming languages and the existing hardware available, and intentionally
encodes that experience/exptertise into machine language so that knowledge can
be easily communicated from human to human.

------
Aardwolf
What I always think would be a cool device (science fiction of course), would
be one which you can give DNA code (be it copied from an existing creature,
modified by someone, or computer generated), and then the machine produces the
organism from that DNA code.

~~~
toufka
At this point, it's just a matter of how tediously you want to do the work to
become famous:
[http://en.wikipedia.org/wiki/Mycoplasma_laboratorium](http://en.wikipedia.org/wiki/Mycoplasma_laboratorium)

Venter's doing it right now.

~~~
dekhn
I admire Venter's ability to withstand years of toil and criticism to prove
something that is absolutely fucking obvious.

Given enough time and energy, you can build an organism from scratch. Of
course, Venter's still pretty far away from that goal. But I suspect he will
succeed; we all know there are none but _technical_ barriers.

------
altras
Hey, you should check out [https://github.com/VarnaLab/node-
organic](https://github.com/VarnaLab/node-organic) \- organic development with
NodeJS :D It has implementations on java & php too :)

------
rakesh111989
There are many people who think that DNA code should not be compared to a
computer code. But I think it is actually a Holy Code. People argue that
because DNA is more complex than Computer code. But this complexity can be
explained in following way. When the first organisms came they only had amino
acids for doing all the biological processes so the holy code was very simple.
Than with evolution there was need of more complex code to execute more
complex biological processes, so RNA came into existence. These new living
thing had only RNA as genetic material like RNA virus. Than more evolution and
We got new version of Holy Code the DNA. It has happened in billions of years
so now I think you can now understand the reason for complexity. The another
reason why its complex is because DNA is a code but we cannot understand
computer code unless we know the language.

------
alcari
Here's a relevant, interesting talk [0] from 24C3 about engineering organisms.

[0]
[https://www.youtube.com/watch?v=gadBNBJRPr0](https://www.youtube.com/watch?v=gadBNBJRPr0)

