
DNA seen through the eyes of a coder (2017) - sundarurfriend
https://ds9a.nl/amazing-dna/
======
jfarlow
As someone building software to design genetic tools, this is actually a
pretty good overview. It's analogies are pretty solid, even if they are just
analogies. It's a well-described snapshot of the early phase of education
where this helps build a grand intuition and relations between disciplines.
Just be careful to remember that they're just analogies, and in general can't
be relied on to actually discover, invent, design or conclude.

The fun part for everyone here is that the science has improved so far so
quickly that at this point that we _are_ starting to be able to improve,
implement and otherwise make good on those coding-style capabilities. We can
start to write, patch, update, and affect that code - and do so in intelligent
and rational ways, rather than the screens set up in the past century.

The next section I would add would be one that actually started to talk about
the compiled programs - the protein's themselves. All that code, all those
rules, all those heuristics are there to be run through an atomic 3D-printer
to produce proteins. And in an ensemble, those proteins form a closed
community within a cell, which then form communities in a tissue, which form
in communities called an organism. Those compiled proteins actually do the
work at the lowest levels but often get a bit of short shrift while we all
stare at their source-code in amazement. The source is elegant, but the
programs are even more amazing, in my humble opinion.

Here are some 'profiles' of the kinds of economically and socially interesting
programs we've started to make based on the above source-code. An 'app-store'
of compiled apps if you will:
[https://serotiny.bio/notes/proteins/](https://serotiny.bio/notes/proteins/)

~~~
westoncb
> The source is elegant, but the programs are even more amazing, in my humble
> opinion.

Makes me curious about the 'runtime' which makes their impressive execution
possible—granting that the analogy has definite limitations and could be
misleading. The aspect of a 'runtime' which I see carrying over is that once
the 'source' is 'compiled' into proteins, _something_ makes the interesting
behavior of those proteins what it is. I think where the analogy could be
misleading is that the architecture of programming language runtimes tends to
be a single coherent system with definite separation from programs executing
within; whereas in the case of proteins, a large part of what provides for
their biologically interesting behavior is just their general chemical
properties (which are sort of contained within the 'programs' themselves,
rather than in a separate external system). But there are probably aspects of
the environment into which the proteins are released that systematically
influence their behavior in a way that could be abstracted and studied (at
which point it might have some loose resemblance in role to language
runtimes). Maybe?

~~~
mygo
Your 'runtime' is physics.

A protein is a chain of concatenated amino acids.

The chain will "ball up" into a 3-dimensional shape.

Its shape = its function.

Why does the balled-up chain function the way that it functions? Because other
balled-up protein chains bump up against it in a certain way and suddenly an
iron molecule can be contained within the confines of this mega-structure that
has formed, and you now have hemoglobin.

Why does the chain ball up the way it does? Can we predict how a protein chain
will ball up? Well, protein folding is an NP-Complete problem. Solve it and
you will get _at least_ $1M USD, change the world, etc.

\- Software Engineer with a B.S. Degree in Biological Science w/ a focus on
genetics + minor in Chem.

~~~
westoncb
But physics is the 'runtime' for everything at some level of interpretation. I
point out this other hypothetical system because it's possible for 'higher-
level' runtimes to emerge from lower-level physicals rules, and this looks
like it might be one of those cases. (To clarify, by 'emerge', I mean another
consistent set of behaviors appears which are capable of formal description
and which can be viewed as consequences of a lower-level system without
actually resembling that system.)

~~~
chillacy
I had a discussion about this a few days ago. If the current state of DNA
programming is done at the physical level, that's like writing your code by
doing photolithography on silicon: not very productive. A level up, maybe the
Von Neumann architecture for genetic engineering is the world of DNA and the
various proteins like transcriptase. When will we come up with the equivalent
Structured Programming? Or Object Orientation? Or Operating Systems?

Each represents an abstraction over the others, and increases productivity.

~~~
nicwilson
The level up from DNA is proteins, the molecules that actually do stuff. These
are relatively structured: Primary structure is the amino acid sequence.

Secondary is the structures that from out of the primary sequence: Helices
coil, like DNA except except only a single helix not double. Double and triple
helices (see keratin and collagen respectively) are more like winding 2 or 3
single helices around themselves, and Sheets: two or more strands that bond to
each other either parallel or antiparallel.

Then it starts to get interesting. These secondary structures form domains
which are the functional subunit of proteins that actually do (or are in the
case of non-enzymes) stuff. These are the equivalent of structured
programming. E.g. join an antibody variable domain (the bit that stick to
stuff) to an enzyme, inject it into the bloodstream and you get expression on
the enzyme wherever the antibody happens to bind (e.g. to a cancer cell).

(Tertiary and quaternary structure refer to the complete protein and to
proteins that form a functional unit with other proteins respectively.)

The OO analogy is like the OO analogy for HDLs, the protein _is_ an object. OS
is whole organism level.

~~~
gilleain
Although it doesn't alter the analogy much, there is also the even higher
level of 'quinary structure' \- although it is disputed:

[https://www.ncbi.nlm.nih.gov/pubmed/23943406](https://www.ncbi.nlm.nih.gov/pubmed/23943406)

------
visarga
You know what blew my mind about DNA? The "Gene Regulatory Networks".
[https://en.wikipedia.org/wiki/Gene_regulatory_network](https://en.wikipedia.org/wiki/Gene_regulatory_network)

They work analogically like neural networks, where genes act as signals to
other genes in a complex graph of dependency. Isn't that amazing? Each cell
has a "gene based brain" (an RNN) which can do computation - it can map states
to actions. Each neuron in the brain is actually a whole neural network. GRN's
are probably the most energy efficient neural nets that exist in the universe,
and they're also self replicators.

------
bmsran
"Is here. This not a joke. We can wonder about the license though. Maybe we
should ask the walking product of this source: Craig Venter."

The reference build of the human genome (provided here by Ensembl) is almost
entirely derived from the public human genome sequencing project, not the
private project led by Venter.

~~~
chrisamiller
And the public reference genome is about 2/3 from a single African American
individual from Buffalo, New York!

------
colemannugent
Definitely an interesting take on the "coding" of DNA.

Something that has always interested me is similarities in natural and
unnatural mechanisms, like how modern cameras mimic eyes right down to the
tiny motors that drive the auto-focus mechanisms.

The long sequence of nucleotides reminds me of a strange sort of Turing
machine capable of building humans.

~~~
dekhn
the big difference between modern cameras and eyes is that a huge amount of
neural processing is done in the retina of the eye. There aren't any cameras
(that I know of) that compute such sophisticaed levels of processing (edge
detecting, contrast detecting, motion detecting) within the sensor package.

~~~
kevin_thibedeau
There are CMOS sensors with in-array edge detection. More flexibly, all
optical mice have integrated sensors that do motion processing in the same
package if not the same die.

------
dandare
This is amazing and I was looking for such analysis for a long time. I only
wish it included more information about the actual programing language of DNA.

>Now, DNA is not like a computer programming language. It really isn't.

Maybe it would be interesting to talk about how it differs and what
programming paradigms do not have analogies in the genetic code.

------
tehsauce
A very important aspect of DNA as code the heavy use of macros. The code
itself heavily influences the expression of other parts of the code.

~~~
scalio
How so?

------
teekert
Very nice piece, as a biologist turned programmer I think it nicely sums
things up. Some remarks though...

"Similarly, as an embryo develops in the mother's womb, its DNA is edited
substantially to reduce its growth rate, and the size of the placenta. In such
a way, the competing interests of the father ('large strong children') and the
mother ('survive pregnancy') are balanced. Such 'imprinting' can only happen
within the mother, since the father's genome doesn't know anything about the
size of the mother."

This is strangely worded, I think there is indeed a trade-off between size of
the child and the mother surviving but the father also benefits from the
mother surviving (better have breast milk available and love and care, right?
Father genome?). Moreover, the fathers' genome is half inherited from a woman
and is mingled with a woman's DNA every generation (most of the genome is not
specifically male of female). I don't believe there is evidence that the Y
chromosome solely drives this push for a larger child in the uterus, but
please correct me if I'm wrong.

I also think it is theoretically possible to produce a female child from two
fathers by merging their genomes and supplying the child with two X
chromosomes and no Y chromosome. So the statement that "the father's genome
doesn't know anything about the size of the mother" is also wrong (for the
used definition of "knowing"), the male genome "knows" just as much as the
mother.

~~~
ajuc
> So the statement that "the father's genome doesn't know anything about the
> size of the mother" is also wrong (for the used definition of "knowing"),
> the male genome "knows" just as much as the mother.

It knows something about possible future female child, but not about that
particular mother that will have to give a birth?

------
Razengan
As a layman, I’ve always wondered about something but I’m not sure how exactly
to ask/frame the question:

How did biological _“software”_ “evolve?”

As in, the basic features that are common to many macroscopic lifeforms, like
knowing one’s position within 3D space, as well as the related-but-distinct
sense of proprioception [0].

[https://en.wikipedia.org/wiki/Proprioception](https://en.wikipedia.org/wiki/Proprioception)

~~~
chrisamiller
The short answer is that it was useful! The parameter space of things that
have been tried in uncountable organisms over _billions_ of years is vast.

And proprioception is incredibly useful, even in small doses. Imagine you're a
single-celled organism:

At the simplest level, some vague sensory input about your surroundings so
that you can avoid the predator/find the food gives you a massive advantage
over other organisms that can't. Every random mutation that improves this
ability, even a tiny bit, is rewarded, as you can grow faster/die less and
ultimately, reproduce more. (this is what we call "fitness" \- how many of
your genes get passed on).

If you get a mutation that hurts this ability, you're going to be massively
penalized - eat less, grow slower, reproduce less.

In pretty short order, the organisms with higher fitness are going to take
over.

Mutations occur all the time. In humans, the rate is very low - about 3 new
mutations during every cell division. In many bacteria, the fidelity rate is
much lower, and the reproduction rate is much higher, and so they are
_constantly_ exploring new combinations of parameters. Most are neutral, many
are disasterous, a few give a slight advantage.

Pile up these small advantages over millions or billions of years, and
gradually, in fits and starts, very complex abilities evolve.

------
searine
This is a fun parallel, but I feel like it fails to go the other direction.

How is DNA different from programming?

This is important to understand because those differences are the foundation
of our intuition about how DNA operates. We can't let ourselves fall into the
misunderstanding that cells are like computers. In particular, the ideas of
random mutation and populations are inherently different from software.

Imagine a piece of code whose bits slowly decay over time. Where functions
compete with functions in every other program on the filesystem to see who is
most efficient. Where scripts need to constantly copy themselves to other
folders simply to maintain their integrity.

Its this kind of stochastic and unreliable environment that I think a lot of
people forget about when talking about DNA. Yes it is a consistent heritable
genetic library, but it is also in a constant state of change. Genes are
strange little islands in a sea of noise.

~~~
Robotbeat
> Imagine... Where functions compete with functions in every other program on
> the filesystem to see who is most efficient. Where scripts need to
> constantly copy themselves to other folders simply to maintain their
> integrity.

That sounds a lot like computer viruses.

~~~
tzahola
Good analogy!

Funny, but in fact there’s even a whole class of self-replicating biomachines
named after computer viruses!
[https://en.m.wikipedia.org/wiki/Introduction_to_viruses](https://en.m.wikipedia.org/wiki/Introduction_to_viruses)

The similarities are _eerie_.

------
tvelichkov
Great article, so no unit testing? I knew it!

------
callesgg
From what I know genes don’t exist as distinct parts of the genome, genes are
just a way that we humans grouped the DNA.

97% is not junk is is just not coding for pure proteines. The machinery can
jump in to “commented” genes. We don’t know exactly how non protein coding dna
works to the lowest level. Allot of it is probably useless but you probably
can’t remove it without destroying stuff.

Compared to a software project the DNA code has the worst code quality you
will ever see in a working system.

Almost everything is dependent on everything else. Is is essentially a
software project made by a toddler cutting and pasting assembler code during
millions of years.

~~~
thriftwy
Calling it junk didn't make sense to me 10 years ago, and today it is plain
false.

DNA before protein sequence is used for binding modifiers. This means that
every protein has a huge if() { before them, and many kinds of different stuff
can bind there in order to either suppress production of this protein or
increase it. This is how all this stuff work.

How would it work otherwise, I always wondered. Will all proteins in DNA be
produced at the same rate, as naive models will imply? It turns out, they
aren't, and this is regulated by areas in 'junk' DNA.

Complaining about junk in DNA is like complaining about dispatch in computer
program. All kinds of ifs and whiles and fors. Everybody knows our programs
are 97% dispatch and 3% business logic computations after all. Or worse.

------
ozy
I might have missed it, but key is that DNA is not executed linearly. Instead
all instructions are executed all the time. Much like the difference in normal
programming and programming an FPGA, but then probabilistic.

~~~
dsnuh
>Furthermore, 97% of your DNA is commented out. DNA is linear and read from
start to end.

Looks like he is saying the opposite?

~~~
Obi_Juan_Kenobi
That statement is only true in a very narrow context.

Much of your DNA is silenced by being formed into dense chromatin - DNA that
is tightly packed onto a protein scaffold. However, the regions that are
silenced change throughout development, and these regions are not strictly
inaccessible. This is seen with CNSes, conserved non-coding sequences. As the
name implies, these are not traditional genes that make protein products, but
are usually small regulatory sequences that often affect chromatin state, and
sometimes at a great distance. I've had colleagues attempt for years to
identify a mutation, only to find it that the causative change occurred many
kilobases away from the relevant gene. This is partly the reason why DNA _is
not_ 'plug and play', as the genomic context of a particular sequence often
matters.

Which is why DNA isn't really linear, at least on a genomic scale. The
processes of transcription and translation (DNA -> RNA -> Protein) are linear
to be sure, but the regulatory networks that determine gene expression are
happening all at once on a massively parallel scale. These molecules are also
3-dimensional and can fold back on themselves and cause modifications.
Essentially every fundamental process in gene expression is able to be
regulated, whether it's enhancers and repressors affecting transcription, RNA-
mediated silencing, RNA splicing, RNA modifications and stability, histone
modifications, protein modifications, protein stability, phosphylation or any
of the dozens of other post-translational regulation, and on and on and on..
And there are thousands of genes which can impact all manner of other genes at
any of those levels, either directly or otherwise. It's a wonder that we're
able to tease anything out of it that makes sense.

Generally speaking, I caution against computer analogies for biology. Biology
is messy! It rarely works how you want it to. Even relatively common and basic
techniques in molecular biology require a great deal of troubleshooting, and
even in the best labs with loads of experience, sometimes things simply refuse
to work how you'd like them to. You don't hear too much about synthetic
biology anymore (it's still going, just less hype) because the premise was
incredibly naive; you were never going to get bits and pieces of DNA to behave
in a predictable manner. Just about everyone with wet lab experience suspected
this.

For context, the current hype about CRISPR is almost entirely based on _how
well it works_ , not what it does. We've had a few different techniques for
modifying DNA for a while now, but none of them worked quite well enough to
practically accomplish all the interesting things that CRISPR is now enabling.

------
tw1010
More like (2002)

~~~
WhitneyLand
What’s wrong with it? Seems like a fantastic micro introduction to DNA for
those with a computer science background.

Of course it’s not perfect. I’d bet most readers like myself were enjoying it,
while thinking of analogies that might be more apt or illustrative. But the
power of it is how quickly it can establish a frame of reference, and if
desired a jump off point for further understanding.

If there are parts so fundamentally flawed or outdated as to negate the value,
please, point them out. Of if you know of a better article that illuminates as
much given only a 10 minute time investment, by all means please share it.

~~~
austinprete
I took his comment to reference the fact that this was originally written in
2002 with minor additions over time, as evidenced by the “Updates” section.

~~~
tw1010
This was the intent of my comment.

------
fjfaase
I think that his talks on SHA2017 were the best. Not just with respect to the
contents but also for the manner he presented them. A definite watch for
anyone who wants to know a little more about DNA.

------
bitwize
Since this was written, we've discovered CRISPR, which started off as a
bacterial analogue to antivirus software that works against actual physical
viruses.

~~~
sundarurfriend
The author did a talk on this with updated content, apparently including "A
little bit on CRISPR". Blog post with links to the talk videos:
[https://medium.com/@bert.hubert/dna-the-code-of-
life-12db4a1...](https://medium.com/@bert.hubert/dna-the-code-of-
life-12db4a17c66d)

------
tritium

      What happens if you copy paste 
      the 'legs selector' part of a 
      mouse HOX gene into the fruitfly 
      Homeobox:
    
        'In fact, when the mouse Hox-B6 
         gene is inserted in Drosophila, 
         it can substitute for Antennapedia 
         and produce legs in place of 
         antennae'	
    

Mother of God, is that ever strange.

------
rumcajz
On similar topic: [http://250bpm.com/blog:89](http://250bpm.com/blog:89)

------
Maultasche
Interesting, that was a really useful analogy.

So what I learned is that our body is like an Actor model with lots of
asynchronous processes sending messages to each other. I'm sure that's a leaky
analogy, but it makes things simpler in my mind. I have the sudden desire to
go emulate it with Elixir.

------
ajuc
My internal model of DNA programming:

\- make 1 million sed scripts acting on directory X

\- put all of them in directory X

\- run all of them in parallel in infinite loops

------
tzahola
Ah, the good ol’ _”imagine gravity like aaaa... sheet of rubber”_. Except this
time for biology!

~~~
dandare
So you prefer the _" imagine gravity like aaaa curvature of space and time"_?

We explain our science in analogies. We think in analogies. And DNA like a
programming language is a useful analogy.

~~~
chrisamiller
It's definitely a good starting place for people from a CS background, but if
you really want to learn biology, don't get too attached! As you get deeper,
you'll quickly discover the limitations of that analogy.

