

DNA is object code - maxwell
http://www.eyeondna.com/2008/07/16/what-does-dna-mean-to-you-14/

======
etal
Easier said than done, as I'm sure Andres Yates knows. Biologists have DNA
libraries, they can mark expressed genes for debugging, they can link and run
this object code (with varying results) -- but there's no general compiler to
generate this code, and it's nontrivial to write one without fully
understanding the machine architecture.

There are about 30,000 human genes, spread over 3 billion base pairs, coding
for tens of millions of different proteins. Given how proteins and RNA
interact with DNA, and DNA interacts with itself, the language for describing
molecular biology usually isn't discrete math, it's statistics.

I think this is why Yates is accusing biologists of treating DNA as the
abstraction that represents proteins, when really, it makes more sense the
other way around. It seems like in the past couple of years the best
fundamental research has been in proteomics, rather than genomics, probably
for this reason -- bioinformaticians are recognizing that analyzing the
expressed proteins, done efficiently, can give better information than raw DNA
about what's actually going on in a cell.

------
jaytee_clone
This article somewhat touches upon the issue of "bio-programming" but it
doesn't give credit where it is due.

People have been studying "bio-compiler/machine code" for decades. The
difficulty is that there are many different compilers at work. Just to name a
few: protein folding, chemical gradient regulation, inter/intra-cellular
signaling, etc. All of which "compiles" chemical composition into actual
physical function. But each of the process alone is so complex, it takes years
to un-cover a small portion of the black box, not to mention most of these
"compilers" are inter-dependent with each other. So it is close to impossible
to use the limited knowledge we have un-covered to predict other novel code ->
function compilation. I worked in a protein folding lab before. Just to be
blunt here, no one really has a clue.

------
ced
I disagree completely.

For it to be object code, it would have to be compiled from something. If
anything, DNA is the thing compiled (and _modified_ ) several times from the
ACGT form down to the protein form. Nature doesn't operate on the "hidden
abstraction layer", it modifies directly the DNA, which is a very very strong
clue that the meaning is _there_ , not elsewhere. Why does cross-over mixes
large sequential swaths of chromosomes? Because genes (the meaningful units)
are sequential, and are relatively unlikely to be messed up by such an
operation. The requirements of evolution impose much structure on DNA.

I wouldn't call DNA source code either.

Furthermore, while the idea of building higher abstractions sounds nice in
theory, it fails, because it's _not an engineering problem_. It's a science
problem, and anyone who has some experience with physics or chemistry knows
that models have to remain simple for them to work at all. Meteorological
models suck. Climate change models suck. Modelling a single cell is super
hard. I wish we could get better models for biology, but it just seems really
unlikely. The current approach to discovering gene function seems very
reasonable to me.

~~~
anamax
> For it to be object code, it would have to be compiled from something.

Nope. In fact, object code doesn't even have to be assembled from something.

It just has to be made available to an execution engine.

And yes, folks modify object code.

And, even if there was some source somewhere else, that doesn't imply that the
meaning isn't in the object code. (Meaning can be in multiple places.)

~~~
ced
... right, but the point of the top post is that we should be looking for "the
source"... Maybe that wasn't formulated properly.

The point comes down to science vs engineering, induction vs building from
parts. Biologists are already building models (model != programming
language!). Models are the holy grail. They are just damn hard to get right.

------
ntoshev
This assumes that there is a simple, coherent way to describe life - the
source code. There doesn't have to be such a description. Maybe we were just
lucky with physics and the ability to describe the universe with simple
equations (until we got to quantum mechanics and relativity theory and things
became messy again). With genomics, things can very well stay messy: after
all, life evolved at DNA level.

------
Herring
_There’s a reason why Window’s object code is everywhere, but the source code
is top secret._

Might be a bad example. You couldn't use the source even if it wasn't secret.
That probably protects windows' copyrights more than the other.

------
MaysonL
No - DNA is not object code; it's much closer to Lisp.

Some of it is code, some of it is data, some of it is _macros_.

And it's all the self-modifying output of random genetic algorithms.

~~~
etal
DNA is trillions of monkeys typing on typewriters for 4.5 billion years and
throwing away every page that doesn't contain any readable words.

And sometimes a page happens to turn into another monkey, typewriter or page.
Actually, I can't think of an analogy that come close to describing the
hairiness of biology. It's the origin of all hair; it eclipses all else.

------
jsmcgd
Spaghetti code.

------
newt0311
Amazing article. Instead of limiting it to genomics, I would apply this
sentiment to nearly all parts of biology. The field needs to grow up and start
using the powerful principles of building abstractions and leveraging advanced
mathematics like physics did around Newton and before.

~~~
sungam
I disagree. Biology had built complex abstractions, it is just that (outside
of specific bioinformatic domains) mathematics has consistently proven not to
be the appropriate language for describing complex, messy biological systems.

I like the DNA-as-code analogy. Getting the 'source code' is not the stumbling
block - this is becoming far easier with high throughput sequencing
methodologies. The real difficulty is writing the code. I have spent the last
3 months constructing 15kb of DNA by classical molecular biology techniques -
and this was largely acheived by cut and pasting from existing DNA sequences.
The cost of synthesizing large DNA fragments is currently >1 dollar / base.
When this falls to trivial levels I think we will really start to see DNA
programming taking off.

~~~
maxwell
> Getting the 'source code' is not the stumbling block - this is becoming far
> easier with high throughput sequencing methodologies.

I don't really know anything about biology, but as a programmer, the article's
suggestion that DNA is "byte code" and not source makes sense. To get actual
high-level human comprehensible source, I think we'd need to build
abstractions on top of DNA object code. Not that I have any idea what these
abstractions might look like.

The article compares genomes to Windows, and I don't really know anything
about biology, but it might actually be easier to reverse engineer ourselves
than Microsoft's operating system, since (as far as I know) our "binary" is
much smaller :)

~~~
sungam
Ultimately it comes down to sematics but I think of DNA as source code. DNA is
already human-comprehensible. The organisation of genes within the sequence is
actually fairly straightforward and probably not that different from how you
would invent it from first principles given the constraints of the storage
medium. We can routinely mix and match existing sequence components with
predictable consequences. There is no need to invoke a higher level of
abstraction than this.

I would say that the object code is the set of RNAs present in the cell and
the program is the state of all of the macromolecules in the cell - proteins,
lipids, carbohydrates etc. Essentially the program is running on the organic
chemistry virtual machine.

When it comes to 'writing' artificial DNA the complexities are that firstly we
do not understand the intricacies of how genes are turned on and off, although
there is not reason that we should not come to understand this. This is the
area that I am working in. The second major problem is that we cannot 'invent'
new proteins as we do not how to predict the 3D shape and chemical behaviour
'folding' from the protein sequence. With massive increases in computational
power and clever algorithms it is possible that this difficulty will be
overcome.

~~~
maxwell
Very interesting, thanks.

------
lst
We all are a mystery to ourselves, and will keep it till the end of our
days...

(Poor humans, they will never be able to stop God from smiling about their
relatively poor scientific investigations...)

;)

