Hacker News new | past | comments | ask | show | jobs | submit login
Is Most of Our DNA Garbage? (nytimes.com)
42 points by r721 on March 5, 2015 | hide | past | favorite | 67 comments



As programmers we're in a unqiue position to get a sense of this, actually. It's very very old spaghetti code. Some of it essential, there are even some cleverly re-used common routines, some of it is just ballast, some of it can't be taken out but isn't getting executed either, some of it now serves a different purpose, and none of it comes with annotations that allow us to easily find out which is which.


This is exactly how I like to think of DNA: a twisted mass of legacy code. Rife with Heisenbugs, the slightest change can have profound side-effects in seemingly unrelated subsystems. Even sections that are literal "junk" must be retained since everything uses GOTOs with hard-coded line numbers.


Well, let's refine that analogy a bit. Legacy code being used by BILLIONS of users, where even a single critical error is viciously excised. Non working code = death. So while it isn't clean, it is functional and relatively bug free. I do stress the relatively part.


Evolution by natural selection is the original genetic algorithm.


In addition, sometimes it isn't even "data". The physical structure and chemistry of a particular sequence should not be ignored. A simple example is tRNA[1] that has both data and chemical roles where one side matches the data on the mRNA while the other side binds to an amino acid. Many other examples are known, from purely structural features to numerous types[2] of messaging methods and gene activation/suppression structures.

DNA is not merely "data"; to make everything even more complicated, some of it is probably both data and structure... maybe even both at the same time.

[1] http://en.wikipedia.org/wiki/Transfer_RNA#Structure

[2] http://en.wikipedia.org/wiki/Non-coding_RNA


A good but limited analogy. It's important to remember one critical detail: DNA is "executed" in a massively parallel and stochastic fashion. Everything happens at once, all the state is analog, and everything interacts liberally.

That being said I think the basic gist of your comment is correct. It's very old spaghetti code... just hyper dimensional massively parallel spaghetti code.


Does that make each of us - or rather, natural selection - one big unit test?


I agree and have often thought that code is an interesting analogy to study to get insight into biological life. Since the widespread adoption of the internet we have an ecosystem in which different species emerge (commercial and open source software) which try to solve problems and then react to stresses within the environment and try to evolve. Projects are abandoned. New functions are piled on to existing code bases. Code is commented out.

I'd love to start a project to study this.


Very cool analogy.

At least we're getting to the point where we can flip switches and monitor for changes. Debug print!


Why do you think it's an analogy?


Because our DNA isn't a literal computer program written by humans in the past 50 years.


That comparison was never drawn, though. It was just observed that DNA is old (true in fact) spaghetti code (true in fact) with clever code reuse (true in fact), dead weight (conjectured but plausible), stuff with no apparent purpose that nevertheless can't be removed without destroying functionality (true in fact), stuff that used to serve one purpose but currently serves a completely different purpose (true in fact)... all of these phenomena can be directly observed in a working cell. Granted, "spaghetti code" is a value judgment, but I challenge you to find anyone who thinks it doesn't apply to DNA.


I'm wouldn't bet much on the fact that nature allows for that much garbage.


When you open a binary file in a text editor it looks like garbage.


I heard about this "98% of DNA is junk" claim a while ago. http://www.livescience.com/31939-junk-dna-mystery-solved.htm...

Right, because a single cell transforms into a whole animal (horse, giraffe, whatever) and that takes zero information. Really? Somehow I doubt it.

A multicellular organism is a really big, complicated state machine and the "junk" DNA is what's needed in order to go from a single cell to many billions of cells organized in the particular fashion that they are.

To suppose that everything that doesn't directly code for a protein is "junk" is incredibly arrogant; just because you don't currently understand something doesn't mean that it cannot be.

As an engineer I'm guilty of this all the time; something I don't know how to do is impossible but once someone explains to me how to do it, it's trivial. I've had this happen enough times that I can now recognize that my "impossible" isn't necessarily actually impossible, just that it might be impossible to me right now.


Don't underestimate the creative capacity of small (to encode) algorithms. Perhaps development uses something like simulated annealing to create structures which match required properties. If it works like this, an exact template is not required to be encoded in the genes. In other words, the ultimate source of the design is thermal randomness, but guided by constraints encoded in the genes.


Well we're starting to see Turing vindicated on the way that a lot of biological stuff happens. http://genomebiology.com/2013/14/1/101

Sure it doesn't have an exact template of _precisely_ how to do everything, but there's a lot of information there. Maybe it's not huge relative to the amount of theoretical information DNA contains, but I strongly suspect that the 98% is junk claim will eventually be found to be laughably wrong.


At the same time Pine tree genomes are 20 times the size of our own, but have a similar number of coding genes. Do we really think all of the extra DNA is needed?


Doesn't mean all the code that was used to compile that binary is used though.


Binary files didn't evolve.


The current state of things seems more like opening ASCII source code in a hex editor to me.

We can sequence DNA, but that raw data isn't something that we can mentally digest very easily at all.


Flowering plants have huge amounts of DNA, 1-2 orders of magnitude more than mammals. It's like they haven't evolved subroutines or macros or something comparable.


Plants are an interesting case. It's a pretty simple theory - plants need more DNA because they can't change their location. If conditions change around them, they can't just get up and move like an animal. Instead, they have to have dormant DNA that doesn't get activated unless the environment changes. They can't change their locations, so they have to change themselves.


Plants tend to be polyployd[1], sometimes to a rather ridiculous amount. 6x or 8x (or more) full copies of the genome is common, compared to the two copies ("dipooid") in most animals. This has been going on for a long time[2], so some of the older duplicate copies separated over time leaving "near duplicates" in the genome that now serve new purposes.

Maintaining these "extra copies" of the genome costs energy (lowering survivability), but you get something very loosely like RAID mirroring allowing the genome to survive more errors. Having more than a single backup copy also allows a single plant species to have a lot more variation, as alternative genotypes can be saved across generations in the "backup copies".

[1] http://en.wikipedia.org/wiki/Polyploid

[2] http://en.wikipedia.org/wiki/Paleopolyploidy


The idea that nature may have evolved coding design patterns to make "cleaner more efficient" code is amazing.

Are flowering plants just written in a language that doesn't have "for loops" so everything is just written out line by line?


No. The compactness of the DNA "program" is an evolutionary trade off, just like in the engineering. For some species to have a significantly more compact code now there should have existed some evolutionary benefit among the ancestors for that.

http://en.wikipedia.org/wiki/Genome_size#Genome_reduction_in...


Interesting, and from the same page, Drake's rule says that similarly to programming, the larger the "codebase", the slower the mutation (new feature launch) rate.

http://en.wikipedia.org/wiki/Genome_size#Drake.27s_rule


I'm excited to see the headline "Nature's usage of macros in insect DNA" appear on hacker news next week.


You might get a kick out of insect patterning based upon morphogen protein concentration gradients then...

http://en.wikipedia.org/wiki/Drosophila_embryogenesis


There's an interesting recent manuscript that subdivides the obsolete "junk DNA" classification into things like "garbage DNA", "rubbish DNA", "Lazarus DNA", "zombie DNA", etc.

http://gbe.oxfordjournals.org/content/early/2015/01/28/gbe.e...


This reminds me of a C program I once saw. Certain functions in the code were dependent on the stack order. If you moved any thing around the (10mm+ LoC) app would fail to compile. Therefore any time anyone wanted to change these functions they would just append more code to the bottom, never touching any of the existing code. Most of it was garbage, but if you touched it at all, everything would break.


While there might be some signal noise ratio, I highly doubt we have any junk dna. I imagine a good modern analogy here would be http://en.wikipedia.org/wiki/Evolvable_hardware. http://www.damninteresting.com/on-the-origin-of-circuits/

in the two articles I mentioned,I think the way the "offspring" circuit worked in an extremely baffling(complex) way relying in physics in way beyond human design (aka the way the disconnected logic gates were still necessary for functionality) have insight into some of possible complex interactions of dna. They look like unnecessary garbage until you try to remove them and break a ton of stuff.


According to the article:

    On average, each baby is born with roughly 100 new mutations.
    If every piece of the genome were essential, then many of those
    mutations would lead to significant birth defects, with the defects
    only multiplying over the course of generations; in less than a
    century, the species would become extinct.
The point this article is making is that new techniques are finding that some portions of DNA assumed to be junk might have a use, and removing or changing them certainly does have an effect. But that doesn't mean we should jump ahead too far and say that none of the DNA is junk, because there are obviously some things that have no effect when changed.


I wouldn't be surprised if some of it really has no effect and that's the whole point. As you mentioned if there's 100 new mutations per baby then if the coding is as dense as possible that means each mutation will cause some significant change. If it's got a lot of coding that doesn't do anything, by probabilities it'll likely not change anything important and cause no ill effects. Having that extra unused DNA could pose an evolutionary advantage in the face of an imperfect copying mechanism.


Assuming mutation probabiity don't depend on the length of the copied DNA - copying 40 MB you will probably introduce 10 times more mutations than if you just copied 4 MB. only 10% of them will on average change the important code, so the end result is the same.

So I don't see how NOP padding helps.


I see. That's a good point, its difficult to say if DNA has no junk. So I'm going to avoid the categorical issue, of junk,non junk,essential,redundant(but useful in a replication failure). maybe

My thoughts are this, DNA is not used just for an individual survival but also a mechanism to allow adaptation at some higher level. Like how some deep learning AI's learn how to play games by making random moves and getting feedback from the score changing. For DNA, we get random mutations/moves in the form of transcription,replications errors or Meiosis , and the feedback comes from successful reproduction(surviving to play the game again).

So what we call junk DNA, might actually play a large role in this sort of "learning" mechanism.

I hope I explained that decently. A lot of my thoughts come from this book http://en.wikipedia.org/wiki/The_Selfish_Gene


I don't get it, there's no choice between everything being essential and lots of the genome being junk. Redundancy mechanisms like multiple copies of genes, which we know is there, reduces the risk of harm from mutations. Why can't most of the stuff in the genome have the label "useful, but we have backups", rather than "essential" or "junk"?


What if we have wild mutations AND exceptionally good exception handling?


We do. We have 2 mechanism for this.

First mechanism - to go into production you have to pass unit tests (race to the egg). It's only 1 cell, so either it works or not, and if you were the fastest you probably didn't have most of the really bad mutations.

Then after you are in production you run million copies of the code, and to be safe against mutations there's sth like erlang exception handling - if sth's wrong kill the process, start a new one.

The problem is - sometimes the process gets so wrong it doesn't respond to kill signal.


>if you were the fastest you probably didn't have most of the really bad mutations

How did they arrive at this conclusion?


The mutations that makes your cell die will prevent you from being the first to the egg.

The mutations that makes you immobile - the same.

The mutation that mess up with your instincts, and reactions to chemical enzymes - same (you won't go the right way).

And so on.

It's a little like unit testing - it checks all the basic routines.


> I highly doubt we have any junk dna

Telomeres (the HLT HLT HLT HLT HLT... sequence at each end of a chromosome) are pretty well documented. They can be reasonably analogized to those 404 pages with a comment along the lines of

    <!-- Your browser won't show you our custom 404 page 
         unless it beats a size threshold, so just to be sure,
         we've put the text of War and Peace in here.
         Have a nice day!
         [...]
         -->
They're totally necessary if you don't want to quickly suffer a horrible death, but there's definitely also a sense in which they're junk.


It seems to me, that if there was a large amount of "junk DNA" in our chromosomes, it would be advantageous for much of it to be at the ends, preceding those telomeres. That way, when you start to run low on telomeres, you would still have quite some time before you start losing "important" DNA and suffer that horrible death.

It's also possible that I am misunderstanding how telomeres work.


I'm not sure of that analogy. Somehow telomeres are thought to connected to aging. http://www.progeriaresearch.org/cbs-monday-night.html Cancer cells can extend their telomeres, and are also immortal. I imagine them to be some sort of internal counter that other components of the system listen to.

I guess I also have a hard time figuring out what people are trying to say when they call parts of the genome "junk".

Perhaps what's "junk" is contextually dependent ? One human's trash is another human's treasure ?


> Somehow telomeres are thought to [be] connected to aging

The mechanism here isn't a mystery. As I originally pointed out, you'll die when your telomeres are gone from loss of DNA. Every time a cell divides, a certain amount of DNA is lost from the ends of each chromosome. Imagine if every time you copied a file 4 kilobytes were lopped off the beginning and end of both copies. Telomeres are padding (again, they code HLT HLT HLT HLT HLT HLT HLT HLT HLT...) that can be lost without affecting the functioning of the cell.


I do agree with you, that is the general scientific consensus. I made that statement vague because new data is emerging about the telomere network,so I'm not sure what is the current correct way to think about telomere's.

Some new research papers are emerging that question the existing model we have about telomeres. http://journal.frontiersin.org/article/10.3389/fgene.2015.00...

Quote from that paper "summarizing recent research findings, Trusina (2014) conclude that recently obtained knowledge ‘shift the telomere paradigm from a simple clock counting cell divisions to a more complex process recording the history of stress exposure within a cell lineage.’ "

So I find it reasonable to think that telomeres could be more than just padding, like a clock or history of stress exposure in a cell lineage.


This isn't totally unreasonable, but bear in mind that the information-theoretic information content of telomeres is almost zero (quoting from wikipedia, "For vertebrates, the sequence of nucleotides in telomeres is TTAGGG"). There's very little room for information to be recorded.


Couldn't agree more. Evolution as a process has no regard for simplicity or elegance. The solutions that Natural Selection "comes" up with are random. I guess you could say that all organisms have a design, but that the design is irrational.

A lot of people expect that at the molecular level life is organized and sensible. But why should that be the case? On the macroscopic level, living organisms are a mess. It is my belief that at the biochemical level the chemical reactions are as organized as the branches on a tree---that the chaotic design of life at the microscopic level mirrors its design at the macroscopic level.


It's best to throw out all these value judgements.

Evolution isn't purely random. There's nothing random about what works and what doesn't. Over time, differential selection and other effects (e.g. sexual selection) work to transfer information about the environment into the genome. Evolution is a learning process. It's just that the representation that it generates is difficult for us linear thinkers to interpret.

It only appears irrational from an anthropocentric point of view. It's not rational to us because we, having brains that work in a certain way, are biased toward seeing things as linear chains of cause and effect. That's how we think and that's how we like to build stuff, but obviously that's not the only way stuff can be built.

For all we know elsewhere in the universe there are beings that think in massively parallel super-holistic causality-matrix terms. To them evolution's designs would seem perfectly rational, while a UML block diagram would seem insane. "Nothing has only one cause or one effect," they would mutter... in a language in which every word implies everything to varying degrees and each syllable of a sentence must be parsed in parallel with all others.


I think parent meant that new mutations are random. Or at least look random.


> Couldn't agree more. Evolution as a process has no regard for simplicity or elegance. The solutions that Natural Selection "comes" up with are random. I guess you could say that all organisms have a design, but that the design is irrational.

Except of course that evolution comes up with solutions that are so clever and so intricate that we, with all our scientific brains, have barely scratched the surface of understanding it. Now if it acts intelligently, walks intelligently and quacks intelligently, then maybe we should just call it intelligent?


I spent a long time playing with genetic programming systems a number of years ago, and started to get an intuition for evolution and how it "thinks."

It can be thought of as an alien form of intelligence. It "learns" and "thinks" but not in a human-like way. It does not design things the way we do.

Argumentum ad Lovecraftium: http://lesswrong.com/lw/kr/an_alien_god/


sounds like the god galaxy from futurama, that bender talks with when he is lost in space.


Evolutionary algorithms make use of randomness, but evolutionary solutions are neither random, nor irrational.

Non-deterministic, yes. Unpredictable, sure. Chaotic... you betcha.

Random... certainly not.

And not irrational - the process is driven to optimize for survival, and selections occur accordingly. It meets the economic and game theory definitions for rational behaviour.

The rationality is not driven by a sentient intelligence, but the result could arguably be called a limited form of intelligence, and we know it can certainly give rise to expressions of intelligent systems.


I highly recommend reading "DNA seen through the eyes of a coder" at http://ds9a.nl/amazing-dna/

It is a very nice explanation, shows many similarities, while also making Intercal seem sane.


The way I see it is DNA is much like a very long running program meant to run for millions to billions of years. Our current life is an instance of some code in the DNA. Due to the longevity of the purpose of DNA, and our short span in an instance, it is very, very difficult to ascertain what some parts may be about, or what it's for. Not to mention the high compressibility of information in DNA, some aspects might never really reveal themselves unless met with an instance in an environment it is "meant to run in".


Didn't they recently realize that huge swaths of "junk" DNA are actually use to regulate RNA expression? So, uh, not junk?

Not the original article I'm thinking of, but related:

https://www.uam.es/personal_pdi/ciencias/genhum/bibliogenoma...


You state it as fact but it remains a matter of heated debate. If all the DNA were important then why are large swaths of DNA allowed to undergo genetic drift?


Why then do pine trees with a similar number of genes to our own, have nearly 20 times larger genome? Are you assuming they are more regulated?


Perhaps when the right conditions arise the "junk" turns out to be useful after all?


I thought the introns were not well preserved between generations and that this is strong evidence that they really are junk.


I am surprised at the volume of agnostic comments here. Before I begin, let me explain my position. I believe in God, and believe in evolution as coming from God himself. Second, I distrust any behavior of arrogance or pride that comes from a I-know-it-all person. Third, the very people who often do create, or innovate, or discover are far more humble and more curious than the previous people I described.

Having said that, there is some fanboy culture and attachment with science and DNA tinkering. Some "discoveries" and claims sometimes come from such fanboys. People on HN are not different. In fact majority of the comments here signal as coming from fanboys.

Now to the topic of DNA: DNA, if a product of evolution, could not be mostly composed of garbage because nature has its own "garbage collector." Much of the DNA is important and significant in some shape or form. It won't be apparent in a simple experiment, but it may be apparent over a lifetime or maybe over generations.

There is evidence to back this up: a recent NPR show, i forget what, claimed that your grandfather's hunger at the age of 9 influenced your chances of heart attack. This brings up DNA's influence across lifetimes. And the complications are so much deeper.

Our fight to try to understand DNA is incredible. We are not there completely. And maybe this is the longtail part of science. But we will understand it more and more, and imagine the information we extract then. I bet much of it will debunk everything we so arrogantly claim today.


> nature has its own "garbage collector."

Ah, no, it doesn't. If you think "survival of the fittest" will somehow magically cull non-functional, non-deleterious DNA out of the genome (except by accident), then you don't understand evolution.

It's kind of hypocritical to call others "arrogant" when you're opining on a topic you clearly don't grasp yourself.


Nope. Not being hypocrite here. DNA that is not needed will wither away. And I do understand evolution. Quite well actually simply because I have debated it far more rationally than those who have outright rejected it and those who have unquestionably accepted it. And there happen to be a lot few mysteries with my explanation.

Think of DNA as computer code. Think of each creature as a robot with computer code built in. Now think of DNA as also having the code to recreate the creature. Overloaded DNA will collapse. Or incredibly sophisticated biology will be able to handle overloaded DNA. Either way, too much complexity breaks. You may call it "Survival of the fittest" implying chaos theory but I call it harmonious morphology implying benign force.


> DNA that is not needed will wither away.

You are 100% wrong about this. DNA that is not needed does not "wither" away. If it's benign, it'll most likely stick around.

> Overloaded DNA will collapse.

This sentence doesn't make any sense. How do you "overload" DNA? What does "collapsing" look like?

> Either way, too much complexity breaks.

What is "complexity"? Can you measure it? How much is too much? What "breaks", and how?

> You may call it "Survival of the fittest" implying chaos theory

What does chaos theory (roughly, the idea that small changes in initial conditions have dramatic effects) have to do with anything we're discussing?

> but I call it harmonious morphology...

Please define "harmonious" and describe how to measure it.

> ...implying benign force

I suspect you started with benign force and worked your way backwards, ignoring any inconvenient evidence.


> You are 100% wrong about this. DNA that is not needed does not "wither" away. If it's benign, it'll most likely stick around.

Why would it stick around if it is not needed? Why wouldn't it be free to morph if it did not matter? Why wouldn't it change if it did not affect the being? Evidence exists that DNA changes due to stresses in environments, so hereditary (e.g. the parent has it, so the child must) is not sufficient to explain why garbage DNA is still transferring over generations later.

> This sentence doesn't make any sense. How do you "overload" DNA? What does "collapsing" look like? > What is "complexity"?

If a creation of an object relies solely on DNA, then there must be some sort of ignore protocol built into the creation factory for the DNA part that is meaningless. Overloaded DNA has excessive meaningless data. An enormous amount of complexity would be required within the creation factory to handle this overload.

> Chaos theory ... The idea that random changes occur and the only changes that survive are the ones that can still adapt to their environment despite or in spite of the change.

> Please define "harmonious?

Harmonious would mean the very stable and smooth transitions of creatures to change from one to another without there being some sort of "noise" e.g. rapid fluctuations in genes. If creatures can transition uniformly to another creature, why not several different creatures across several different environments (sort of like man-made innovations in isolated environments). Failures and successes would both appear on graphs.

> how to measure harmonious? how to measure complexity?

Measuring harmonious is akin to measuring cold energy. You cannot measure something that does not exist, but instead you measure it by the lack of its opposite. (you measure cold by the lack of heat). Complexity is only apparent when things break. If you don't see things breaking, you cannot witness complexity.

> you started with benign force and worked your way backwards, ignoring any inconvenient evidence.

Actually, its the other way around. A researcher must constantly battle the concept of unknown/benign/hidden/dark energy/force/matter to figure out why things work the way they do. A researcher is always digging, always denying, the concept of an unknown force, in finality to achieve the phrase "That's how!" I call it peace at a certain point. But for a person with an agenda, the inconvenient fact may be that there is an unknown force. Sometimes, these things are cloaked as laws of physics, that some events other than this event just cannot happen. Other times, they are called theories, until they are met with an enigma, where then it is broken.


Your thinking on this matter is incredibly messy and not guided by rigorous investigation of the facts.

> Why would it stick around if it is not needed?

WHY WOULDN'T IT??? It's not expensive to carry junk DNA around - it just sits there, not doing anything, not creating proteins, not affecting anything. You're proposing that there's some sort of janitor going through and cleaning up the nonsense and it's just not true. If there is such a mechanism, point to it. Name it. Tell me what it is. Tell me what the evidence of it is. Just link to a wiki page!

I think the problem is that you're stuck on thinking about DNA as procedural program code instead of on its own terms, how it actually operates.

I'll try to explain a few things, in a last ditch attempt to appeal to your intellect:

DNA gets transcribed to mRNA based on START and STOP codons (codons being sequences of three bases). Then the codons in-between START and STOP on the mRNA strand get translated into amino acid chains, also known as proteins. The "ignore protocol", such as it is, is then twofold: 1) if a sequence doesn't occur between START and STOP, it's highly unlikely to ever get transcribed into mRNA, and 2) if an mRNA contains nonsense codons, or its number of base pairs is not divisible by three, then the resulting protein is most likely benign, or at least not harmful enough to prevent the organism from producing offspring and therefore propagating the harmful sequence. (If it is harmful enough to affect reproductive viability, then it doesn't tend to stick around very long.)

Not all DNA does anything. Plenty of it is just hanging on for the ride, not harming anyone, and so not selected against by natural selection.

> I call it peace at a certain point.

I call it giving up the search. Just because you don't understand something doesn't mean you give up trying to understand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: