I remember several years ago reading about "junk DNA" or "useless DNA" in sequences. Even then, I was certain that it probably wasn't "junk", we were just yet to understand it. I wish we'd take that attitude a bit more with science journalism. "It doesn't make sense..... YET".
I'm not sure where I encountered this hypothesis but I find it compelling. As noted by many, junk DNA, acquired from viruses and mutations and genome shuffling, is quite a puzzle. Why does it persist? It takes energy to copy, and misreading it can cause fatal or maladaptive mutations. From that perspective, it shouldn't persist (with slowly accumulating drift) for billions of years, as some shared junk sequences have across species. But it does.
Obviously, because it isn't junk; it is of value to the organism. Even if it's not of any use right now, even if it's completely biologically inactive at present. Because it is still extremely high entropy information. They're remnants of solutions other living systems once used, at some point, to solve the problem of staying alive.
If I were going to try and exploit genetic mutation to produce novel solutions to biological problems, I would start from an existing genome. In fact, I'd start with as much data, from as many organisms, as I could get my hands on and store. Perhaps we carry junk DNA because mutations in existing coded sequences, even mutated, currently useless ones, are far more likely to be functional, and so potentially a useful adaptation, than literal randomness. It's life's portfolio of solutions, badly photocopied little snippets accumulated over the years, and we all carry it around for future generations that might live in an environment where it's useful.
You must have selective pressure on genome size for organisms to evolve mechanisms to reduce the "junk". The metabolic cost of carrying around the junk is small. The cost of cleaning up the junk comes from much more frequent accidental deletions/truncations of important sequences. Upending that equation requires massive selective pressure for a smaller genome - maybe something like a tardigrade that gets desiccated regularly? In any case no chance any vertebrate species would have that kind of pressure. You'd need insane offspring counts and short generation cycles to afford the selective pressure price.
The fact that we can tap junk at some future point is probbly just an accidental side-effect... though there is another theory that claims having lots of junk provides some protection against environmentally-induced damage because most of the time it is a junk section that gets damaged. Hows that for the next error protection algorithm: pad the message with mostly zeros so occasional bit corruption doesn't matter. Take that Shannon!
If you want a specific example of this mechanism working: primate 3-color vision. In our two color blue-yellow seeing ancestors the yellow pigment sequence got duplicated, then eventually slightly mutated. That's why the red and green receptors overlap so much yet blue is standing way off by itself. It is high likely this started as a useless duplication and was carried around for a long time before one of the duplicates got mutated.
> It takes energy to copy, and misreading it can cause fatal or maladaptive mutations
Can maladaptive mutations really be caused by copying DNA that's not used much (as far as we can tell, like the DNA for endogenous retroviruses in our genome)?
From the perspective of the gene it makes sense - genes that are more sucesful at making offspring (aka getting copied) should be expected to prosper through natural selection.
There are parts that are almost certainly not under functional selection and provide no benefit whatsoever- with Alu sequences being the best candidate. Even in tthe case of Alu, they do seem to have some vague effect on regulation of transcription... although they're not what we would call "genes" or "regulatory regions".
In other cases, there are just lots and lots of duplicates of the same genes over and over. Other parts appear to be forges of gene creation- either through gene duplication and divergent evolution, or through some other mysterious mechanism we don't know yet.
Certainly, we've had parts that looked like they were nothing at all and ended up being very important, and other parts that looked like they were incredibly important, but were really just the side effect of some effective parasite.
It's sort of not even an interesting debate any more, as most of the initial positions everybody held were changed when we interrogated more, and better data.
There are also fairly strict limits, given human mutation and reproductive rates, on the amount of information that can be preserved in the genome. Most of the genome is therefore meaningless (although not necessarily useless). As this article points out, these regions allow for random creation of novel proteins
Even for the “no benefit whatsoever” parts, is it not possible that they influence (and are possibly crucial to) the rest of the system just by providing spacing between other more-apparently-functional parts?
I’m thinking by analogy of executable programs that have runs of zeros. The zeros don’t necessarily do anything, but remove them and everything else is out of alignment.
I am open to the idea that "boring duplicated regions" performance some vague function through spacing. Some folks have proposed doing experiments where the spacers are removed, or replaced with other sequences, but they are extremely hard experiments to properly do (in a way that convinces the field).
We already know that enhancers "work at a distance" and it's not clear what "distance" exactly means, and it gets into complicated 3D structure of the genome inside a cell; see https://en.wikipedia.org/wiki/Enhancer_(genetics)
Personally I think that the best way to think about the genome is to unlearn most of the preconceptions you learned in genetics and instead think about it in terms of biophysics and development and machine learning: you'll never realyl be able to understand the true function of every little bit, but you cvan probably create an approximate model that explains the vast majority of biology with relatively few variables, and some deep models that contain all the necessary statistics to model these systems accurately.
It sounds like because there is a very complex 3D structure that the 'spacing' function could actually be extremely important. Far more so than zeros in machine code.
This link dives quite deep into what is an ALU, for those interested.
ALU elements: Know the SINEs [short interspersed elements]
Alu elements are primate-specific repeats and comprise 11% of the human genome. They have wide-ranging influences on gene expression. Their contribution to genome evolution, gene regulation and disease is reviewed.
I love the analogy. Many times I think about the genome as a bunch of machine code it's my job to reverse engineer. That was a good part of my career- probably 20 years- before I realized the problem was that it's much too hard to actually "prove" anything about systems like genomes.
That object code has been heavily modified during runtime for billions of years. We have no access to the original source code for any of the patches, though at this point it would be incomprehensible anyway.
For folks interested in understanding the subject of junk DNA a bit better, there's an upcoming book [1] that might be worth checking out. The authors blog seems also to be interesting on this and related subjects.
Non-coding sequences have been understood as having some functions at least since the early 1990s. Because genome expression is dynamic, tracking the exact mechanisms of action of these sequences is challenging.
"Junk DNA" brought to you by the same geniuses that brought you "we only use 10% of our brain cells" and "the heart is where the spirit resides, the brain is just useless grey goo."
That was born of the widely held metaphysical position that evolution is purposeless. Given that assumption one would expect to find plenty of “junk” DNA.
If the primary goal is survival based primarily on efficient use of energy. A lot of evolution is about organisms becoming more efficient by adapting to their environment. So then keeping unnecessary junk around is inefficient and we would expect orgasms that lose to would benefit and out breed the others.
Having our optic nerve run right through our retina producing a blind spot in order to capture an upside down and backwards image is pretty inefficient too. Evolution doesn't maximize efficiency, it maximizes good-enough-to-reproduce-ity.
Better adapted organisms are just that - better. Not perfect, or free of inefficiencies. And even a perfectly adapted organism might not be as good at adapting to changes in the very long term compared to one with "junk" DNA. Also, does unused junk in the DNA really hurt energy efficiency?