Akaogi et al. (2006) Role of non-protein amino acid L-canavanine in autoimmunity. Autoimmun Rev. 5(6):429-35. https://www.ncbi.nlm.nih.gov/pubmed/16890899
Nunn et al. (2010) Toxicity of non-protein amino acids to humans and domestic animals. Nat Prod Commun. 5(3):485-504. https://www.ncbi.nlm.nih.gov/pubmed/20420333
Perhaps "non-proteinogenic" is a poor choice, or not exactly correct, but readers generally understand what is meant.
It's a bit like responding to an article about "why do all pendulums keep the same frequency as they lose energy" with "but they are neglecting relativity!". It just isn't really relevant to the research question, which is "why are some amino acids present in the DNA code, and why is there consistency across species".
That is, what's in the DNA is codons, and what amino acids get incorporated depends on tRNAs etc, plus what amino acids are around (synthesized or ingested).
Anyway, I get that I was being a little pedantic. But in the paper referenced by TFA, I see that they studied three nonproteinaceous amino acids: ornithine, 2,4-diaminobutyric acid, and 2,3-diaminopropionic acid. Canavanine is much closer to arginine, with a methylene replaced with an ether bridge. I wonder how well it would have polymerized.
Edit: Upon reflection, "there are just 20 proteinaceous amino acids" and "AS/400 is a database" are ~similar simplifications.
The title, of course, refers to the 20 proteinogenic amino acids.
Selenocysteine (https://en.wikipedia.org/wiki/Selenocysteine) occurs in proteins and is quite widespread. "Selenocysteine: the 21st amino acid." (https://www.ncbi.nlm.nih.gov/pubmed/1828528) It shows up in humans too (https://www.ncbi.nlm.nih.gov/pubmed/24194593).
Pyrrolysine (https://en.wikipedia.org/wiki/Pyrrolysine) occurs in proteins in archaea and bacteria. "A new UAG-encoded residue in the structure of a methanogen methyltransferase." (https://www.ncbi.nlm.nih.gov/pubmed/12029132)
So in that context the fact they are so dominant is the key part of the question, not that they are the only ones capable?
This is a tautological statement: the ones that still exist by definition are the ones which outcompeted the alternatives, because that's what outcompeting means.
The question being addressed by the study is roughly "by what selection criterion did the survivors outcompete the others?"
I guess it's just nitpicking the headline and is not particularly relevant to the article or the research question (as the OP admitted).
But TFA isn't about biology. At least, primarily. It's about emulating chemistry that led to the first life forms.
But upon reflection, it's arguable that the paper would have been more interesting if they had included L-canavanine, in addition to L-arginine and three analogs with shorter cationic side chains. Because we know that it can be incorporated during protein synthesis, in place of L-arginine. But none of their analogs can.
So if L-canavanine had been incorporated into peptides in their tests, more or less as well as L-arginine, that would arguably have ruled out their hypothesis.
Which then screw up animals that eat them. Because they, in turn, also incorporate them into proteins. And those tweaked proteins elicit immune responses, because they weren't present during immune system development. And some of that immune response cross reacts with the normal host proteins. So you get autoimmune disease.
It seems these amino acids aren't typically incorporated, but rather appear to serve as defense (https://www.ncbi.nlm.nih.gov/pubmed/21529857) and signaling molecules (https://www.ncbi.nlm.nih.gov/pubmed/28218981). Apparently humans do that sometimes as well (https://www.ncbi.nlm.nih.gov/pubmed/18828673). Sometimes the insects even end up using the toxic compound (in the linked example the non-protein amino acid L-DOPA) for their own purposes (https://www.ncbi.nlm.nih.gov/pubmed/27006098), which I find quite amusing.
More than just being eaten, apparently L-DOPA is released into the environment at an impressive rate in some cases (https://www.ncbi.nlm.nih.gov/pubmed/24598311).
Edit: I see your earlier comment has some links. Thanks.
This is just a fact. All life that we know of has the same origin. Common descent is a core tenet, and one of the best established facts, of modern biology (https://doi.org/10.1038%2Fnature09014). For there to have been multiple distinct ancestors, we would need to see some relevant differences in the core machinery of life. And the fact is that, despite minor variations, we simply don’t see any. The odds of there being multiple distinct ancestors is astronomical, and would fundamentally impact our understanding of modern biology.
> and b) no organism has drifted in the millions of years of evolution
Organisms have drifted. That’s what evolution is. But evolution plays by rules, it can’t just change things willy-nilly. Changes to the core machinery would presumably either break it outright, or be so strongly detrimental as to be purified away almost immediately. Of course many genetic changes are deleterious to some extent but most effects can either be buffered for a short time because they are minor, or they confer some other advantage. In the core machinery of the cell this is much (!) less tolerated because even tiny chemical inefficiencies would immediately be amplified millionfold. Odds are, the DNA/RNA core machinery of all life sits in a steep local optimum. It’s not necessarily a global optimum but it’s essentially impossible to evolve out of because any individual change, or even a handful of coincidental changes, would leave the organism a lot worse off.
It's taken that (known) modern organisms descend from a common set of ancestors, but the tree of life isn't a tree. Organisms diverged and merged multiple times along the way to the modern world.
It says nothing about how that/those points of common descent came to be.
What originally constituted living things probably weren't very good at living by modern standards. They probably leaked like sieves, and they probably traded RNA, polypeptides, and other small molecules back and forth.
When was this soup alive, and when wasn't it? I suspect it's just a continuum, and that complicated soup probably went back and forth across that grey zone of living-nonliving many times.
That’s correct but I don’t see what this has to do with my comment. It’s still a fact that all modern life, at some point, came through the same individual organism (EDIT: this should be population) which, furthermore, already possessed the fundamental machinery of DNA replication, RNA transcription and protein synthesis (amongst other things).
EDIT: I misunderstood. Yes, you’re right: due to lateral gene/molecule transfer, it’s not certain that the last universal common ancestor was an individual cell, and at that time the label “individual” probably didn’t make much sense (although the paper I linked argues strongly that it was in fact one single cell).
This is a contentious statement that I don't think is established - a fact that the grandparent comment is trying to raise.
but that doesn't mean it's been proven through experimentation, does it? isn't it just an assumption?
what if there's something akin to mathematical uniqueness in the mechanics of the core machinery?
I mean it in the sense that it's either the way it is, or it's not really viable over a long enough period of time? such that it could have multiple origins which all nontheless converge into the same core mechanism?
It’s not an assumption, it’s backed up by excellent evidence — see the paper I linked.
> what if there's something akin to mathematical uniqueness in the mechanics of the core machinery?
Well that’s clearly not the case, we can trivially (…) design self-replicating machines that have completely different mechanics, as a thought experiment. More to the point, we can change parts of the machinery. For instance, we can take the universal genetic code and, with effort, change it into something completely different (by just swapping all codons around). The result is just as viable, but doesn’t exist in nature. In fact, the observed universality of the genetic code, in itself, is already seen as sufficient evidence for common descent (and then some).
Or, some would argue, evidence of a common designer.
The question seems the same as, could an AI in a virtual world deduce or prove its artificialness or the reality outside the virtual?
To me that’s more interesting than the simple credulity implied by moon cheese.
I don't think there are any known to be radically different. Note that it's possible that life arose several times but only the lineage of that one cell survived, perhaps? In any case, it's not like there's been no variation.
Now, here's a mystery (AFAIK): Just where is the genetic code stored? I seem to recall reading an article a while back about how while this seems like an easy question it's actually not known. I can't seem to find it at the moment, though. Anyone know more about this?
The synthetases are also encoded in DNA, so the fundamental point, that the code is encoded in DNA, stands.
In terms of potential errors in gene expression, the genetic code is rarely discussed as a potential source of error.
It's just difficult to study for various technical reasons related to how we sequence it.
Sometime the mistranslation is even intentional! (https://www.ncbi.nlm.nih.gov/pubmed/25220850)
As you mentioned though, there are also quite a few safeguards.
* tRNA synthetase example: https://www.ncbi.nlm.nih.gov/pubmed/27226603
* trans-editing factor example: https://www.ncbi.nlm.nih.gov/pubmed/28737471
Damaged tRNA is even repaired (a bit, sometimes). (https://www.ncbi.nlm.nih.gov/pubmed/28901837)
From an interesting (2018) review of tRNA in general (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6103721/):
> Surprisingly, a perfect proteome is not a pre-requisite for cellular viability even in the context of human cells. Lant et al. demonstrated that a single tRNA mutant can lead to significant mistranslation in human cells . This was accomplished by expressing an Ala accepting tRNAPro G3:U70 variant in HEK 293 cells. The authors visualized a rate of ~ 3% mistranslation using a novel green fluorescent protein (D129P) reporter that fluoresces in response to mistranslation at proline codons. In contrast to previous studies in yeast , human cells in culture did not mount a detectable heat-shock response and tolerated the mistranslation without apparent impact on cell viability.
Translation errors also exist, and some people hypothesise that there is selective pressure on protein-coding genes to reduce this source of errors by selecting codons in a way that reduces the error rate (potentially by slowing down the polymerase). This results in something known as “codon bias” but so far there is no good evidence that codon bias has an actual effect on error rate (it does have an effect on correct protein folding), or is selected for (http://dx.doi.org/10.7554/eLife.27344, http://dx.doi.org/10.1371/journal.pgen.1006024).
However, I believe my third link is about environmental stress triggering intentional mistranslation (mRNA to protein). From that paper's figure 3:
> Proteins arising from “statistical proteomes” have various folding and binding properties, resulting in phenotypic diversity in the host organism.
I haven't bothered to pull up the related references (5 obvious ones) to assess their strength though.
The papers you linked seem to be claiming that codon encoding preferences (which vary by gene category) are in fact due to (or merely correlated with?) GC content in mammalian genomes (as opposed to a number of other previously proposed mechanisms). This is surprising because individual tRNA abundance varies by cell state and type, so that would have been the obvious (but apparently wrong) explanation. It's doubly surprising because a number of single celled organisms utilize the mismatch between codon preference and tRNA availability in order to regulate protein translation, but these papers are claiming that's not a significant factor in mammals.
tRNA gene expression varies by cell state, but isoacceptor abundance is in fact very stable (at least in mammals). Meaning, if you have a set of tRNA genes which all code for, say, Ala_AGC, the sum of the gene expression of all these genes is relatively stable, even if their individual expression varies (http://dx.doi.org/10.1101/gr.176784.114l; full disclosure: I’m an author on this and one of the previously linked papers).
Why individual tRNA gene expression varies, and how the cell regulates the overall stability, is unclear (my personal pet theory is that secondary tRNA function as regulatory RNA, in the form of tRNA-derived fragments, causes the need to regulate tRNA genes, see e.g. http://dx.doi.org/10.1016/j.cell.2017.06.013).
> It's doubly surprising because a number of single celled organisms utilize the mismatch between codon preference and tRNA availability in order to regulate protein translation
It’s not that surprising: gene regulation happens fundamentally differently in eukaryotes and prokaryotes, and even differently in different classes of eukaryotes. The effective population size (= evolvability) and genome complexity seems to play a role here. Simply put, higher animals have much more powerful and precise ways of controlling gene expression (enhancers and histone control). Regulation at the translation level is comparatively slow and wasteful (it’s several steps further down the line of the gene->protein production process).
On the other hand, once a certain layer of functionality exists, and starts being used to build on top of, that layer will resist change, because any variation will throw off so many higher level processes it's unlikely to be viable.
Life happens rarely enough; even in the 'optimal' conditions of the primordial soup, it's still a pretty miraculous occurrence for a cell to spontaneously form. Then consider this:
- The first cell was probably not 'good' at surviving, even in those conditions. It probably sucked at it, and was just barely good enough at doing it to reproduce a little and evolve a little.
- So, a cell which is actually good at surviving is pretty much guaranteed not to form spontaneously. Meanwhile, the lineage of the first (shitty) one has evolved to be pretty damn good at it.
- If a new cell tries to form, even if it's better at surviving than the Original cell, it'll probably get instantly outcompeted by its progeny.
Voila, single ancestor cell.
> b) no organism has drifted in the millions of years of evolution.
Why should they? There's no pressure to.
This would answer why most are using it now. If other cells uses a different mapping that's less optimal, they'll have a harder time to reproduce, and thus lose out over time.
Methionine is one of only two amino acids encoded by a single codon (AUG) in the standard genetic code (tryptophan, encoded by UGG, is the other). In reflection to the evolutionary origin of its codon, the other AUN codons encode isoleucine, which is also a hydrophobic amino acid. In the mitochondrial genome of several organisms, including metazoa and yeast, the codon AUA also encodes for methionine. In the standard genetic code AUA codes for isoleucine and the respective tRNA (ileX in Escherichia coli) uses the unusual base lysidine (bacteria) or agmatine (archaea) to discriminate against AUG.
The methionine codon AUG is also the most common start codon. A "Start" codon is message for a ribosome that signals the initiation of protein translation from mRNA when the AUG codon is in a Kozak consensus sequence. As a consequence, methionine is often incorporated into the N-terminal position of proteins in eukaryotes and archaea during translation, although it can be removed by post-translational modification. In bacteria, the derivative N-formylmethionine is used as the initial amino acid. 
This is not a necessity. It is entirely possible life on earth was formed multiple times, completely independently and our lineage simply out-competed other sort(s) of life; and somehow we lost the evidence of it, or couldn't find it yet. As it's suggested by other comments, it's a well established fact that the life we know of share the same single origin (this organism(s?) is called the "Last universal common ancestor" or LUCA ); but that doesn't mean there has been a single sort of life our lineage ever interacted with, it just means our lineage was the only good-enough life for the conditions on earth for most of the geological periods.
Genetic Code Variants
For most organisms the "stop codons" are "UAA", "UAG", and "UGA". In vertebrate mitochondria "AGA" and "AGG" are also stop codons, but not "UGA", which codes for tryptophan instead. "AUA" codes for isoleucine in most organisms but for methionine in vertebrate mitochondrial mRNA.
wikipedia cites to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?
edit: read the paper carefully. They came to the same conclusion. But... Is this new?
Looking at the structures, I would not expect similar reactivity:
They differ in both size and shape. More importantly, they differ in the length of the tether to the positively charged group which could easily play a role in carbonyl activation by this unit, either in the forward direction (peptide formation) or reverse (peptide hydrolysis).
The paper doesn't mention autocatalysis (catalysis of external amino acids or short peptides themselves), but this is also a possibility. There's a large body of synthetic chemistry in which amino acids and short peptides show remarkable catalytic activity.
But the main problem with this study is that peptides are made biologically through catalysis. What we observe in isolated system reactivity has no reason to translate into what's seen in nature because enzymes offer lower-energy transition states.
From a comment in a previous post:
In the research paper they analyzed a few amino acids like Lysine. Lysine is an amino acid that has the usual amino group and the usual acid group in one side, and it has an additional amino group in the other side. In the study they compared Lysine with Lysine-like amino acids that are shorter and the additional amino group are closer to the usual amino group and the usual acid group. For example https://en.wikipedia.org/wiki/Ornithine
They found that the usual amino acids like Lysine are better to form spontaneously protein-like chains than the shorter versions when they are in a solution that gets dried. I'm not sure if this is enough to explain why Lysine is used in proteins but it's an interesting result anyway.
In the more optimistic case, the research article "explain" why the 3 usual amino acids that they used are better than the 3 shorter variants that they used. It doesn't "explain" why the other 17 amino acids where selected.
In particular they used amino acids with an additional amino group, so the polymerization can get "confused" and instead of using the usual amino group use the other group, so instead of a nice chain, you get some other structure. There are 17 of the other usual amino acids that don't have an additional amino group, so the polymerization process can't get confused.
There has probably been a lot of evolution, maybe one or two billions of years since the last common ancestor of all known life on earth. And that LCA had that system. It's a bit like ASCII.
Again no disrespect to the talk here, but this is a Stack Exchange kind of question, and lo there is an SE answer: https://biology.stackexchange.com/questions/653/why-20-amino...
Answer: because it's built on stuff that was built on stuff that...
If so, is it known over time to any extent?
(I'm going to presume that to a meaningful extent it is not as any potential samples are going to be largely recent -- past few thousand years, maybe tens of thousands given frozen samples. Amber possibly excluded.)
What does that distribution look like? Zipf? Normal? Other?
And to follow on to that how could life get started without being able to produce amino acids? It doesn’t seem like something early life could evolve gradually. Are they then naturally occurring?
As for how early life started, presumably with a strongly reduced set of amino acids. Some of these are simple enough chemicals which form spontaneously given the right conditions (technically all of them can, but probably at very low rates). This was famous established by the Miller–Urey experiment (https://en.wikipedia.org/wiki/Miller–Urey_experiment).
If I remember correctly, there's over a hundred various amino acids. For humans, we have only 20.
This is true of all group 14 elements.
Carbon is the lightest and thus most-common group 14 element. Carbon and silicon are the only two group 14 elements lighter than iron, which means they're the only ones produced by stellar burning. (Everything heavier than iron is produced exotically, e.g. by supernovae and neutron star collisions.)
Most (chemical) life is thus probably carbon based, with a minority running on silicon. If germanium-based life exists, it's near a supermassive black hole.
Sci-fi novel based on the Sepoy Mutiny. Life on Uller is based on silicon.
I mean, if your greatest creation is something made in your own image, how does that demonstrate imagination?
I wonder what it means for exobiology, is this true only in Earth conditions or in general, i.e. can we expect life elsewhere to use the same 20 amino acids? Also, if not, how toxic/dangerous would that life be for humans?
About life in other planets, nobody knows, so it's guessing time:
The 20 amino acids are some of the ones that get formed spontaneously in the prebiotic conditions, so it's probably that they exist in other planets. In some proteins, the amino acids are modified after the protein is produced, so a few additional amino acids may be a nice feature. Some amino acids of the proteins are quite similar and perhaps we can live without them. So will independent life in other planets:
* use the same 20 amino acids: Probably no. Unofficially: No way.
* use the exactly 20 amino acids: Probably no. It's a hard question, there are some paper that try to justify the number 20 or a similar number, but I'm not convinced. I guess that something between 15 and 30 amino acids.
* Use some set of amino acid similar to our 20 amino acids: Probably. I guess yes.
* Use the same L variants as us, or use the specular D variants: Let's say 50%.
* Use proteins at all: Another hard guess, I'd say yes. Proteins are very versatile and easy to build in different configurations. It's hard to guess a replacement.
‘We discovered that there are purely chemical factors, based on higher polymerisation reactivity and fewer side reactions, that might have contributed to this selection process.’
Evolution can progress very rapidly, but the core machinery which drives life is very conservative.
A good analogy would be to look at other codes. In the computer world, consider ASCII. Each number maps to control code, letter, number or symbol. This encoding is entrenched. Imagine how hard it would be to change a single letter of ASCII to mean something else. Most of the hardware and software on the planet would require updating. It's not just that interoperability is important. It's that every piece of software on a single computer system would require updating, in lockstep, to transition from the old to the new encoding. This would be an almost impossible feat.
The same constraint applies to DNA encoding of protein (and other) sequences. There are multiple pieces of the machinery which would require changing in synchrony for the result to work and result in a viable living organism. A triplet coding system change would require almost all instances of that coding triplet changing to retain existing structure and function in every protein using it, new enzymes to synthesise the new amino acid and tRNA, along with all of the associated regulatory and control systems. Only at that point could you start using the new amino acid triplet sequence in a new or modified protein. Evolution works by single small changes and natural selection. Making several big changes is extraordinarily unlikely.
It's easier to make such a fundamental change in simpler organisms where the scope of the change is limited. And this is likely why the small number of variations we see in genetic encodings are both in the lowest forms of life, and are largely superficial. The current encoding is entrenched as a result.