Hacker News new | past | comments | ask | show | jobs | submit login
Why does all life use the same 20 amino acids? (chemistryworld.com)
272 points by respinal 63 days ago | hide | past | web | favorite | 104 comments

As is often the case with titles like this, the answer is that all life doesn't use the same 20 amino acids.

For example:

Akaogi et al. (2006) Role of non-protein amino acid L-canavanine in autoimmunity. Autoimmun Rev. 5(6):429-35. https://www.ncbi.nlm.nih.gov/pubmed/16890899

Nunn et al. (2010) Toxicity of non-protein amino acids to humans and domestic animals. Nat Prod Commun. 5(3):485-504. https://www.ncbi.nlm.nih.gov/pubmed/20420333

Super interesting example! Nevertheless, even the organisms referenced in these papers don't have these amino acids in their DNA structure, and the meaning of the linked research really isn't affected by the point you are raising.

Perhaps "non-proteinogenic" is a poor choice, or not exactly correct, but readers generally understand what is meant.

It's a bit like responding to an article about "why do all pendulums keep the same frequency as they lose energy" with "but they are neglecting relativity!". It just isn't really relevant to the research question, which is "why are some amino acids present in the DNA code, and why is there consistency across species".

I'm not sure what you mean by "don't have these amino acids in their DNA structure". There are 64 codons that code for amino acids and start/stop signals. But there are only 20 common amino acids, so multiple codons typically code for the same amino acid. However, just exactly what happens depends on the set of tRNAs that are being expressed. Plus the enzymes that catalyze attachment of amino acids to tRNAs.

That is, what's in the DNA is codons, and what amino acids get incorporated depends on tRNAs etc, plus what amino acids are around (synthesized or ingested).

Anyway, I get that I was being a little pedantic. But in the paper referenced by TFA, I see that they studied three nonproteinaceous amino acids: ornithine, 2,4-diaminobutyric acid, and 2,3-diaminopropionic acid. Canavanine is much closer to arginine, with a methylene replaced with an ether bridge. I wonder how well it would have polymerized.

0) https://www.pnas.org/content/116/33/16338

Edit: Upon reflection, "there are just 20 proteinaceous amino acids" and "AS/400 is a database" are ~similar simplifications.

> non-protein

The title, of course, refers to the 20 proteinogenic amino acids.

Even then it's an oversimplification.

Selenocysteine (https://en.wikipedia.org/wiki/Selenocysteine) occurs in proteins and is quite widespread. "Selenocysteine: the 21st amino acid." (https://www.ncbi.nlm.nih.gov/pubmed/1828528) It shows up in humans too (https://www.ncbi.nlm.nih.gov/pubmed/24194593).

Pyrrolysine (https://en.wikipedia.org/wiki/Pyrrolysine) occurs in proteins in archaea and bacteria. "A new UAG-encoded residue in the structure of a methanogen methyltransferase." (https://www.ncbi.nlm.nih.gov/pubmed/12029132)

Also wasn't the answer always going to be they simply out-competed the other amino acids for that particular role? The research question is why they beat out similarly structured acids to become the consistently dominant ones in life.

So in that context the fact they are so dominant is the key part of the question, not that they are the only ones capable?

> wasn't the answer always going to be they simply out competed the other amino acids

This is a tautological statement: the ones that still exist by definition are the ones which outcompeted the alternatives, because that's what outcompeting means.

The question being addressed by the study is roughly "by what selection criterion did the survivors outcompete the others?"

Sure, my point is the fact that others are still capable of competing can always be assumed in a competitive environment. So in that context, where the question is inherently going to be why did these 20 amino acids out-compete the others, why bring up the fact others exist as a counter-point if the they aren't as dominant as the 20 primary ones? What's interesting is their dominance, not that they can play the game.

I guess it's just nitpicking the headline and is not particularly relevant to the article or the research question (as the OP admitted).

My girlfriend is a biologist and this kind of misunderstanding is IMHO the fundamental difference between a biologist and a programmer. Biology is way fuzzier than computers and that's always the default stance.

Agreed, I took an online "great courses" in chemistry and I had so many "but why?" questions I had to stop. I was spending far too long going down Wikipedia rabbit holes. You need to have a particular tolerance for not understanding it all and work with a sort of bayesian estimation of understanding as you move forward.

That's actually why I switched from physical chemistry to biology. I'm not good with abstract math, and I didn't want to be stuck relying on theoretical stuff that I didn't understand completely.

But TFA isn't about biology. At least, primarily. It's about emulating chemistry that led to the first life forms.

I did admit to being a little pedantic.

But upon reflection, it's arguable that the paper would have been more interesting if they had included L-canavanine, in addition to L-arginine and three analogs with shorter cationic side chains. Because we know that it can be incorporated during protein synthesis, in place of L-arginine. But none of their analogs can.

So if L-canavanine had been incorporated into peptides in their tests, more or less as well as L-arginine, that would arguably have ruled out their hypothesis.

I'll defer to you on that, you seem more informed on the subject than me. Makes me want to revisit my biology reading list.

Except that some sorts of life, such as some sprouting plants, do incorporate those "non-proteinogenic" amino acids into proteins.

Which then screw up animals that eat them. Because they, in turn, also incorporate them into proteins. And those tweaked proteins elicit immune responses, because they weren't present during immune system development. And some of that immune response cross reacts with the normal host proteins. So you get autoimmune disease.

This turned out to be an incredibly interesting (and educational) diversion!

It seems these amino acids aren't typically incorporated, but rather appear to serve as defense (https://www.ncbi.nlm.nih.gov/pubmed/21529857) and signaling molecules (https://www.ncbi.nlm.nih.gov/pubmed/28218981). Apparently humans do that sometimes as well (https://www.ncbi.nlm.nih.gov/pubmed/18828673). Sometimes the insects even end up using the toxic compound (in the linked example the non-protein amino acid L-DOPA) for their own purposes (https://www.ncbi.nlm.nih.gov/pubmed/27006098), which I find quite amusing.

More than just being eaten, apparently L-DOPA is released into the environment at an impressive rate in some cases (https://www.ncbi.nlm.nih.gov/pubmed/24598311).

Yes, I was wrong about that. Alfalfa seeds and sprouts don't themselves use canavanine in their proteins. And yes, plants do seriously get into chemical warfare.

While somewhat true, essentially the question of 20 amino acids still holds and trying to answer it is still meaningful (and in the same way article addresses it). Consider this: how old are these plant insecticides (or whatever they are)? Maybe some 150 mln years old? (I'm only assuming, of course, but it seems like a plausible assumption.) Well, the life on Earth itself is about 3.5 bln years old, and as far as we know it mostly utilizes the same set of 20 proteinogenic amino acids it did back then. It is only natural that given time, constantly mutating and evolving organisms would eventually learn to produce whatever amino acids possible, if it is useful (and being insecticide is obviously useful, if you are a plant, and for reasons you mention yourself it is natural to use as insecticide some compound that is not normally utilized in the other organism). The question is how did it occur that these amino acids weren't present during immune system development? Why it was always the same 20?

Yup and those shared articles don’t make a good counterargument or point..They are off topic. A better article(s) would have addressed the point of the number of amino acids...

Are you talking about human autoimmune disease? If so, do you have an examples and/or links to papers? I've read a lot about autoimmune diseases, and this hypothesis is not something I've ever heard of.

Edit: I see your earlier comment has some links. Thanks.

Yeah, some sprouts are ~dangerous. That is, if you're unlucky to have an immune system that got self-deselected in the "wrong" way.

That makes lots of sense. Do you have any examples of sprouting plants that so this? I'm guessing ones we consider poisonous?

woah that sounds interesting. Do you have some links to videos or articles which talk more about this?

articles like this are really talking about a standard that covers 99.9% of cases, not some sort of universal absolute. We know of many special cases which violate the standard (to the extent that 'life' has 'standards') but it's not clear how valuable it is to point out the exceptions here.

I don’t see your point. Could you elaborate some more?

Please see my reply to Sharlin.

An even bigger question: why is DNA/RNA always expressed in the same way? I.e. the codon->aminoacid table is constant for all living things. I always found this very puzzling. Of course, the easiest explanation is that life really did start from a single cell. But it sounds unlikely that a) there really was that single origin cell, and not a few that popped up around the same time (on a geological scale), and b) no organism has drifted in the millions of years of evolution.

> But it sounds unlikely that a) there really was that single origin cell

This is just a fact. All life that we know of has the same origin. Common descent is a core tenet, and one of the best established facts, of modern biology (https://doi.org/10.1038%2Fnature09014). For there to have been multiple distinct ancestors, we would need to see some relevant differences in the core machinery of life. And the fact is that, despite minor variations, we simply don’t see any. The odds of there being multiple distinct ancestors is astronomical, and would fundamentally impact our understanding of modern biology.

> and b) no organism has drifted in the millions of years of evolution

Organisms have drifted. That’s what evolution is. But evolution plays by rules, it can’t just change things willy-nilly. Changes to the core machinery would presumably either break it outright, or be so strongly detrimental as to be purified away almost immediately. Of course many genetic changes are deleterious to some extent but most effects can either be buffered for a short time because they are minor, or they confer some other advantage. In the core machinery of the cell this is much (!) less tolerated because even tiny chemical inefficiencies would immediately be amplified millionfold. Odds are, the DNA/RNA core machinery of all life sits in a steep local optimum. It’s not necessarily a global optimum but it’s essentially impossible to evolve out of because any individual change, or even a handful of coincidental changes, would leave the organism a lot worse off.

> Common descent is a core tenet...

It's taken that (known) modern organisms descend from a common set of ancestors, but the tree of life isn't a tree. Organisms diverged and merged multiple times along the way to the modern world.

It says nothing about how that/those points of common descent came to be.

What originally constituted living things probably weren't very good at living by modern standards. They probably leaked like sieves, and they probably traded RNA, polypeptides, and other small molecules back and forth.

When was this soup alive, and when wasn't it? I suspect it's just a continuum, and that complicated soup probably went back and forth across that grey zone of living-nonliving many times.

> but the tree of life isn't a tree

That’s correct but I don’t see what this has to do with my comment. It’s still a fact that all modern life, at some point, came through the same individual organism (EDIT: this should be population) which, furthermore, already possessed the fundamental machinery of DNA replication, RNA transcription and protein synthesis (amongst other things).

EDIT: I misunderstood. Yes, you’re right: due to lateral gene/molecule transfer, it’s not certain that the last universal common ancestor was an individual cell, and at that time the label “individual” probably didn’t make much sense (although the paper I linked argues strongly that it was in fact one single cell).

> That’s correct but I don’t see what this has to do with my comment. It’s still a fact that all modern life, at some point, came through the same individual organism which, furthermore, already possessed the fundamental machinery of DNA replication, RNA transcription and protein synthesis (amongst other things)

This is a contentious statement that I don't think is established - a fact that the grandparent comment is trying to raise.

I misunderstood both your and the grandparent comment. You’re right. I’ve amended my other comment.

> Common descent is a core tenet

but that doesn't mean it's been proven through experimentation, does it? isn't it just an assumption?

what if there's something akin to mathematical uniqueness in the mechanics of the core machinery?

I mean it in the sense that it's either the way it is, or it's not really viable over a long enough period of time? such that it could have multiple origins which all nontheless converge into the same core mechanism?

> but that doesn't mean it's been proven through experimentation, does it? isn't it just an assumption?

It’s not an assumption, it’s backed up by excellent evidence — see the paper I linked.

> what if there's something akin to mathematical uniqueness in the mechanics of the core machinery?

Well that’s clearly not the case, we can trivially (…) design self-replicating machines that have completely different mechanics, as a thought experiment. More to the point, we can change parts of the machinery. For instance, we can take the universal genetic code and, with effort, change it into something completely different (by just swapping all codons around). The result is just as viable, but doesn’t exist in nature. In fact, the observed universality of the genetic code, in itself, is already seen as sufficient evidence for common descent (and then some).

> We can take the universal genetic code and with effort change it into something completely different ... just as viable, but doesn’t exist in nature. In fact, observed universality ... is sufficient evidence for common descent.

Or, some would argue, evidence of a common designer.

Some people argue that the moon is made of cheese. I don’t think that merits a mention either.

It’s baffling to me these findings get closer and closer to software, some papers even use the term ‘design’, still other materials postulate our universe could be a simulation, and yet it’s mockable to consider whether there’s something cleverer than ourselves outside our universe that caused thus to be so.

The question seems the same as, could an AI in a virtual world deduce or prove its artificialness or the reality outside the virtual?

To me that’s more interesting than the simple credulity implied by moon cheese.

Variants on the genetic code are known to exist: https://en.wikipedia.org/wiki/Genetic_code#Alternative_genet...

I don't think there are any known to be radically different. Note that it's possible that life arose several times but only the lineage of that one cell survived, perhaps? In any case, it's not like there's been no variation.

Now, here's a mystery (AFAIK): Just where is the genetic code stored? I seem to recall reading an article a while back about how while this seems like an easy question it's actually not known. I can't seem to find it at the moment, though. Anyone know more about this?

The mapping between base triplets and amino acids is encoded in the tRNAs. And the tRNAs themselves are of course again encoded in the DNA sequences used to produce them.

I'd say it's encoded in the combination of the tRNAs and the aminoacyl-tRNA synthetases, which load the appropriate amino acids on to the tRNAs. There's nothing in the tRNAs themselves which picks out a specific amino acid - that happens because the synthetases recognise specific amino acids and specific tRNAs.

The synthetases are also encoded in DNA, so the fundamental point, that the code is encoded in DNA, stands.

Aha! Thanks, hopefully with that search term I can maybe find what I was thinking of. :)

I mean, that's the obvious answer, but I recall reading a thing about why it wasn't actually that simple, how swapping out the tRNAs didn't actually have the effect you would expect. Like, obviously the anticodons match the codons, no problem there, but there was some complicated confusing thing about how the rest of the tRNA matched the amino acid. Trying to find it. Hoping someone knows what I'm talking about.

Presumably, biology has a lot of safeguarding to prevent modified transfer-RNA from being functional.

In terms of potential errors in gene expression, the genetic code is rarely discussed as a potential source of error.

Not to be "that guy", but actually, it is. (https://elifesciences.org/articles/09945) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5026258/)

It's just difficult to study for various technical reasons related to how we sequence it.

Sometime the mistranslation is even intentional! (https://www.ncbi.nlm.nih.gov/pubmed/25220850)

As you mentioned though, there are also quite a few safeguards.

* tRNA synthetase example: https://www.ncbi.nlm.nih.gov/pubmed/27226603

* trans-editing factor example: https://www.ncbi.nlm.nih.gov/pubmed/28737471

Damaged tRNA is even repaired (a bit, sometimes). (https://www.ncbi.nlm.nih.gov/pubmed/28901837)

From an interesting (2018) review of tRNA in general (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6103721/):

> Surprisingly, a perfect proteome is not a pre-requisite for cellular viability even in the context of human cells. Lant et al. demonstrated that a single tRNA mutant can lead to significant mistranslation in human cells [17]. This was accomplished by expressing an Ala accepting tRNAPro G3:U70 variant in HEK 293 cells. The authors visualized a rate of ~ 3% mistranslation using a novel green fluorescent protein (D129P) reporter that fluoresces in response to mistranslation at proline codons. In contrast to previous studies in yeast [18], human cells in culture did not mount a detectable heat-shock response and tolerated the mistranslation without apparent impact on cell viability.

The eLife paper you link to discusses errors in RNA synthesis (transcription), not in tRNA-dependent protein synthesis (translation). The abstract mentions “translation”, but only as an amplifier of transcription errors: each mRNA transcript is translated into proteins thousandfold, so every error in a single mRNA molecule will be present in thousands of protein molecules.

Translation errors also exist, and some people hypothesise that there is selective pressure on protein-coding genes to reduce this source of errors by selecting codons in a way that reduces the error rate (potentially by slowing down the polymerase). This results in something known as “codon bias” but so far there is no good evidence that codon bias has an actual effect on error rate (it does have an effect on correct protein folding), or is selected for (http://dx.doi.org/10.7554/eLife.27344, http://dx.doi.org/10.1371/journal.pgen.1006024).

My first two links are indeed about transcription! Rereading my comment in context now I'm realizing it was a bit unclear. I was responding to "the genetic code is rarely discussed as a potential source of error".

However, I believe my third link is about environmental stress triggering intentional mistranslation (mRNA to protein). From that paper's figure 3:

> Proteins arising from “statistical proteomes” have various folding and binding properties, resulting in phenotypic diversity in the host organism.

I haven't bothered to pull up the related references (5 obvious ones) to assess their strength though.

The papers you linked seem to be claiming that codon encoding preferences (which vary by gene category) are in fact due to (or merely correlated with?) GC content in mammalian genomes (as opposed to a number of other previously proposed mechanisms). This is surprising because individual tRNA abundance varies by cell state and type, so that would have been the obvious (but apparently wrong) explanation. It's doubly surprising because a number of single celled organisms utilize the mismatch between codon preference and tRNA availability in order to regulate protein translation, but these papers are claiming that's not a significant factor in mammals.

> This is surprising because individual tRNA abundance varies by cell state and type, so that would have been the obvious (but apparently wrong) explanation.

tRNA gene expression varies by cell state, but isoacceptor abundance is in fact very stable (at least in mammals). Meaning, if you have a set of tRNA genes which all code for, say, Ala_AGC, the sum of the gene expression of all these genes is relatively stable, even if their individual expression varies (http://dx.doi.org/10.1101/gr.176784.114l; full disclosure: I’m an author on this and one of the previously linked papers).

Why individual tRNA gene expression varies, and how the cell regulates the overall stability, is unclear (my personal pet theory is that secondary tRNA function as regulatory RNA, in the form of tRNA-derived fragments, causes the need to regulate tRNA genes, see e.g. http://dx.doi.org/10.1016/j.cell.2017.06.013).

> It's doubly surprising because a number of single celled organisms utilize the mismatch between codon preference and tRNA availability in order to regulate protein translation

It’s not that surprising: gene regulation happens fundamentally differently in eukaryotes and prokaryotes, and even differently in different classes of eukaryotes. The effective population size (= evolvability) and genome complexity seems to play a role here. Simply put, higher animals have much more powerful and precise ways of controlling gene expression (enhancers and histone control). Regulation at the translation level is comparatively slow and wasteful (it’s several steps further down the line of the gene->protein production process).

It isn't: https://en.wikipedia.org/wiki/List_of_genetic_codes

On the other hand, once a certain layer of functionality exists, and starts being used to build on top of, that layer will resist change, because any variation will throw off so many higher level processes it's unlikely to be viable.

The ultimate technical debt if there ever was one for our species. An "oopsie" at this layer would be pretty insanely difficult to architect and engineer around.

Selenocysteine is a really strange one.

> it sounds unlikely that a) there really was that single origin cell, and not a few that popped up around the same time (on a geological scale)


Life happens rarely enough; even in the 'optimal' conditions of the primordial soup, it's still a pretty miraculous occurrence for a cell to spontaneously form. Then consider this:

- The first cell was probably not 'good' at surviving, even in those conditions. It probably sucked at it, and was just barely good enough at doing it to reproduce a little and evolve a little.

- So, a cell which is actually good at surviving is pretty much guaranteed not to form spontaneously. Meanwhile, the lineage of the first (shitty) one has evolved to be pretty damn good at it.

- If a new cell tries to form, even if it's better at surviving than the Original cell, it'll probably get instantly outcompeted by its progeny.

Voila, single ancestor cell.

> b) no organism has drifted in the millions of years of evolution.

Why should they? There's no pressure to.

I read long ago about someone doing computer simulations, finding that the current table is near optimal when it comes to error resilience (against mutations etc) as well as encoding efficiency IIRC.

This would answer why most are using it now. If other cells uses a different mapping that's less optimal, they'll have a harder time to reproduce, and thus lose out over time.

That would be it. Thanks for digging it up!

Yeah I think the answer is pretty clear why there's one. It was slightly better than the others in some ways like error resistance (important on a young, ie recently formed, where there's no atmosphere to protect from UV and other comic radiation) and then it just out competed. It's easy to forget just how long it took to go from single celled life (4000 MYA) to multi celled life (~2100 MYA, first fossil evidence at least). That's almost 2 billion years for different versions of DNA/RNA/Amino acids to be tried and tested with almost no chance that we'd find fossil evidence of it. (Not even sure if you could tell from a fossil at all honestly)

IIRC the standard argument against "not a few that popped up around the same time (on a geological scale)" is that geological scale is so much, much longer than biological scale, that the time between two events that would be "around the same time (on a geological scale)" is sufficient for that single cell to multiply in immense amounts, colonize the whole of world's oceans, evolve significantly, and (most importantly) eat up all the easily available resources that allowed it to form. If we look at the 'primordial soup' theory, it's about concentrations of aminoacids that can accumulate only if life is not present - once the first life spreads through the oceans, it'll colonize and consume every location that's sufficiently rich for new life to spawn.

This small snippet from Wikipedia claims that DNA/RNA is not always "expressed in the same way" and that bacteria en eukaryotes have a different (albeit related) amino acid for AUG (Methionine vs. N-formylmethionine). So those claims here and in the article (even the title) are not completely true.


Methionine is one of only two amino acids encoded by a single codon (AUG) in the standard genetic code (tryptophan, encoded by UGG, is the other). In reflection to the evolutionary origin of its codon, the other AUN codons encode isoleucine, which is also a hydrophobic amino acid. In the mitochondrial genome of several organisms, including metazoa and yeast, the codon AUA also encodes for methionine. In the standard genetic code AUA codes for isoleucine and the respective tRNA (ileX in Escherichia coli) uses the unusual base lysidine (bacteria) or agmatine (archaea) to discriminate against AUG.[15][16]

The methionine codon AUG is also the most common start codon. A "Start" codon is message for a ribosome that signals the initiation of protein translation from mRNA when the AUG codon is in a Kozak consensus sequence. As a consequence, methionine is often incorporated into the N-terminal position of proteins in eukaryotes and archaea during translation, although it can be removed by post-translational modification. In bacteria, the derivative N-formylmethionine is used as the initial amino acid. [0]

[0] https://en.wikipedia.org/wiki/Methionine

> But it sounds unlikely that a) there really was that single origin cell, and not a few that popped up around the same time (on a geological scale)

This is not a necessity. It is entirely possible life on earth was formed multiple times, completely independently and our lineage simply out-competed other sort(s) of life; and somehow we lost the evidence of it, or couldn't find it yet. As it's suggested by other comments, it's a well established fact that the life we know of share the same single origin (this organism(s?) is called the "Last universal common ancestor" or LUCA [1]); but that doesn't mean there has been a single sort of life our lineage ever interacted with, it just means our lineage was the only good-enough life for the conditions on earth for most of the geological periods.

[1] https://en.wikipedia.org/wiki/Last_universal_common_ancestor

If I had to relate this to an HN related topic, I'd say this is closest to the theory of "technical debt". The further your code base moves away from the origin, the more difficult it is to address issues in the foundational code. Fudging around with the basic building blocks of life is extremely risky in an evolutionary perspective, gains can be made but they are going to be smaller (less advantageous) than gains possible by fudging around with top level code. By making base level changes, you effectively halt top level change until everything is proved out. By the time a base level change would have stabilized and the gains realized, the more nimble organisms making top level changes will have long since outcompeted the base changer into oblivion.

The codon->amino acid table is not always expressed in the same way within human cells. Mitochondria utilize different codons for the same amino acids compared to the nuclear genome.

Fascinating. Multiple different DNA based instruction set architectures.


Genetic Code Variants

For most organisms the "stop codons" are "UAA", "UAG", and "UGA". In vertebrate mitochondria "AGA" and "AGG" are also stop codons, but not "UGA", which codes for tryptophan instead. "AUA" codes for isoleucine in most organisms but for methionine in vertebrate mitochondrial mRNA.

wikipedia cites to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?

Perhaps there's a first mover advantage such that any new gene system just can't compete with established life.

It's not so much first mover advantage I think as established life having had tons of time to optimize. Tierra[1] illustrates how any new system has to be much better than the established to have any chance of surviving.

[1]: https://en.wikipedia.org/wiki/Tierra_(computer_simulation)

Pardon my ignorance, but hasn't it been trivially known why orninthine and DAB don't make proteins? This was a test question in undergrad bioorganic chemistrty class 15 years ago. Ornithine can self-cyclize to make a six membered lactam, and DAB will self-cyclize to make a five membered lactam, which are energetically favorable, and thus chain termination compete with chain extension polymerization, making them unsuitable for proteinogenesis.

edit: read the paper carefully. They came to the same conclusion. But... Is this new?

> ‘We thought that, in general, all of these amino acids would react similarly because they are structurally similar,’ says Leman. But while almost all the experiments did produce oligomers, the three proteinogenic amino acids reacted more efficiently and produced fewer side products compared with their non-proteinogenic counterparts. ‘That came as a real surprise. We thought “Is this for real?”,’ Leman says.

Looking at the structures, I would not expect similar reactivity:


They differ in both size and shape. More importantly, they differ in the length of the tether to the positively charged group which could easily play a role in carbonyl activation by this unit, either in the forward direction (peptide formation) or reverse (peptide hydrolysis).

The paper doesn't mention autocatalysis (catalysis of external amino acids or short peptides themselves), but this is also a possibility. There's a large body of synthetic chemistry in which amino acids and short peptides show remarkable catalytic activity.

But the main problem with this study is that peptides are made biologically through catalysis. What we observe in isolated system reactivity has no reason to translate into what's seen in nature because enzymes offer lower-energy transition states.

The translation system's universality is the best proof that there is a LUCA.


LUCA = Last Universal Common Ancestor

Note that the paper gives a hint about why 3 of the 20 amino acids were selected, it is not clear that the same rule can be applied to the other 17 amino acids and why the number is 20 (or approximately 20).

From a comment in a previous post:

In the research paper they analyzed a few amino acids like Lysine. Lysine is an amino acid that has the usual amino group and the usual acid group in one side, and it has an additional amino group in the other side. In the study they compared Lysine with Lysine-like amino acids that are shorter and the additional amino group are closer to the usual amino group and the usual acid group. For example https://en.wikipedia.org/wiki/Ornithine

They found that the usual amino acids like Lysine are better to form spontaneously protein-like chains than the shorter versions when they are in a solution that gets dried. I'm not sure if this is enough to explain why Lysine is used in proteins but it's an interesting result anyway.

In the more optimistic case, the research article "explain" why the 3 usual amino acids that they used are better than the 3 shorter variants that they used. It doesn't "explain" why the other 17 amino acids where selected.

In particular they used amino acids with an additional amino group, so the polymerization can get "confused" and instead of using the usual amino group use the other group, so instead of a nice chain, you get some other structure. There are 17 of the other usual amino acids that don't have an additional amino group, so the polymerization process can't get confused.

Never change a running system!

There has probably been a lot of evolution, maybe one or two billions of years since the last common ancestor of all known life on earth. And that LCA had that system. It's a bit like ASCII.

Another possibility: life came to Earth already well formed (panspermia), and that single ancestral introduction rapidly filled the terrestrial biosphere so no further introductions could take hold.

Assuming that abiogenesis wouldn't happen very frequently (given the right set of conditions, a mean time to happen on the order of thousands to billions of years), a single life-forming event would have the same effect, no? After a foundation is set, the self-replicator starts exploiting existing energy gradients, making any other abiogenesis-event much less likely.

Panspermia is such a plausible theory. We know that life spreads like crazy if you let it -- try to keep something sterile and life always finds a way in, even into our sealed jars and cans, volcanoes, the bottom of the ocean, etc. I wouldn't be surprised if one day we find that Mars is teeming with bacteria that hitched a ride on one of our rovers and adapted.

Almost certainly Mars would have been seeded billions of years ago, w. bacteria hitching rides on ejected debris from asteroid impacts, when conditions were much more hospitable there. And of course it could have gone in the other direction also. Even autopanspermia -- where ejecta from a planet-sterilizing megaimpact later comes back to the cooled off world -- is a possibility.

No offense to the good discussions here, but this article is no more than a press release. It doesn't even directly address the question in the title!

Again no disrespect to the talk here, but this is a Stack Exchange kind of question, and lo there is an SE answer: https://biology.stackexchange.com/questions/653/why-20-amino...

Why is almost everything written in C if you go down enough levels? Why does everything run on Unix-heritage OSes?

Answer: because it's built on stuff that was built on stuff that...

Is the frequency / preponderance distribution of amino acids known?

If so, is it known over time to any extent?

(I'm going to presume that to a meaningful extent it is not as any potential samples are going to be largely recent -- past few thousand years, maybe tens of thousands given frozen samples. Amber possibly excluded.)

What does that distribution look like? Zipf? Normal? Other?

This isn't an important distinction presently, but keep in mind this is for Earth life. If we someday encounter alien life, they may be different, which could lead to some trouble if people are not careful [1].

[1] https://www.youtube.com/watch?v=eEeLhqNLhZw

So where are the essential amino acids actually produced?

And to follow on to that how could life get started without being able to produce amino acids? It doesn’t seem like something early life could evolve gradually. Are they then naturally occurring?

“Essential” amino acids are those that humans can’t synthesise. Other life forms can, which is why we ingest them with our food.

As for how early life started, presumably with a strongly reduced set of amino acids. Some of these are simple enough chemicals which form spontaneously given the right conditions (technically all of them can, but probably at very low rates). This was famous established by the Miller–Urey experiment (https://en.wikipedia.org/wiki/Miller–Urey_experiment).

We are carbon based life forms, but researchers have hypothesized that there are probably sulfur based lifeforms out in the universe.

If I remember correctly, there's over a hundred various amino acids. For humans, we have only 20.

Have heard that Silicon and Germanium have 4 bonds like Carbon does. Follow the periodic table down.

> Have heard that Silicon and Germanium have 4 bonds like Carbon does

This is true of all group 14 elements.

Carbon is the lightest and thus most-common group 14 element. Carbon and silicon are the only two group 14 elements lighter than iron, which means they're the only ones produced by stellar burning. (Everything heavier than iron is produced exotically, e.g. by supernovae and neutron star collisions.)

Most (chemical) life is thus probably carbon based, with a minority running on silicon. If germanium-based life exists, it's near a supermassive black hole.

Uller Uprising, H. Beam Piper

Sci-fi novel based on the Sepoy Mutiny. Life on Uller is based on silicon.

I thought it was sulfur, but it may have been silicon that the researchers had used.

Perhaps because it was designed by someone with a great deal of intelligence but not a whole lot of imagination?

I mean, if your greatest creation is something made in your own image, how does that demonstrate imagination?

I am far more amused than I should be by your implication that God is made of protein, and probably edible.

So God created man in his own image, in the image of God he created him; male and female he created them. (Genesis 1:27)

But was it a jpeg? A gif? What type of image file?

Obviously a GIF, but with its name pronounced correctly.

Delicious, he created them.

that explains communion

so the flying spaghetti monster?

So, tl;dr answer is that it's because those amino acids are more eager to react/polymerize.

I wonder what it means for exobiology, is this true only in Earth conditions or in general, i.e. can we expect life elsewhere to use the same 20 amino acids? Also, if not, how toxic/dangerous would that life be for humans?

They only analyzed 3 of the 20 amino acid, and they found a reason that perhaps explain why they are used. It's only a (good?) reason, but it's not a definitive proof. The conditions are quite simple, something like a small pond that can be desiccated and collect some water again later. When the "soup" is almos desiccated it is easier that the amino acid form small proteins-like chains.

About life in other planets, nobody knows, so it's guessing time:

The 20 amino acids are some of the ones that get formed spontaneously in the prebiotic conditions, so it's probably that they exist in other planets. In some proteins, the amino acids are modified after the protein is produced, so a few additional amino acids may be a nice feature. Some amino acids of the proteins are quite similar and perhaps we can live without them. So will independent life in other planets:

* use the same 20 amino acids: Probably no. Unofficially: No way.

* use the exactly 20 amino acids: Probably no. It's a hard question, there are some paper that try to justify the number 20 or a similar number, but I'm not convinced. I guess that something between 15 and 30 amino acids.

* Use some set of amino acid similar to our 20 amino acids: Probably. I guess yes.

* Use the same L variants as us, or use the specular D variants: Let's say 50%.

* Use proteins at all: Another hard guess, I'd say yes. Proteins are very versatile and easy to build in different configurations. It's hard to guess a replacement.

That's the real question! It might be far-fetched, but to me it seems that if carbon life exists elsewhere, it probably uses the same biochemistry that we do.

Tldr: Those 20 work more efficiently than the others.

Because they work :D

Yes. At least, they work a lot better than the others.

‘We discovered that there are purely chemical factors, based on higher polymerisation reactivity and fewer side reactions, that might have contributed to this selection process.’

while others??

Don't fit the job

but isn't it surprising that such a strict subset is maintained across all life?

Not at all. It would be surprising if it wasn't maintained.

Evolution can progress very rapidly, but the core machinery which drives life is very conservative.

A good analogy would be to look at other codes. In the computer world, consider ASCII. Each number maps to control code, letter, number or symbol. This encoding is entrenched. Imagine how hard it would be to change a single letter of ASCII to mean something else. Most of the hardware and software on the planet would require updating. It's not just that interoperability is important. It's that every piece of software on a single computer system would require updating, in lockstep, to transition from the old to the new encoding. This would be an almost impossible feat.

The same constraint applies to DNA encoding of protein (and other) sequences. There are multiple pieces of the machinery which would require changing in synchrony for the result to work and result in a viable living organism. A triplet coding system change would require almost all instances of that coding triplet changing to retain existing structure and function in every protein using it, new enzymes to synthesise the new amino acid and tRNA, along with all of the associated regulatory and control systems. Only at that point could you start using the new amino acid triplet sequence in a new or modified protein. Evolution works by single small changes and natural selection. Making several big changes is extraordinarily unlikely.

It's easier to make such a fundamental change in simpler organisms where the scope of the change is limited. And this is likely why the small number of variations we see in genetic encodings are both in the lowest forms of life, and are largely superficial. The current encoding is entrenched as a result.

No others were available?

at a certain evolutionary point in time?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact