
New Letters Added to the Genetic Alphabet - treefire86
https://www.quantamagazine.org/20150710-genetic-alphabet/
======
veddox
So these guys come along, casually expand a well-established code and test it
under precisely one small condition. And when this one test gives them some
nice data, they say: "Hey, our stuff is better than everything all of nature
has ever done!"

A very intellectually stimulating endeavour no doubt, but I expect some more
tests before I would call this good science. Claiming that "the new additions
appear to improve the alphabet" is simply extrapolation to the nth degree. [1]

Oh and by the way, when the article claims that

> "the three-biopolymer system may have drawbacks, since information flows
> only one way, from DNA to RNA to proteins"

that is not correct either. For more information, read up on epigenetics.

[1] Note that this quote comes from the article, not the original paper. The
original paper is not quite as cocky (at least not in the abstract, but I
don't have full access).

~~~
ken_e
I can't help but notice the irony in your comment. So this guy comes along and
having read one article, says "I can't call this good science."

A very basic summary of molecular biology of the cell:

DNA - library of blueprints, basically instructions on how to build proteins

RNA - copies of blueprints you take out of the library to build proteins so
you don't expose DNA to unnecessary hazards

protein - catalyzes reactions so the cell can do stuff, including make new DNA
when replicating.

A fundamental conundrum exists when it comes to evolution of this mechanism.
DNA is needed to build proteins, but proteins are needed to catalyze the
reactions necessary to build DNA. It's a chicken and egg problem... what came
first?

When it was discovered that RNA can catalyze certain reactions, presumably
because of their slightly higher chemical reactivity, it suggested a way out
of this conundrum. What if life originated with RNA only, where RNA acted as
both the hereditary and catalyzing machinery?

The problem with this RNA World hypothesis is that the reactions that current
RNA can catalyze is very limited. But what if, at the origin of life, when
nature could experiment, the genetic alphabet that RNA could play with was
bigger, potentially leading to expanded capability? Scientists like Benner
have worked for three decades to try to answer this question.

So, your characterization of "comes along, casually expand" a well-established
code and claims it's better is grossly unfair.

~~~
veddox
I am well aware of the basic molecular biology of the cell, as well as the RNA
world hypothesis.

All due respect to Benner for his work - my comment was rather too pointed,
I'll concede that. Nonetheless, I am always wary of too much theory being
induced from too little data.

Benner's experiment shows that an expanded genetic code can form molecules
that show greater chemical functionality in a given situation than that of
natural DNA molecules. Now a quote from the abstract:

> This suggests that this system explored much of the sequence space available
> to this genetic system and that GACTZP libraries are richer reservoirs of
> functionality than standard libraries.

Already he is starting to extrapolate when he starts talking about the
extended libraries in general. The Quantamagazine article then goes on to say:

> In other words, the new additions appear to improve the alphabet, at least
> under these conditions.

That is true, but for a rather narrow definition of "improve", and a very
narrow set of conditions. The result is that the superficial reader goes away
thinking "they've made a better DNA".

------
dibujante
There are actually 84 possible combinations with 4 base pairs if you accept
sequences of length < 3.

However, if you assume all sequences are length 3, you still get 64
combinations.

We only use 20 out of that space. And if you look at how base pairs encode to
amino acids, for half of them, only the first two base pairs even matter -
since it's prefix-free you can guess the amino acid if you see those two and
even ignore the third.

Given how underutilized this space is, I'm not convinced that increasing the
domain to 216 will lead to much more than the ability to express our current
amino acid space with only two base pairs.

~~~
Terribledactyl
Minor >> Underutilized may not be the best way to think about it. There is a
lot of redundancy that affords protection against mutations.

~~~
veddox
And actually, even the redundancy is not complete. A recent paper showed that
seemingly "redundant" variations of triplets led to a slightly different 3D
folding structure of the DNA, with effects on the physiology of the cell.

~~~
sciencerobot
did the have an effect on the translated amino acid sequence or translation?
Do you have a reference?

~~~
veddox
[http://journal.frontiersin.org/article/10.3389/fgene.2014.00...](http://journal.frontiersin.org/article/10.3389/fgene.2014.00140/abstract)

Sorry, made a mistake in the comment above. It's not the DNA's structure that
is changed, it's the resulting protein's. (Different codons slightly alter the
rate of translation, leading to a different folding of the protein.)

~~~
sciencerobot
Thanks for replying with the reference. That is very interesting

------
shiggerino
It would have been nice if the author would have at least acknowledged that in
reality they are nucleobases and not tiny, tiny letters curled up in our cell
nuclei. Sure, 6-amino-5-nitro-2(1H)-pyridone and 2-amino-
imidazo[1,2-a]-1,3,5-triazin-4(8H)one doesn't say much to us laymen, but just
saying letters and not mentioning once what they stand for is really poor
reporting.

~~~
Roodgorf
I suppose the heading and maybe opening paragraph could be seen to give that
impression, but the article makes note of the fact these are new nucleotides
numerous times throughout the article. I found that pretty sufficient for
showing that there are not in fact actual letters in DNA.

~~~
Dylan16807
It still goes the entire article without naming the chemicals, which is really
annoying.

And a picture of the three pairs wouldn't hurt.

------
pavel_lishin
Nitpick: it wouldn't be a potential 216. Some three-"letter" sequences code
for the same amino acids, so instead of 4^3 (64) possible amino acids, only 20
are generated. Adding new letters doesn't change what these _old_ words
create, so I think there would only be a possible maximum of 172.

(I think I did my math right, but maybe not.)

(edit: thanks duaneb, had my basic bio facts wrong - codons code for amino
acids, not proteins.)

~~~
duaneb
Nitpick two, codons don't encode for proteins but rather components of
proteins (amino acids) that the RNA/ribosome "interprets".

At least that's what my high school bio taught me.

------
MaxScheiber
I'm not convinced that this is necessarily a good idea biologically,
especially after talking to a couple of my friends that are researchers in
this space. However, this seems quite interesting for non-biological
applications. Take cold storage, for example--with a third base pairing, we
can obviously develop an even denser data storage format than with regular
DNA.

------
jey
Neat, but extending amino acids would be even cooler. DNA is mostly "just" a
string encoding for information, like binary or hexadecimal. Proteins on the
other hand are the actual machines whose blueprints are written in DNA, and
they're built out of amino acids. Extending the set of amino acids could
extend the set of basic building blocks available to create biomolecular
machines.

Of course, teaching ribosomes to handle them and etc will take a lot of
additional work, but identifying promising new amino acids would be a nice and
major first step.

~~~
phkahler
>> Of course, teaching ribosomes to handle them and etc will take a lot of
additional work, but identifying promising new amino acids would be a nice and
major first step.

There are a couple other amino acids in the tree of life. The mapping from
base pairs to aminos is not completely static. And then there's Selenocystine
(Sec) which is coded in a very unusual way.

I've often thought the redundancy in the encoding allows mutations to have no
effect, so a protein that is well established and important could have a more
stable encoding and new things still in flux could be more prone to evolving
(less stable encoding). But I have no real data on this.

------
logfromblammo
This sounds a lot like a story from 20 years ago, that was probably in
Discover Magazine or Scientific American. The new nucleotides at that time
were labeled kappa and chi.

And as a point of fact, three-base segments of DNA to not have a one-to-one
mapping to amino acids. I also believe that a non-standard use of one of the
three stop codons can change an encoded methionine to selenomethionine, with
similar special cases for other proteins using rare amino acids.

Furthermore, 6^3=216, but that doesn't mean that adding a new base pair can
code for that many amino acids. The original set of 4, with 64 possible
codons, usually encode for 20 amino acids (excepting special cases as with
selenomethionine). mRNA also employs uracil and tRNA adds hypoxanthine. These
lead to " _wobble pairs_ " which in turn allow a single tRNA to match several
different-but-synonymous codons.

As it stands now, every codon without a matching tRNA would be a different
variety of stop codon.

Now, what would be interesting to me is if the P-Z pairs could match some tRNA
anticodons that translate stereoisomers of the standard 20 amino acids (or
actually just the 19 that are chiral). That way, the D-(KLAKLAK)2 apoptosis
promoter sequence could be synthesized directly by the ordinary transcription-
translation mechanics of a cell.

------
NoMoreNicksLeft
This article is ignorant.

>Why nature stuck with four letters is one of biology’s fundamental questions.
Computers, after all, use a binary system with just two “letters” — 0s and 1s.
Yet two letters probably aren’t enough to create the array of biological
molecules that make up life. “If you have a two-letter code, you limit the
number of combinations you get,” said Ramanarayanan Krishnamurthy, a chemist
at the Scripps Research Institute in La Jolla, Calif.

This simply isn't true. Even with regular DNA, the word size is 3 nucleotides
long... giving you 64 instructions. If I remember my highschool biology, only
some of these are even used, the rest are duplicates or unused.

Binary would work too, assuming ribosomes and mRNA could expand the word
size... you only need 6 bits to do the same as natural DNA.

Is there something I don't know that fixes word size at 3 nucleotides?

------
apalmer
Not sure I understand the benefit, it's denser, on the other hand from what I
understand DNA generally does have much in the way of size constraints. If I
remember large swathes of DNA is inactive and there isn't selective pressure
to clean up this wasted space. Coupled with the fact that it is apparently
more error prone and seems to show why evolution didn't go down this path.

Probabably will be very useful for synthetic purposes where there isn't too
much concern about fidelity after 10 million years of copying.

~~~
gherkin0
> Not sure I understand the benefit, it's denser, on the other hand from what
> I understand DNA generally does have much in the way of size constraints.

I'm a layman, but they could use the new base pairs to code for unusual amino
acids allowing for proteins with novel chemistry.

Also, I think DNA is pretty much only used to encode information, but RNA has
important chemical roles (e.g. ribozymes), and the new base pairs open up
similar possibilities with that.

------
DDickson
Sequel to GATTACA(PZ)?

~~~
kenj0418
18 years ago, and I just now realized the title was chosen because they are
all DNA letters.

------
trestletech
Oh, good. Bioinformatics data wasn't big enough with two bits per nucleotide.

------
mjfl
Very interesting concept. One thing I noticed after developing several genetic
algorithms on my own is that they tend to give a good creative _hint_ at what
the solution to the problem should be, which the human mind can then interpret
and produce what the genetic algorithm was "trying" to approach. I wonder if
the same could be true with biological evolution, that there are better ways
of storing genetic information than DNA and all that, but that DNA is a good
guideline to what should be done.

------
mbq
Even with PZ DNA would have major and minor groove rather than being a
symmetrical double helix beloved by virtually all illustrators, sadly also
those of pop sci articles...

------
gherkin0
IIRC, E.T. (from the movie) had DNA with six nucleotides.

~~~
sciencerobot
and humans have 40 DNA memo groups (5th element)

------
jacob019
The "enhanced" DNA escapes into the wild where a new pathogen spreads over
earth. All life is defenseless against the bizzare genetic alphabet...

~~~
dnautics
And since p and z are synthesized in the lab how would this escaped organism
eat?

~~~
sciencerobot
Life finds a way.

~~~
dnautics
Why hasn't life found a better way to fix nitrogen yet? Nitrogenase is a
godawful enzyme... Wastes three hydrogen gas molecules for each turn of the
crank.

