Hacker News new | past | comments | ask | show | jobs | submit login
Adding new DNA letters makes novel proteins possible (economist.com)
161 points by pseudolus on Jan 17, 2019 | hide | past | favorite | 35 comments

This is cool scientifically, and also a cool example of how amazing things can happen with interdisciplinary collaboration in biotech.

When you have a new technology with so many potential applications (adding 152 extra codons on top of the normal 64), it can be overwhelming to figure out where to start. Getting from "we can add letters to the DNA alphabet" to "we can build new amino acids into IL-2 to facilitate PEG binding to stop IL-2 binding to the alpha unit of the IL-2 receptor while still binding the beta and gamma units, which will preserve anti-cancer effects while sparing vascular damage" seems like a leap that would require a cross-disciplinary team to make

IL-2 is a molecule that is effective at treating a variety of diseases but has nasty side effects that make it a poor drug. The article discusses sometimes fatal side effects of high dose IL-2 in treating cancer; low-dose IL-2 is actually an effective treatment for autoimmune disease, although it is difficult to titrate the dose in such a way that you don't accidentally dose too high and make the autoimmunity worse.

Another startup, Delinia, engineered an agonist selective for the alpha-beta-gamma subtype of the IL-2 receptor to treat autoimmune disease [0] (where Synthorx is hitting only the beta-gamma part of the IL-2 receptor to treat cancer). Delinia was acquired by Celgene last year for $775M, 3 months after its Series A. Synthorx went public this year and has a $450M market cap. All of that from making different versions of a single protein

[0] https://lifescivc.com/2016/09/re-balancing-immunity-via-regu...

eDNA is one of the more creepy things you can think of in a dystopian future e.g. you can engineer humans to require a specific synthetic amino acid that doesn’t appear in nature this is essentially a potential biological DRM.

While applying it to humans directly is unlikely applying it to be some sort of DRM for drugs to prevent generics from working is quite a realistic possibility.

Ketracel white?

I want more life father/fucker.

Some people have not watched Bladerunner.

Combine this with some sort of blockchain straight on your DNA, you could have a permission system to grant/deny access to things, that would work not only for your biological system, but with binding contracts on the real word. Like bad credit, or denying access to your bank account, to kill someone by making his system 'forget' how to process a given amino acid, or make them dependent of some drug to live so a government could just wait a "traitor" come to them so he/she can keep on living..

People speculate there may have been a wider variety of DNA coding in the past. But natural selection plus perhaps some reaction energetics versus complexity settled on the current system.

There was probably a simpler two nuclide encoding versus three beforehand. About half of the amino acids only use the first two nuclides and ignore the third.

that seems unlikely, because shifting your recognition domain count from 2-3 means that you basically lose all the evolved information from before and have to rely on chance "correct encodings" everywhere.

The idea is that the initial tRNA was not specific enough and only care about the first two letters of each codon and ignored the third. So for example Proline was determined by the first two letters CC? and was associated the four codons CCU, CCC, CCA and CCG. Actually, this is the current mapping.

Other blocks of four codons were split for some reason. We can imagine that originally Isoleucine was determined by AU? so initially AUU, AUC, AUA and AUG encoded Isoleucine, but now only the first three encode Isoleucine and the last one encodes Methionine instead.

This is somewhat based in the blocks of four codons that follow this patter where the first two base determine 16 block that sometimes are split https://en.wikipedia.org/wiki/Genetic_code and because the third base in the tRNA is strange https://en.wikipedia.org/wiki/Wobble_base_pair

Anyway, IIRC this is a reasonable speculation but it's not confirmed. So don't take this explanation too literally.

With this idea, the initial DNA could evolve for a few (zillions) years as list like


and then make the whatever letters also important with a almost backward compatible code, so in most case it still doesn't mater, but in a few cases it is important.

[Note: The official letter for whatever is "N" instead of "?"]

That's a great explanation! To add a cool point, the wobble position is frequently modified by highly specific enzymes to make it matter more. It's like some random protein mutated to do this modification and all of a sudden the organism got more RAM thus increasing it's fitness.

Yes, that is generally the accepted idea, but a two letter codon is not the same thing as a three letter codon with an absolutely ignored wobble.


You start with a two-letter code, then something evolves that puts an (initially) rare third letter at a few locations on the tape. All the old "gear" that reads two-letter code can still read most of the tape.

It is difficult to imagine that as a possibility. The spacing of three mRNA nucleotides is pretty important structurally in the process of translation from mRNA to protein via the ribosome. It is difficult for me to imagine a ribosome that could operate arbitrarily between codons of two and three nucleotides, unless I am misunderstanding your comment.

A translational reading frame consists of non-overlapping codons of three nucleotides. If one nucleotide is skipped, the entire downstream message is thus garbled. So how would the translational machinery operate if each codon arbitrarily consisted of two or three nucleotides?

I see what you are getting at, but I think what the other comments are saying is that for the third nucleotide position of a given codon, it does not matter which nucleotide this is. The amino acid to be used would only depend on the first two nucleotides, while the third nucleotide can be AUCG.

Nah, I misunderstood completely.

I was thinking of how a hypothetical code might, in the abstract have evolved from binary, through ternary up to the (current base-4).

I haven't got enough biochem knowledge to speculate how three nucleotides per amino acid can evolve to have three.

They have done this with a four letter expanded codon. The fitness of the bacteria tanks.

>> About half of the amino acids only use the first two nuclides and ignore the third.

I've often thought some of that redundancy in the code could be a feature. Important (more sensitive) sequences could evolve to a coding that is more robust against mutations, while things that are less important could be more brittle in their encoding. This seems hard to prove though.

It also allows a particular triplet to have more neighbors, meaning you can go from one amino acid to more options without going through intermediates.

I might be missing something but I don't see how adding DNA nucleotides can lead to novel proteins. You have more letters to potentially map to amino acids, but unless you've also expanded your set of amino acids, how does this lead to new proteins? And did they also design new t-RNA's (the things that maps RNA to amino acid)?

My interpretation, though it doesn't go into it in the article, is that ribosomes are capable of translating abnormal tRNA bases to abnormal amino acids; they just normally don't do it because there aren't tRNA strands floating around with abnormal bases on them.

They happened to find an abnormal DNA/RNA base pair, and it happened to be mapped by the ribosome to a PEG amino acid (I think the implication is NH2-PEG-COOH?), and relized if they swapped out an existing amino acid in IL-2 with that PEG-based amino acid, the protein would be less toxic?

Another thought: it might not be strictly a RNA-base-X maps to amino-acid-x type of operation; RNA-base-X might map to either amino-acid-x or amino-acid-y based on concentrations of AA-x and AA-y, or even based on neighborhood structure of the protein that's under construction, and they just got lucky with IL-2 or they figured they could put tons of PEG into a cell and get most IL-2 produced with that PEG-based amino acid?

They have indeed expanded their set of amino acids. I assume they must have added tRNAs as well but the article didn't go into this detail.

Romesberg got mRNAs up about 3-4 years ago? I presume the tRNAs are made from processed DNA, not by adding them (adding tRNAs exogenously to bacteria would be tough, and not economical at all).

I think there are theoretically an infinite number of amino acids out there. An amino acid has a backbone structure and two side chains, but in principle the side chains can be anything.

Floyd was Pete Schultz' postdoc, one of Pete Schultz' claims to fame is really nailing down unnatural AA stuff. IIRC, the peg UAA has been around for quite a while (at least a decade).

Floyd does very nice work, but this particular justification -- at least as expressed in this article -- for a 4-base code is completely flawed. The article claims that the reason to augment the codon table is because only two stop codons could possibly be rewired to code for artificial amino acids. But in fact 43 out of 64 codons could in principle be recoded. That's because of the extraordinary redundancy built into the codon table of 4^3 = 64 codons: there are only 20 coded amino acids, plus one necessary stop codon.

So that leaves 43 codons, not 2.

It's not that simple. Even when two codons map to the same amino acid, it doesn't mean they have the same implications.

One factor is that different codons translate at different speeds, and these can affect how the protein folds.

Another is that base pairs in DNA may have functions or implications apart from their function as part of a codon in a gene.

Simply remapping a codon to a new amino acid and re-writing all the genes in a cell's DNA to avoid that codon will cause many things to "break" in a cell's function. Life is a messy, inter-connected system without the clean modularity we like to have in software.

"can," "may," ...

1) My argument was simply that the limit on recodable codons is much higher than two, as it was claimed in the article. That's hard to argue with. I didn't say the result in every case would be neutral mutations.

2) The consequences of recoding the stop codons are also not neutral. For example, when Isaacs (eventually) did it, there were severe growth phenotypes in the resulting strain.

So I agree that "it's not that simple," which is frankly why I'm not hugely optimistic about this class of endeavors. But the point stands that there is no need in principle for a 4-base code.

This would require rewriting all the existing DNA, though, to collapse all existing code for those 23 dual-coded amino acids to whichever code we decide is canonical.

Not really. It's just a question of codon frequency. Organismic codon bias ensures that some codons are actually used quite rarely in the genome already. Specifically, the frequency of TAG stop codons is ~0.03% in E. coli, and the frequency of TGA is 0.1%. By comparison, the frequency of CTA leucine codons is ~0.3%. So it's only 10-fold more effort than the easiest codon, and ~3X more effort than the next easiest. And it may really not be necessary to recode all of them.

The article talked about directly re-using stop codons like some exotic organisms do to integrate pyrrolysine, but there is another more nuanced way which humans use for selenocysteine: https://en.wikipedia.org/wiki/SECIS_element

So... like... what happened, exactly?

It sounds astonishing, but I can't make heads or tails of it. Rosemberg and his team managed to add new letters ("letters") to DNA encoding, and that allows them to make new proteins... how? Did they make new kinds of acid bases for these proteins – the kinds that don't exist in nature? Is that not an even more astonishing achievement?

I'm way out of my depth here, but I'm also intensely curious about anything related to genetic engineering. Could someone explain this to me?

> So... like... what happened, exactly?

A group added a new DNA base pair ("X" and "Y") to a strain of E. coli.

In genes, codons are triplets of DNA letters that transcription and translation machinery converts into specific amino acids. The mechanisms by which this happens are complex, but well understood.

The new DNA letters were used to make new codons that could be translated to particular, novel, amino acids.

So, did they effectively invent scalable natural-mechanism synthetic amino acid production? Is it at all as cool and biopunk as I imagine?

I wonder for curiosity and concerns; how and why the new DNA letters have not already evolved and do not already feature in the present range of living species? Should a firm understanding of this, no matter how far off, not be a prerequisite to aspirations of mass production of novel proteins and new lifeforms featuring them?

Novel DNA and proteins - after billions of years natural CI... we better be sure to have a damn good incident response capability! But do we?

> Adding new DNA letters make novel proteins possible

> Adding new DNA letters make

> Adding make

OK, updated over here.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact