
Why does the Wuhan coronavirus genome end in 33 A’s? - jamiesonbecker
https://bioinformatics.stackexchange.com/questions/11227/why-does-the-wuhan-coronavirus-genome-end-in-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
======
epistasis
It's times like these that we come to depend heavily on basic science, that up
until now few would have realized is really important. There's a long chain of
publications cited in the top answer, that all represent years of person hours
trying to discover these behaviors and then share them with the world.

Politicians sometimes ridicule basic research in an attempt to claim that
government is irresponsible, and during recent elections it would have been a
simple thing to ridicule the idea of studying "how many As are at the end of a
viral genome" as an egregious waste of tax funds. Yet here we are.

Basic science without any known applications are how the really big
discoveries come about. Sometimes we can guess that an area will be fruitful,
but it's now always the case.

Next time somebody attempts to ridicule science as pointless, think back to
why anybody might care about a strange pattern of nucleotides at the end of a
virus.

~~~
GarrisonPrime
For every 1 such study that ultimately proves useful, I wouldn’t be surprised
if 100 (or even 1,000) were an utter waste of time.

Your argument seems to imply a shopaholic hoarder isn’t wasting their lives,
just so long as at least sometimes they actually use a random rubber band or a
‘70’s themed valentine card.

~~~
asdff
If that's the hitrate, then so be it. How many hours poured over code end up
being useful?

~~~
mbar84
> If that's the hitrate, then so be it

Pardon? That's awfully generous you're being with other peoples money. If
that's the hit-rate, can you imagine what people could do with the money if it
had not been taken from them in taxes? For starters you might simply ask, how
many face masks could have been purchased.

Basic research should not be an ivory tower that is isolated from questions of
cost vs. benefits.

------
sarosh
Apparently: "poly(A) tails at the 3' end of RNA are not an unusual feature of
viruses. Eukaryotic mRNA almost always contains poly(A) tails, which are added
post-transcriptionally in a process known as polyadenylation. It should not
therefore be surprising that positive-strand RNA viruses would have poly(A)
tails as well. In eukaryotic mRNA, the central sequence motif for identifying
a polyadenylation region is AAUAAA, identified way back in the 1970s, with
more recent research confirming its ubiquity. Proudfoot 2011 is a nice review
article on poly(A) signals in eukaryotic mRNA."

Edit: A more recent (2017) study than Proudfoot 2011 offers the perspective
that the process may be more tissue specific. From Tian B., Manley J.L.
Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol.
2016; 18:18–30:

Alternative polyadenylation (APA) is an RNA-processing mechanism that
generates distinct 3′ termini on mRNAs and other RNA polymerase II
transcripts. It is widespread across all eukaryotic species and is recognized
as a major mechanism of gene regulation. APA exhibits tissue specificity and
is important for cell proliferation and differentiation. In this Review, we
discuss the roles of APA in diverse cellular processes, including mRNA
metabolism, protein diversification and protein localization, and more
generally in gene regulation. We also discuss the molecular mechanisms
underlying APA, such as variation in the concentration of core processing
factors and RNA-binding proteins, as well as transcription-based regulation.

~~~
popcorncolonel
What's the point of this comment? Isn't this pretty much exactly what's in the
answer linked?

~~~
Yetanfou
The generally accepted term for this type of comment is "karma whoring", an
attempt to garner upvotes for the erudite and enlightening explanation of the
phenomenon by quoting the article itself. This works for those who suffer from
'TL,DR' but it fails on those who actually read before they comment.

------
TrueDuality
It blows my mind that we're at a stage of scientific exchange where this kind
of discourse and genetic information is so readily available at people's
fingertips.

~~~
blazespin
Quite frightening, really, given CRISPR. It's like discussing optimal ways to
build a thermonuclear bomb in your kitchen casually in an open, public forum.

~~~
ars
Any lab capable of building a dangerous virus already had that capability
before this open info.

Your thermonuclear bomb example was pretty apt: It's actually pretty easy to
find out how, it's hard to actually do it.

~~~
blazespin
Hard to do in a reliable, controlled manner, yes. It's surprisingly easy
however to do it messily. To push the analogy further, think about the early
atomic piles.

Also note that nuclear requires a certain amount of collection of radioactive
material, kind of like gold but much more expensive. Viruses can just be drawn
from anyone's blood stream / mucus.

People don't quite realize the catastrophic risk our over educated society can
be as we learn more and more about the building blocks of our universe,
intelligence, organic makeup or otherwise.

Vernor Vinge discusses it a bit in his latest novels.

What is the scientific requirement that the universe can not be easily meddled
with by highly intelligent / knowledgeable agents? The human being did not
evolve under circumstances where we had such capability.

Likely our best bet is to spread out to different planets where there would at
least be some buffer.

~~~
kfrzcode
Like a dystopian, civilization-ending Dunning-Kruger effect.

------
voldacar
The very first nCoV genome sequencied earlier this month[0] ends with 33 As,
while the genome of the virus found in the first U.S. patient[1] ends with
only 12 As.

Is it normal for there to be variation here? Do these repeated A sequences
actually code for a protein or are they just a marker to let the ribosome know
it's reached the end of the genome, and so the length of repeating As doesn't
really matter past a certain point?

(I'm obviously not a biologist)

[0][https://www.ncbi.nlm.nih.gov/nuccore/MN908947](https://www.ncbi.nlm.nih.gov/nuccore/MN908947)

[1][https://www.ncbi.nlm.nih.gov/nuccore/MN985325](https://www.ncbi.nlm.nih.gov/nuccore/MN985325)

~~~
CrazyStat
One of the papers mentioned in the linked discussion shows that the number of
As can vary over time within a single patient. There seems to be some evidence
that different lengths might give an advantage or disadvantage in replication,
but it seems like variation is normal.

(also not a biologist, though I work with some)

------
userbinator
It's surprisingly reminiscent of the "leader" and "trailer" of tape and film,
and according to the quotes in the link, are required for the transcription
process that "reads" the genome. The analogous functionality of reading a
genome, and that of a human-made device reading tape or film, is fascinating.

~~~
0x8BADF00D
It is interesting. DNA itself could be analogous to the tape that a Turing
machine reads one cell at a time. The alphabet in each cell is finite (ATCG).

~~~
cliqueiq
Not a biologist but the whole thing reminded me of a NOP sled

~~~
ksaj
If you used your nop sled in a stenographic fashion (ie: each type of NOP can
be decoded as some piece of information) you would be pretty close when it
comes to RNA virus replication. For that matter, some computer viruses
purposely mutate the nop sled to avoid creating scan strings, so there are a
number of similarities to be had as long as each mutation still performs the
necessary function.

------
eat_veggies
PKCS#7 padding

~~~
jmspring
upvoted, but I hate you for the PTSD the PKCS standards trigger :)

------
40acres
While it's amazing that folks can simply discuss this on the internet I would
love a ELI5 style answer so us simple folk could understand.

~~~
userbinator
It's essentially a "padding" that makes it easier for the biological machinery
to read the genome.

~~~
0xdada
So like a NOP sled?

~~~
anfilt
That's actually a good point lol.

Also DNA encoding proteins basically have a function epilogue and prologue.

------
rambojazz
> I don't think that's just random

Why would he expect aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa to be less random than
for example acatgagacgtctaatgttagacatgcatgac?

~~~
eat_veggies
I would bet that 99+% of arrangements of 33 letters "look random" \-- that is,
there is no discernible pattern. Despite each arrangement having an equally
tiny probability of occurring, if you randomly generate an arrangement, you're
all but guaranteed to get something that looks random.

For aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa to appear, it is either an amazing
coincidence, or there must be some non-random mechanism that creates it.

~~~
zippoxer
If I had to guess, the chance for a sequence of 33 identical DNA letters would
be (1/4 ^ 33) * 33

Pardon my math.

------
sunstone
It has a halting problem.

------
seibelj
No one has proven that this virus is any more deadly than a bad flu. 3-4% of
hospitalized cases is not absurd compared to the yearly flu. Very interesting,
all of this

~~~
ghostpepper
The CDC has a page up right now showing a 7% mortality rate for influenza on a
weekly basis.

[https://www.cdc.gov/flu/weekly/index.htm](https://www.cdc.gov/flu/weekly/index.htm)

~~~
feral
That says 7 percent of deaths are from influenza, not that 7 percent of people
with influenza die.

~~~
ghostpepper
Ah I see my mistake now. I'm not sure that statistic is as useful as the one I
thought it was, though.

