
Novel coronavirus complete genome from the Wuhan outbreak available in GenBank - mikhailfranco
https://ncbiinsights.ncbi.nlm.nih.gov/2020/01/13/novel-coronavirus/
======
nestorD
I found out a few days ago when an, interesting, related question was posted
on bioinformatics.stackexchange.com : Why does the Wuhan coronavirus genome
end in aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (33 a's)? [0]

[0]:
[https://bioinformatics.stackexchange.com/questions/11227/why...](https://bioinformatics.stackexchange.com/questions/11227/why-
does-the-wuhan-coronavirus-genome-end-in-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)

~~~
sumgocko
(This is baseless speculation by someone who has only taken Genetics 101)

Human cell nuclei produce mRNA from DNA, which then makes it's way to the
ribosome to be expressed as a chain of amino acids, or a protien. On it's way
to the ribosome, it needs protection from the cell's RNA cleanup enzymes, so
it leaves the nucleus with a 5' cap and a poly-A tail.

Therefore, it makes sense that a virus that wishes to hijack human ribosomes
would end in a poly-A tail (a bunch of As) in order to not get destroyed by
human cells.

------
gexla
More interesting to me is the process the medical industry has to go through
to deal with the testing kits. Apparently they are in short supply. Lots of
good nuggets in this thread. Yes, it's Twitter, but this can be cross-checked.

[https://twitter.com/luchenhist/status/1220497118755987456](https://twitter.com/luchenhist/status/1220497118755987456)

* Testing is a long process * Kit distribution is supply chain constrained due to holiday * Only severe cases are being tested * Hospitals are overwhelmed * People not getting tested are being turned back * Government offering compensation for testing but only if it's actually Coronavirus

The author has other interesting Tweets as well. Might be worth a follow if
this is of interest.

------
Smerity
Does anyone know of an easy way to download the nucleotide sequences for the
entire set of pneumonia / coronavirus viruses? I've looked at the FTP mirror
but can't connect it with the nucleotide locus I find on the site itself.

I work in language modeling and want to see about using those unsupervised /
self-supervised methods for genome annotation / phylogenetic tree
construction.

Even if it's more of a curiousity than a useful tool I have experience with
small datasets, most recently focused on character level language modeling on
~90MB of training data, so if I can get (90MB / (29 kilobases * 2 bits per
base) =) approximately 12,000 related samples I should be able to at least
make a dataset out of it.

~~~
BioGeek
Using Biopython. Note that the search query that I am using currently returns
70605 results, so you might want to tweak it fit your needs.

    
    
        from Bio import Entrez
        import time
        from urllib.error import HTTPError
        
        DB = 'nucleotide'
        QUERY = '("pneumoviridae"[Organism] OR "Coronaviridae"[Organism])'
        
        Entrez.email = 'your.email@provider.com'
        handle = Entrez.esearch(db=DB, term=QUERY, rettype='fasta')
        record = Entrez.read(handle)
        
        handle = Entrez.esearch(db=DB, term=QUERY, retmax=count, rettype='fasta')
        record = Entrez.read(handle)
    
        id_list = record['IdList']
        count = len(id_list)
        post_xml = Entrez.epost(DB, id=",".join(id_list))
        search_results = Entrez.read(post_xml)
        
        webenv = search_results['WebEnv']
        query_key = search_results['QueryKey']
    
        batch_size = 200
        with open('viruses.fasta', 'w') as out_handle:
            for start in range(0, count, batch_size):
                end = min(count, start+batch_size)
                print(f"Going to download record {start+1} to {end}")
                attempt = 0
                success = False
                while attempt < 3 and not success:
                    attempt += 1
                    try:
                        fetch_handle = Entrez.efetch(db=DB, rettype='fasta',
                                                     retstart=start, retmax=batch_size,
                                                     webenv=webenv, query_key=query_key)
                        success = True
                    except HTTPError as err:
                        if 500 <= err.code <= 599:
                            print(f"Received error from server {err}")
                            print("Attempt {attempt} of 3")
                            time.sleep(15)
                        else:
                            raise
                data = fetch_handle.read()
                fetch_handle.close()
                out_handle.write(data)

~~~
Smerity
Champion! Thank you =]

~~~
BioGeek
There is a small error in the code. The variable `count` should be defined on
line 11 like:

    
    
        count = int(record["Count"])
    

en the appearance on line 15 should be removed.

------
ALittleLight
What kind of things can be done with this? Are there tools that consume
genomes and tell you something about them? Is biology advanced enough that it
would be possible to create such tools?

I see they have several different strains. I could imagine trying to find
similarities or organize them into a tree, if that's not already done.

~~~
asdff
In fact that is exactly what is done. What you are describing is genome
annotation and homology based phylogenetic analysis.

------
fjfaase
This already has been available for some time. I found it through a link on
the wikipedia page. I had not see the tree before. Because it is close to
coronoviruses found in bats, one could conclude that bats are the animal
reservoir for this type of virus (as was the case with SARS as well). I
presume that China will forbid the trade and consumption of bats.

~~~
fspeech
You don't need to actually consume bats to have the viruses transmitted across
species. Bat droppings could contaminate/infect some intermediary animal.

~~~
hanniabu
Could you get sick from the air if you were in a cave full of bats with the
virus?

~~~
krajzeg
From what I read, the viruses live in the intestine in bats. So no, only
through consumption of bats themselves or contact with their excretions.

~~~
rjsw
If their excretions create an aerosol then you could breath it in.

~~~
tyfon
If I were to guess I'd say it is from collecting guano to use as fertiliser.
This practice is quite common and used to be so common that there is an
expression in Norwegian for something selling very well, "Det selger som hakka
møkk" [1], which means "It is selling like chopped shit".

[1] [https://www.sprakradet.no/svardatabase/sporsmal-og-
svar/som-...](https://www.sprakradet.no/svardatabase/sporsmal-og-svar/som-
hakka-mokk/) (Norwegian)

------
tgvaughan
There are already 27 genomes on GISAID. Very little diversity at this stage
though, so phylodynamic analyses are more noise than signal.
[https://www.gisaid.org/epiflu-applications/next-betacov-
app/](https://www.gisaid.org/epiflu-applications/next-betacov-app/)

------
pnathan
Super awesome. As a point of curiosity, when did this kind of complete
sequencing capability get to the point where it could be turned around in
under two months?

~~~
distant_hat
It's been a while. You can do real time sequencing now. E.g., here is a
sequencing demo happening in a coffee shop in Africa.
[https://twitter.com/virology_chitra/status/12200656783272550...](https://twitter.com/virology_chitra/status/1220065678327255040)
The device connected to the laptop via USB is the sequencer.

~~~
wiggler00m
The company that makes that is Oxford Nanopore
[https://nanoporetech.com/products](https://nanoporetech.com/products)

~~~
ALittleLight
This is amazing. I'm having to restrain the part of myself that loves buying
gadgets. These not only look cool, from watching their videos and reading
about them, they have an awesome function.

Suppose I had some of this equipment and a sick family member. Would it be
possible for me to somehow isolate a virus or bacteria from the sick person's
saliva or blood and then sequence the DNA/RNA and then use a catalog of
genetic information to identify a likely match?

------
camdenlock
29,903 base pairs (4 bit values), so that’s 119,612 bits, or about 15 KB of
information total. A devious little self-replicator in 15 KB which runs on our
bio substrate!

~~~
kijin
I wonder how many of those base pairs are devoted to evading the host's immune
system, as opposed to the primary function of being a fork bomb.

~~~
majewsky
> opposed to the primary function of being a fork bomb

/me realizes that every organism is a fork bomb

------
mikhailfranco
Looks like the Chinese military had the same _Bat SARS-like Coronavirus_ in
their possession 2 years ago.

Here is the NIH page showing Identical Protein Matches of the envelope protein
from Wuhan 2019-nCoV:

[https://www.ncbi.nlm.nih.gov/ipg/QHO62113.1](https://www.ncbi.nlm.nih.gov/ipg/QHO62113.1)

Notice there are two entries not from the recent _Wuhan seafood market_
outbreak:

[https://www.ncbi.nlm.nih.gov/protein/AVP78033.1](https://www.ncbi.nlm.nih.gov/protein/AVP78033.1)

[https://www.ncbi.nlm.nih.gov/protein/AVP78044.1](https://www.ncbi.nlm.nih.gov/protein/AVP78044.1)

Both sequences were provided on 05-JAN-2018 by:

    
    
      Institute of Military Medicine Nanjing Command, 
      NO. 293 East Zhongshan Road, Nanjing, JangSu 210002, China

------
dharma1
I wonder how realistic it is to expect a vaccine in 3 months, given that we
don't have a vaccine for MERS 8 years later (and not really one for SARS
either)? Is it just a question of allocating resources and shifting
priorities?

~~~
hanniabu
I read they have a few SARS vaccines, but non that have gone through human
trials yet.

Side not: I'm always concerned when we're put in a situation like this where
we're pressured to put out a vaccine quickly because it opens the door for a
less than optimal solution to become a mainstay. For example since it's an
emergency you may put forward a solution, have it go through human trials,
find some side effect, but given the current circumstance say that it's "good
enough" and give it the green light of approval. Now I understand that in such
an urgent circumstance that this may be justified, but after the urgency is
over it's my understanding that this solution that may or may not be approved
under regular circumstances will remain approved.

~~~
mrfusion
Why should a given vaccine have more side effects than existing vaccines? It’s
just broken up pieces of the virus. If produced properly and rendered inert
Why should it trigger major side effects?

~~~
vikramkr
Because biology is complicated and it's never that simple. You could trigger
too strong an immune response, no response at all, or you could end up with a
situation like on dengue where the vaccine makes the disease worse
([https://www.statnews.com/2019/05/01/fda-dengue-vaccine-
restr...](https://www.statnews.com/2019/05/01/fda-dengue-vaccine-
restrictions/))

------
alpb
I hate saying this but I didn't expect some default-theme WordPress set up
when I clicked on a link from nih.gov –with the comments section, post rating,
and pingbacks enabled.

~~~
cltsang
Think of it this way:

They launch with near absolute MVP. Design is not the priority here; the
information is. When it comes to dealing with a pandemic, every minute counts.

The default theme is a sign that someone who knows software and project
management is working on it.

~~~
alpb
Except this blog is up since 2013, though. :) I think .gov sites have some
style guide.

~~~
kijin
.gov sites usually have some style guide, but it often doesn't extend as far
as fifth-level subdomains delegated to small teams.

Also, the default theme is very clean and readable. It's hard to do better
than that, so I'm glad they left it as it is.

------
peter303
Kudos to the scientists who developed this so fast. I rember it took four
YEARS to identify the HIV virus and another two years for a test.

~~~
vikramkr
It's really amazing how much more powerful biological tools have become since
then!

------
nartz
Any way anyone else can 'contribute' here to reaching a cure? I'm thinking -
somehow bootstrapping a node to join a distributed blockchain network of
protein folding simulations ...

~~~
DrAwdeOccarim
Join a company working in the space. There is a need at all of them for good
UI, automation, and database engineers.

~~~
hanniabu
Happen to know of any off the top of your head?

------
29athrowaway
The virus propagation was said to have started in a market that sells, among
other products, snakes.

2019-nCoV can go from bats to snakes to humans. Eating snakes or bats seems
like a bad idea.

~~~
vasco
Previous viruses came from pigs and chicken and cows. Unless you also advise
to stop eating those this just seems a bit of a stretch.

~~~
29athrowaway
There is an enourmous effort put into ensuring the safety and quality of the
meat products you have mentioned.

~~~
M2Ys4U
Hah.

The US poultry industry is so bad at looking after their animals they have to
wash carcasses in chlorine to even _approach_ safe levels of pathogens like
salmonella and campylobacter.

