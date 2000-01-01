[1]: http://www.genomenewsnetwork.org/resources/sequenced_genomes...
[2]: https://www.ncbi.nlm.nih.gov/genome/browse/
[3]: https://en.wikipedia.org/wiki/List_of_sequenced_bacterial_ge...
[4]: https://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_g...
Most human pathogens, like Staphylococcus aureus, Streptococcus pneumoniae, Escherichia coli, Salmonella enterica, Mycobacterium tuberculosis, ... have several thousand assemblies each.
[1] https://www.ncbi.nlm.nih.gov/genome/browse/
Sanger sequencing was one of the first methods of sequencing, and employs linear sequencing: the synthesis of strands with increasing length. With the advent of the Human Genome Project, Celera instead came up with the idea of fragmenting the genome, amplifying the fragments, sequence the fragments, and match them together using bioinformatics. The complexity here lies in that much of the DNA is not particularly unqique (microsatellites). As such, a short 20 nucleotide sequence may be present in may parts of the genome. As as such, it is oftentimes hard to generate a 100% complete connected genome.
Today, Illumina sequencing is the major sequencing platform. It relies of the fragmentation of DNA into fragments ~300 bp fragments. By synthesising the complementary strand of each fragment with fluorescent nucleotides, we are able to sequence each fragment. Here we have the same complexity as with shotgun sequencing: the fragments occur in multiple parts of the full DNA sequence.
To remedy this, error-prone sequencing methods such as IonTorrent/PacBio/etc. may be employed to generate long reads. These long reads may then act as a map for stiching together the more precise short reads
Other sequencing methods, such as Pyrosequencing, has the inherent problem of not being able to discern too many (5) of the same nucleotide in a row.
If the genome in question contains a lot of regions that are similar to each other, the algorithms that do the assembly will get confused.
And you need good population coverage. What's a normal variant? Newer methods propose a graph alignment instead of just trying to build a single sequence reference genome.
