Pretty cool to see the success of PacBio (Pacific Biosciences) technology for long-read sequencing. PacBio is one of the few successful sequencing technologies using nature’s canonical high-fidelity DNA reading tool (DNA polymerase). Other successful sequencing tech uses approaches that are (ingeniously) different from how DNA is read in a living cell.
PacBio circular consensus sequencing (used here) is a clever way of performing extremely accurate single-molecule reads: the target linear DNA is joined into a circle, which is read over and over again, enabling high accuracy of each base by consensus
And it's specifically the low error rate of the HiFi sequencing that enabled this advance. The string graph collapses not just because of sequence length, but error rate that makes it impossible to distinguish repeats from each other. The 15kb reads used in this paper have just enough length and accuracy to span almost all repeats in the genome (rDNA and some centromeres excluded).
rDNA is repeated many many times and is located at several places in the genome. Presumably we need so many ribosomes that we need to have many copies of it or else transcribing it into rRNA would be the bottle neck in translating all mRNA. Can't have any proteins without rRNA!
PacBio circular consensus sequencing (used here) is a clever way of performing extremely accurate single-molecule reads: the target linear DNA is joined into a circle, which is read over and over again, enabling high accuracy of each base by consensus