
Efficiently grouping similar DNA sequences to remove duplicates (2019) - c0deb0t
https://blog.liudaniel.com/n-grams-BK-trees
======
c0deb0t
I am a high school student and this blog post is a summary of a research
project I did. The full published paper can be viewed here:
[https://peerj.com/articles/8275/](https://peerj.com/articles/8275/).

This is not really explained in the blog post, but the "naive" method is
O(N^2) brute force search, and the "combos" method is recursively going
through all combinations of UMIs within a certain edit distance. There are
also some other variants that are evaluated.

If you have any questions, feel free to ask me!

