
Show HN: My Published HS Research – Deduplicating Unique Molecular Identifiers - c0deb0t
https://peerj.com/articles/8275/
======
c0deb0t
I am a high school student, and this is a published paper that I wrote. If you
want to read a shorter blog version of my work, please take a look at my blog:
[https://blog.liudaniel.com/n-grams-BK-
trees](https://blog.liudaniel.com/n-grams-BK-trees). I enjoy working on
bringing computer science algorithms to other fields like biology. I have also
worked on algorithms for machine learning security.

The general problem that this paper addressed is grouping similar DNA/RNA
sequences based on something known as a Unique Molecular Identifier, and then
collapsing those groups into consensus sequences. This helps estimate the
number of unique sequences while efficiently accounting for substitution
errors in sequencing or PCR amplification.

If you have any questions, feel free to ask me!

