I think one of my favorite examples of avoiding decompression for great profit (really: health) is PLINK. If you’re not aware, it’s a tool for doing statistical genetics (things like quality control but also inferring population structure, computing relatedness between individuals, and fitting linear models). The code is mind bending to read but it’s blazing fast thanks to “compressing” genotypes into two bits (homozygous reference, heterozygous, homozygous alternate, N/A) and implementing all its operations on this 2-bit representation.
It’s exciting to see more work on compressed arrays make it out into the world. It would have been nice if a lot of PLINK’s bitfiddling functionality was in an accessible native library rather than wrapped up with the rest of that tool.
It’s exciting to see more work on compressed arrays make it out into the world. It would have been nice if a lot of PLINK’s bitfiddling functionality was in an accessible native library rather than wrapped up with the rest of that tool.
https://www.cog-genomics.org/plink/2.0/
reply