I'd argue that while open source is a desirable feature in most software, it's a necessary feature in scientific software. When our results depend on running through software, that software can't be a proprietary, closed-source black box.
And yes, we'd be happy to discuss your use case and get you set up with something. Just shoot me a note at nick at helix dot io and we can follow up off thread.
One of the core things we're working on is a data structure for more efficient, constant-time sequence lookups, and we've definitely thought of some non-bioinformatics use cases as well. Happy to answer any more specific questions by email too (in my profile)!
Could we use our own corpus for this?
Ah, I see you have some problems with species identification...are you using old NCBI datasets?
(ps. If you'd like to talk, you can email me at the address in my profile)
But... what's the use case for this? Are there people with genomic sequences just laying around with no idea of what they represent? Or is it a resource for students?
So you just finished the brand new genome sequencing of Sugarcane for example.
And you are interested in Aluminium resistence genes to be able to plant sugarcane in aluminium packed soils that would normally kill the plant.
What you do is to throw parts of the genome in a tool like that (there are others) and see if you can find a match for a Sorghum(Wheat,Corn...) Aluminium resistence gene that has been already researched and tested. The closest related the better.
Now you have some clue about where to start your studies or which genes you should be looking for in your new varieties or which gene to try to insert in your next Agrobacterium or gene bombardment test.
PS: I left the Bioinformatics field a few years ago (best job I ever had BTW) So correct me if I am wrong.
(This is nkrumm's and my project)
Thanks for the explanation and good luck.
These complex “metagenomic” samples can be processed using our tool (we've set up a demo on the submitted link), in order to understand a sample's composition, or to see if there are relevant pathogens in the sample.