Initially looking at the title, it seemed like this was a biomed LLM which would be really exciting. But after looking at the project, it's a prompt and an OpenAI API call.
Not saying that it's not useful, just not as exciting.
The repo more or less just constructing a prompt to explain the APIs available and how to use them, wrapped up in CLIs. They include some evaluation tasks, which I guess they wanted to be repeatable, but not sure this warrants a paper -- this should be a GPT in the OpenAI marketplace.
The project was posted on arXiv last April. Back then not a lot of people were doing tool augmentation, let alone domain-specific tools. I guess the whole idea was to prove that LLMs can be augmented by domain tools to do useful staff, which now seems obvious...
Has anyone trained a transformer model yet on genotype phenotype mapping? Like feed in loads of human, animal, etc sequences and get it to predict phenotype as accurately as possible.
This seems like it could be the key to genetic engineering. You could probably build something analogous to a diffusion model for organisms that renders genomes or fine tunes them from phenotypic traits.
No. In general there are many orders of magnitude more variations throughout the genome than there are observable phenotypes. We don’t know what genes most cis regulatory elements regulate; and we don’t know what most genes do in which contexts. Hell, we don’t even know how many different types of cells there are in the human body.
The problem is data scarcity. You're stuck with a few million rows of data, which just isn't enough given how noisy and subtle the genotype->phenotype relationships are. Neural nets work well on language, Chess, Go, etc, because we have gigantic datasets where the subtle patterns can be teased out without overfitting.
I tried to do this ages ago in 2018 by adapting OpenAI's flow architecture and it sort of seemed to work (was at least promising). With today's models with a significantly more disentangled latent space it should be much easier to do! I saw a transformer trained on the UK Biobank recently, excited for this space!
I was also under impression that this was a new biomed llm initially. But it's a perfect example to illustrate context matters most in many cases.
If I understand correctly, this is like a custom GPT with 2 external APIs access. With the right context/data provided, it outperforms llm + bing search (not surprised).
Yes. Answering information-seeking questions that would require using Bioinformatics tools (database utilities, BLAST, etc). The evaluation metric is mainly EM.
Not saying that it's not useful, just not as exciting.