Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Undermind – Deep scientific paper search with adaptive LLMs (undermind.ai)
49 points by tomhartke 22 days ago | hide | past | favorite | 14 comments
We can now build drastically higher quality search because we can use LLMs in algorithms that mimic a human's systematic research process, instead of just roughly recommending results based on semantic embeddings or term frequency.

We built a deep search LLM pipeline that takes a few minutes to carefully search all the scientific literature. You describe your complex goal, as you would to a colleague. Then, we carefully search 200M+ papers. We classify the preliminary results with GPT-4. We then adapt the search goals based on relevant/irrelevant papers uncovered and continue searching, repeating in a scalable, structured exploration. Because of the classification accuracy, we can track this process statistically to predict what fraction of relevant papers have been discovered so far at any point, and know when the search is complete. There's more explanation of techniques/benchmarks in the whitepaper on our homepage.

We want to optimize the workflow for researchers in ML, biotech, medicine, etc, and would love critical feedback and suggestions. One major challenge is getting users to accurately describe their search goal, including everything implicitly in their head (instead of keyword phrasing). Another is how to differentiate what's happening behind the scenes, and manage expectations on timing (it's ~3-6 minutes). Also, of course, how to optimally present the results.

This is great! But it's also the 3rd platform I've seen in the research space in the last few weeks. I still haven't even gotten around to trying Elicit.com

You say "try it now", but then link to a sign-up, and you don't have any social sign-in, so I can't just click a butt on and go.

If you look at elicit.com, look at their branding, the quality of their design, then look at your competing site. You need to up your game to get trust.

I'm assuming the reason you don't want to just have an open search is due to the cost of running searches, but what's the cost of nobody using it? How can you provide examples at least that showcase what you can do?

WRT the name of your name, the first thing that came to mind is undermine, which is not a positive connection to research.

I hope you can take this as constructive feedback. Like I said, I haven't even tried elicit yet, I can't remember what the other competitor in this space was.

But also, here's a bonus. Emmitt Shear just posted on twitter looking for quality research on reaction time. I know of at least one paper on slow-wave enhancement for deep sleep (CLAS, PTAS) and a secondary finding was on slow-wave sleep. I said I'd get back to him with the link, but maybe you can do even better and show us what your product can do. What's the best research into reaction time? Is there something other than Clare Anderson's paper on slow-wave sleep and reaction time?

Here's the results for Emmitt's request on research using interventions to improve reaction time: https://www.undermind.ai/query_app/display_one_search/e1bcab...

These results look really useful on a first glance! I will make an account to test this with my own questions after seeing the results (The homepage alone didn't convince me to invest that much time).

https://www.lumina-chat.com/ the competitor you saw maybe?

Here's some others focused at least partly on search: Elicit https://elicit.com/ Lumina-chat https://www.lumina-chat.com/ Consensus https://consensus.app/ OpenEvidence (medical focused) https://www.openevidence.com/ sciscape: https://typeset.io/ Epsilon-ai https://www.epsilon-ai.com/ scholarAI https://scholarai.io/

Consensus app was the other one. Thanks! Wow, there are a lot.

These are the best results that I've gotten from an AI research assistant.

I really don't mind the long latency, in fact, I think it's a fundamentally better way of interacting with this kind of LLM based tool.

Like the latency is necessary for the LLM to actually interact with the content, rather than just doing a Bing or Perplexity style RAG+summarization workflow that delivers very uneven results.

I also really like the use of longer prompts, as it encourages a full description of your topic, rather than keyword fiddling trying to make the RAG system pick up the right signifiers.

The "Discovery Progress and Exhaustiveness" section is a bit confusing as a user. Like, ok we have 23.6% of the relevant papers? Why not 100%? What am I supposed to do with that information? Can you give me any information about the missing papers?

Overall, very nice work, I'll be using this in the future.

On the discovery progress, we can't look at all 200M papers with the LLM, so we prioritize some of them (the first 100 most promising) for deep analysis. Within those, we find a few that are relevant. But the rate at which we discover these tells us roughly what will happen if we read the next 100 (if we're discovering new relevant papers all the time, we will likely continue). We need a better explanation on the website, but we can statistically model this to quantitatively predict how many papers we would find if we exhaustively searched the whole database.

More info here: https://www.undermind.ai/static/Undermind_whitepaper.pdf

That makes sense, I figured it was related to some of the statistics work in ecology estimating species count from a limited sample.

Hmm, does this just use the traditional term frequency search bars (scholar, arxiv, etc) under the hood with query expansion for the prelimary search?

Without chunking the papers I'm skeptical the prelim search would be all that useful.

Also, using GPT4 as a cross encoder seems really wasteful both in terms of compute and latency.

Using GPT4 as a cross encoder also seems very wasteful.

Might try it anyways, but damn 3-6mins is brutal. Traditionally research has shown that low latency is more important than relevance for search because it allows users to reformulate queries.

Maybe this approach is worth it though.

While the time/cost of using GPT-4 is not ideal, GPT-4-level classification is absolutely crucial for the entire adaptation process to succeed. With 3.5 guiding the adaptation, we find that errors quickly accumulate. It can't identify complex ideas correctly.

3-6 minutes for results takes getting used to, but we've found most people don't complain if it solves a problem that is actually impossible to solve without hours of digging, ie if you use it on something truly hard. Low latency is more crucial for public search engines like Google (0.5s delay -> 20% traffic loss) where there are convenient, fast alternatives.

Preliminary search is a blend of semantic embeddings on 100M+ papers and keyword search, citation links, etc. Reasonably accurate, but full of noise for complex queries.

Hmm, I assume the pun on "undermined" is intentional, though since that has somewhat negative connotations I am not sure it's entirely a good idea...

I just tested the platform out. I am really impressed by the results of a search for papers regarding protein inhibitors for the treatment of cancer. I actually was able to find at least 4 new informative papers that gave good insights to my research.

I am impressed because I have spent roughly 3 months researching this topic on Science Direct and PubMed and I did not expect your engine to turn up anything new. To this point, in less than six minutes, your search engine was able to give me more relevant search results than probably a week of searching Science Direct.

Great work!

(oh yeah, I know people are commenting about the interface, but I actually like the clean look of the search results. It is the home page that leaves something to be desired)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact