Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Deep search of all ML papers (undermind.ai)
109 points by tomhartke 10 months ago | hide | past | favorite | 25 comments
Built an automated system to run a deep search of ArXiv and carefully find all the precise papers that exist on a complex topic.

It's different from simple RAG because it searches, classifies, and adapts based on relevant papers it uncovers, and then continues until it finds every paper on a topic (trying to mimic the human research process). Benchmarked 10x higher accuracy and total retrieval compared to Google Scholar for a median search (whitepaper on website). Also knows when it is complete, and misses virtually nothing (< 3% or so, once it's converged).

Website has a free trial and a bunch of example search reports. Want feedback and suggestions.




Here's an example report on: tokenization-free large language model architectures, which have been shown to achieve compute/accuracy tradeoffs comparable to or better than traditional token-based models https://app.undermind.ai/query_app/display_one_search/05f0b8...


"Our AI agent finds precisely what you ask for, 10-50x better than Google Scholar"

I was curious how this was measured since benchmarking accuracy for LLMs is tough. Found this in the paper: "This classification accuracy was benchmarked by manually analyzing over 400 papers across a range of representative searches, and comparing the human evaluation to the language model’s judgment"

I'm skeptical that their dataset of 400 papers with 3 classification labels (highly relevant, closely related, or ignorable) is large enough to represent the diversity of queries they're going to get from users. To be clear, I don't think this undermine's (haha) the value of what they've built, still very cool.


It is certainly interesting, and I would love to try it for my hobbyist use-cases. I don’t do much research at work, but a fair bit on the weekends.

Are you filtering users however? I cannot sign up in a personal capacity with a GMail email. The page raises this error: “Please use a valid institutional or company email address.”


I would change the main CTA to "Try it now" and then use a different style for "Read the stats". It currently looks like there are two equally important CTAs.

If you can find a way to make the results closer to real-time, this will be a really popular product.


Appreciate the advice. Re: timing, it's bottlenecked by the sequential nature of the search. To be comprehensive, we discover a few papers, and use that info to choose where to look more closely next.


How does this compare to a system like Elicit? Seems to be very similar at first glance.


What about compare to ArXiv Sanity Preserver?


FYI system is a bit delayed because of traffic levels. May take a bit longer to generate results at the moment (usually takes ~10 min).


I tested this out on a topic I'd been discussing with some fellow researchers, and it pulled in the papers we'd chatted about plus a bunch of related ones that look very relevant and interesting. Congrats on a cool project!


It's a shame the research publishing industry is a bunch of walled gardens.

Since this only supports arXiv, and not paper repositories from other industries.


Maybe the model will get smart enough to go to SciHub and Libgen? IP holders and distributors come after me with evidence, I'll just pull out my belt and tell them I gotta go teach some naughty GPUs another lesson.


what is pricing?


I don’t want to be negative but you asked for feedback, so I’ll give you a few impressions including the superficial and quite subjective fwiw:

1. The hyperbolic claims are going to be off-putting to some. You’ve “solved” ML search? 50x better than Google scholar on a metric no one’s been benchmarking against? Consider your audience and what they would find credible.

2. The UX needs work. To give one aesthetic example, in the results there are large, brightly colored, red and green circles that are used inconsistently, and they clash the palette. This stuff can affect how sticky your service is.

3. Don’t restrict signup by email domain. This is nuts. Never add friction to gaining customer relationships. If you’re capacity constrained limit the trial. If you’re trying to segment the market there are better ways.

4. The name “Undermind”, is not working to my ear. It’s worth changing. At least find a product person whose opinion you respect and ask their take.

5. I think a lot of people here would be willing to give you useful technical feedback on the architecture and approach if more information were shared about how the service works, but I didn’t notice that was available.


This is very effective criticism.

If OP wants to grow their project, those 4 has to be the first to target.


undermine: "lessen the effectiveness, power, or ability of, especially gradually or insidiously"

hm.


There's something disagreeable about charging a subscription to search freely-available scientific papers.

Yeah I get you're technically paying for the "advanced" search but it still leaves a bad taste in the mouth because this service's entire existence depends on open source knowledge.

P.S. hiding pricing behind registration isn't cool


I think that full text search queries over long text data is already kind of expensive server side. Users are paying specifically for this, not better UI or simple direct match search available in free to use projects. I would say it's very reasonable to charge for costs incurred here.


IMO if you're going to profit off open research, you should at least make your own work available for other researchers. The white paper has 10 pages of performance benchmarks but 5 sentences on methodology.


Immediately bailed once it required I provide my email address.


Nice but way too pricey.

https://chat.openai.com/g/g-dGz4aw9iA-research-refiner - the free version (just ChatGPT Plus subscription needed)


The goal is to be systematic and handle complex topics. ChatGPT + keyword search can't handle complex topics at all, and isn't systematic either.


It's still early days. Many folks think that GPT-4 + simple RAG works well. It doesn't. Building a good tool like this is hard.


Free, just needs a subscription?


I wonder how many of such services chatgpt undercuts already?


Awesome had no idea this existed. Very useful, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: