Hacker News new | past | comments | ask | show | jobs | submit login

This looks awesome, and really useful.

A few weeks ago I asked in Hacker News "I'm in the middle of a graduate degree and am reading lots of papers, how could I get ChatGPT to use my whole library as context when answering questions?"

And I was told, basically, "It's really easy! Just First you just extract all of the text from the PDFs into arxiv, parse to separate content from style, then store that in a a DuckDB database, with zstd compression, then just use some encoder model to process all of these texts into Qdrant database. Then use Vicuna or Guanaco 30b GPTQ, with langcgain, and....."

I was like, ok... guess I won't be asking ChatGPT where I can find which paper talked about which thing after all.




I don't know why you need the "ask chatGPT" piece. Why not just semantic search on the documents?

What is the value add of generative output?


I think the value is "Hey, I remember a paper talking X topic with Y sentiment, it also mentioned data from <vague source>. Which paper was that?"

If you're dealing with 100s of papers, then having a front end that can deal with vague queries would be a huge benefit.


You could just write "X topic with Y sentiment similar to foo/<vague-source>" into a search bar.

Then, plain old vector distance on your data would find the chunks relevant. No need for generative AI.

citation to prove this works: chat.arguflow.ai / search.arguflow.ai


The simple answer is because I don't know how to do a semantic search on a bunch of documents, nor do "plain old vector distances." It's not my field.

The longer answer is that I think that it will also be useful to have the background knowledge (however hallucinogenic) that ChatGPT has. I'd like to be able to have a conversation specifically grounded in the papers, and if I ask about a topic that isn't specifically mentioned by those words in the article, I'd like it to be able to say "none of the articles talk about X, but they do mention Y, which is related in this way." I'm not sure if this is expecting too much of it.


https://github.com/whitead/paper-qa

>This is a minimal package for doing question and answering from PDFs or text files (which can be raw HTML). It strives to give very good answers, with no hallucinations, by grounding responses with in-text citations.


We built https://github.com/trypromptly/LLMStack to serve exactly this persona. A low-code platform to quickly build RAG pipelines and other LLM applications.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: