Hacker News new | past | comments | ask | show | jobs | submit login
Understanding What Matters for LLM Ingestion and Preprocessing (unstructured.io)
60 points by mooreds 8 months ago | hide | past | favorite | 1 comment



Does anyone have any non-toy, fully open source data and code examples (“fully open source”) of open-weights LLMs fine tuned on non-instruct style data AND the resulting instruct-style queries “actually” “work”? Are there any fully open source examples of actually working RAG that to the end user are obviously superior to the most sophisticated open source full text search, or even Google indexing?

In creative art, there is a thriving use for fine tuning. Much of it is reproducible. There are specific guides with specific results. But where is the guide for “corporate knowledge base?” I get the feeling that “it” meaning “fine tune an LLM or use a RAG” is inferior to sophisticated open source full text search, but so many people are invested in pretending otherwise because of all the dollar signs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: