Hacker News new | past | comments | ask | show | jobs | submit | vissidarte_choi's comments login

Modern RAG system:

1. Quality-guaranteed data extraction.

2. End to end workflow based on GRAPH.

3. Hybrid search for retrieval.

4. Reflection mechanism based on the retrieval results.


a penetrating analysis


There are numerous strategies and methods available to enhance RAG performance, particularly when it comes to improving performance in parsing vast amounts of unstructured data. Additionally, various scenarios call for different parsing techniques. I would suggest exploring a RAG project that excels in document parsing: https://github.com/infiniflow/ragflow


This is because PDF has so many different versions. A third-party tools like pdfplumber won't fit it all. For example, using pdfplumber to parse some PDFs will cause the system to raise exceptions. Sometimes fitz works in situations where pdfplumber won't handle well. It looks a bit complicated, but RAGFlow is using multiple parsing tools to handle different types of PDFs.


To be honest, RAGFlow already supports this but has not documented this local deployment process yet, as we are still working on simplifying this process, and will release this feature soon. Please keep tuned!


RAGFlow will support more LLMs, including locally deployed LLMs.


RAGFlow uses Yolov8 for its OCR/layout recognition/TSR(table structure recognition). And RAGFlow uses large amount private data to train these models for them to perform well in some specialized scenarios.


Not quite certain about your meaning. Could you be more specific? RAGFlow does not have its own LLM model or souce code. RAGFlow supports API calling from third-party large language model providers, as well as local deployment of these large models. RAGFlow has open-sourced these two parts of codes already.



Hi bschmidt1, This is a good feature. We do plan to support it soon. Please stay tuned. If you have further suggesions, welcome to file an issue with us.



Popularity can be translated to cash flows.


And marketing campaigns can even be saved.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: