Towards accurate and efficient document analytics with large language models

demilich · 2024-05-14T04:04:20

RAGFlow (github.com/infiniflow/ragflow) use OCR/layout recognition/TSR(table structure recognition) to understand the document structure and context. Is there any difference between RAGFlow and ZenDB?

yingfeng · 2024-05-14T04:24:27

I read the paper and there are some similarities between ZenDB and RAGFlow, but also many differences.

The goal of RAGFlow is to use computer vision models to recognize the structure of a document, including diagrams and tables, and then to slice these structures into appropriate formats, such as table information combined with table definitions into text, which is then sent to the RAG system to be used for retrieval and answering questions.

ZenDB also makes use of computer vision models to understand documents, but it is mainly used to understand the semantic structure of documents, such as headings, phrases, etc., which also involves semantic-based text clustering. ZenDB also defines a query language specifically for querying these semantics. ZenDB is pretty useful to query and summarize long text.

I think some combination of RAGFlow and ZenDB for processing unstructured document data could be interesting to work on.

ajcp · 2024-05-13T23:16:24

This is an ad for ZenDB.

EDIT: Having reread the paper in full I don't hold this view anymore. Leaving up for reply posterity.

sprobertson · 2024-05-14T00:16:00

What? This is a paper describing a system/technique that they happen to call ZenDB.

ajcp · 2024-05-14T01:00:25

Apologies; having read the full paper you are correct.

This paper is describing a technique to query Semantic Hierarchical Trees (SHTs) constructed from documents. I would say that the data itself *is* structured, it just exists in an unstructured medium, but now I'm just arguing...semantics.

That being said I suspect they didn't think ShtDB would catch on very well, and so went with ZenDB. Shame really.

PaulHoule · 2024-05-14T00:38:45

The authors are all academics…