Your finding #2 is the most important — hallucinations are a retrieval problem, not a generation problem. We hit the same wall building legal document retrieval at Brainfish. Swapping embedding models gave incremental gains, but the real jump came from preserving document hierarchy (articles, sections, clauses) during retrieval instead of flattening everything into chunks.
Curious, did you evaluate any structure-aware retrieval approaches on this benchmark, or purely embedding-based?
Hi HN, I built ReasonDB (Opensource) after spending 3 years building a knowledge intelligence layer at Brainfish (my company). We stitched together vector DBs, graph DBs, and custom RAG pipelines. The constant problem: when search returns the wrong documents, your AI gives wrong answers confidently. Debugging why embeddings didn't surface the right chunk is a black box – we'd fix one case and break three others.
ReasonDB takes a different approach. Instead of shredding documents into flat chunks and hoping cosine similarity finds the answer, it preserves document hierarchy as a tree and lets the LLM navigate through it – like a human scanning a table of contents and drilling into the right section.
How it works:
- Documents are ingested as hierarchical trees (headings, sections, subsections) with LLM-generated summaries at each node
- When you query, a 4-phase pipeline kicks in: BM25 narrows candidates → tree-grep filters by structure → LLM ranks by summary → parallel beam-search traversal extracts answers
- The LLM visits ~25 nodes out of millions instead of searching a flat vector space
It also has RQL, a SQL-like query language with SEARCH (BM25) and REASON (LLM) clauses:
SELECT * FROM contracts REASON 'What are the termination conditions?'
Built in Rust (redb, tantivy, axum, tokio). Single binary. ACID-compliant. Supports OpenAI, Anthropic, Gemini, Cohere and Open Source Models. Runs with one Docker command.
Curious, did you evaluate any structure-aware retrieval approaches on this benchmark, or purely embedding-based?
reply