Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MiRAGE: Open-source framework for multimodal RAG evaluation
1 point by mmhetric 25 days ago | hide | past | favorite
Code: https://github.com/ChandanKSahu/MiRAGE

Hi HN, we are the authors of MiRAGE.

We built this because standard RAG benchmarks (like Natural Questions) rely on text-only Wikipedia-like data, which doesn't reflect the reality of enterprise RAG. In the real world, "truth" is often locked in a chart, a complex table, or a diagram deep inside a PDF.

MiRAGE is an open-source framework that uses a swarm of specialized agents to reverse-engineer evaluation datasets from your own documents.

How it works:

1. Ingest: It uses vision models to describe charts/tables and "semantically chunk" the PDF.

2. Generate: An agent swarm (Generator, Retriever, Persona-Injector) creates multi-hop questions.

3. Verify: An adversarial "Verifier Agent" fact-checks the answers against the source to prevent hallucinated ground truth.

Key Finding: In our ablation studies, removing the adversarial verifier dropped the faithfulness of the generated dataset from 97% to 74%. Synthetic data needs self-verification.

Resources:

- Paper (arXiv): https://arxiv.org/abs/2601.15487 - Install: pip install mirage-benchmark - Demo: (See the terminal video in the repo)

We’d like your feedback, especially on the "Visual Grounding" challenge, it’s still the hardest part of multimodal RAG. Happy to answer any questions!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: