You could be the first if you were to develop an eval (preferably automated with...

learningcircuit · 2025-03-11T10:34:21 1741689261

How do they evaluate the quality of the report? It's one of the most important things for me.

mentalgear · 2025-03-11T13:46:06 1741700766

Given a benchmark corpus, the evaluation criteria could be:

- Facts extracted: the amount of relevant facts extracted from the corpus

- Interpretations : based on the facts, % of correct interpretations made

- Correct Predictions: based on the above, % of correct extrapolations / interpolations / predictions made

The ground truth could be in JSON file per example. (If the solution you want to benchmark uses a graph db, you could compare these aspects with a LLM as judge.)

---

The actual writing is more about formal/business/academic style, and I find less relevant for a benchmark.

However I would find it crucial to run a "reverse RAG" over the generated report to ensure each claim has a source. [0]

[0] https://venturebeat.com/ai/mayo-clinic-secret-weapon-against...