I think in addition to all the benchmarks used right now for LLM evaluation (Hum...

		TastyLamps on Nov 6, 2023 \| parent \| context \| favorite \| on: Chatbots May 'Hallucinate' More Often Than Many Re... I think in addition to all the benchmarks used right now for LLM evaluation (HumanEval and the like). It would be interesting to have a 'hallucination benchmark' with a summarization based hallucination dataset.