Claude does have more of a hallucination problem than GPT-4, and a less robust k...

Claude does have more of a hallucination problem than GPT-4, and a less robust knowledge base.

It's much better at critical thinking tasks and prose.

Don't mistake benchmarks for real world performance across actual usecases. There's a bit of Goodhart's Law going on with LLM evaluation and optimization.