Hacker News new | past | comments | ask | show | jobs | submit login

Claude does have more of a hallucination problem than GPT-4, and a less robust knowledge base.

It's much better at critical thinking tasks and prose.

Don't mistake benchmarks for real world performance across actual usecases. There's a bit of Goodhart's Law going on with LLM evaluation and optimization.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: