Hacker News new | past | comments | ask | show | jobs | submit login

Both Claude 4 Sonnet and Opus fail this one, even with extended thinking enabled, and even with a follow-up request to double-check their answers:

“What is heavier, 20 pounds of lead or 20 feathers?”




Can humans answer this correctly ? It is ambiguous


chatgpt (whatever fast model they use) passed that after i told it to "read my question again"




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: