Hacker News new | past | comments | ask | show | jobs | submit login
Are You Smarter Than An LLM? (Quiz based on the most popular LLM benchmark) (erenrich.net)
28 points by trott 5 months ago | hide | past | favorite | 9 comments



This was very cool until I realized that a significant fraction of the questions have incorrect answers. Two for two biology related questions were wrong.


that's part of the point of showing it actually. see https://derenrich.medium.com/errors-in-the-mmlu-the-deep-lea...


Obviously it's an expensive process, but I'd really want to know what percentage of these questions are just wrong. Some of the ones they called out are pretty terrible.

But it can be a good honeypot for either LLMs that cheat or are overfit.


I found the multiple models getting this one correct really interesting:

As a result of an accident, Abdul lost sight in his right eye. To judge the distance of vehicles when he is driving, Abdul is able to rely on cues of

    A. I only
    B. II only
    C. III only
    D. I and II only
I don't think this is the models learning the specific dataset, but rather the best performing models having learned how to score well on multiple choice tests, such as when not sure to guess an exclusionary combined answer (I'd wager this strategy ends up correct more often than incorrect when all answers seem equally probable based on available knowledge).

Which in turn is a useful reminder that we'd best be wary of turning measurements into targets, as we may be targeting adaptations that score well on the targeted measurement but aren't more broadly applicable (and might even undermine a better generally performing model that isn't as smart at acing the test format vs the test content).


No, but at least I'm capable of answering "I don't know", or would have, had the option been available.


Or, upon being pressed, you don't normally immediately fold if you know you are correct.


Be nice to know how many questions are in the quiz – unless realizing that it's unending and quitting is a Turing test and we're the real subjects ツ


The first question is: "The risk of abnormality in the child of a mother with untreated phenylketonuria is:"

What has this to do with being smart? Even Forrest Gump can memorize the answer to this question.


clearly I am a doctor, physicist, chemist, lawyer and ethicist with my 60%




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: