The LMSYS leaderboards are crowdsourced and would be hard to fake, it showing a ...

paxys · 2024-12-31T21:04:50 1735679090

Crowdsourced data is the easiest to fake unless you can somehow ensure that you have a completely unbiased population (which is impossible). There's a reason why certain models do so well on upvote-based leaderboards but rank nowhere on objective tests.

CGamesPlay · 2025-01-01T00:15:06 1735690506

Which ones? I think fine-tunes are where I see most of this (I'll just call it) "model spam", but the base models don't seem to exhibit this behavior. I do see some models perform way below the curve compared to their benchmark performance, though (Phi family being the most famous).