Crowdsourced data is the *easiest* to fake unless you can somehow ensure that yo... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

paxys 28 days ago | parent | context | favorite | on: Deepseek: The quiet giant leading China’s AI race

Crowdsourced data is the easiest to fake unless you can somehow ensure that you have a completely unbiased population (which is impossible). There's a reason why certain models do so well on upvote-based leaderboards but rank nowhere on objective tests.

CGamesPlay 28 days ago [–]

Which ones? I think fine-tunes are where I see most of this (I'll just call it) "model spam", but the base models don't seem to exhibit this behavior. I do see some models perform way below the curve compared to their benchmark performance, though (Phi family being the most famous).

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact