Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

perplexity score against a corpus such as wikipedia? Basically how well the model predicts the next word.


This is a good start, but given the breadth of applications this would hardly give us enough to compare, as the goal of these models isn't to simply recite Wikipedia articles. What about language translation? Content summarization? Code generation? Turing test performance?


Both models were trained on Wikipedia, so that's a particularly bad choice. But yes, in practice this is what people tend to do. Take results with a very large grain of salt though, as the domain of the prompts you feed it make a huge difference.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: