Hacker News new | past | comments | ask | show | jobs | submit login

One think I like about this effort is their attempt to factor out the cacheing of prior answers due to having asked a similar question before. Due to the nearly eidetic memoization ability of LLMs, no cognitive benchmark can be meaningful unless the LLM's question history can somehow be voided after each query. I think this is especially true when measuring reasoning which will surely benefit greatly from the cacheing of answers from earlier questions into a working set that will enhance its associations on future similar questions -- which only looks like reasoning.





This talk touches on similar points, I haven’t finished listening to it though https://youtu.be/s7_NlkBwdj8



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: