At some point, with limited constraints, the best way to predict the next token is to have a deep understanding of the world that produced those tokens. This is, as I understand it, a key part of the scaling hypothesis. Very interesting to see people testing parts of that hypothesis.