At some point, with limited constraints, the best way to predict the next token ...

At some point, with limited constraints, the best way to predict the next token is to have a deep understanding of the world that produced those tokens. This is, as I understand it, a key part of the scaling hypothesis. Very interesting to see people testing parts of that hypothesis.