Point is that the simplest way to excel in next token prediction in the way human consider correct - which is rated by how people feel the predictor mimics a human understanding - is to actually have a world model and other components of human understanding.
Understanding and compression are the same thing. LLMs are fed a huge chunk of totality of human knowledge, and optimized to compress it well. They for sure aren't doing it by Huffman-encoding a multidimensional lookup table.
> Point is that the simplest way to excel in next token prediction in the way human consider correct - which is rated by how people feel the predictor mimics a human understanding - is to actually have a world model and other components of human understanding.
This is a speculative theory for why a next token predictor might sound like it knows what it's talking about. Not something we actually know.
Understanding and compression are the same thing. LLMs are fed a huge chunk of totality of human knowledge, and optimized to compress it well. They for sure aren't doing it by Huffman-encoding a multidimensional lookup table.