But in this case why would you prefer approximation over answer ?

sfink · 2024-03-27T18:08:37 1711562917

I would prefer my car to fly through the air, but that's not what it does.

My point is that LLMs are not magical, they're limited by their architecture and reality. They are not symbolic rule processors, even though they can fake it somewhat convincingly. In order for a symbolic rule processor to produce accurate answers, it must have some form of iteration (or fixed point computation, if you prefer). A finite number of layers imposes a fundamental limit on how far the rules' effects can be propagated, without feeding some state back in and iterating. You can augment or modify an LLM to internally do just that, but then it's a different architecture and most likely no longer trainable in a massively parallel fashion. Asking for a chain of thought gives a weak form of iteration restricted to passing state via the response so far, and apparently that chain of thought is compatible enough with the way the LLM works that it doesn't matter that the training did not explicitly involve that iteration.

In short, demanding accurate answers means moving back in the direction of traditional AI. Which has its own strengths and weaknesses, but has never achieved the level of apparent magic we're seeing from these relatively dumb collections of weights extracted from enormously massive piles of data.

The Secret Formula turned out to be "feed a huge amount of data to a big but dumb model", because the not so dumb (simple) models would take too long to feed the huge amount of data to, and the benefits of model complexity are massively outweighed by the competing benefits of learning big sets of weights from massive data. The trick was to find just the right form of "dumb" (though now it sounds like multiple forms of dumb work ok as long as you have the massive pile of data to feed it, and you don't go so dumb as to lose the attention mechanism).