>A statistical model may "capture" explanatory relations, but can it use them?
That's the beauty of autoregressive training, the model is rewarded for capturing and utilizing explanatory relations because they have an outsized effect on prediction. It's the difference between frequency counting while taking the past context as an opaque unit vs decomposing the past context and leveraging relevant tokens for generation while ignoring irrelevant ones. Self-attention does this by searching over all pairs of tokens in the context window for relevant associations. Induction heads[1] are a fully worked out example of this and help explain in-context learning in LLMs.
>Anyway I find it very hard to think of language models as explanatory models. They're predictive models, they are black boxes, they model language, but what do they explain? And to whom?
The model encodes explanatory relationships of phenomena in the world and it uses these relationships to successfully generalize its generation out-of-distribution. Basically, these models genuinely understand some things about the world. LLMs exhibit linguistic competence as it engages with subject matter to accurately respond to unseen variations in prompts of that subject matter. At least in some cases. I argue this point in some detail here[2].
>how can we say which model we got a hold of?
More sophisticated tests, ideally that can isolate exactly what was in the training data in comparison to what was generated. I think the example of the wide variety of poetry these models generate should strongly raise one's credence that they capture a sufficiently accurate model of poetry. I go into detail on this example in the link I mentioned. Aside from that, various ways of testing in-context learning can do a lot of work here[3].
>> That's the beauty of autoregressive training, the model is rewarded for capturing and utilizing explanatory relations because they have an outsized effect on prediction.
That sentence should be decorated with the word "allegedly", or perhaps "conjecture"! In practical terms, I believe you are pointing out that language models of the GPT family are trained on a context surrounding, not just preceding, a predicted token. That's right (and it gets fudged in discussions about predicting the next token in a sequence), but we could already do that with skip-gram models, and with context-sensitive grammars, and dependency grammars, many years ago, and I don't remember anyone saying those were specially capable of capturing explanatory relations [1]. Although for grammars the claim could be made, since they are generally based on explanatory models of human language (but not because of context-sensitivity).
Anyway, I thought you were arguing that explanations are arbitrary, "explanatory posits", and wouldn't that mean that an explanation doesn't improve prediction? This is not to catch you in contradiction, I'm genuinely unsure about this myself. My understanding is that explanatory hypotheses improve predictions in the long run [2], but that's not to say that a predictive model will improve given explanations, rather explanatory models eventually replace strictly predictive models.
Are you saying that including explanations in training data can improve prediction? That would make sense, but this is very hard to do when training a predictive model on text. In that case, the explanations are at best hidden variables and language models are just not the right kind of model to model hidden variables.
Sorry, writing too much today. And I got work to do. So I won't bitch about "in-context learning" (what we used to call sampling from a model back in the day, three years ago before the GPT-3 paper :).
______________
[1] My Master's thesis was a bunch of language models trained on Howard Philips Lovecraft's complete works, and separately on a corpus of Magic: the Gathering cards. One of those models was a probabilistic Context-Free Grammar, and despite its context-freedom, and because it was a Definite Clause Grammar, I could sample from it with input strings like "The X in the darkness with the Y in the Z of the S" and it would dutifully fill-in the blanks with tokens that maximised the probability of the sentence. So even my puny PCFG could represent bi-directional context, after a fashion. Yet I wouldn't ever accuse it of being explanatory. Although I would say it was quite mad, given the corpus.
[2] I mention in another comment my favourite example of the theory of epicylces compared to Kepler's laws of planetary motion.
>Anyway, I thought you were arguing that explanations are arbitrary, "explanatory posits", and wouldn't that mean that an explanation doesn't improve prediction?
I don't mean to say that explanations are arbitrary, rather that causes are not observed only inferred. But we infer causes because of the explanatory work they do. This isn't arbitrary, it is strongly constrained by predictive value as well as, I'm not sure what to call it, epistemic coherence and intelligibility maybe? Explanatory models are satisfying because they allow us to derive many phenomena from fewer assumptions. Good explanatory models are mutually reinforcing and have a high level of coherence among assumptions ("epistemic coherence"). They also require the fewest number of assumptions taken as brute without further justification ("intelligibility").
Why think explanatory models are better at prediction? Because the mutual coherence among assumptions and explanatory power of the whole (ability to predict much from few assumptions) suggests the explanatory model is getting at the productive features of the phenomena that result in the observed behavior. Essentially, the fewer number of posits, the fewer ways to "bake in" the data into the model. If we were to cast this as a computational problem, i.e. find a program that reproduces the data, shorter programs are necessarily more explanatory. There's no other way to explain the coincidence of program picked out of a small space generating data picked out of a very large space without there being an explanatory relation between the two. Further, our credence for explanation increases as the ratio of the respective spaces diverge.
This is really the problem of machine learning in a nutshell. Is the data vs parameter count over some threshold such that training is biased towards explanatory relations? Is the model biased in the right way to discover these relations faster than it can memorize the data? LLMs seem to have crossed this threshold because of the massive amount of data they are trained on, seemingly much larger than can comfortably be memorized, and the inductive biases of Transformers that search the space of models to extract explanatory relations.
>Are you saying that including explanations in training data can improve prediction? That would make sense, but this is very hard to do when training a predictive model on text. In that case, the explanations are at best hidden variables and language models are just not the right kind of model to model hidden variables.
I agree with this, and I think these explanatory relations are implicit in human text. I gave the example in another comment that I say things like "I picked my cup off the floor" rather than "I picked my cup off the ceiling" because causal relations in the real world influence the text we write. The relation of "things fall down" is widely explanatory. But it seems to me that LLMs are very much general modelers of hidden variables, given the wide applicability of LLMs in areas that aren't strictly related to natural language. But then again, any structured data is a language in a broad sense. And the grammar can be arbitrarily complex and so can encode deep relationships among data in any domain. Personally, I'm not so surprised that a "language model" has such wide applicability.
>> Why think explanatory models are better at prediction? Because the mutual coherence among assumptions and explanatory power of the whole (ability to predict much from few assumptions) suggests the explanatory model is getting at the productive features of the phenomena that result in the observed behavior. Essentially, the fewer number of posits, the fewer ways to "bake in" the data into the model. If we were to cast this as a computational problem, i.e. find a program that reproduces the data, shorter programs are necessarily more explanatory. There's no other way to explain the coincidence of program picked out of a small space generating data picked out of a very large space without there being an explanatory relation between the two. Further, our credence for explanation increases as the ratio of the respective spaces diverge.
Like you say, that's the problem of machine learning. There's a huge space of hypotheses many of whom can fit the data, but how do we choose one that also fits unseen data? Explanatory models are easier to trust and trust that they will generalise better, because we can "see" why they would.
But the problem with LLMs is that they remain black boxes. If those black boxes are explanatory models, then to whom is the explanation, explained? Who is there to look at the explanation, and trust the predictions? This is what I can't see and I think it turns into a "turtles all the way down" kind of situation. Unless there is a human mind, somewhere in the process, that can look at the explanatory model and use the explanation to explain some observation, then I don't see how the model can really be said to be explanatory. Explanatory- to whom?
>> But it seems to me that LLMs are very much general modelers of hidden variables, given the wide applicability of LLMs in areas that aren't strictly related to natural language.
Well, I don't know. Maybe we'll find that's the case. For the time being I'm trying to keep an open mind, despite all the noise.
That's the beauty of autoregressive training, the model is rewarded for capturing and utilizing explanatory relations because they have an outsized effect on prediction. It's the difference between frequency counting while taking the past context as an opaque unit vs decomposing the past context and leveraging relevant tokens for generation while ignoring irrelevant ones. Self-attention does this by searching over all pairs of tokens in the context window for relevant associations. Induction heads[1] are a fully worked out example of this and help explain in-context learning in LLMs.
>Anyway I find it very hard to think of language models as explanatory models. They're predictive models, they are black boxes, they model language, but what do they explain? And to whom?
The model encodes explanatory relationships of phenomena in the world and it uses these relationships to successfully generalize its generation out-of-distribution. Basically, these models genuinely understand some things about the world. LLMs exhibit linguistic competence as it engages with subject matter to accurately respond to unseen variations in prompts of that subject matter. At least in some cases. I argue this point in some detail here[2].
>how can we say which model we got a hold of?
More sophisticated tests, ideally that can isolate exactly what was in the training data in comparison to what was generated. I think the example of the wide variety of poetry these models generate should strongly raise one's credence that they capture a sufficiently accurate model of poetry. I go into detail on this example in the link I mentioned. Aside from that, various ways of testing in-context learning can do a lot of work here[3].
[1] https://transformer-circuits.pub/2022/in-context-learning-an...
[2] https://www.reddit.com/r/naturalism/comments/1236vzf/
[3] https://twitter.com/leopoldasch/status/1638848881558704129