Domain as in a context, the type of data that they were trained on. Code vs text. English vs Russian. Python vs APL.
For instance, OpenAI found that GPT4 is much better at reasoning in some human languages than others. It is best at reasoning in English, but struggles reasoning in less resourced languages.
There is clearly some context-independent reasoning going on (i.e. generalization) otherwise the model would not be able to reason at all in languages that it hasn’t seen a particular problem in. But there also appears to be a large context-dependent factor.
For instance, OpenAI found that GPT4 is much better at reasoning in some human languages than others. It is best at reasoning in English, but struggles reasoning in less resourced languages.
There is clearly some context-independent reasoning going on (i.e. generalization) otherwise the model would not be able to reason at all in languages that it hasn’t seen a particular problem in. But there also appears to be a large context-dependent factor.