I still think it seems unclear what you mean by “interpolate” in this context? If your NN takes in several numbers and assigns logits to each class based on those numbers, then if you consider the n dimensional space of possible inputs, and if the new input is in the convex hull of the inputs that appear in training samples, then the meaning of “interpolate” is fairly clear.
But when the inputs are sequences of tokens…
Granted, each token gets embedded as some vector, and you can concatenate those vectors to represent the sequence of tokens as one big vector, but, are these vectors for novel strings in the convex hull of such vectors for the strings in the training set?
The answer is kind of right there in the start of your last sentence. From the transformer model's perspective, the input is just a time series of vectors. It ultimately isn't any different from any other time series of vectors.
Way back in the day when I was working with latent Dirichlet allocation models, I had a minor enlightenment moment when I realized that the models really weren't capturing any semantically meaningful relationships. They were only capturing meaningless statistical correlations to which I would then assign semantic value so effortlessly and automatically that I didn't even realize it was always me doing it, never the model.
I'm pretty sure LLMs exist on that same continuum. And if you travel down it in the other direction, you get to simple truisms such as "correlation does not equal causation."
The part about “is it in the convex hull?” was an important part of the question.
It seems to me that if it isn’t in the convex hull, it could be more fitting to describe it as extrapolation, rather than interpolation?
In general, my question does apply to the task of predicting how a time series of vectors continues: Given a dataset of time series, where the dimension of each vector in the series is such and such, the length of each series is yea long, and there are N series in the training set, should we expect series in the test set or validation set to be in the convex hull of the ones in the training set?
I would think that the number of series in the training set, N, while large, might not be all that large compared to the dimensionality of a whole series?
Hm, are there efficient techniques for evaluating whether a high dimensional vector is in the convex hull of a large number of other high dimensional vectors?
Just shooting from the hip, LLMs operate out on a frontier where the curse of dimensionality removes a large chunk of the practical value from the concept of a convex hull. Especially in a case like this where the vector embedding process places hard limits on the range of possible magnitudes and directions for any single vector.
Outside of the context of a convex hull, I don’t know how to make a distinction between interpolation and extrapolation. This is the core of my question.
What precisely is it that you mean when you say that it is interpolating rather than extrapolating? In the only definition that I know, the one based on convex hulls, I believe it would be extrapolating rather than interpolating. But people often say it is interpolating rather than extrapolating, and I don’t know what they mean.
I doubt they're really thinking about it in a mathematical sense when they say that. I'm guessing, for example, that "extrapolate" is meant in the more colloquial sense, which is maybe closer to "deduce" in practice.
But when the inputs are sequences of tokens…
Granted, each token gets embedded as some vector, and you can concatenate those vectors to represent the sequence of tokens as one big vector, but, are these vectors for novel strings in the convex hull of such vectors for the strings in the training set?