> I think the OP's argument is, the ChatGPT can only average inputs into an output, as opposed to a human mind that can extrapolate.
Well, it doesn't. It determines where the input should be in an absurdly high dimensional vector space, goes there and looks around for what else is there, then picks one of the closest things and returns it as output.
This is not averaging input. If anything, it's averaging training data. But it's not working in the space of all things ever written in the training data - it's working in a much larger space of all things people could have written, given the conceptual relationships learned from everything they wrote that end up in the training set.
I take it as the structure being anchored in the training data. But while current models may not extrapolate beyond the boundaries of the training data[0], my understanding is that the training data itself defines points in the latent space, and those points cluster together over time (the point of doing this in the first place), and otherwise the space is quite sparse.
The latent space doesn't represent the set of things in the training data, but rather a (sub)set of things possible to express using bits of training data. That's a very large space, and full of areas corresponding to thoughts never thought or expressed by humans, yet still addressable - still able to be interpolated. Now, there's a saying that all creativity is just novel way of mashing up things that came before. To the extent that is true, an ML model exploring the latent space is creative.
So I guess what I'm saying is, in your "2x" example, you can ask AI what is between 2 and 3, and it will interpolate you f(2.5)=5, and this was not in the training data, and this is creativity, because almost all human inventiveness boils down to poking at fractions between 1 and 3, and only rarely someone manages to expand those boundaries.
It may not sound impressive, but that's because the example is a bunch of numbers on a number line. Current LLMs are dealing with points in couple hundred thousand dimensional space, where pretty much any idea, any semantic meaning you could identify in the training data, is represented as point proximity along some some of those dimensions. Interpolating values in this space is pretty much guaranteed to yield novelty; the problem is that most of the points in that space are, by definition, useless nonsense, so you can't just pick points at random.
--
[0] - Is it really impossible? In pedestrian-level math I'm used to, the difference between interpolation and extrapolation boil down to a parameter taking arbitrary values, instead of being confined to the [0...1] range.
Well, it doesn't. It determines where the input should be in an absurdly high dimensional vector space, goes there and looks around for what else is there, then picks one of the closest things and returns it as output.
This is not averaging input. If anything, it's averaging training data. But it's not working in the space of all things ever written in the training data - it's working in a much larger space of all things people could have written, given the conceptual relationships learned from everything they wrote that end up in the training set.