> *I think the OP's argument is, the ChatGPT can only average inputs into an out...

XorNot · on April 14, 2023

This hits on the point exactly: latent space isn't a defined by training inputs, it's defined by structure.

If you have the numbers 1=2, 2=4 and 3=6 as training inputs then latent space builds an axis which reflects something like "2x".

That axis is not bound by those inputs - you can stretch right along it to infinity. Which is extrapolation.

TeMPOraL · on April 14, 2023

I take it as the structure being anchored in the training data. But while current models may not extrapolate beyond the boundaries of the training data[0], my understanding is that the training data itself defines points in the latent space, and those points cluster together over time (the point of doing this in the first place), and otherwise the space is quite sparse.

The latent space doesn't represent the set of things in the training data, but rather a (sub)set of things possible to express using bits of training data. That's a very large space, and full of areas corresponding to thoughts never thought or expressed by humans, yet still addressable - still able to be interpolated. Now, there's a saying that all creativity is just novel way of mashing up things that came before. To the extent that is true, an ML model exploring the latent space is creative.

So I guess what I'm saying is, in your "2x" example, you can ask AI what is between 2 and 3, and it will interpolate you f(2.5)=5, and this was not in the training data, and this is creativity, because almost all human inventiveness boils down to poking at fractions between 1 and 3, and only rarely someone manages to expand those boundaries.

It may not sound impressive, but that's because the example is a bunch of numbers on a number line. Current LLMs are dealing with points in couple hundred thousand dimensional space, where pretty much any idea, any semantic meaning you could identify in the training data, is represented as point proximity along some some of those dimensions. Interpolating values in this space is pretty much guaranteed to yield novelty; the problem is that most of the points in that space are, by definition, useless nonsense, so you can't just pick points at random.

--

[0] - Is it really impossible? In pedestrian-level math I'm used to, the difference between interpolation and extrapolation boil down to a parameter taking arbitrary values, instead of being confined to the [0...1] range.