Here’s my understanding. It is intimidating to write a response to you, because ...

upghost · 2025-01-24T13:32:02 1737725522

If this is true, my understanding of vanilla token vector embeddings is wrong. my understanding was that the vector embedding was the geometric coordinates of the token in the latent space with respect to the prior distribution. So adding another dimension to make it a "multivector" doesn't (in my mind) seem like it would add much. What am I missing?

tadkar · 2025-01-24T15:49:07 1737733747

I think the important thing is that the first approach to converting complete sentences to an embedding was done by averaging all the embeddings of the tokens in the sentence. What ColBERT does is store the embeddings of all the tokens before then using dot products to identify the most relevant tokens to the query. Another comment in this thread says the same thing in a different way. Feels funny to post a stack exchange reference, but this is a great answer!

[1] https://stackoverflow.com/questions/57960995/how-are-the-tok...