Well, you've fooled yourself into thinking you understand something when you don't. I say this as someone with a PhD in the topic, who has taught many students, and published dozens of papers in the space.
The operation of adding BoW vectors together has nothing to do with the operation of adding together word embeddings. Well, aside from both nominally being addition.
It's like saying you understand what's happening because you can add velocity vectors and then you go on to add the binary vectors that represent two binary programs and expect the result to give you a program with the average behavior of both. Obviously that doesn't happen, you get a nonsense binary.
They may both be arrays of numbers but mathematically there's no relationship between the two. Thinking that there's a relationship between them leads to countless nonsense conclusions: the idea that you can keep adding word embeddings to create document embeddings like you keep adding BoWs, the notion that average BoWs mean the same thing as average word embeddings, the notion that normalizing BoWs is the same as normalizing word embeddings and will lead to the same kind of search results, etc. The errors you get with BoWs are totally different from the errors you get with word or sentence or document embeddings. And how you fix those errors is totally different.
No. Nothing at all makes sense about word embeddings from the point of BoW.
Also, yes BoW is a total dead end. They have been completely supplanted. There's never any case where someone should use them.
> as someone with a PhD in the topic, who has taught many students, and published dozens of papers
:joy: How about a repo I can run that proves you know what you're talking about? :D Just put together examples and you won't have to throw around authority fallacies.
> if you add 2 vectors together nothing happens and its meaningless
In game dev you move a character in a 3D space by taking their current [x, y, z] vector and adding a different [x, y, z] to it. Even though left/right has nothing to do with up/down, because there are foundational concepts like increase/decrease in relation to a [0, 0, 0] origin, it still affects all the axes and gives you a valuable result.
Take that same basic idea and apply it to text-embedding-ada-002 with its 1536 dimensional embedding arrays, and you can similarly "navigate words" by doing similar math on different vectors. That's what is meant by the king - man = queen type concepts.
I think what the person before you meant about it not making sense is that it's strange that something like a dot product gives you similarity. To your point (I think) it only would assuming the vectors were scored in a meaningful way, of course if you did math on nonsense vectors you'd get nonsense results, but if they are modeled appropriately it should be a useful way to find nodes of certain qualities.
The operation of adding BoW vectors together has nothing to do with the operation of adding together word embeddings. Well, aside from both nominally being addition.
It's like saying you understand what's happening because you can add velocity vectors and then you go on to add the binary vectors that represent two binary programs and expect the result to give you a program with the average behavior of both. Obviously that doesn't happen, you get a nonsense binary.
They may both be arrays of numbers but mathematically there's no relationship between the two. Thinking that there's a relationship between them leads to countless nonsense conclusions: the idea that you can keep adding word embeddings to create document embeddings like you keep adding BoWs, the notion that average BoWs mean the same thing as average word embeddings, the notion that normalizing BoWs is the same as normalizing word embeddings and will lead to the same kind of search results, etc. The errors you get with BoWs are totally different from the errors you get with word or sentence or document embeddings. And how you fix those errors is totally different.
No. Nothing at all makes sense about word embeddings from the point of BoW.
Also, yes BoW is a total dead end. They have been completely supplanted. There's never any case where someone should use them.