I'd give a bit more of a nuanced view here -- we can choose any number of properties (dimensions) to represent words, which are all learned from a corpus. 300 dimensions is a pretty popular choice. These dimensions aren't (generally) interpretable: they represent latent properties. In other words, it's not possible to say which property each dimension represents, it's simply one that your word embedding algorithm has picked up in the data. Generally speaking, feature importance is hard to define for the same reason.
The model interpretability goes out the window because we used techniques for the vectorization that kinda suck. NLP is obsessed with self-supervision unnecessarily when they should be innovating in dimensionality reduction techniques
it boggles my mind I haven't seen anyone implement my idea.
Word2vec's popularity is the result of people valuing performance (i.e. accuracy) more than interpretability.
what's mindboggling to me is that I haven't seen anyone else come up with the idea independently.
You need to start from a word list with 100% unambiguous and clearly defined words and even then you're no step closer to working with real language because while superficially similar your word list is actually a highly specialised DSL.
Of course in many cases this DSL approximation of the target language is good enough for certain tasks but the entire process is inherently flawed.
BERT embeddings, after training change with context. In other words if you feed a paragraph about bank robbers and look at the encoding for bank, it will be meaningfully different from the encoding for the same word produced from a paragraph (or sentence) about river banks.
We use BERT at the startup I work at, and one of our tests was the sentence "the bank robbers robbed the bank and then rested by the river bank". BERT was able to generate three different semantically meaningful encodings for the word bank in this sentence. The first two instances were much closer to each other in vector space (euclidean distance) than the last.
This is huge, because it is arguably the first step in building an AI which can perform basic reasoning about information encoded in text. For example, if you average up the encodings of a paragraph of words, you can create an "encoding" which assigns a summary meaning or topic. Simple vector math becomes a powerful reasoning tool.
The future is here.
Well, except for the many many decades of previous work on NLP using symbolic methods that are quite capable. Although DNNs are en vogue and have some amazing properties, we shouldn't forget that symbolic AI/NLP using explicitly semantic representations is powerful and has a rich history, and complements DNNs quite well -- such as being easily explainable, for one.
That is pretty much the definition of a "model" :)
I recently went through the "Tensorflow in Practice" specialization on Coursera and it was illuminating. The thing about ML models, whether CNNs for images, or word2vec+RNN, or whatever else, is that they really don't have any rigid scientific basis for why they work. You're doing, say, Stochastic Gradient Descent to optimize the neuron weights across your dataset. Out the other side of the training, you have a mostly meaningless set of coefficients that work well to classify other unseen data.
I dual-majored in CS and EE, and I leaned towards the "science" side of things, where things get modelled mathematically and analyzed, accepting that the model is likely incomplete but still useful. The thing that drives me nuts with ML is that there's no explanation of what the terms in the ML model actually mean (because the process that produced them doesn't actually investigate meaning, it just optimizes the terms). But... I've accepted that even though the models are pretty much semantically meaningless, they work.
Until they don't. Which may happen easily if you deploy a model for the first time.
My personal view is that (at this moment) ML is mostly correlation detection and pattern recognition, but has little to do with intelligence.
I think you object to the lack of "mathematical beauty", but my point is "who cares?". Not sure why should reality conform to some mental model we find "appealing" for whatever reason. Deep Learning is similar to experimental physics.
Explainable AI is a emerging field, I hear about this necessity specially in NLP and Law. We expect to understand how some decision was reached, and we'll never accept some computer-generated decision if it wasn't explained how each logical step was done. And just giving millions of weights of each neuron won't give us that, because we won't be able to reach the same decision with just those parameters.
We know that IA is a bunch of probabilities, weights and relations in n-dimensions. Our rational brain can know that too, but can't feel it.
Example of this would be here: https://github.com/Hellisotherpeople/Active-Explainable-Clas...
The ontological approach described in the article doesn't really work all that well with real world data.
The raw ML approach works well enough but has a multitude of problems (e.g. learning biases, like "black" being a negative sentiment classifier when talking about people because of the texts the model was initially fed).
But given how hard it is to "solve" these problems, I'm not convinced ML alone will ever progress beyond the 80% "good enough" solution it is now, without being replaced with something completely different.
This is what makes me skeptical of all the tall tales about strong AI and "the singularity". While the specialised applications (e.g. deepfakes) are certainly impressive and a lot of the more generalised applications can go a long enough way to get a decent amount of funding despite unfixable flaws (e.g. sentiment analysis), getting from "here" to "there" seems to require more than just more incremental refinement.
Computer Linguistics courses have been teaching ontological "scientifically sound" approaches that yielded no real-world applications while Google had been eating their lunch with dumb statistical models. The dumb models have since become infinitely more intricate and improved from "barely usable" to "good enough" but seem to be inching ever closer to an insurmountable wall, whereas the "scientific" models still seem to be chasing their own tail describing spherical cows in a vacuum.
That's also one of the reasons why deep NN returned the spark to ML. ML was so deep into the proven models and math that the lack of trial and error part slowed down progress.
king - manliness
For your chess example, if you trained a word2vec model using only a large corpus of text about chess, you very likely wouldn't get the "king - manliness" vector being anything meaningful at all, but you would likely see word associations that are meaningful and also potentially unexpected.
However, I just marvel at word2vec when I stumbled upon it. Encoding of meaning as vector dimensions was mind expanding for me.
I know easy-to-use object recognition models, but for general clustering, metric learning, etc. tasks it would be useful to have an abstract embedding.
Here's a nice paper. But yes, almost any CNN works.
VGG-16 or VGG-19 seem to be used the most.