
Why do we use word embeddings in NLP? - data_nat
https://medium.com/@natasha.latysheva/why-do-we-use-embeddings-in-nlp-2f20e1b632d2
======
macando
After watching Andrew Ng's course on NLP is when it finally all clicked to me.
Basically, every word can be numerically described with up to 300 properties,
some properties being more important. So every word is a 300-dimensional
vector and vector algebra can be applied. So if you have the word-vector
'king' and you subtract the one dimensional 'manliness' vector from it the
result is the word-vector queen. Likewise, it you subtract the queen from the
king the resulting vector is the one dimensional vector 'manliness'. Awesome!

~~~
pluma
Except this doesn't work consistently because words in real language are
ambiguous and not every combination results in something that can meaningfully
map to real-world language.

You need to start from a word list with 100% unambiguous and clearly defined
words and even then you're no step closer to working with real language
because while superficially similar your word list is actually a highly
specialised DSL.

Of course in many cases this DSL approximation of the target language is good
enough for certain tasks but the entire process is inherently flawed.

~~~
tonyarkles
> approximation of the target language is good enough for certain tasks but
> the entire process is inherently flawed

That is pretty much the definition of a "model" :)

I recently went through the "Tensorflow in Practice" specialization on
Coursera and it was illuminating. The thing about ML models, whether CNNs for
images, or word2vec+RNN, or whatever else, is that they really don't have any
rigid scientific basis for why they work. You're doing, say, Stochastic
Gradient Descent to optimize the neuron weights across your dataset. Out the
other side of the training, you have a mostly meaningless set of coefficients
that _work well_ to classify other unseen data.

I dual-majored in CS and EE, and I leaned towards the "science" side of
things, where things get modelled mathematically and analyzed, accepting that
the model is likely incomplete but still useful. The thing that drives me nuts
with ML is that there's no explanation of what the terms in the ML model
actually mean (because the process that produced them doesn't actually
investigate meaning, it just optimizes the terms). But... I've accepted that
even though the models are pretty much semantically meaningless, they _work_.

~~~
amelius
> I've accepted that even though the models are pretty much semantically
> meaningless, they work.

Until they don't. Which may happen easily if you deploy a model for the first
time.

My personal view is that (at this moment) ML is mostly correlation detection
and pattern recognition, but has little to do with intelligence.

~~~
bitL
The point is that we don't have mental capacity to understand this stuff.
Nobody has any clue how to interpret millions of dimensions, some non-linear
manifold there and how to translate it to something humans are capable of
understanding. These things might be done automatically by our brains on
subconscious level in a similar fashion (or not), but on conscious level we
are completely clueless and basically shoot darts to see which ones become
somewhat useful.

I think you object to the lack of "mathematical beauty", but my point is "who
cares?". Not sure why should reality conform to some mental model we find
"appealing" for whatever reason. Deep Learning is similar to experimental
physics.

~~~
woliveirajr
This.

Explainable AI is a emerging field, I hear about this necessity specially in
NLP and Law. We expect to understand how some decision was reached, and we'll
never accept some computer-generated decision if it wasn't explained how each
logical step was done. And just giving millions of weights of each neuron
won't give us that, because we won't be able to reach the same decision with
just those parameters.

We know that IA is a bunch of probabilities, weights and relations in
n-dimensions. Our rational brain can know that too, but can't feel it.

~~~
Der_Einzige
That's why you use interpretability tools like LIME

Example of this would be here: [https://github.com/Hellisotherpeople/Active-
Explainable-Clas...](https://github.com/Hellisotherpeople/Active-Explainable-
Classification)

------
thanatropism
Are there "recommended", pretrained image featurizers -- maybe intermediate
layers from CNNs and all that?

I know easy-to-use object recognition models, but for general clustering,
metric learning, etc. tasks it would be useful to have an abstract embedding.

~~~
superpermutat0r
[https://www.cv-
foundation.org/openaccess/content_cvpr_worksh...](https://www.cv-
foundation.org/openaccess/content_cvpr_workshops_2014/W15/html/Razavian_CNN_Features_Off-
the-Shelf_2014_CVPR_paper.html)

Here's a nice paper. But yes, almost any CNN works.

VGG-16 or VGG-19 seem to be used the most.

~~~
thanatropism
Yeah, but if I understood it correctly the VGG-16 that comes with Keras is
trained to give a probability distribution over 1000 interpretable labels. I
was hoping for something that, like word2vec, embedded image data in low-ish
dimension.

