
Universal Sentence Encoder - andrewg
https://arxiv.org/abs/1803.11175
======
nl
Interesting. There's a big need for better vector representations of things
in-between words (for which Word2Vec/Glove/FastText work well) and documents
(which to me seems impossible. Yes I know about Doc2Vec etc, but really.. it
works ok for paragraphs).

Facebook's InferSent[1] has worked reasonably well for me for a variety of
sentence level tasks, but I don't have anything I can point to to say that it
is really substantially better than averaging word embeddings.

More options is good.

(Also, is Kurzweil part of Google Brain or separate. He doesn't really have
nay background in NLP does he?)

[1]
[https://github.com/facebookresearch/InferSent](https://github.com/facebookresearch/InferSent)

~~~
slashcom
For the record, good old fashioned bag of words representations (tf-idf, LDA,
LSA) still provide useful representations for documents. Obviously we hope to
do better, but recently people act like there's no way of turning a document
into a vector.

~~~
nl
Bag of word representations work fine for some applications.

The reason people want better representations is for the applications where
they don’t. For example, Bag of words doesn’t capture agreement or disagree
well, whereas better representations can.

------
JustFinishedBSG
1\. This is more Technical Report worthy than paper worthy...

2\. "by Ray Kurzweil's Team", although accurate I find that fetishization of
certain stars to pretty insulting to the other authors, we already have a
convention and it's "Cer et al. (2018)"

~~~
PaulHoule
At least Ray has the decency to be listed last on the author list!

Personally I think the idea of this paper is pretty good, but the evaluation
is weak.

~~~
wolfgke
> At least Ray has the decency to be listed last on the author list!

Just do it like in mathematics: Authors in alphabetical order.

~~~
josephjrobison
Usually the actual lead author is first, the assistant authors follow, and the
advisor is listed last.

At least that’s how it is in (psychology and other?) PhD programs.

So Ray may only be supervising or contributing a small portion and is likely
listed on all papers his team publishes.

~~~
l1n
Same in Biology

------
igravious
“We present models for encoding sentences into embedding vectors that
specifically target transfer learning to other NLP tasks. The models are
efficient and result in accurate performance on diverse transfer tasks. Two
variants of the encoding models allow for trade-offs between accuracy and
compute resources. For both variants, we investigate and report the
relationship between model complexity, resource consumption, the availability
of transfer task training data, and task performance. Comparisons are made
with baselines that use word level transfer learning via pretrained word
embeddings as well as baselines do not use any transfer learning. We find that
transfer learning using sentence embeddings tends to outperform word level
transfer. With transfer learning via sentence embeddings, we observe
surprisingly good performance with minimal amounts of supervised training data
for a transfer task. We obtain encouraging results on Word Embedding
Association Tests (WEAT) targeted at detecting model bias. Our pre-trained
sentence encoding models are made freely available for download and on TF
Hub.”

Awesome. Now what does all that mean in English?

~~~
rahimnathwani
They made a way to take any sentence, and output a small array of numbers that
represent its essence. You can use their model to find the essence of your own
sentences. And then use it either directly (e.g. compare the essence of two
sentences to see if they're saying roughly the same thing) or use it as a
starting point for the model you need (e.g. if you're building a system to
convert English sentences into French, your neural network might generate the
essence of the English sentence as part of its work. By using the pre-trained
model, you have a better starting point for that part of the network than just
random numbers, so your training time will be greatly reduced).

~~~
laboo
What do you mean by "its essence"? Is this a semantic essence?

~~~
tree_of_item
The array of numbers represents some opaque statistical property of the
sentence with respect to the others in the corpus the model was trained from.
The hope is that this property will correlate with what we believe to be the
sentence's meaning.

------
mlevental
>Our pre-trained sentence encoding models are made freely available for
download and on TF Hub.

what is tf hub? I assume it stands for tensor flow hub but what is that

~~~
eruditepanda
It looks like an internal site, this is the link it is referring to:
[https://tfhub.dev/google/universal-sentence-
encoder/1](https://tfhub.dev/google/universal-sentence-encoder/1)

~~~
sp821543
[https://www.tensorflow.org/hub/modules/google/universal-
sent...](https://www.tensorflow.org/hub/modules/google/universal-sentence-
encoder/1)

~~~
eruditepanda
It looks like there is a link to a Colab notebook (Google's hosted JupyterHub
environment, also called Datalab):
[https://colab.research.google.com/github/tensorflow/hub/blob...](https://colab.research.google.com/github/tensorflow/hub/blob/r0.1/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb)

------
golergka
As someone who has done a ML course, did a primitive Word2Vec but doesn't
really follow the field all that close - how important is this and how does it
compare to what came before?

------
pcf
"..transfer learning to other NLP tasks" – NLP as in neuro-linguistic
programming?

If so, can someone explain how this project is related to NLP? Thanks!

~~~
girvo
Natural language processing/parsing

