

Google Is Working on a New Type of Algorithm Called “Thought Vectors” - msoad
https://wtvox.com/robotics/google-is-working-on-a-new-algorithm-thought-vectors/?utm_source=dlvr.it&utm_medium=twitter

======
Tobu
Such a fluffy article.

Here is what he's publishing:
[http://arxiv.org/find/cs/1/au:+Hinton_G/0/1/0/all/0/1](http://arxiv.org/find/cs/1/au:+Hinton_G/0/1/0/all/0/1)

This seems to be a good introduction to the topic:
[http://arxiv.org/pdf/1310.4546v1.pdf](http://arxiv.org/pdf/1310.4546v1.pdf)

This is about paragraph vectors:
[http://arxiv.org/abs/1507.07998](http://arxiv.org/abs/1507.07998)
[http://arxiv.org/abs/1405.4053](http://arxiv.org/abs/1405.4053)

~~~
rspeer
I have doubts about these results if they depend on paragraph vectors.

The first paragraph paper vector (Le and Mikolov 2014) was irreproducible [1].
Not even Mikolov could reproduce it.

Paragraph vectors also fundamentally involve training on your test set: the
paper stresses the importance of all the vectors being learned jointly,
without addressing why this is a problem for evaluation.

The developers of gensim have been making an effort to make a version of
doc2vec that can be applied to documents it was not trained on (of course it
doesn't perform as well). They seem content to clean up after Google's messy
publications, but in a fair world, _they_ would be the ones getting the
citations if they succeed at this.

[1] [http://stats.stackexchange.com/questions/123562/has-the-
repo...](http://stats.stackexchange.com/questions/123562/has-the-reported-
state-of-the-art-performance-of-using-paragraph-vectors-for-sen)

~~~
varelse
Is it possible that the reason Quoc Le never published the code is that it got
tangled up in the word2vec patent? At GTC 2015, Ren Wu stated (paraphrased)
that the reason DNNs are advancing so quickly is because all the practitioners
are sharing code, data, and techniques too quickly for the lawyers to gum up
the works.

The lawyers have since stepped up their game IMO, doing their best to protect
us from that imminent Robot Apocalypse Elon Musk keeps warning us is right
around the corner.

~~~
rspeer
That would be a pretty plausible explanation of why Le has gone completely
silent about that paper.

I assumed the worst when I read your comment; I figured that neural net
semantics was going to stall for 17 years the same way that basic morphology
did when Xerox patented FSTs. But it looks like word2vec is Apache-licensed,
meaning the patent can only be used defensively. Phew.

------
panarky
The article munges thought vectors and word vectors, but it really is quite
startling what's possible.

Take a word vector for "Paris", add the vector for "Germany", subtract the
vector for "France", and the result is "Berlin".

    
    
      indexes, metrics = model.analogy(pos=['paris', 'germany'], neg=['france'], n=10)
      (u'berlin', 0.32333651414395953, 20)
    

Source:
[http://nbviewer.ipython.org/github/danielfrg/word2vec/blob/m...](http://nbviewer.ipython.org/github/danielfrg/word2vec/blob/master/examples/word2vec.ipynb)

~~~
siegecraft
Is that really that surprising, though? Assuming you train it on encyclopedias
and reference material, you'd get a ton of "X is the capital of Y" sentences.

~~~
danieldk
In my experience it also works fairly well when training e.g. on newspaper
text.

It's not really that surprising when you think about _how it works_. Similar
words cluster together in vector space in some dimensions. E.g. 'Paris' and
'Berlin' will both have capital-ish contexts. However, they also are different
in some ways, e.g. Paris will have France-ish contexts and Berlin German-ish
contexts.

The 2-dimensional PCA projection in figure 2 of one of Mikolov's papers [1]
gives an intuition why substraction/addition generally works.

[1] [http://arxiv.org/pdf/1310.4546.pdf](http://arxiv.org/pdf/1310.4546.pdf)

------
abeppu
Of course there's not really enough info in this article to really tell what
he's talking about.

A Guardian article from May [1] has a bit more about what he is meant by
"meaning space":

> Hinton said that the idea that language can be deconstructed with almost
> mathematical precision is surprising, but true. “If you take the vector for
> Paris and subtract the vector for France and add Italy, you get Rome,” he
> said. “It’s quite remarkable.”

... which makes it sounds like this builds off of the "word2vec" [2,3] work
that came out in 2013. But there must be something else new (maybe at the
sentence level?) to get from there to logic and natural conversation.

[1] [http://www.theguardian.com/science/2015/may/21/google-a-
step...](http://www.theguardian.com/science/2015/may/21/google-a-step-closer-
to-developing-machines-with-human-like-intelligence) [2]
[https://code.google.com/p/word2vec/](https://code.google.com/p/word2vec/) [3]
[http://arxiv.org/abs/1310.4546](http://arxiv.org/abs/1310.4546)

------
waterlesscloud
Reminds me of this attempt to create a "philosophical language" in the 17th
century. I learned about it in Neal Stephenson's Baroque Cycle, turned out it
was a real thing.

[https://en.wikipedia.org/wiki/An_Essay_towards_a_Real_Charac...](https://en.wikipedia.org/wiki/An_Essay_towards_a_Real_Character_and_a_Philosophical_Language)

~~~
tpeo
There were many such attempts since Raymond Lull, and even Descartes toyed the
notion for a while, but I'd more readily think of Leibniz's _Characteristica
Universalis_ , whose intent was to outright reduce language processing to
calculation.

------
egillie
You can play around with something similar here:
[https://code.google.com/p/word2vec/](https://code.google.com/p/word2vec/)

~~~
personjerry
Hm, strange that this comment is the very bottom comment given how very much
hacker-oriented it seems.

------
joeblau
I'm working on a similar idea, but as a way to predict personality
temperament. The idea is to use a simple SVM from machine learning and code
multiple contextual inputs into the feature vector. The resulting labels
correspond to a users personality.

An example would be that on Friday (5) at 10 am (36000) in San Francisco
(94158) etc that input maps to personality temperament. My work is still also
in the preliminary stages but I have a working prototype on iOS that captures
over 200 contextual inputs across 14 sensor categories which get factored into
the algorithm. Right now, the contextual vectors are used to predict someones
personality temperament, but I never thought about mapping them to thoughts.

