
Decoding the Thought Vector - kebinappies
http://gabgoh.github.io/ThoughtVectors/
======
numinary1
When I was working on a recommender for television shows, I ran SVD on a large
User/Item matrix to create a low rank approximation, essentially reducing
thousands of user features (TV show preferences) to user vectors representing
twenty or thirty abstract "features". Then I looked at the actual item
preferences of users who expressed each feature at the greatest and least
magnitude. The features, in some cases, mapped to recognizable constructs.
There were distinct masculine and feminine features, several obvious Hispanic
/ Latino elements, and strong liberal versus conservative indicators. Others
were less explainable using common labels.

It struck me at the time that the qualities that were expressed most strongly
were the ones that ended up having names in our language. But there were
others for which I would say to myself, there is something about this group
(e.g. those with the greatest expressed value of F124) that I recognize, but
can't quite put my finger on.

Of course, I was looking at people through a keyhole, their TV viewing
preferences being the only information I had.

Also, I noticed that these "came into focus" most clearly at a certain level
of compression (rank).

FWIW

~~~
sweetdreamerit
A question: is this much better / different than a principal component
analysis (or a factor analysis)?

~~~
antognini
It's a bit of an apples/oranges comparison to compare SVD to PCA. SVD is a
numerical technique, whereas PCA is a method to analyze a dataset. You can use
SVD to perform PCA (although there are other ways to perform PCA without
explicitly doing a SVD). I'm guessing that the GP performed PCA using SVD.
There's a good Stack Exchange answer to exactly this question here:

[http://stats.stackexchange.com/questions/121162/is-there-
any...](http://stats.stackexchange.com/questions/121162/is-there-any-
advantage-of-svd-over-pca)

------
SolarNet
> Rather curiously, it turns "airplanes" into "knives". I do not understand
> why this happens.

I thought it was pretty obvious. The atoms are complected (defined - by Rich
Hickey of Clojure - as, basically, a semantic that contains multiple
interdependent concepts (for example how variables complect state, values, and
names)). In fact that's the conceit of the whole idea, the thought vector is
being extracted from the sparse matrix, sometimes that sparse matrix isn't
that sparse and you will get complected concepts. It was obvious in the
earlier pics. One atom might contain a piece of information needed by a hat,
that when combined with other atoms makes a hat, but when combined with a
different set makes a headband.

For example if you look knives is shared with scissors. It's one atom
describing roughly "handheld sharp objects". The airplane atom + many items
atom is actually a special mutation for many knives. Where as the many items
atom + sharp objects atom is likely scissors. And the sharp object + airplane
atoms are probably knife. They are all complected and interdependent.

Sure an atom may generally mean a specific concept but sometimes it will fall
back to a combination specific mutation. For example there aren't often many
planes, and many planes looks rather like many knives. And there probably
aren't ever more than one pair of scissors, and one pair of scissors looks
rather like two knives. It's a way to describe 3 things and their number
(knives, scissors, plane) using 2 things, an existing counting mutator, and
the fact that scissors and planes are often singular. It's a form of semantic
compression, quite interesting, and I would imagine domain dependent.

That's my hypothesis anyway.

~~~
jawns
This explanation reminds me of a concept in ASL (American Sign Language)
called Classifiers:

[https://seattlecentral.edu/faculty/baron/Summer%20Courses/AS...](https://seattlecentral.edu/faculty/baron/Summer%20Courses/ASL%20223/ASL223ASLClassifiers.htm)

They are "class" modifiers that modify different nouns in different ways.

------
ap22213
This is incredible and infinitely useful. For many years, I've had this hunch
that Human symbols and abstractions had an algebraic quality. And, I have
always wanted to substitute tags, category, labels, and attributes with
global, permanent indices. Mainly, I wanted to do this because the 'view' of a
thing is mutable: e.g. a word for a concept changes over time even when the
conceptual meaning remains constant. I can't wait to get home and play around
with this.

------
earthly10x
Much of AI and Machine Learning all boils down to the vector and vector space
in addition to how well those features are engineered, constructed, scored and
ranked.

------
EvanMiller
Nice interactive examples but I'm afraid the basic setup here doesn't make
sense to me. The "atom" is defined as the average encoding of inputs with the
feature ("faces with a smile"), but I'd think the proper definition should
subtract off inputs without the feature (i.e. "smile" = "faces with a smile"
minus "faces without a smile"). The way it's defined you end up adding an
extra "average face" along with the feature of interest, which is clearly seen
in "The Geometry of Thought Vectors" example -- the non-smiling woman isn't so
much forced to smile as to have her face merged with that of a generic smiling
woman.

------
ximeng
Is there any way to read these responsive sites on mobile when they are cut
off by the column margins?

~~~
dharma1
Request desktop site

------
choxi
The "atom" terminology is a bit confusing to me. Isn't an "atom" just another
thought vector? If "sunglasses" \+ "smiles" = "smiling-while-wearing-
sunglasses", then "sunglasses" and "smiling-while-wearing-sunglasses" are both
just vectors. Is there a reason for the distinction?

Also, if all thoughts can be described as vectors and linear combinations of
vectors in "thought-space", I wonder what the axis represent and how many
dimensions there are. Are all thoughts just a combination of 100 "unit
thoughts"?

Really interesting post!

------
mooneater
I find it counter-intuitive that thought vectors in should have "Linear
Structure" in multilevel autoencoders.

Since the whole appeal of neural networks is that they can model non-linear
functions. Why would the autoencoder end up with an encoding that is
essentially linear?

~~~
antognini
The goal of a neural network is to take a complicated manifold and, through
each of its layers, flatten it out into progressively more and more linear
manifolds. If you are building a classifier, then the inputs to the last layer
will necessarily have to be linearly separable because the final layer is
essentially linear --- the softmax operation just transforms the pre-
activation values from logits to probabilities.

So if the NN is well trained, the second to last layer will have linearly
separated the different classes (as much as possible, anyway). Earlier layers
may not have completely linearly separated their inputs, but they are probably
going to lie along simpler manifolds than even earlier inputs.

There's a good blog post on this here:
[http://colah.github.io/posts/2014-03-NN-Manifolds-
Topology/](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)

~~~
mooneater
Yes I understand that conceptually. But the middle autoencoder layer is not
really an output layer, what would constrain it to linear representations? I
assume normal output layers are constrained to linear representations by the
supervised training process.

------
webmaven
From the post:

 _> Rather curiously, it turns "airplanes" into "knives". I do not understand
why this happens._

I would venture to guess that this happens because of the existence of an
plural ambiguous "thought" bridging the _" airplane"_ and _" knives"_ concept
vectors, along the lines of _airplanes - > propellers -> blades -> knives_,
and the _" a group of"_ vector is causing the system to jump over that
semantic ambiguity.

