

DeepWalk: Online Learning of Social Representations - ilyaeck
http://arxiv.org/abs/1403.6652

======
juxtaposicion
In both this paper and word2vec, the key concept is to try and represent a
high-dimensional and sparse dataset as a dense and low-dimensional continuous
vector. Interestingly, the same skip-gram algorithm is used for both even
though it's applied to datasets as disparate as a social graph and a sentence
structure. There's a bit of cleverness here: the authors equate a sequence of
social network graph visits (a random walk in DeepWalk) to a sequence of words
(a sentence in word2vec.) In both cases the resulting representation is dense
while still preserving many relevant properties of a social group which makes
it useful as an input to other ML algorithms. Incredibly interesting.

I wonder if there's a simple but powerful example (like king-man+woman=queen
for word2vec) of this technique.

~~~
kastnerkyle
Both of these ideas (word2vec and DeepWalk) are largely based on the idea of
low level embeddings as you mention. A great paper on learning embeddings for
sparse representations is a paper by Yoshua Bengio et. al. "A Neural
Probabilistic Language Model"[1]. Chris Olah gives a great introduction and
summary here [2]. If a problem can be reformulated into words or a random walk
on a graph, these methods provide a powerful, direct way to feed into a neural
network and also visualize and interpret the learned representation. It is
pretty cool!

Since these are random walks, it seems unlikely that something as simple as
vector math in the embedding space will emerge. However, I wonder if these
embedding spaces could contain much more complex relationships than the simple
eigenvector based methods typically used for characterizing graph Laplacian
type matrices, since the embedding is a non-linear decomposition of this
matrix. Smarter PageRank, anyone? Maybe even graph translation (translating
the connections in one large social graph to condition the information in
another smaller network, or vice versa), though I have been fuzzy on
formulating the details.

[1]
[http://machinelearning.wustl.edu/mlpapers/paper_files/Bengio...](http://machinelearning.wustl.edu/mlpapers/paper_files/BengioDVJ03.pdf)

[2] [http://colah.github.io/posts/2014-07-NLP-RNNs-
Representation...](http://colah.github.io/posts/2014-07-NLP-RNNs-
Representations/)

~~~
thashim
The connection between the random walk and embeddings has been characterized
in [1] (disclosure: I am an author).

It turns out embedding in this setting is exactly smarter page rank as you
suggested, since PageRank recovers the underlying density structure of an
embedding.

I'm not hopeful that this is much more powerful than a matrix representation,
since if [2] is true, it's a matrix factorization of the random-walk marginal.
It's probably more scalable than the linear algebra approach, but not more
powerful theoretically or representationally.

One interesting connection that is unexplored in the paper is that of kernel
PCA with diffusion kernels and this approach. Once again if [2] is true, then
this procedure should behave like kernel PCA under a diffusion kernel in the
large-data limit.

[1] [http://arxiv.org/abs/1411.5720](http://arxiv.org/abs/1411.5720)

[2] [http://arxiv.org/pdf/1501.00358.pdf](http://arxiv.org/pdf/1501.00358.pdf)

~~~
kastnerkyle
Very interesting! It is unfortunate that you could not expect more
representational power theoretically, but perhaps personalizing using
conditional information is easier in this framework vs. the standard linalg
approach.

I have been having hopes for "network translation" specifically using encode-
decode LSTM networks, or learning a generative model of a given network using
unsupervised enc-dec, and trying to find connections which _should_ be there
but are not. May work for recommendations and/or suggesting "friend"
connections if I am thinking about it correctly. It would be architecturally
similar to the recent video work in [1]

Thanks for the link to your paper, I plan on reading it in more detail soon.

[1]
[http://arxiv.org/pdf/1502.04681v1.pdf](http://arxiv.org/pdf/1502.04681v1.pdf)

~~~
thashim
The last sentence of your post is known as 'link-prediction' in the networks
literature. [1] is a pretty comprehensive survey from the networks
perspective. In many cases this reduces to learning pairwise distances between
vertices.

The network translation view is interesting, though one caution would be that
you can have two graphs constructed on identical points, but with a different
graph construction technique (say k-nearest neighbor vs fixed eps ball graph)
and it is unclear if you should say the two graphs are the same (since they
share the same latent coordinates) or if they are different since the graphs
differ.

The generative model approach seems like a pretty nice way to go since it lets
you evaluate against simulation and ground truth. The only problem in my view
is that network models may be quite limited since the relevant network models
for these problems rely on exchangability of some type.

We've a paper in review on the link prediction problem via random walk hitting
times as a similarity measure. We don't have a preprint on Arxiv since that
would de-anonymize us, but I'd be happy to send a copy if this is relevant to
your work.

[1] [http://arxiv.org/abs/1010.0725](http://arxiv.org/abs/1010.0725)

