
Visualizing Large-Scale and High-Dimensional Data - blopeur
https://arxiv.org/abs/1602.00370v2
======
cs702
I've just skimmed the paper, and this new algorithm looks very nice -- the
authors call it "LargeVis."

According to the paper, LargeVis improves on Barnes-Hut t-SNE in two ways:
first, it uses the idea that "the neighbors of my neighbors are likely my
neighbors too" to construct an approximate graph of nearest neighbors in the
high-dimensional space in a manner that is computationally much more
efficiently than the method used by t-SNE. Second, the authors apparently have
found a clever way to use SGD to map this graph to two (or three) dimensions
with computational cost _linear_ in the number of nodes.

If the authors release an open-source implementation, LargeVis looks likely to
supplant t-SNE as the go-to algorithm for visualizing high-dimensional data.

~~~
vladimirralev
I wonder why they are positioning this as visualisation technique rather than
dimensionality reduction which would be a much bigger deal. Is there something
to suggest the technique doesn't work well for reduction in arbitrary
dimensions?

~~~
eximius
Disclaimer: Havent read it but have read summaries here.

It sounds like it is very fast but not very rigorous. This lets you get a feel
for the data but it doesn't give you the same guarentees other dimensionality
reductions do.

~~~
cs702
That's my sense from skimming the paper.

------
mikeskim
I wish academics would publish pure python implementations of their "new"
algorithms. Standard python with Pypy is enough for speed of development and
runtime.

The biggest thing about t-SNE is that it's been used in competitive machine
learning for quite a long time successfully by many different people because
it's on R via CRAN and Python via sklearn. LargeVis has potential, but it
could also be not so useful like the vast majority of academic work.

~~~
visarga
After a paper like this comes out, usually someone adds it to their library or
makes a public implementation of it. Usually there are more than one. Maybe
the authors aren't the best at implementing a clean version of it.

~~~
zo1
Any version is better than none. Even if it is just a basic/crude reference
implementation that library-authors can use to add the algorithm/functionality
to their library.

------
thaw13579
This seems like a nice and useful method, but the conclusions of the paper
seem problematic, specifically that the resulting visualizations are more
"effective" and higher "quality" than t-SNE (or anything else for that
matter). Can that aspect of a visualization method be judged by only showing a
few examples, without any tests of how people actually use or benefit from it?

------
jerryhuang100
[https://sites.google.com/site/pkujiantang/big-data-
visualiza...](https://sites.google.com/site/pkujiantang/big-data-
visualization)

------
haddr
As far as I see authors haven't decided to publish their implementation of the
algorithm. That's a real pity given the fact that it's a really useful one.

~~~
wcrichton
For what it's worth, one the algorithms they compare against, t-SNE, is open
source: [https://lvdmaaten.github.io/tsne/](https://lvdmaaten.github.io/tsne/)

