
Topological methods for unsupervised learning problems [video] - lmcinnes
https://slideslive.com/38913519/topological-approaches-for-unsupervised-learning
======
nabla9
Uniform Manifold Approximation and Projection

[https://arxiv.org/abs/1802.03426](https://arxiv.org/abs/1802.03426)

[https://github.com/lmcinnes/umap](https://github.com/lmcinnes/umap)

~~~
throwawaymath
Context: the speaker in this talk (Leland McInnes) is one of the authors of
the UMAP paper and corresponding code repo.

------
ham_sandwich
I’ve been interested in learning about topological data analysis, haven’t dug
in too deep yet, but it definitely looks like an interesting direction to zig
in while the field at large zags with ever larger deep learning architectures.

UMAP has already demonstrated its efficacy as a tool in any data scientist’s
belt. Ayasdi and Gunnar Carlson’s work is certainly interesting, but unsure
how much business value it can actually unlock. Seems like there is also
opportunity to draw inspiration from the applied category theory crew (Spivak,
Fong etc) to use some CT tools to approach data science from a fresh
perspective.

Some of the research coming out is interesting, but as a practitioner I’m more
interested in seeing how TDA can add differentiated value in a business
context. Interested to hear where people see the field moving next.

~~~
kaitai
I've been playing with topological methods for data analysis recently and I
think there are some fruitful things happening. Seems like there are some
ideas emerging from theory to practice which might be useful (Betti curves,
multiparameter persistence, etc) but they're not quite there yet.

Another idea that's been intriguing me lately is applied sheaf theory.
Robinson, Ghrist, and Curry are the only people I see working on this but I
don't know what I'm not seeing. The "big idea" is taking local data and seeing
if it patches together to a global coherent whole or not. Sometimes it doesn't
(old Russian example: arbitrage in currency exchange networks). Or sometimes
it's about using interpolation to fill in missing data, if you know that it's
something for which there is a global function (temperatures across ocean
surfaces), or providing a probability distribution for the missing parts.
Category theory has something to offer here as well.

Anyone know more about any of these things?

~~~
nimithryn
Look into OpenCog - there's some sheaf-theoretic NLP stuff going on there.
There's a recent high-level overview by Linas Vepstas you can find on the
ArXiv somewhere. There's a project called SheafSystem also, which is a sheaf-
based database for scientific computing (I've never used it). I have some
ideas I'm working on in this area also (not affiliated with these parties in
any way, and the ideas are not ready to share yet, unfortunately.

What's your background, out of curiosity?

------
notthingnill
Five minutes reading
[https://johnhw.github.io/umap_primes/index.md.html](https://johnhw.github.io/umap_primes/index.md.html)

Without using any category, topology of sheaf theory, this is what I believe
is in this paper:

(1) the prior hypothesis is that data points in R^n are a sample from a
uniform distribution in a Riemann space.

(2) Try to define a Riemann metric such that the number of sample points in
any ball B is propotional to the volume of B.

(3) Since (2) doesn't define a global Riemann metric, they define a fuzzy
membership relation. I suppose the role of the fuzzy tool is that local
distance information is weighted according to the variance of the local
distance estimations.

Disclaimer, I could be completely wrong.

------
starchild_3001
I do a lot of clustering professionally, yet TDA feels very academic. Does
finding locally connected components have practical value with "noisy" data
sets'? If what you're after is locally connected components, why can't you use
density clustering? Also my general feeling: If you have such weird shapes in
R^n, maybe you should try to develop a better distance metric (vs finding
connected components)? Just saying.

------
mikhailfranco
Great introductory talk. I have a physics and 3D graphics background, with
just enough topology and sketchy CT to understand all the words, so now I have
some idea about TDA. Thanks.

