

Big Data and the Topologist (2012) - kdavis
http://ldtopology.wordpress.com/2012/04/11/big-data-and-the-topologist/

======
tlarkworthy
I tried analysing data using persistent homology. What is not obvious,
although they do admit it in one line of every paper, is that is it
susceptible to noise :( So it has to go in the bin even though I really want
to know what my manifolds look like!

~~~
Topolomancer
Well, the thing about noise, from a topological point of view, is that
persistent homology simply _cannot_ decide whether something is noise or not
--- at least not right away.

To be more precise: A lot of what PH does is actually some sort of multi-scale
Betti number calculation. The Betti numbers count the number of k-dimensional
"holes" in a data set. Their calculation is usually done by something that is
called "simplicial homology" in algebraic topology. Topologists like
simplicial homology because it "only" requires that your input data is given
in the form of a simplicial complex. A simplicial complex is a sort of
generalized graph. Think of your input data is being described by vertices,
edges, triangles, tetrahedra, and their higher-dimensional counterparts. Got
it? Good.

Now, the problem about real-world data is that it does not come in the form of
a simplicial complex. So, this sort of structure needs to be _approximated_
somehow. And it is precisely through this approximation that the noise begins
to creep in. PH tries to deal with the noise by assigning a weight to each
feature it detects, sort of like a "scale" on which the feature "lives" (note
that I am talking about a feature in the sense of a "hole" here). Features
that live "long" are considered important. Features that don't live long, are
not considered important.

So, coming finally back to your comment: _You_ , as a user of PH, have to
(sort of) decide what to consider as noise and what not. If you take a look at
the seminal applications in PH, you will find that the extraction of features
works quite well, although some sort of preprocessing might be required.

Anyway, if you are interested about using PH to analyse some data, I would be
happy to discuss some stuff with you :)

Disclaimer: I am working with PH from the point of visualization. I try to
make the topological structure of a data set visible.

~~~
carterschonwald
Well said points. I've been digging into writing some data viz tools that
leverage ph recently, and everything you say is exactly true.

~~~
Topolomancer
Thank you. You got me interested...would you like to elucidate something about
your work?

~~~
carterschonwald
Oh, well thats part of a general subproject of "what are good low dimensional
computational geometry algorithms/problems/techniques that can help do basic
data vis well?!". I'm spending a wee bit of time right now evaluating what are
suite of primitives / algorithms/ data structures I want to have for designing
a EDSL for declarative data vis / plotting.

------
CurtMonash
KXEN does some interesting things with assuming that what the data
approximates is (I think) the solution set to a quadratic equation, rather
than a linear one.

