
Topology looks for the patterns inside big data - ColinWright
https://theconversation.com/topology-looks-for-the-patterns-inside-big-data-39554
======
bite_my_shiny_m
As soon as I see the phrases "topology" and "big data" in the same title, I
know it's an ayasdi plug.

~~~
boxy310
My work in statistical learning is always about multicollinearity, variable
cardinality, and model selection. Understanding the topology of the data space
is critical, and day-to-day concerns of data cleaning had made me lose sight
of that. To that degree, this article was a fantastic reminder.

------
chestervonwinch
I recently saw a talk on this, and I didn't understand the barcode charts then
either. For example, I understand that the zeroth order Betti number gives the
number of connected components. So, why does the bar code chart show multiple
numbers for each radius value? Shouldn't it look more like plot of some non-
increasing function where for each radius, there is a single zeroth order
Betti number on the y-axis (since the number of connected components is non-
increasing as a function of the radius)?

~~~
wmsiler
The barcode is not usually showing the zeroth Betti numbers. It's usually
showing the first Betti number, which intuitively counts the 1-dimensional
holes in the space. For each radius, there could be lots of those (imagine
several circles that all have a single point in common, then the first Betti
number will be equal to the number of circles). At a given radius, there will
be one bar over it for each 1-dimensional hole. If a bar is really long, i.e.
the same hole exists for many radii, then we assume that it must represent
actual structure in the data, rather than just noise.

You could do a barcode for the holes of any fixed dimension, but as you point
out, the 0-th dimension case is relatively uninteresting, and as you get to
higher dimensions, it's harder to visualize and interpret what is going on. So
dimension 1 is most common.

~~~
chestervonwinch
So, let me try to understand: Let's take the order 1 case. If I pick a value
on y-axis and hold it constant, this refers to a particular "1D hole". As I
move left to right, increasing the radius, the graph is colored black if this
particular hole is present for the given radius and not-colored otherwise. Is
it not misleading, then, to label the y-axis as the Betti number since this is
a single, global number?

> the 0-th dimension case is relatively uninteresting

It was explained to me that the zeroth order Betti numbers have applications
for clustering.

~~~
wmsiler
Your description is correct. And to call the y-axis the Betti number is a bit
misleading. If you are looking at a barcode for 1-dimensional holes, then at a
given radius, the number of bars over that radius is the first Betti number
(at that radius). So Betti number counts the number of holes, but the barcode
graph is keeping track of each hole's "lifetime" as the radius changes.

> It was explained to me that the zeroth order Betti numbers have applications
> for clustering.

That is correct, so perhaps "uninteresting" was too strong :) The 0-th Betti
number counts the number of connected components of the space. So if we are at
radius, say, 1 and the 0-th Betti number is 3, then we know the data points
can be put into 3 "dense" clusters. By dense, I mean that for every two data
points A and B in the cluster, there is a sequence of data points that you
could step on going from A to B where each step has distance at most 1. I
don't know if that explanation made any sense.

------
umutisik
Also very cool is the recognition of branching in the data by the computation
of a persistent Borel-Moore homology. This is the method that was used in
their cancer study.

------
aswanson
Can anyone recommend a good introductory text for topology?

~~~
icen
Munkres is very good. If you're after algebraic topology, Hatcher is also good
(and freely available online).

~~~
compactmani
I second Hatcher. His AT book is a goto for me. He also has introduction to
point set topology notes on his site which are appropriate for someone without
a math background.

