

Machine Learning: Curse of Dimensionality - brendamorgan
http://www.edupristine.com/blog/curse-dimensionality

======
deepnet
As a counterpoint, the Blessing of Dimensionality is that local minima are
mostly as good as the global minima.

In David Silver's RLDM 2015 talk @20 mins in

[http://videolectures.net/rldm2015_silver_reinforcement_learn...](http://videolectures.net/rldm2015_silver_reinforcement_learning/)

he recounts this as his experience.

the paper _The Loss Surfaces of Multilayer Networks_ goes in depth
[http://arxiv.org/abs/1412.0233](http://arxiv.org/abs/1412.0233)

[edit] and the paper _Qualitative Characterisation of Neural Net Optimisation_
[http://arxiv.org/abs/1412.6544v4](http://arxiv.org/abs/1412.6544v4)

~~~
Fede_V
Local minima are only nearly as good as global minima in very very deep
networks.

~~~
deepnet
And yet Goodfellow et al's paper only uses 2 hidden layers and 1 softmax
output layer (on MNIST) which is not so very deep but the effect is still
present.

( _Qualitative Characterisation of Neural Net Optimisation_ ICML 2015)
[http://arxiv.org/abs/1412.6544v4](http://arxiv.org/abs/1412.6544v4)

On p.6 Goodfellow et al. compare the neural nets to linear solutions by
removing the non-linear layers, so that a single linear function is learned
and find for Deep Linear Networks that :

" All minima are global minima, and are linked to each other in a continuous
manifold. "

Which is pretty exciting because the received wisdom was local-minima are a
really big problem.

------
DenisM
Another problem worth mentioning is that computing e.g. Euclidean distance you
end up adding up things of completely different types, and their relative
weights are often entirely arbitrary, e.g. adding distance and weight.

Imagine three points on two-dimension plane: A(1,1) B(5,10), C(2,100).
Clustered like this, A and B will end up in one cluster, while C is in
another. Now let's scale down the second axis by a 100, now we have A(1,0.01)
B(5,0.1), C(2,1) so A and C are clustered together, while B is the second
cluster. And that's just liner scaling, imagine how much havoc I could wreck
with a logarithmic scale?

I haven't done anything to research this problem, it's just something I came
across that made me uneasy about clustering as a whole.

~~~
NhanH
You're supposed to normalize the value of your range on all dimensions so that
they are the same numerical magnitude/range (as the simplest normalization).

The issue is definitely not specific to clustering, you would have the same
problem with one dimension dominate the weight without normalization even for
simple regression too.

------
beagle3
The curse of dimensionality is real and problematic, but it only applies to
the real dimension of the problem, not the measured/perceived one.

Let's say you have samples with 5 dimensions. That's still not (usually)
considered within the realm of the curse.

Now, let's assume that because of the way they are measured, you actually have
10,000,000 measured dimensions for each sample. Now, it starts to be a problem
- especially if you don't know how that expansion happens -- it might be
linear, nonlinear, with noise, etc.

Even though we are now working in 10,000,000 space, and it looks like the
curse of dimensionality applies, it does not. You just have to find a
reasonable way to compact everything back down to the _real_ problem
dimension, or somewhere close enough to it.

The most popular tool for this is called Random Projections, that follows from
the work of Lindenstrauss and Johnson in 94. Compressive Sensing is a
complementary field.

~~~
codingbinary
What you want to get down to is the intrinsic dimensionality. Go below that
and you start discarding useful information.

------
Jun8
Relevant HN post from some time ago:

[https://news.ycombinator.com/item?id=3995615](https://news.ycombinator.com/item?id=3995615)

The problem with the super simple explanation at the beginning (searching for
the penny) is that it doesn't really expose the error of our insight in higher
dimensions: everybody knows intuitively that as the dimensions goes up there's
more "space". Insight fails, I think, in trying to understand what this
statement really means and its effects on point distributions aka objects.

------
brendamorgan
Curse of Dimensionality refers to non-intuitive properties of data observed
when working in high-dimensional space, specifically related to usability and
interpretation of distances and volumes.

~~~
truncate
I remembered someone telling me that they prefer cosine similarity over euler
many times. I was looking into reason why and found an interesting paper "On
the Surprising Behavior of Distance Metrics in High Dimensional Space"[1]
which at some point argues that fractional metrics are perform better than
more widely used distance metrics on high-dimensional space

[1]
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.7409&rep=rep1&type=pdf)

~~~
tdaltonc
There is a cornucopia of distances to choose from. It's not as simple as "just
use the best one." Cosine similarity assumes a certain projection of data (by
ignoring distance of points are from the origin). It preforms well whenever
that's a good projection. I think it's best to think of it as one way to
reduce dimensionality, and should be used when you think that's the right way
to reduce dimensionality.

