
Introduction to Mean Shift Algorithm - DLion
https://saravananthirumuruganathan.wordpress.com/2010/04/01/introduction-to-mean-shift-algorithm/
======
rcthompson
For all those struggling to understand the mean shift algorithm, it's much
easier to understand in pictures. Here's an article that explains it with some
diagrams: [http://eric-yuan.me/continuously-adaptive-shift/](http://eric-
yuan.me/continuously-adaptive-shift/)

Basically, it goes like this:

1\. Select a data point of interest.

2\. Draw a circle of a specified radius around the point of interest.

3\. Collect all data points within the circle and compute their mean.

4\. Move the center of the circle to the mean.

5\. Repeat 3 & 4 until convergence. Each iteration will move "uphill" on the
density gradient of the data distribution until it reaches the top of the hill
(a local maximum).

6\. Repeat 1-5 for all data points. Points that converge to the same local
maximum are members of the same cluster. The number of clusters is the number
of local maxima.

For higher dimensions, replace "circle" with sphere (3-D) or hypersphere (4 &
higher dimensions). Obviously, this algorithm depends on a choice of radius,
which determines the granularity of the search for local maxima.

~~~
thomasahle
Do I understand correctly that you run the converging process on every single
point?

Couldn't you have a situation where points really far from a cluster would
still converge to that cluster by this method? Something like a long string of
points slowly getting closer and closer?

Does the mean shift algorithm have any guarantees on running time and/or the
quality of the clustering it finds?

~~~
rcthompson
> Do I understand correctly that you run the converging process on every
> single point?

I don't think you have to run it on every point in the data set. If your data
set is very large, you could run it on a random sample of points, or a define
a regular grid of starting positions at the resolution that you require.

> Couldn't you have a situation where points really far from a cluster would
> still converge to that cluster by this method? Something like a long string
> of points slowly getting closer and closer?

Yes, that's the idea. For each point, you're basically asking "If I start here
and keep walking up the density gradient from here until I hit a maximum,
where do I end up?" If the shape of the probability density function has a
very long ridge, you could end up walking the entire length of the ridge until
you hit the highest point. This means that you can have arbitrary-shaped
clusters, within the smoothness bounds imposed by your chosen radius. This
feature is considered a potential advantage over k-means clustering, which can
only produce convex clusters.

> Does the mean shift algorithm have any guarantees on running time and/or the
> quality of the clustering it finds?

I haven't actually used it in practice, so I don't know.

------
kough
Cool writeup. This is part of sklearn, which has a great visualization of how
it performs on some different clustering tasks, including comparisons to other
algorithms: [http://scikit-
learn.org/stable/auto_examples/cluster/plot_cl...](http://scikit-
learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html#example-
cluster-plot-cluster-comparison-py)

~~~
Wookai
Based on this plot, my naive reaction would be to decide that DBSCAN is the
best algorithm, as it recovers the (qualitatively) best clusters in each case.
Anybody has experience with DBSCAN? What are the downside?

~~~
kough
The Wiki page has a nice writeup of a few downsides:
[https://en.wikipedia.org/wiki/DBSCAN#Disadvantages](https://en.wikipedia.org/wiki/DBSCAN#Disadvantages)

------
web007
This sounds similar to k-means - pick some points "at random" and iteratively
move them to the average location of all nearest neighbors until you
stabilize. But mean shift goes in the opposite direction, giving each point a
window and grouping them on overlaps. Can someone who understands both confirm
this is the case, or correct my understanding?

~~~
RossBencina
The article includes a comparison with k-means. From the article, mean-shift:

> does not assume anything about number of clusters

> can handle arbitrarily shaped clusters

> is fairly robust to initializations

> not very sensitive to outliers

> time complexity mean-shift: O(Tn^2), k-means: O(knT). (k: number of
> clusters, n: number of points, T: number of iterations).

[Edit: formatting]

------
nojvek
I was expecting a more intro-ey layman explanation. This was pretty hard to
understand.

~~~
Cyph0n
I agree, these are basically the lecture notes for the course.

------
m_y-n_a_m_e
Mean Shift in R
[https://cran.r-project.org/web/packages/MeanShift/MeanShift....](https://cran.r-project.org/web/packages/MeanShift/MeanShift.pdf)

------
thecourier
I love this blog. whenever I want to learn something cool I open Chrome in my
Android and type: saravananthirumuruganathan.wordpress.com

------
brianberns
> For each data point, Mean shift defines a window around it and computes the
> mean of the data point.

This makes no sense. A clearer explanation would go a long way.

~~~
daveguy
For each data point, Mean shift defines a window around it and computes the
mean of the data points within the window.

Better?

~~~
bazzargh
Not really.

 _Given an estimate of the mean_ ; for each data point, Mean shift defines a
window around it and computes a new estimated mean _weighting each point by
the probability density at the previous estimated mean calculated using the
window_

The (weighted) 'mean of the data points within the window' makes sense if you
use the other perspective of looking at the window _around the current
estimated mean_ \- you'll get the same answer, and to me this explanation is
easier to grasp - the PDF only depends on the distance between the point and
the estimated mean so you can think of either as the 'center'. But saying you
calculate the mean of the data points within the window _for each data point_
mixes up two perspectives and makes no sense.

~~~
ebalit
daveguy's explanation is correct. It's the simplest form of the Mean Shift
algorithm.

~~~
bazzargh
Nope. It would be correct if it said 'For each estimate of the mean, Mean
shift defines a window around it and computes the mean of the data points
within the window, then repeats the process with this new estimate.'

Suppose we had 500 data points. Daveguy's process calculates a separate mean
for a window around _each data point_. Now we have 500 means...and?

~~~
daveguy
I was clarifying the sentence which the poster had pointed out as confusing,
not the entire description of the algorithm.

------
biomcgary
I just got out of a genome talk that uses the Mean Shift algorithm to find
gene clusters, which speaks to the broad applicability of the algorithm.

------
garbage_stain
mlpack has a nice command-line-accessible implementation:
[http://mlpack.org/docs/mlpack-2.0.3/man/mlpack_mean_shift.ht...](http://mlpack.org/docs/mlpack-2.0.3/man/mlpack_mean_shift.html)

------
nerdponx
This sounds an awful lot like Expectation-Maximization

~~~
contravariant
I don't think so, at least I can't figure out what would correspond to the
expectation step.

It seems to be a gradient ascent on a smoothed density function, so there's
only a maximization step, no expectation step is involved.

------
tmaly
this algorithm looks interesting.

any chance you could move it to github?

~~~
DLion
This article is not mine I found it looking for articles about Mean Shift for
my thesis but you can find a cpp implementation in OpenCV libraries:
[https://github.com/opencv/opencv/blob/master/modules/video/s...](https://github.com/opencv/opencv/blob/master/modules/video/src/camshift.cpp)

------
sixhobbits
(2010)

