
Clear animation on how the SVM "kernel trick" works - jashmenn
http://www.youtube.com/watch?v=3liCbRZPrZA
======
lliiffee
Technically what is visualized here isn't "the kernel trick". This is the
general idea of how nonlinearly projecting some points into a higher-
dimensional feature space makes linear classifiers more powerful. You can do
this with out SVMs. Just compute the high-dimensional features corresponding
to your data, then use logistic regression or whatever. Trouble is, if the
higher-dimensional space is really big, this could be expensive. The "kernel
trick" is _computational_ trick that SVMs use to compute the inner product
between the high-dimensional features corresponding to two points with out
explicitly computing the high-dimensional features. (For certain special
feature spaces.)

But this is definitely a cool visualization of the value of feature spaces!

~~~
moultano
Can you explain more about how the kernel trick specifically works?

~~~
lliiffee
Basically, the idea of feature spaces is to blow up the data into high
dimensions. So, we use

x' = f(x)

as our data, instead of x. It turns out that in lots of machine learning
algorithms (notably SVMs), you end up only needing _inner products_ between
different data elements. That is, we need to compute x^T y for two data
elements x and y. In feature space, we need could compute this by doing f(x)^T
f(y). However, it turns out that for certain feature spaces (like polynomials)
one can compute the number f(x)^T f(y) quite quickly with out ever explicitly
forming the big vectors x' or y'.

~~~
moultano
You stopped _just_ short of the explanation I was hoping for. :)

~~~
lliiffee
Try section 7 of these notes:

<http://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf>

------
bprater
Cool video. I have no idea what it's describing though. Can anyone generalize
it?

~~~
smanek
It's been a few years, but I'll take a shot.

1\. SVMs are useful for categorizing items.

2\. SVMs can only categorize on 'linearly separable' sets. That is, if you
were to plot the items from the two sets on a plane, it's possible to
distinguish the two sets by drawing a line on the graph and everything above
the line is in one set, while everything below is in the other.

3\. The kernel trick is adding dimensions to a problem, so even though the
problem isn't linearly separable in two dimensions, it might be in 3 by
setting the right values in your extra dimension.

------
tlipcon
Neat visualization, but I wouldn't say this is the kernel trick. The kernel
trick is a trick because it maps the higher dimension dot products into a
kernel function in the lower dimension. The neat thing is specifically that a
kernel function on low dimension points can actually correspond to a dot
product in an _infinite_ -dimension space. In fact this is the case for RBF
kernels like Gaussian, which is probably the most popular SVM kernel choice.

------
jules
Can SVM fit a circle with variable origin and variable radius to a set?

~~~
ajj
It can in principle, and this is exactly where the kernel trick is useful.

The standard SVM formulation can only give _linear_ classifiers. But, if you
project your data into feature space (a higher, possibly infinite dimensional
space), a linear separator in that space can be a circle in your original
space. Since you can not do explicit computations in an infinite dimensional
space, the kernel trick lets you get away without doing them at all. You can
thus get an inner product value of two infinite dimensional vectors using a
kernel function. So classifiers that only require inner product values and
never the explicit vectors can exploit the kernel trick. i.e., SVM, logistic
regression, etc.

That being said, choosing the appropriate kernel function is not always
straightforward for your data.

~~~
jules
Can you give a kernel function that would let you fit a circle with variable
position and radius? The examples always use a fixed position.

~~~
ajj
The Radial Basis Function kernel K(x,y) = exp(-(x-y)^2/gamma) is a very
general kernel function that can get you this.

Finally, whether you actually get such a classifier depends on your data, and
as I had mentioned setting the parameters (like gamma) is not very
straightforward. Typically you may have to try with various values and see
what works for your data.

Train your SVM (or any kernel method) and see what classifier it gives (run
the classifier on data points near the boundary you want to detect to see
where the exact boundary lies). This would only be to see how the
classification function looks like.

