
A Quick Look at Support Vector Machines - irpapakons
https://generalabstractnonsense.com/2017/03/A-quick-look-at-Support-Vector-Machines/
======
syntaxing
One thing that I discovered recently which surprised me (while taking the
Udacity SDC)is how effective and resilient these "older" ML algorithms can be.
Neural networks was always my go to method for most of my classification or
regression problems for my small side projects. But now I learned with the
minimal dataset I have (<5K samples), linear regression, SVM, or decision
tress is the way to go. I got higher accuracy and it's about 10X faster in
terms of computational time!

~~~
akyu
Yes SVMs are still great models. The advantage neural nets have over them are
that they can do automatic feature extraction. By the time you get to the last
layer of a neural net, you are basically just doing a simple logistic
classification, but the features coming in have been learned from all of the
previous layers.

I've even seen people use pretrained ImageNet classifiers, chop off the last
layer and use an SVM as the actual classifier, and it works very well for some
problems.

~~~
platz
> automatic feature extraction

Hope you have a *ton of data, otherwise it's not gonna happen

~~~
trendia
And a lot of tweaking of configuration parameters until it's "automatic".

~~~
akyu
This was true maybe 10 years ago. Not so much anymore.

------
nafizh
Aaah, I was hoping for an explanation of the kernel trick. I think that is the
hardest concept in support vector machines.

~~~
jwr
I think I can help with that.

The article nicely explains the data transformation so that it becomes
linearly separable. But the trick to the kernel trick is no to transform the
data at all.

What you do is use a learning algorithm that doesn't need individual input
vectors, but instead only needs their dot products. You then imagine a magical
high-dimensional space where your data is (you suppose) linearly separable.
The trick is that you never actually transform your data to that magical space
— you don't need the input vectors, remember? You only need their dot
products. So you define a function that given two vectors in your normal input
space returns a scalar. Assuming your function behaves in a sane way (go read
about the required properties if you need to), you can think of this function
as a dot product. In some kind of magical space — you don't actually care
much. You will never transform your data, it might not even be possible to:
the most common gaussian kernel is defined over an infinite-dimensional space.
But hey, who cares? You take your SVM, give it your kernel function and input
data, and off it goes, working as usual, except your dot products are no
longer computed in your input space, but in your magical infinite-dimensional
space.

It's both really clever and really simple.

------
shas3
Very cool! However, I think the author should have spent a a few more words
and figures to distinguish support vector machines from standard perceptrons.
Maximum margin classification and the definition of 'support vectors,' in my
experience, helps demystify the algorithm.

------
lallysingh
This is great! Any follow-ups describing kernels?

------
rs86
Amazingly well written. Short and to the point, humbly sharing something cool!

------
curiousgal
Many aspects of Machine Learning boil down to optimization problems.

~~~
rs86
Well all of them do. In ML we always try to select the best description for a
dataset, and that involves minimizing some function that represents some kind
of goodness of fit

------
LeanderK
well, that really was a quick look. Any reading-recommendations about the
kernel-functions? How do they work and why are they fast?

------
rmchugh
Best name for a blog ever?

