
Introduction to Support Vector Machines in Machine Learning - rangerranvir
https://ranvir.xyz/blog/svm-support-vector-machines-in-machine-learning/
======
rusty-rust
Large parts of this blog are straight copy-paste from “An introduction to
statistical learning” by Gareth James et. al.

------
astrophysician
If you're new to ML or datascience, I would recommend working to build a
strong basis in Bayesian statistics. It will help you understand how all of
the "canonical" ML methods relate to one another, and will give you a basis
for building off of them.

In particular, aspire to learn probabilistic graphical models + the libraries
to train them (like pyro, tensorflow probability, Edward, Stan). They have a
steep learning curve, especially if you're new to the game, but the reward is
great.

All of these methods have their place. SVM's have their place, but also aren't
great for probability calibration and non-linear SVM's like every single
kernel method can scale absolutely terribly. Neural networks have their place,
sometimes as a component of a larger statistical model, sometimes as a feature
selector, sometimes in and of themselves. They're also very often the wrong
choice for a problem.

Don't fall into the beginner trap: sometimes people tend to mistake 'what is
the hottest research topic' for 'what is the right solution to my problem
given my constraints, (data limitations, time limitations, skill limitations,
etc.)'. Be realistic, don't use magical thinking, and have a strong basis in
statistics to weed out the beautiful non-bullshit from the bullshit that is
frustratingly prevalent (everyone and their mother is an ML expert today).

EDIT: I want to also clarify: I don't mean to suggest the author is new to ML,
I just mean this as general advice for anyone coming here who is new to DS/ML.
The article looks great!

~~~
tesseract2
Thanks for this insight. Can you kindly also suggest a good book for someone
to start with Bayesian Statistics? I could really use a suggestion about first
and second book on this.

About Probabilistic Graphical Models, is there book other than Daphne Koller's
book that you would suggest?

~~~
uoaei
Introduction to Statistical Learning

[https://faculty.marshall.usc.edu/gareth-
james/ISL/](https://faculty.marshall.usc.edu/gareth-james/ISL/)

Elements of Statistical Learning

[https://web.stanford.edu/~hastie/ElemStatLearn/](https://web.stanford.edu/~hastie/ElemStatLearn/)

Machine Learning: A Probabilistic Perspective

[https://mitpress.mit.edu/books/machine-
learning-1](https://mitpress.mit.edu/books/machine-learning-1)

~~~
kmundnic
"Machine Learning: a Probabilistic Perspective" is more an encyclopedia of
algorithms I would say, and it has lots of typos. I personally would not
recommend it (except for the amount of algorithms that it covers, many of
which are usually not found in other books).

~~~
rangerranvir
Thanks for early warning. Will have to keep that in mind.

------
smbrian
Stay away, in my opinion. I spent a year supporting a SVM in a production
machine learning application, and it made me wish the ML research community
hadn't been so in love with them for so long.

They're the perfect blend of theoretically elegant and practically
impractical. Training scales as O(n^3), serialized models are heavyweight,
prediction is slow. They're like Gaussian Processes, except warped and without
any principled way of choosing the kernel function. Applying them to
structured data (mix of categorical & continuous features, missing values) is
difficult. The hyperparameters are non-intuitive and tuning them is a black
art.

GBMs/Random Forests are a better default choice, and far more performant. Even
simpler than that, linear models & generalized linear models are my go-to most
of the time. And if you genuinely need the extra predictiveness, deep learning
seems like better bang for your buck right now. Fast.ai is a good resource if
that's interesting to you.

~~~
Der_Einzige
Kernel function is simple - Are you in a high dimensional space? If so, choose
linear kernel. Else? Choose the most non-linear one you can (usually a
guassian or RBF). I suppose quadratic and the other kernals are useful if what
your modeling looks like that but in practice that is rare.

Prediction is not that slow with linear SVMs especially not compared to
something like K-NN. The main hyperparamaters which matter are the "C" value
and maybe class weights if you have recall or precision requirements. The C
value is something that should be grid-searched, but you might as well be
grid-searching everything that matters on every ML algorithm and in this
regard SVMs are fast to iterate over (because the C value is all that
matters).

Applying categorical and continuous features is not difficult if you choose to
do it in anything more sophisticated than sklearn. Also, pd.get_dummies()
exists (though it may lead to that slow prediction you're concerned about)

You're most likely right with GBM or Random Forests - though they can have all
sorts of issues with parallelism if you're not on the right kind of system.
You talk about linear models but SVMs are usually using linear kernals anyway
and are a generalization of linear models (including lasso and ridge
regression models).

~~~
smbrian
Agreed -- linear SVMs, especially in text processing applications, is the one
area where they are a natural fit. All their attributes complement the domain.
Linear SVMs also have desirable performance characteristics.

But at that point, they also have a lot in common with linear models. Those
also seem practical in that domain (though I have less experience here, tbh).
And performant, when using SGD + feature hashing like e.g. vowpal wabbit.

My beef with non-linear kernels and structured data is a longer discussion,
but I find kernel methods for structured data (which is usually high-dimension
but low-rank -- lots of shared structure between features, shared structure
between missingness of features) to be highly problematic.

------
bitforger
ITT: Whether SVMs are still relevant in the deep learning era. Some junior
researchers will say neural networks are all you need. Industry folks will
talk about how they still use decision trees.

Personally, I'm quite bullish on the resurgence of SVMs as SOTA. What did it
for me was Mikhail Belkin's talk at IAS.[1]

[1]
[https://m.youtube.com/watch?index=15&list=PLdDZb3TwJPZ5dqqg_...](https://m.youtube.com/watch?index=15&list=PLdDZb3TwJPZ5dqqg_S-
rgJqSFeH4DQqFQ&v=5-Kqb80h9rk)

~~~
stu2b50
I mean NNs are still quite bad at low n tabular data (and they may always be),
which is honestly how a lot of real life data is, so there is clearly a need
for not a neural network.

I feel like I've seem more tree ensembles in the wild than SVMs, though.

~~~
rangerranvir
Anyway the idea of NNs was introduced to work on data which a simple human
brain couldn't make sense of.

For more general tabular data, like trees, regression and even rule based
models are more realistic.

------
starchild_3001
I've been an ML practioner since 2009. I've used every method imaginable or
popular, I think. With the exception of non-linear SVMs. Linear SVM => All
good, just the hingle loss optimization. Non-linear SVM, a bit of overkill
with basis expansion. Just too slow, or too complex a model?

My impression: SVMs are more of theoretical interest than practical interest.
Yeah, learn your statistics. Loss functions. Additive models. Neural nets.
Linear models. Decision trees, kNNs etc. SVM is more of a special interest,
imho.

~~~
rangerranvir
We can definitely learn a piece from such an experienced practitioner. Thanks
for sharing, I think your intuition matches with the other experienced once in
the comments.

------
zetazzed
Interestingly, a top Kaggler (Ahmet) just posted a nice contest solution with
SVMs for the TReNDS Neuroimaging contest:
[https://www.kaggle.com/aerdem4/rapids-svm-on-trends-
neuroima...](https://www.kaggle.com/aerdem4/rapids-svm-on-trends-neuroimaging)

~~~
amrrs
Interestingly they're also popularizing RAPIDS of Nvidia as part of it. Thanks
for sharing. Never imagined SVM in such scenario!

------
rangerranvir
Since, I am here. I would like to take a small feedback about the general
structure of the website and how it feels.

If someone has a suggestion on how I can improve the user experience feel free
to hop in and let me know.

