

LIBSVM - A Library for Support Vector Machines - DanielRibeiro
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

======
ajj
I'm a PhD student working in machine learning, and I very highly recommend
this library. I've used it for all sorts of problems within and outside my
research, and it just works great. I've used it in C++, Python, as well as
Matlab.

Their papers are excellent too if anyone is interested in reading about large-
scale optimization problems for SVM.

------
MLnick
Also worth looking at for linear SVMs:

Sofia-ml which is a very fast linear svm and classification C++ package.
Supports PEGASOS as well as logistic regression and also learning rankings.
Has no bindings for other languages which is a bit of a downside. Still, a
useful command-line tool.

<http://code.google.com/p/sofia-ml/>

It also includes a package for very fast mini-batch K-Means
(<http://code.google.com/p/sofia-ml/wiki/SofiaKMeans>). Combining these two
approaches one can effectively learn a "kernelized" model while still being
linear and therefore very fast (at least this is the claim, I haven't tried
this).

I've used both the SVM and k-means package and they work very well. For sparse
datasets with >500 dimensions and > 10 million rows, file IO time was <15 sec,
training time <3 sec. K-means is slower but still orders of magnitude faster
than standard batch k-means.

Finally, Vowpal Wabbit is a very fast package that also uses stochastic
gradient descent as the workhorse. Also has a nice feature-hashing compression
scheme which is being widely adopted (e.g. in Mahout, and also in sofia-ml
above).

<https://github.com/JohnLangford/vowpal_wabbit/wiki>

------
suraj
SVM's are awesome for pattern matching. I first encountered them on a project
to identify pedestrians from IR images and was blown away with the simplicity
of underlaying math.

~~~
StavrosK
For anyone curious, it basically boils down to roughly "put your data in a
plane and plot a line through them that has the largest possible margin from
each cluster".

~~~
A1kmm
Except that the straight line is in feature space and not input space; the
computations are done using only a kernel function, which takes vectors in the
input space, and computes the dot product in the feature space.

This is a very important distinction because while the method is linear in the
feature space, it can solve non-linear problems in the input space.

~~~
StavrosK
It is, but this addendum provides diminishing returns in terms of usefulness
versus sentence length/parsability...

------
dododo
you might also want to have a look at shogun:

<http://www.shogun-toolbox.org/>

it provides a nice wrapper around libsvm, liblinear, and a whole bunch of
other classification libraries. plus it provides things like HDF5 support,
octave, matlab, python and R bindings, more esoteric kernels (e.g., on
strings) as well as one-class and multi-class SVMs.

------
ashish01
The cookbook style guide for beginners is especially helpful.

<http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf>

Scaling of data is step which many usually miss.

------
jwr
See also LIBLINEAR:

<http://www.csie.ntu.edu.tw/~cjlin/liblinear/>

(and liblinear-java), which might make more sense if you have lots of data.

All of those are BSD-licensed which means they are actually useful in real
life. Good stuff.

------
SammoJ
I have also used libsvm a lot and can heartily recommend it - but only for
non-linear kernels. If you wish to use a linear SVM (which if you aren't
familiar with machine learning you should probably try first) then for your
own sake try libocas:

<http://cmp.felk.cvut.cz/~xfrancv/ocas/html/>

It uses SVM light format and also has a mex wrapper (MATLAB). More importantly
I found that for linear SVMs it was around 100-1000 times faster than libsvm
(I shit ye not).

------
ma2rten
It might also be worthwhile to have a look at WEKA, it's a UI / java
implementation for all kinds of Machine Learning algorithms. Makes it really
easy to just test stuff, because most of the time there is not really a way to
tell which machine learning algorithm will work best.

------
mahmud
I can vouch for its Lisp binding.

------
joeroot
Does anyone know of a solid Ruby interface for this? When I tried using it for
a recent project I had a lot of problems getting the Gems to work on OS X.
Other than that I've head a lot of praise for it...

------
msutherl
For pre-packaged realtime use, I think IRCAM's FTM library for Max/MSP uses
libsvm:

<http://ftm.ircam.fr/>

------
T_S_
Useful, and built into R.

