
ThunderSVM: A Fast SVM Library on GPUs and CPUs - lainon
https://github.com/zeyiwen/thundersvm
======
AbacusAvenger
Potentially stupid question: What is an SVM? The linked page uses the acronym
almost 30 times but doesn't define it.

~~~
abhgh
Support Vector Machine. A Machine Learning technique that's used typically for
classification and regression, but also has been adapted to novelty detection,
stuctured prediction, ranking etc. I wrote up a beginner's tutorial some time
back, if you are interested [1].

[1] [https://blog.statsbot.co/support-vector-machines-
tutorial-c1...](https://blog.statsbot.co/support-vector-machines-
tutorial-c1618e635e93)

~~~
roumenguha
Great read, thanks!

------
bobosha
>"Outside of neural networks, GPUs don’t play a large role in machine learning
today, and much larger gains in speed can often be achieved by a careful
choice of algorithms." [1]

Scikit learn espouses a non-GPU approach. Perhaps the performance gains by
using GPUs aren't that significant. Has anyone tried SVMs (or for that matter
other non-DL classifiers) + GPUs?

[1] [http://scikit-learn.org/stable/faq.html](http://scikit-
learn.org/stable/faq.html)

~~~
lz400
Anyone has more insight into why? if I try to grid search hyperparameters for
a random forest takes ages in a single machine with scikit-learn. I only ever
see GPUs in neural networks. Is there some acceleration for non-NN machine
learning algos?

~~~
T-A
Generally speaking, what GPUs excel at is applying exactly the same operation
to many parallel data streams. Neural networks are like that.

Random forests are the opposite: branching on conditional tests is very
expensive. It's been several years since I last wrote raw CUDA and OpenCL, but
if memory serves, back then the docs essentially said that every if...else
amounted to running both branches on all the data and then deciding what to
keep, effectively halving performance. So a decision tree just a few levels
deep would slow you down by an order of magnitude.

------
jamilbk
Prerequisites

    
    
         - Supported Operating Systems: Linux, Windows and MacOS
         - CUDA 7.5 or above | cmake 2.8 or above | gcc 4.8 or above
    

s/GPU/nVIDIA/

Is it just me or does anyone else get annoyed when a library claiming to run
on GPUs uses proprietary CUDA, making it nvidia-only?

It would seem strange to claim that a library "runs on CPUs" but only supports
Intel CPUs.

~~~
IanCal
It does run on GPUs, just only some of them.

> It would seem strange to claim that a library "runs on CPUs" but only
> supports Intel CPUs.

Most things would "run on CPUs" but often not run on ARM CPUs.

The key distinction is that it runs on some GPUs rather than (just) some CPUs,
so it seems reasonable that they say that.

~~~
jamilbk
Most things that "run on CPUs" run on CPUs by many vendors. In the cases where
software does not run on ARM processors it's almost always due to an
architectural difference with x86, not an artificial proprietary limitation.

If an open-source library chose to rely on a proprietary C compiler with
special language features (e.g. the Borland C compiler) that targeted only
Intel x86 CPUs, I would argue we would not generally claim this software "runs
on CPUs". The software "runs on Intel x86 CPUs" seems more appropriate.

Similarly, especially in the machine learning field, it seems to be more and
more common that "GPU" really means "CUDA-required GPU". In a world where AMD
and Intel GPUs are also common (perhaps more common?), to me it makes sense to
be up-front about this.

I'm not criticizing the authors' choice to use CUDA here. I'm just saying it
would be nice if we stop pretending CUDA is synonymous with GPU programming,
especially in cases where OpenCL would be a very appropriate choice (e.g.
open-source software).

~~~
IanCal
The important point is it runs on any GPUs. If you think then saying it runs
on GPUs is wrong the counter sounds more unusual, that it doesn't run on GPUs.

They could have been more precise but there's nothing wrong or particularly
misleading about what they said, and what they said conveyed the most
important aspect.

> would argue we would not generally claim this software "runs on CPUs"

"This software does not run on CPUs" sounds wrong for this scenario.

