Hacker News new | comments | show | ask | jobs | submit login

Just adding this, for learning purposes, libsvm works just fine, but for most real problems I've run into I've had to hand code my own SVM libs. All open source solutions I've run into use libsvm or some derivation there of which does not scale well. It single cpu usage on an algo that can easily thread out (or even better run on distributed clusters using Hadoop). It also blows through absurd amounts of memory when the number of support vectors and or the vector features are large.

The libsvm train function and some of the associated tools grid.py do a fairly mediocre job at refining the coef (as someone else mentioned) when doing nonlinear SVMs. It getts really tricky to select good coef then. Also using the eigen matrix to cut down the number of important vector features ends up being a pretty big deal if you end up black box using SVMs on massive input data sets.

If you work in a space where you data set isn't a 50/50 mix of classification data, finding coefficients to maximize accuracy and cut incorrect classification, becomes a mess.




Maybe you can contribute some of your ideas to libsvm?


Working on it :-) to be posted on github. I'm about done into a re-write in perl (I just like that lang, don't just judge me :-P) that threads it out to all cores and cuts the memory usage in half for data large sets.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: