
Waffles: command-line tools for machine learning and data mining - rkda
http://waffles.sourceforge.net/
======
megaframe
This is nice, so is libsvm, and a few other comercial products I've used
(JMP), but having some fairly non-trivial data sets for work, I've found
myself implementing most of these by hand to get out the results I want. I'm
not in an internet data mining field, most of my data comes from
Semiconductors, so do these actually work well (as in 85%+ accuracy) on those
type data sets?

~~~
bravura
This is highly problem dependent. libsvm will give some of the best results
off-the-shelf, assume that your input features are sane. If not, you might
need to do something more sophisticated. You can email me (see profile) if you
want more information.

------
zeratul
At the first glance: This could be complimentary tool to WEKA. It has some
features that WEKA does not have. To name one: non-linear feature selection
(e.g., Manifold Sculpting). It has extensive visualization libraries (2D and
3D). It can store sparse representation of data which is a huge memory saver
for text mining (NLP). The biggest complaint would be that I don't see how
this tool could do feature selection INSIDE cross-validation loop. It seems
that authors are unaware that feature selection on the whole data set is prone
to overfitting.

Note to self: It has mean margin trees but no SVM? Thread safe? Portable to R?
C++ codebase. Why SHA?

------
nwmcsween
The 'do one thing and do it well' doesn't necessarily mean command line
programs. I would much prefer a binary -> library and bindings to various
other languages through that library, the command line is a very sloppy
integration tool compared to programming languages. Shogun-toolbox currently
fits what I want except for the licensing.

~~~
obtu
From the link:

> Waffles apps are thin wrappers around functionality in a well-documented C++
> class library.

~~~
nwmcsween
Thank you, didn't see that.

------
tectonic
This looks really useful. If anyone has used it, can you contrast it with
Weka?

~~~
marshallp
This is much nicer than weka in my opinion. Less bloat (not java, and doesn't
have the full range of algoritms), more emphasis on difficult problems (non-
linear) and practicality (random forests and neural nets and automating
everything out). The main creator is working on practical computer vision and
not machine learning research directly.

Weka is more suited for teaching than practical work.

~~~
tensor
There is a lot that bothers me about this post. First, that a library contains
many different algorithms in a single framework is not a negative, nor
_bloat_. Second, there is nothing about random forest or neural nets that are
any more practical than maximum entropy or any other learning algorithm.

The main researchers are both doing university research work, not building
products. I'm not sure where the practical/teaching arguments come in, Weka is
used in many different scenarios, including commercial systems.

A brief look at the documentation suggests that this package is not nearly as
extensive as Weka. Startups might care about the licence being LGPL instead of
GPL. I can't comment on convenience and performance without using it a bit,
but I've found that other command line driven packages are very easy to use
for exploratory type research work.

------
keenerd
AUR package: <https://aur.archlinux.org/packages.php?ID=55333>

------
mylons
any potential bioinformatics use from this?

