

Milk: Machine Learning Toolkit for Python - adulau
http://luispedro.org/software/milk

======
luispedro
A new version was just released:
<http://freshmeat.net/projects/milk/releases/329513>

(I'm the author of this package).

~~~
bravura
Could you please comment on why a Python machine learning developer should use
your code and not, say, Shogun or scikits.learn? In what use cases would your
code be preferable, or dispreferred?

How does your k-means implementation compare to that of scipy's?

Why not push your code as modules into scikits.learn? Their library is
designed to be many loosely coupled components.

~~~
luispedro
Why would you use my code instead of others? None of the packages covers all
of machine learning, so it depends on what you're looking for. I focus mainly
on supervised learning and kmeans. I want my algorithms to be as scalable as
possible too.

Other projects have other priorities/functionality.

I like my interfaces better too, but I might be biased by being so much more
familiar with them.

"""How does your k-means implementation compare to that of scipy's?"""

I think that my code is faster: <http://bit.ly/e8VOXy> and it is probably more
scalable.

Scikits.learn has more functionality in certain aspects. So, if you need
those, use it. I started milk when there was no scikits.learn. The interfaces
are different and they work together here.

~~~
sp332
That's <http://packages.python.org/milk/benchmarks.html> if anyone was
wondering

------
peregrine
I've gotten so used to Github style pages that I expect to scroll down and see
some examples, documentation and links. I actually found myself frustrated
when I scrolled down to find nothing but a copyright notice. Even this site's
github only had a changelog. Not a knock against the site, I just don't see
much in the way of selling me on the library.

~~~
srean
Here you go <http://packages.python.org/milk/> The link was right there at the
bottom of the Github page. Some of the links from doc to the src code does not
work though, but one can always browse the Github repository directly.

@luispedro I did not see your reply, hence the duplication. I see that Milk
isnt doing too well on pca (11 times slower than the best performer). From the
code it seems you are importing numpy.linalg. I think if you import
scipy.linalg your results will be better without any change to the algorithm
(unless you have built numpy with ATLAS or MKL).

Scipy.linalg links with underlying BLAS if its avaliable whereas the standard
build of numpy implements linalg on its own (but it is possible to override
that). Second point, I think for large data sets you are better of giving an
api for computing an user specified number of principal components rather than
all of them. Thirdly, if you find limiting yourself to gcc not too
restrictive, you can stick with stl style algos in place of c++ looping. The
advantage is that you will get parallelism for free
[http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode...](http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html)

~~~
luispedro
Yeah, I think we were writing at the same time.

Thanks for the tips. I know that the PCA implementation is very naïve, but I
never put as much work into it as into some of the other parts of the library.

------
zapdos
How good would this be for computer vision applications?

------
noverloop
What is the advantage of this library over MDP? (<http://mdp-
toolkit.sourceforge.net/index.html>)

~~~
wladimir
Or,

\- Pybrain (<http://pybrain.org/>)

\- Theano/Pylearn (<http://deeplearning.net/software/pylearn/> /
<http://deeplearning.net/software/theano/>)

\- Orange (<http://orange.biolab.si/>)

\- Pyevolve (<http://pyevolve.sourceforge.net/>)

There are many machine learning libraries for Python these days, all subtly
different in which methods they support. I guess that's good :)

