Hacker News new | past | comments | ask | show | jobs | submit login
Milk: Machine Learning Toolkit for Python (luispedro.org)
74 points by adulau on March 15, 2011 | hide | past | favorite | 12 comments



I've gotten so used to Github style pages that I expect to scroll down and see some examples, documentation and links. I actually found myself frustrated when I scrolled down to find nothing but a copyright notice. Even this site's github only had a changelog. Not a knock against the site, I just don't see much in the way of selling me on the library.


Here you go http://packages.python.org/milk/ The link was right there at the bottom of the Github page. Some of the links from doc to the src code does not work though, but one can always browse the Github repository directly.

@luispedro I did not see your reply, hence the duplication. I see that Milk isnt doing too well on pca (11 times slower than the best performer). From the code it seems you are importing numpy.linalg. I think if you import scipy.linalg your results will be better without any change to the algorithm (unless you have built numpy with ATLAS or MKL).

Scipy.linalg links with underlying BLAS if its avaliable whereas the standard build of numpy implements linalg on its own (but it is possible to override that). Second point, I think for large data sets you are better of giving an api for computing an user specified number of principal components rather than all of them. Thirdly, if you find limiting yourself to gcc not too restrictive, you can stick with stl style algos in place of c++ looping. The advantage is that you will get parallelism for free http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode...


Yeah, I think we were writing at the same time.

Thanks for the tips. I know that the PCA implementation is very naïve, but I never put as much work into it as into some of the other parts of the library.


You could have complained to me. I put up some simple examples in the github:

https://github.com/luispedro/milk

The main documentation for the package is actually at:

http://packages.python.org/milk/


Sorry I wasn't really complaining about your example, just in general. I've become trained by github to expect a certain thing. Keep on making cool stuff!


A new version was just released: http://freshmeat.net/projects/milk/releases/329513

(I'm the author of this package).


Could you please comment on why a Python machine learning developer should use your code and not, say, Shogun or scikits.learn? In what use cases would your code be preferable, or dispreferred?

How does your k-means implementation compare to that of scipy's?

Why not push your code as modules into scikits.learn? Their library is designed to be many loosely coupled components.


Why would you use my code instead of others? None of the packages covers all of machine learning, so it depends on what you're looking for. I focus mainly on supervised learning and kmeans. I want my algorithms to be as scalable as possible too.

Other projects have other priorities/functionality.

I like my interfaces better too, but I might be biased by being so much more familiar with them.

"""How does your k-means implementation compare to that of scipy's?"""

I think that my code is faster: http://bit.ly/e8VOXy and it is probably more scalable.

Scikits.learn has more functionality in certain aspects. So, if you need those, use it. I started milk when there was no scikits.learn. The interfaces are different and they work together here.


That's http://packages.python.org/milk/benchmarks.html if anyone was wondering


How good would this be for computer vision applications?


What is the advantage of this library over MDP? (http://mdp-toolkit.sourceforge.net/index.html)


Or,

- Pybrain (http://pybrain.org/)

- Theano/Pylearn (http://deeplearning.net/software/pylearn/ / http://deeplearning.net/software/theano/)

- Orange (http://orange.biolab.si/)

- Pyevolve (http://pyevolve.sourceforge.net/)

There are many machine learning libraries for Python these days, all subtly different in which methods they support. I guess that's good :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: