

Ask HN: How-to Implement Machine Learning? - physcab

Ok, here's the deal.  I'm taking a machine learning class using primarily Bishop's text. Awesome book. Awesome class.<p>How do you all suggest I implement the material though?  In class we use MATLAB, but I'm slightly hesitant because I want to create some cool web applications.<p>O'Reilly's book Collective Intelligence uses Python, which I don't know, but I would learn it if it would be worth it.<p>Would I also have to learn Django?<p>Any insight would be greatly appreciated.
======
imusicmash
I think the Programming Collective Intelligence is going to go down as one of
the most important and influential books of the decade.

I'm a product manager, but was a Java & Perl programmer previously. I didn't
know Python before I read this book. But the concepts are presented so
clearly, the examples use web APIs, and the spark of creativity each chapter
brings too great, that I could not resist the urge to fire up iPython on my
Mac and play with some of the algorithms.

The book's examples are all command-line driven. But I've also converted a few
of the examples into web projects to help me explore the possibilities more
efficiently and demo/prototype some things for my company. Python works
perfectly well as a CGI web application and you don't need the overhead of
Django for learning and taking examples from this book to the next level.

I've since read several data mining and machine learning books, and must say
that none come close to the breadth, programming detail, and hands-on ease
presented in this book.

The book has created a rich opportunity to learn and explore concepts such as
collaborative filtering, clustering, optimization, decision trees, and text
mining with Bayesian classification.

Python is easy to read and learn.

------
aschobel
Mahout looks like a really interesting project which has a bunch of Java
machine learning algorithm implementations build on top of Hadoop.

<http://lucene.apache.org/mahout/>

You can glean some good idea from there. Hadoop has a bunch of python
libraries if you want to go that route and write it yourself.

" In class we use MATLAB, but I'm slightly hesitant because I want to create
some cool web applications."

It shouldn't really mater, just write a Thrift Service and you can call MATLAB
from whatever your favorite language is.

~~~
physcab
I'll look into those suggestions. I always thought for some reason that MATLAB
was kind of a black box calculator, where you pump in some commands and
magically numbers appear. It's a good computational resource, especially for
engineers, but I never learned how to implement it for anything.

------
aneesh
Machine Learning is not a language-specific thing. Maybe Python makes it easy
out-of-the box (I haven't used Python -- I don't know), but you do it in
whatever language you want.

Many languages have modules you can just use. And of course you could just
write your native implementation of an algorithm in you language of choice.

Actually, nowadays some of the databases come with data mining algorithms
built-in -- I know Oracle and SQL Server have this functionality.

So basically, lots of options.

------
RiderOfGiraffes
Learn Python.

~~~
lux
Agreed. If you can program in anything else, you can pick up enough Python to
get started in a day (obviously longer to be really productive).

I've read the O'Reilly Collective Intelligence book, and Python was a nice fit
for expressing the ideas in there. That book would definitely get you started
with practical uses of ML quickly. I'm not sure any of it is that cutting edge
though in the academic sense.

Also just finished On Intelligence by Jeff Hawkins last week. Good read. I
plan to check out his company's ML ideas when I get some free time - they have
free software for download too:

<http://www.numenta.com/for-developers/software.php>

~~~
physcab
Great. Definitely bookmarked. Looks like Python would be a useful tool in the
toolbox!

------
anuraggoel
<http://pyml.sourceforge.net/>

------
rm999
Scipy - it can be used like matlab, but it's python.

At the very least, make sure your solution can easily and efficiently work
with linear algebra, a plotting tool, and some basic statistics functions.

------
critic
If you are looking for a quick hack, i.e. getting existing code to work on a
free platform, then try Octave. Its source-level compatibility with Matlab is
fairly good.

------
MaysonL
Python + sage

