
Machine learning toolkit - DanielRibeiro
http://mallet.cs.umass.edu/
======
gtani
Others:

<http://alias-i.com/lingpipe/index.html>

<http://gate.ac.uk/>

<http://rapid-i.com/content/view/181/190/>

<http://elefant.developer.nicta.com.au/>

(tanagra, weka, orange, (depending on what what you're looking for) )

------
apatry
I am currently using this toolkit and I must say that I really like it.

The main advantages of mallet over weka (the main java toolkit used in
academic machine learning) for Natural Language Processing are:

\- No need to map words and features to position in a feature vector yourself.

\- Instances preprocessing can be defined in pipes that can be saved along the
models. So no need to remember the pre-processing of data for each
experiments.

\- Contains algorithms for structured learning (CRF, HMM and general graphic
models).

On the other hand, Mallet implements less algorithm (e.g. no Support Vector
Machines to my knowledge).

In short, it is a nice toolkit to be aware of if you are planning to do
Natural Language Processing.

~~~
tensor
For anyone wanting to use it in a commercial setting, it's worth noting that
weka is GPL and mallet is CPL.

<http://en.wikipedia.org/wiki/Common_Public_License>

------
Rickasaurus
I worked with this a bit at UMass, it's not bad at all. Also from UMass, be
sure to check out Factorie the probabilistic factor graph framework in Scala.

<http://code.google.com/p/factorie/>

------
shortlived
Can anyone recommend a good intro to machine learning and NLP?

~~~
JamieEi
Pattern Recognition and Machine Learning, Bishop
[http://www.amazon.com/Pattern-Recognition-Learning-
Informati...](http://www.amazon.com/Pattern-Recognition-Learning-Information-
Statistics/dp/0387310738/ref=sr_1_4?ie=UTF8&qid=1296749784&sr=8-4)

~~~
SeanDav
Seems like a good book - bit surprised that it doesn't seem to cover Genetic
Algorithms.

~~~
levesque
Genetic Algorithms are rarely covered in (recent) Machine Learning books. They
are more often considered as optimization algorithms.

------
mahmud
Machine learning toolkits are the new "web framework".

~~~
jimbokun
In what sense?

~~~
mahmud
their proliferation as half-assed projects that all accomplish the same thing,
but differ in their all-too-similar implementation languages.

I don't mean this _particular_ project, but generally, you can't be in the
field without seeing ten frameworks a day.

~~~
tastybites
With the primary difference being web frameworks are much better documented
with real world examples because they are written by people in industry trying
to make their real jobs easier, not university grad programs. The examples
released by users never make it onto the search engines, if ever released (see
below).

Most ML frameworks out there are stuck in academic-land and assume the users
are experts - when the exact opposite is usually true - they actually attempt
to use the _most opaque language possible_ when describing usage.

ML is still a consulting gold mine because it's so difficult to wade past the
jargon and bullshit to actually do something useful/profitable with these
frameworks.

