

Machine Learning frameworks, libraries and software - misiti3780
https://github.com/josephmisiti/awesome-machine-learning

======
alsocasey
The R portion needs some serious filling - for starters: you include the Julia
wrapper to glmnet, which was originally implemented in R.

glmnet - lasso/ridge/elastic net glm models.

e1071 - SVM classifiers.

randomForest - random forest classifiers.

mixOmics - a good collection of component-based approaches (PCA, ICA, PLS,
etc. includes sparse variants of all of the above is feature selection is
required).

caret - similar to Java's Weka.

~~~
mbq
Or simply a link to the ML CRAN Task View:
[http://cran.r-project.org/web/views/MachineLearning.html](http://cran.r-project.org/web/views/MachineLearning.html)

------
tomaskazemekas
I suggest adding Torch. [http://torch.ch/](http://torch.ch/) It is a
scientific ML framework written in LuaJIT. Recently it was recommended by Yann
LeCun, Director of AI in Facebook.
[http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_...](http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/chisdw1)

------
tbueno
Why do people keep complaining about the content or giving suggestions here?
I'm pretty sure the original list was created on Github exactly to encourage
contributions (push requests).

------
captainmuon
Some more for C++:

\- TMVA (Toolkit for Multivariate Analysis): Widely used in physics, esp.
particle physics. Has every classifier you can think of and the kitchen sink,
neural nets, BDTs, support vector machines, fisher discriminants, etc.. You
can use it for parameter estimation, classification, discrimination and other
use cases. Is closely integrated with the ROOT framework, which has a few
quirks and gives it a bit of a learning curve, but once you get into it it's
very easy to make a multivariate analysis. Also has bindings for Python. -
[http://tmva.sourceforge.net/](http://tmva.sourceforge.net/)

\- NeuroBayes: Heard some good things about it, but havent tested it. Used in
finance and particle physics. Commercial, but they have special licenses for
research. I heard integration with TMVA is planned. -
[http://neurobayes.phi-t.de/index.php/public-
information](http://neurobayes.phi-t.de/index.php/public-information)

~~~
sroecker
There is a NeuroBayes plugin for TMVA. I've put it on Github with some
additional patches: [https://github.com/sroecker/tmva-
neurobayes](https://github.com/sroecker/tmva-neurobayes)

------
jwr
A nice list! But this would be significantly more useful if it included
project licensing information.

In my case, any library licensed under the GPL is automatically excluded from
consideration, so this is a significant factor. I'd rather not spend any time
on those.

~~~
Radim
How do you feel about LGPL?

~~~
jwr
LGPL is a borderline case. On one hand, it doesn't force itself onto all of
your software. On the other hand, it contains the same patent claim landmine
that the GPL does and that landmine is considered to be dangerous by many
lawyers (GPLv2 Section 7, LGPLv2 Section 11).

So, in practical terms, it depends on who my current client/employer/investor
is. Myself, I'd rather not use any LGPLd libraries.

~~~
jwr
I just wanted to point out that downvoting what you disagree with is… well,
let's just say it's not the right way to use HN.

My comment was precise, informative, in reply to a question that was asked of
me, based on a number of legal opinions and more years of experience than many
people here write into the "age" field on forms.

------
pickle27
[http://www.shogun-toolbox.org/](http://www.shogun-toolbox.org/) has a lot of
algorithms, is super fast and has bindings for many languages, check it out!

~~~
misiti3780
added!

------
CSDude
Weka should be there, it has so many useful tools included and I took a
graduate machine learning course and used weka API for its project and it
saved me lot of time. Highly recommended.

~~~
misiti3780
added

------
penetrarthur
A well maintained library for c# lovers [http://accord-
framework.net/](http://accord-framework.net/)

------
S4M
There is Incanter for Clojure that is missing.

[http://incanter.org/](http://incanter.org/)

------
silentrob
Javascript is missing a few NLP tools POS - [https://github.com/dariusk/pos-
js](https://github.com/dariusk/pos-js) and Node Natural -
[https://github.com/NaturalNode/natural](https://github.com/NaturalNode/natural)

~~~
misiti3780
added

------
micro_cam
I have a pretty good decision/tree random forest library for go:

[https://github.com/ryanbressler/CloudForest](https://github.com/ryanbressler/CloudForest)

------
orasis
For Javascript Bayesian Bandits/Thompson Sampling -
[https://github.com/omphalos/bayesian-
bandit.js](https://github.com/omphalos/bayesian-bandit.js)

~~~
misiti3780
added

------
kovrik
Please, add Clojure and it's libs (see [http://www.clojure-
toolbox.com](http://www.clojure-toolbox.com)).

And sort languages alphabetically, please.

~~~
misiti3780
fixed

~~~
kovrik
cloJure, not cloSure.

Also, I thought you would add all Machine Learning libs, not just link to the
Clojure Toolbox.

~~~
misiti3780
fixed

------
RussianCow
I'm surprised at the lack of tools for Java. Do Java developers really not do
machine learning? Or is this resource missing some libraries?

~~~
misiti3780
it's definitely missing some libraries. with that said, if you are doing any
type of distributed machine learning, you almost have to use java via mahout.

------
elliptic
In what sense is this list "curated?" I'm seriously asking. Have you used
all/most of these, and can recommend them?

------
davidy123
Another huge ecosystem of Java tools is
[http://gate.ac.uk/](http://gate.ac.uk/)

------
Rickasaurus
I wish they'd list the license next to each of the projects so I can avoid
those that are work unfriendly.

------
platz
How about [http://h2o.ai/](http://h2o.ai/) ?

~~~
misiti3780
where does it go ?

~~~
platz
It goes to "The Open Source In-Memory Prediction Engine for Big Data Science"

------
bch
typo: Julia -> General Purpose ... Kernal Density - Kernel density estimators
for julia

