

Programming Languages for Machine Learning - ahalan
http://hunch.net/?p=230

======
jey
This is a five year old post. These days Python is probably the way to go with
NumPy, SciPy, Theano, IPython, matplotlib, Cython, etc. I'm also really
looking forward to PyPy's NumPy support -- should allow you to write in pure
Python code but still get most of the optimization benefits of writing in a
statically typed compiled language like C++.

~~~
glimcat
Python and Haskell for development. Weka and R for saving loads of time. Some
competency in Java and C helps.

~~~
danieldk
Did you ever do number crunching in Haskell [1]? Experiments by the developers
of the repa matrix library show that repa (which exploits data parallelism)
does not even outperform a single threaded C++ implementation when 8 CPU cores
are used:

[http://www.cse.unsw.edu.au/~benl/papers/stencil/stencil-
hask...](http://www.cse.unsw.edu.au/~benl/papers/stencil/stencil-
haskell2011-sub.pdf)

The most glaring problem is that GHC currently does not support SIMD vector
instructions. A problem which is now, fortunately, being addressed:

<http://hackage.haskell.org/trac/ghc/wiki/SIMD>

But currently, Haskell is not really an option if you want to write serious ML
software that rely on vector or matrix operations.

[1] I did, for maximum entropy modeling. Believe me, it's not near competitive
to C or C++ yet. But one day, it will probably be.

~~~
glimcat
"C is faster" is a tired argument. The ability of this compiler or that to
produce the most efficient possible code is almost never relevant outside of a
few niche cases. Even UHF finance is such that the fractional return in
computation time that you may get from using C isn't worth the development
overhead. There are any number of great reasons to use C, but speed is rather
outdated.

In this case, you're also guilty of putting optimization before the algorithm.
Squeezing out the last clock cycle doesn't matter when you have raw data in
one hand and some high-level objectives in the other. Effective machine
learning is more about finding ways to obtain and reduce data into a useful
form, and finding ways to map between your design goals and workable
algorithms. Speed is the last step, and often the only one which can be
improved by throwing money at it.

Besides which, if you want to have a pissing contest about speed, C is still
going to lose to FPGAs. If it's that important, hire someone who knows
Verilog.

~~~
danieldk
_"C is faster" is a tired argument. The ability of this compiler or that to
produce the most efficient possible code is almost never relevant outside of a
few niche cases._

What niches? ML is exactly one of the niches where highly optimized code helps
tremendously. We use ML, amongst other things, for parse disambiguation and
fluency ranking. Making a modification in, say the grammar, often results in
training and retraining various components that have a mutual influence (e.g.
parse disambiguation, auxiliary distributions for subcategorization frames,
the part of speech tagger). Evaluation is also performed using ten-fold cross
validation.

Since a modification can also result in regressions, the effect of
modifications are usually checked individually. Being able to validate changes
in one hour rather than one day, helps the development of such a system
tremendously.

We developed some software first in other languages (Java, Scala, Python),
usually to make a prototype for getting grip on a problem. But reimplementing
it in C or C++ paid of tremendously on each occasion. Even though there is
enough CPU time available (we have a 3280 core cluster).

Another thing to note is that most machine learning techniques are fairly
generic. So, usually there is already a perfectly fine C/C++ implementation
available. Sometimes with bindings and all.

 _In this case, you're also guilty of putting optimization before the
algorithm._

How do you know? Most data is actually preprocessed using Prolog or Perl. But
the actual machine software is written in C or C++.

 _Speed is the last step, and often the only one which can be improved by
throwing money at it._

Again, there are niches where it is worth it. And machine learning is often
that niche, because people use huge data sets, complex problems, or both.

 _Besides which, if you want to have a pissing contest about speed, C is still
going to lose to FPGAs._

Yes, but FPGAs are not readily available to our users :). But we are very much
interested in GPU computing.

\---

Apart from all of this, it's funny that you pointing to the price of
implementing in ML software in C. It's not as if there are large amounts of
Haskell programmers available. Writing such software in Haskell has about the
same economic risk as using COBOL ;).

------
cbo
This article doesn't really describe what the title implies. It states issues
and concerns for several languages, names several that have large ML libraries
written for them already, and even gives his own approach for how to implement
new ML algorithms, but it doesn't actually give real options for "Programming
Languages for Machine Learning".

I've found that Matlab/Octave is a decent substitute for a "high-level
language" to sketch out new approaches with. They're significantly fast, as
well as significantly suited to matrix algebra that they can give decent
results, even though they have some less-than-beautiful code. Matlab appears
to be the language of choice for AI at the University of Toronto.

Personally, I think the best option would be to roll with a functional
language (or at least, a language with functions as first-class objects),
since a lot of ML algorithms can be reduced to recursion on several matrices,
often using very similar functions. For example, ANNs frequently have very
similar structures and training strategies, but simply use different learning
functions.

Everything can be done in C/C++ though, and while it'd be harder, ML is an
area where the gain in speed and efficiency is so significant that extra
development time pays boatloads in terms of ROI. Even basic ML examples often
involve dealing with 300x236 dimensional data, so you can imagine how much
that data would scale up significantly in production environments.

~~~
wladimir
_Everything can be done in C/C++ though, and while it'd be harder, ML is an
area where the gain in speed_

Don't forget machine learning generally involves a _lot_ of experimentation,
and this is easier with higher-level languages. Hand-optmization always makes
assumptions about specific data structures and details which can be hard to
change later.

You can get very close with code-generating high-level languages though. See
Theano (<http://deeplearning.net/software/theano/>), for example. It uses a
functional approach to compose the formulas for (c-)ANNs, and has advanced
features such as automatic differentiation. Scalable generated GPU code can
beat the pants of even the fastest hand-written C loops. Sure, hand-optimized
GPU code can perform even faster but in my experience that is usually not
worth the trouble.

------
srean
Given that there are many lisp aficionados here at HN, some may find lush
<http://lush.sourceforge.net/> quite interesting. Lush has been discussed at
HN before but here is a short summary: You get lisp syntax, optimized multi-
dimensional arrays, _extremely_ easy integration with C code, common numerical
optimization libraries, and the option of translating (and then compiling)
lispy code into a C dynamic library.

The language features that the interpreter supports is subtly (and sometimes
not so subtly) different from the compiled version though they share the same
syntax.

In the context of scaling machine learning code you often hear that one
should/could write most of it in matlab/octave and the critical parts in C.
But anyone who has done it would know it is such a butt-hurting nuisance. In
comparison the C integration is a pleasure in Lush.

------
djacobs
We're using Matlab/Octave in the Stanford ML course, and it's certainly
elegant but not necessarily comfortable for functional programmers. I'm
reimplementing our assignments with Clojure/Incanter and am finding my code
just as performant and even cleaner. In my opinion, the ML domain maps to Lisp
code almost as well as AI as a whole.

~~~
ericlavigne
The instructor emphasized the performance benefit of Octave's optimized matrix
operations. Did you use Colt's matrix operations, or were you able to match
the performance of vectorized Octave code with non-vectorized Clojure code?

~~~
aheilbut
I think the real magic comes from _understanding the algorithms as matrix
operations_ , and then implementing them by more or less just writing down the
algebra.

~~~
billswift
Can you give a source for learning about "algorithms as matrix operations"? I
tried searching on that phrase in Google Scholar but didn't get anything
useful.

~~~
oscilloscope
If you can express the code using the functions and operators of linear
algebra, you can simply write the math and get fast, elegant code. May even
run on a GPU.

Fortran is frequently used for physics simulation. We used Mathematica, Maple
and Matlab even in non-computing classes.

See <http://www.netlib.org/lapack/>

Also <http://en.wikipedia.org/wiki/Array_programming>

~~~
billswift
Your first phrase captures what I assumed he meant, I was wondering if there
is anything available that could help me recognize algorithms that could or
should be expressed that way, and how to do it. Your Wikipedia link was
helpful.

I am working on a calculus refresher right now (I really need it) and am
planning on working through an elementary linear algebra text once that is
done, which is why I particularly noticed the comment. I have found that
having an idea of applications helps retention, which is one reason I am
trying to track down something more concrete.

~~~
aheilbut
Most of the textbooks on machine learning present things in that way (it's
really the only way). For example, check out Elements of Statistical Learning
(<http://www-stat.stanford.edu/~tibs/ElemStatLearn/>).

------
purplebear
I highly recommend to consider probabilistic languages, which constitute a
natural framework for probabilistic reasoning and modelling. For example, have
a look at this:

[http://research.microsoft.com/en-
us/um/cambridge/projects/in...](http://research.microsoft.com/en-
us/um/cambridge/projects/infernet/)

or at this:

<http://projects.csail.mit.edu/church/wiki/Church>

Have a great day!

------
sramsay
Considering what happened to Lisp when machine learning was called AI, I'm not
sure I'd want my language to win this contest. ;)

~~~
brianobush
but machine learning isn't promising the world. So far ML has a good track
record thus I believe we are safe.

------
cageface
Scala doesn't have great native libraries for this but otherwise I've found it
a very good language for basic ML stuff. You can express the algorithms very
succinctly and readably but still get good performance.

~~~
mattmiller
Check out Mahout: <http://mahout.apache.org/>

~~~
cageface
Of course the great thing about Scala is that you can pretty transparently use
all the Java stuff out there. Thanks for reminding me about Mahout.

------
the_cat_kittles
scikits.learn, pyMC, scipy, numpy, milk and more in python make a pretty
formidable toolkit, R is also (probably more powerful, albeit awkward)
excellent! Who needs to read a huge article...

