

Online (Machine) learning in Clojure - grosales
http://mark.reid.name/sap/online-learning-in-clojure.html

======
a-priori
Just for fun, I ported this program to Haskell. It appears to generate
identical results, and completes the training set in 11.23 user seconds.

<http://gist.github.com/147988>

~~~
mreid
That's very neat. Would you mind posting a link to your version from the
comments on my article? Thanks.

I learnt Haskell before I tried using Clojure and really like it as a
language. It doesn't surprise me that it runs faster. I'm guessing the feature
value look-ups don't involve the extra unboxing that Java's maps do.

~~~
a-priori
Sure, just cross-posted it.

Types in Haskell are boxed as well, though it's possible to get around this
with some GHC-specific hackery, I haven't done it.

But the biggest problem is the parsing code -- it actually spends around 70%
of the runtime parsing the input! Changing to regular expressions may help
that, but I gave up trying to find good documentation for Text.Regex.Posix.

------
bravura
"The reported accuracy is simply the cumulative total number of errors divided
by the number of steps."

Use a moving average. I use: m <\-- m - (2/t) (m - x_t) when estimating the
current training error.

1/t would be the exact historical average. 2/t gives more weight to recent
events, which is good when your distribution is non-stationary (as is the case
when your model is changing). With a constant learning rate (independent of t)
you get an exponential moving average.

------
jimbokun
Considering his comment at the end about using the Colt matrix libraries, I
wonder if he knows about incanter?

[http://github.com/liebke/incanter/blob/59c13e05e3242e4491f9d...](http://github.com/liebke/incanter/blob/59c13e05e3242e4491f9dbb00abab230acdab03e/README.textile)

~~~
mreid
I'm the author of the blog post and to answer your question, yes, I know about
Incanter. In fact, I found out about Parallel Colt via Incanter. I've played
around with it a little and it looks very cool.

I was initially going to build my algorithm using its libraries as a base but
I thought a simpler first step would be to write it without pulling in too
many extra dependencies.

