(1) Modern classification algorithms like SVMs need some pretty hardcore math routines (SVMs require Quadratic Programming, which isn't trivial to implement correctly). Do you intend to implement these yourself? If so, that alone might be useful as a separate library, with the ML library built on top of it.
(2) I've been thinking about a JS distributed computing library for a long time -- sort of like Folding@Home, but instead of having to download a program, you just visit a website and let JS do the crunching (Ajax will pull and push data chunks). With modern JS engines, this has become more of a reality. So to bring it back to your question -- why not try to abstract as much of the math+algorithms from as many distributed computing projects as possible, and then build a generic JS library for doing distributed computation. You could have each distributed computing project as a benchmark -- i.e., start by implementing SETI@Home using your library, then move on to Folding@Home. I guarantee you won't be bored... ;)
Good luck, and I'm really glad people are pushing the capabilities of JS these days.
On a lighter note, can you imagine the derisive laughter if you had suggested this 10 years ago? :)
Since I don't know much about machine learning: what do you plan to build your JS library to do or support? Is it just for research, or are there ways to use it in a more everyday webapp?
This http://harthur.github.com/brain/ seems very slow to train.
From my experience in seeing performance and the kind of tweaking we've had to do to be able score 10K documents/s, you need some nitty gritty C code that I don't think can run in a browser.
Why would this be useful? Machine learning generally needs two things that browsers aren't very good at dealing with:
1) Large amounts of data
2) Fast I/O to process that data.
Why would someone prefer to use a client library rather than a remote call to a high performance serverside library, which will give better results?