

Machine Learning in JavaScript - xd
http://burakkanber.com/blog/machine-learning-in-other-languages-introduction/

======
bkanber
Author here -- thanks for submitting, xd!

Let me know if you have any questions. I do intend to keep up with this
series, although my pace is pretty slow at about one article every three
months or so.

There are already a couple of comments about running ML in JS and how JS and
the browser environment isn't terribly suited for heavy calculations. First:
you're totally correct; second, I chose JS because it's

1) accessible -- whether you're Python or Ruby or PHP on the backend, you're
probably comfortable with JS and

2) it demystifies machine learning -- you have to write your ML from scratch,
without the help of all those wonderful Python libs, and I think this exercise
shows you that it's not so mysterious after all.

Anyway, thanks for reading, and I'll poke in here throughout the day if you
have questions.

~~~
usamec
"2) it demystifies machine learning -- you have to write your ML from scratch,
without the help of all those wonderful Python libs, and I think this exercise
shows you that it's not so mysterious after all." \- you can demystify machine
learning in any better language.

JS and PHP are slow, crappy and bug prone. Sane languages (like Python, C++)
have tools to make your job easier (like numpy, blas, eigen and other
libraries). They provide fast and reliable math routines so you don't have to
worry about some eigenvalue decomposition, matrix multiplication and other
problems.

~~~
breuleux
JS isn't particularly slow any more, thanks to the massive efforts invested in
the optimization of the various competing JS engines. It is generally faster
than Python 3
([http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...](http://benchmarksgame.alioth.debian.org/u64/benchmark.php?test=all&lang=v8&lang2=python3&data=u64))
and not just a little. Just for the hell of it I compared node and python on
naive fibonacci. That is anecdotal, of course, but node is _30 times faster_.
I do agree with you that JS has horrid, bug-prone semantics, but it's
impressively well optimized.

I also agree that other languages offer better tools. For instance, Python has
Numpy. However, that's written in C++, not in Python. You can write plugins
for Node in C++ too, so nothing would stop someone from writing a Numpy
equivalent for JS. You might even be able to run it in some browsers through
something like Emscripten with a performance overhead of 2-3x (I think?)

~~~
m1sta_
Let's not forget webCL. Gpu based matrix multiplication is available in
javascript too.

------
kajecounterhack
For those who aren't familiar Andrej Karpathy has done a lot of cool stuff
with ML in JS. Particularly he has a CNN library -- deep learning comes to JS!

[http://cs.stanford.edu/people/karpathy/convnetjs/](http://cs.stanford.edu/people/karpathy/convnetjs/)

[http://cs.stanford.edu/people/karpathy/svmjs/demo/](http://cs.stanford.edu/people/karpathy/svmjs/demo/)

Heather Arthur (npm libraries brain, classifier) has also done a bunch of cool
stuff!

[https://github.com/harthur](https://github.com/harthur)

------
morganherlocker
To those wondering why someone would want ML in JS, there are loads of
reasons.

For starters, node.js, which makes most of the arguments regarding
server/client moot.

Secondly, there are many client side applications for these types of
algorithms as well. K-means clustering, for example, is already used by many
mapping libraries to group together large numbers of points[1].

I personally use neural networks and affinity propagation in many of my
applications for predictive analysis. This does not have to only be
educational, or of a 'toy' nature.

[1]
[http://danzel.github.io/Leaflet.markercluster/example/marker...](http://danzel.github.io/Leaflet.markercluster/example/marker-
clustering-realworld.388.html)

~~~
nashequilibrium
Nodejs is for i/o, i know it has workarounds for long running
tasks(threadpool) but it does not excell at that. Training your model,
updating your model, validation, matrix factorization on large datasets etc, i
just don't see how Javascript helps here. Maybe just taking the http request
and dumping it onto a rabbitmq queue to to classify something but you still
have a whole host of other stuff to deal with.

~~~
morganherlocker
> Nodejs is for i/o

Node is a general purpose language that can be used for all kinds of things. I
switched from Python to node.js about a year ago for exactly the sort of tasks
you are describing and could not be happier. Right off the bat I had huge
speed improvements.

Also, io is one of the biggest issues with web based data analysis, so it
really should not be underestimated. I can do more with less with node than I
could with Python. This is especially true with long running tasks where a 1
minute processing time vs a 20 minute processing time might mean you need
1/20th the number of servers in a cluster ($$$).

Of course, this could be a pretty good argument for something even
faster/lower level, but for me, node.js struck a good balance between
performance and ease of development/ecosystem. As usual, YMMV.

One last point. The language you choose cannot always be the best language for
every task you need. Typically you choose a stack based on the most
common/important tasks in your infrastructure, then for less common tasks you
just make it work with what the chosen language provides. In this case node.js
does not need to be the best solution for ML, it just needs to check the box
for being possible, so that devs who needed node.js for other reasons now have
the ability to add ML to their toolbox.

~~~
kimmel
> Node is a general purpose language that can be used for all kinds of things.

Node is a _library_ that can be used for all kinds of things. FTFY

~~~
morganherlocker
Fair enough, it is not a language, but I would not call it a library either.
Perhaps a platform?[1] It contains many libraries, but also has a cli tool,
packages a runtime, a debugger, and even includes a package manager. It
contains the sorts of things that would be packaged when you install languages
like Python or Ruby.

[1] [http://nodejs.org/](http://nodejs.org/)

------
Already__Taken
This blog also contains a JavaScript physics series which is very cool.

Do hope this author writes more again it has been quite.

~~~
bkanber
Thanks! -- I am intending to write more, but the pace is _sloooowww_.

------
viana007
For neural networks, you can use BrainJS

[https://github.com/harthur/brain](https://github.com/harthur/brain)

~~~
nightski
So the input/output pairs are a linked list of objects? Which then contain
vectors comprised of linked lists? I am not very into JavaScript, but that
right there must preclude this from doing anything significant in a reasonable
amount of time?

------
frik
That's great, keep up the good work.

For some reason it's somewhat hard to find C-style science code examples in
some disciplines. Python feels a bit like a plague in this respect. Everytime
I have to wrap my head around while converting code to C-like language (C,
C++, PHP, JS).

~~~
tlarkworthy
The distance to convert math to python is so much shorter than math to C or
math to javascript.

You need something like numpy to make working in javasctipt easier before
there will be a proliferation of of ML in JS.

I really love JS for its distribution and some of the visualizations are
amazing. But the low level, numerically stable, matrix math primitives are
sorely lacking.

~~~
Already__Taken
I feel the people doing heavy work with graphics APIs and 3D work with robots
too will push this back into the javascript language given some time.

~~~
tlarkworthy
I work with robots. The leading middleware is ROS. It is multi language, but
does not support JS out the box (unlike LISP, Java, C++ and Python), though
there is movement there
[http://brandonalexander.com/rosnodejs/](http://brandonalexander.com/rosnodejs/)

I can see JS useful for a UI to a robot, but I can't see it replacing Python
for math, or C++ for speed, or LISP for planning systems.

That said, I can imagine node.js being a better async message router than the
current C++ one.

ROS is glued together with XML-RPC, which I think was a mistake (why not
JSON???)

------
Joe8Bit
If you're looking for more on this, or some general purpose JS NLP you'd do
worse than to checkout node-natural[0]

[0]:
[https://github.com/NaturalNode/natural](https://github.com/NaturalNode/natural)

------
e12e
Interesting, but I wonder about this:

> … well, most of the time. There are some things you really can’t do in PHP
> or Javascript, but those are the more advanced algorithms that require heavy
> matrix math.

Leaving out javascript (in the browser), it sounds like an odd statement to
make about php -- after all one of php strengths is how easy it is to link
with c-libraries (or other with c ffi)? Among other things I quickly found:

[http://www.php.net/manual/en/intro.lapack.php](http://www.php.net/manual/en/intro.lapack.php)

------
code_scrapping
I still view JS as a UI-oriented language, and I really don't know why would
you want to implement processor-heavy algorithms in a browser environment,
which need a lot of data and don't use the networking.

I would still stick to python. Or java. Or anything else which has a clear
syntax and can run at a useful speed (I'm not mentioning C++ because of the
coding overhead and dirty tricks which makes it a bit unfriendly for learning
an algorithm)

~~~
dangoor
Clarity of syntax is a matter of opinion (personally, I agree that Python is
clearer than JS... Java, not so much.)

Implying that JavaScript can't "run at a useful speed" is wrong, using modern
implementations. This is especially true for code that runs through lots of
repetition as the just-in-time compilers in the JS engines do a remarkable
job.

Not to mention that viewing JS as a UI-oriented language seems a bit out of
date given the 40k or so packages for Node.js that are in npm.

JavaScript of today is pretty different than JS of 2007, and there are more
changes coming with generators, iterators, destructuring, class syntax, arrow
functions, promises, etc.

~~~
Joe8Bit
While I disagree with the comment you're responding to, and agree with yours,
there are some interesting problems doing resource heavy operations in ML/NLP
in an environment like Node that's inherently single threaded.

I'm actually adding multi-threading to classifier training in node-natural as
we speak [0] so it's something I'm recently familiar with. Multi-threading in
JS isn't new or particularly exciting (even less so is multithreading in
ML/NLP applications) but the marriage of the two has led to a few interesting
problems in JS's asynchronous/event based view of the world!

[0]:
[https://github.com/NaturalNode/natural/issues/124](https://github.com/NaturalNode/natural/issues/124)

\--

Edited for clarity

~~~
dangoor
That's a good point. Of course, the problems with shared mutable state are
well-documented and I'm glad that JavaScript hasn't headed down that path. But
you're right that Node doesn't have good, mature solutions for that yet (short
of your central data store option)

------
perimo
I kicked around with some JS manifold learning stuff[1] a while back for
essentially the same purpose: practice in writing things from scratch, while
making it easier for other people to play with.

[1]:
[https://github.com/perimosocordiae/js_manifolds](https://github.com/perimosocordiae/js_manifolds)

------
ecesena
Cool project! Have you already tried asm.js and/or measured performances?

~~~
frik
You cannot write asm.js by hand (in a sane way... it uses one big array for
everything). It's meant to be translated from emscripten clang compiler
project. So you can compile C/C++ code to asm.js.

But Javascript engines like V8 with its JIT are way faster than Python. You
can even use typed arrays that give you almost native speed for such
operations (e.g. matrix). I am coding a 3D game in WebGL and JS is as fast as
Java when used in a modern fashion, though JS run in every browser

------
TeeWEE
X in JavaScript..... _ugh_

if all you have is a hammer, everything looks like a nail

~~~
bkanber
Have you read the article? I make it pretty apparent that JS is used primarily
for its educational value :)

~~~
TeeWEE
To be honest i didn't initially. I just read it.

I think it is a noble thing to explain this in JS. But i don't think "because
every body uses js" is a good reason to choose js.

However your specific use case makes sense. But in a broader sense I see more
and more people fleeing to JS because its what they know.

~~~
thrush
What alternatives would you recommend for someone new to programming and CS?

I guess fleeing implies that they were using other tools already, but a lot of
new devs are going to JS because it just makes sense to start there (lots of
flexibility, hyperactive community, education value).

~~~
stusmall
If you are at the point of your CS education that you are taking a serious
look at machine learning and understanding the theory then you shouldn't have
a problem translating into whatever your language of choice is. I get why a
teacher would just want to pick a language and say "this is what it is in" but
I don't get people who need CS concepts taught in their language of choice.
The hard part is the theory and not the implementation.

When I took it in university it was taught in language agnostic psuedocode and
we were free to use any language from a long list for our assignments.

~~~
bkanber
> If you are at the point of your CS education that you are taking a serious
> look at machine learning and understanding the theory then you shouldn't
> have a problem translating into whatever your language of choice is.

This feels like begging the question. Why does that need to be the case? Why
can't someone strive to learn machine learning _without_ learning a new
language? Why can't they get a head start on the concepts early in their
career? Is there some requirement that ML _must_ be an advanced topic, only
accessible to polyglots that I haven't heard about?

~~~
stusmall
Because the nature of the subject requires a fair amount of background. To
truly understand the subject and a lot of the approaches a firm understand of
statistics, data structures, and even some calculus. Usually by the time
someone these subjects down enough for anything substantial in ML then they've
seen enough different languages to suss out the general idea of most algorithm
sample code.

I'm not saying there isn't room for the easier to understand and easier to
read guides to ML. More the better, Mitchell was a beast to read through. Its
just the language isn't the hard part of the subject. You are the author of
the link, correct? I read through some of it, and its approaches the theory
and subject matter in a gentle way which is what matters. The sample code is
easy to read. I've written _maybe_ 100 lines of js in my life and avoid all
web dev like the plague. Your guide is well written and useful. I am not
dogging it at all and please don't take it that way. I think its great!

What I'm saying is if someone is saying to themselves "I would be able to
learn machine learning if only their was a guide in X" then they are probably
mistaken. The code is easy, the math and theory is what is hard.

~~~
bkanber
> Its just the language isn't the hard part of the subject.

For you, sure -- but not for everyone.

This series has actually been up for a little over a year now. I get emails
from people who didn't know what machine learning was before they started
reading the articles, and now they're building some of the most creative and
beautiful projects out there. I also get emails from people who need to
implement ML in JS or C-like languages but have had trouble seeing the
algorithms in full relief when translating from Python, for instance.

The point is, your experience is not everyone's experience. My goal is
_purely_ one of accessibility of education. There _are_ smart, talented people
who never played with ML simply because they didn't want to dive into a
different language, different platform, and different environment just to muck
around. There _are_ people who hadn't heard of ML before, but tried it out
because JS was _right there_ for them. There _are_ people who stayed away from
ML because they thought higher math and a CS education were requirements.
Those are facts. This series serves all those people, and it serves them well.

~~~
stusmall
I just want to make sure that its clear that I think you are hitting that goal
and your posts look great. There is nothing worse than posting something on
the internet to have some snarky neckbeard from the peanut gallery put it down
for some tangential reason.

------
LambdaAlmighty
Only a question of time before the Machine Learning wave arrived at the
arguably least suited--but certainly the most popular--platform!

There's money to be made with this combination. The field is ripe.

Good write up too.

