
Visualizing popular machine learning algorithms - ingve
http://jsfiddle.net/wybiral/3bdkp5c0/embedded/result/
======
blt
For learners it is confusing to see the nonlinear decision boundaries for
linear and logistic regression, IMO a note about the feature expansion should
be added

~~~
edwinksl
Yeah, that or label the axes....

~~~
sawwit
How would labeling the axis help explaining what is going on.

------
mtw
Awesome. Would be great to have execution times.

Also what is nerdy.js? I saw it was related to "Carl Edward Rasmussen" but
couldn't find another reference on the net

~~~
wybiral
It's a Javascript library I put together a long time ago for dealing with
datasets and machine learning algorithms. It was used for some of my own
personal projects and hasn't been focused on for release in the wild (although
I'm considering it now).

The reference to Carl Edward Rasmussen is because I based my minimize function
heavily off of this one:
[http://learning.eng.cam.ac.uk/carl/code/minimize/](http://learning.eng.cam.ac.uk/carl/code/minimize/)

~~~
mtw
I'd be interested in the library :D

------
indubitably
Looks like K-nearest neighbor does pretty well.

~~~
obmelvin
Yes, k-nn is theoretically the one of the best ML algorithms in the sense that
it will find the closest items in the training set. For classification or
finding similar looking items it is great. However, it has pretty poor running
times for evaluation of unseen data ([http://nlp.stanford.edu/IR-
book/html/htmledition/time-comple...](http://nlp.stanford.edu/IR-
book/html/htmledition/time-complexity-and-optimality-of-knn-1.html)). This is
contrary to something like neural networks, which take a while to train, but
then evaluate very quickly. For real world use the training times matter to an
extent, but in a web app or real time application the latency from knn is just
impractical.

~~~
jsyedidia
That's why we developed the "Boundary Forest" algorithm which is a fast
nearest-neighbor type algorithm with generalization at least as good as K-NN,
while being able to respond to queries very quickly.

It maintains trees of examples that let it train and respond to test queries
in logarithmic time with the number of stored examples, which can be much less
than the overall number of training samples. It thus maintains k-NN's property
of very fast training time, and is also an online algorithm, and can be used
for regression problems as well as classification.

See our paper that was presented at AAAI 2015 here:
[http://www.disneyresearch.com/publication/the-boundary-
fores...](http://www.disneyresearch.com/publication/the-boundary-forest-
algorithm-for-online-supervised-and-unsupervised-learning/)

------
maurits
There is also MLDemos [1] which is open source.

[1]: [http://mldemos.epfl.ch/](http://mldemos.epfl.ch/)

------
RockyMcNuts
I'm a little surprised neural network comes up with a straight line and linear
regression doesn't, which I thought by definition it would do. (e.g. on 2
normal groups)

Some discussion of methods, ie how many hidden layers/nodes for the neural
network, would probably help make some sense of it.

Random forest could be worth adding.

~~~
pedrosorio
Looking at the code
([http://jsfiddle.net/wybiral/3bdkp5c0/light/](http://jsfiddle.net/wybiral/3bdkp5c0/light/))
it seems they are expanding the features to include all second and third order
terms (options.expansion = cubic), that's why linear regression does not come
up with a straight line.

------
lottin
Sounds interesting, but I can't see the results with Firefox 38.2.1.

~~~
narsil
Try this one:
[https://jsfiddle.net/752pqyvp/embedded/result](https://jsfiddle.net/752pqyvp/embedded/result)

It's because of the browser blocking mixed content: The JS libraries are being
loaded over HTTP but the JSFiddle is over HTTPS.

The version above loads the libraries over HTTPS via cdnjs.com

------
revorad
Can someone please explain this?

~~~
wybiral
It's using the X and Y location of the dots as training data. Each algorithm
is being trained on (x,y)->color in an attempt to buildup a rule for
predicting what color an unseen (x,y) pair would be. The hypothesis it builds
is then used to color the background so that you can see the decision
boundary.

------
andrelaszlo
There's a bug somewhere.

Refresh, choose dataset: curved, algorithm: k means clustering. You get this:

[http://imageshack.com/a/img633/7110/sfteaE.png](http://imageshack.com/a/img633/7110/sfteaE.png)

If you play around and select different algorithms before selecting k means
clustering you can get very different results. :)

~~~
wybiral
I accidentally left k means in there as an option and it doesn't make much
sense in the context of this example. So, yeah, it's a bit of a bug.
Realistically, linear regression doesn't make sense being included either but
it still kinda works.

------
throwaway_bob
any visualization of these algorithms in 2 dimensions (with cubic feature
expansion!) is completely misleading if you intend to work on any real problem
with many dimensions. Also, for those asking for execution times, these would
be horribly misleading as well.

~~~
heinrichhartman
+1!

Are you aware of reasonable high dimensional "visualizations". It cant' be
accurate of course. But catpuring essential features would be nice.

E.g. here is a 4d cube:
[https://commons.wikimedia.org/wiki/File:8-cell.gif](https://commons.wikimedia.org/wiki/File:8-cell.gif)

------
chestervonwinch
How is there no training time delay? How is training all these classifiers not
putting my CPU into a sweat??

edit: I should also mention: these is very cool :)

~~~
joshvm
The dataset is quite small and you have a fast machine. On my laptop, a 7 year
old Core 2, there's a slight delay when running some of the heavier algorithms
(e.g. running neural net or svm on the island data set).

------
p1esk
Note: you can click on the graph to add datapoints.

------
alexanderb
No sources on GitHub, but why? Nerdy.js looks like interesting component, but
I failed to find any relevant information about it.

------
0x99
Wow it is awesome. Can we also play around with classifier parameters ( k of
kNN) ?

~~~
wybiral
Only in the code :) Click "Edit in JSFiddle" and look at line 109 in the
Javascript section. You'll see: options.k = 5

