
Tinker with a Neural Network in Your Browser - shancarter
http://playground.tensorflow.org/
======
erostrate
The swiss roll problem also illustrates nicely the idea behind deep learning.

Before deep learning people would manually design all these extra features
sin(x_1), x_1^2, etc. because they thought it was necessary to fit this swiss
roll dataset. So they would use a shallow network with all these features like
this: [http://imgur.com/H1cvt8d](http://imgur.com/H1cvt8d)

Then the deep learning guys realized that you don't have to engineer all these
extra features, you can just use basic features x_1, x_2 and let the network
learn more complicated transformations in subsequent layers. So they would use
a deep network with only x_1, x_2 as inputs:
[http://imgur.com/XBRjROP](http://imgur.com/XBRjROP)

Both these approaches work here (loss < 0.01). The difference is that for the
first one you have to manually choose the extra features sin(x_1), x_1^2, ...
for each problem. And the more complicated the problem the harder it is to
design good features. People in the computer vision community spent years and
years trying to design good features for e.g. object recognition. But finally
some people realized that deep networks could learn these features themselves.
And that's the main idea in deep learning.

~~~
romaniv
_> Before deep learning people would manually design all these extra features
sin(x_1), x_1^2, etc._

It's probably worth pointing out that this is true for ANNs, but there were
(and are) other "shallow" classifiers that can handle swiss roll problem
without manual parameter encoding. SVMs, for example.

[http://cs.stanford.edu/people/karpathy/svmjs/demo/](http://cs.stanford.edu/people/karpathy/svmjs/demo/)

~~~
conceit
needs another image link for visualization

------
eggy
I started reading about ANNs in the 1980s, and had similar confusion to those
here, since it was just for fun. I suggest reading a basic book or online
information that goes over the basics [1]. I struggled through $200 text
books, and jumped from one to the other as an autodidact. I am now studying
TWEANNs (Topology and Weight Evolving Artificial Neural Networks), which
basically are what you see here with the exception that they are able to not
only change their weights, but also their topology, that is how many and where
the neurons and layers are. ANNs (Artificial Neural Networks - as opposed to
biological ones) can be a lot of fun, and are very relevant to machine
learning and big data nowadays. It was exploratory for me. I used them for
generative art and music programs. Be careful: soon you'll be reading about
genetic algorithms, genetic programming [2], and artificial life ;) Genetic
Programming can be used to evolve neural networks as well as generate computer
programs to solve a problem in a specified domain. Hint: You'll probably want
to use Lisp/Scheme for genetic programming!

    
    
      [1] http://natureofcode.com/book/chapter-10-neural-networks/
      [2] http://www.genetic-programming.com

~~~
wjnc
Any thoughts on why genetic programming is not 'in fashion'? Does it have
anything to do with complexity of the calculations?

I can imagine that the advanced models use many, many machines and only
deliver results after a large training time. Genetic programming is not
feasible then, if you cannot get a quick grasp of the potential results of a
model.

~~~
DavidSJ
If your program is a neural network with N parameters, or a program tree with
N nodes, then testing against data takes O(N) time. With evolutionary
computation, what you get for your trouble is a single real number -- the
loss: how bad it did. With neural networks, backpropagation gives you N real
numbers: the gradient of loss with respect to each parameter.

Put another way: with evolution you have to stumble around blindly in
parameter space and rely on selection to keep you moving in the right
direction. With the gradient descent that neural networks use, you get,
essentially for free, knowledge of the (locally) best direction to move in
parameter space.

The bigger the models, the more this matters. Modern neural networks have
millions or even billions of parameters, and that's been crucial to their
expressive power. Good luck learning a program tree with a billion nodes using
evolution. It might take 4.54 billion years.

~~~
daveguy
> It might take 4.54 billion years.

And then only if you have a system powerful enough to accurately simulate a
planet full of molecules.

Although I do think there is a balance between GA and structured NN which will
lead to faster and better results than the deep NN alone. We already see some
of the best deep NNs incorporating specific structures.

~~~
eggy
I think neural networks and other forms of evolutionary computation will merge
as I have been writing in my other replies in this thread. TWEANNs incorporate
EC into evolving ANNs. The other article I cited above on soil mechanics, beat
out expert systems, ANNs, statistics, and used GP. MEP, or Multi-Expression
Programming for GP incorporates being able to put more than one solution into
a gene without increasing the processing times thereby overcoming the
inefficiencies of 1990s-era GP. Here is a recent article using it that is not
behind a paywall or via sci-hub.io [1]. It needs better editing, but there are
other references if you search for Multi-expression Genetic Programming.

    
    
      [1] http://benthamopen.com/ABSTRACT/TOPEJ-9-21

------
nl
This is great, but I think they should make it clear that this isn't using
TensorFlow.

From the title and domain I though they either had ported TF to Javascript(!)
or we connecting to a server.

~~~
sparky_
Wait - what it is using, then? I had assumed it was TF under Emscripten or
similar.

~~~
nl
It appears to be a custom NN implementation[1] in Javascript, somewhat similar
to convnet.js[2]

As far as I can see the API[3] isn't much like TensorFlow.

[1]
[https://github.com/tensorflow/playground](https://github.com/tensorflow/playground)

[2]
[http://cs.stanford.edu/people/karpathy/convnetjs/](http://cs.stanford.edu/people/karpathy/convnetjs/)

[3]
[https://github.com/tensorflow/playground/blob/master/nn.ts](https://github.com/tensorflow/playground/blob/master/nn.ts)

------
minimaxir
When it says "right here in your browser," it's not joking. On my desktop
(Safari), the window becomes unresponsive after a few iterations. Does not
happen in Chrome.

On my phone (Safari/iOS 9.3), the default neural nework doesn't converge at
all even after 300 iterations while it does on the desktop, which is legit
weird: [https://i.imgur.com/KNaXeHH.png](https://i.imgur.com/KNaXeHH.png)

~~~
shancarter
I'm sorry you're having problems with Safari. I can't reproduce on my end, but
if you're still having problems you can raise an issue on github with some
information about your system.

~~~
davidgl
Works perfectly for me on Safari 9.1 with no extensions

------
danielvf
In case you are an idiot like me, you have to train your neural network by
pressing "play".

~~~
shancarter
We would've liked to have it constantly training, but didn't want to abuse
your CPU :)

~~~
dingo_bat
It pauses when I switch tabs :(

------
gojomo
While it doesn't involve training, these 'confusion matrix' animations of NNs
classifying images or digits are fun, too:

[http://ml4a.github.io/dev/demos/cifar_confusion.html](http://ml4a.github.io/dev/demos/cifar_confusion.html)
[http://ml4a.github.io/dev/demos/mnist_confusion.html](http://ml4a.github.io/dev/demos/mnist_confusion.html)

Something about the high-speed updating makes me think of WOPR, in 'War
Games', scoring nuclear-war scenarios.

------
timroy
This demonstration goes really well with Michael Nielsen's
[http://neuralnetworksanddeeplearning.com/](http://neuralnetworksanddeeplearning.com/).
At the bottom of the page the author gives a shout out to Nielsen, Bengio, and
others.

For someone (like me) who's done a bit of reading but not much implementation,
this playground is fantastic!

~~~
seansmccullough
Really awesome article!

------
CGamesPlay
Neat stuff, fun to play with. I wasn't able to get a net to classify the swiss
roll. Last time I was playing around with this stuff I found the single
biggest factor in the success was the optimizer used. Is this just using a
simple gradient descent? I would like to see a drop down for different
optimizers.

~~~
8note
[http://imgur.com/ypBQEWx](http://imgur.com/ypBQEWx)

Add some noise, and use all the inputs, and one 8 wide hidden layer

edit: works better with a sigmoid activation curve, but it converges more
slowly

~~~
andrewtbham
Yeh you're on the right track. Nice pattern emerges on this after 160
iterations.

[http://playground.tensorflow.org/#activation=tanh&batchSize=...](http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-
plane&learningRate=0.03&regularizationRate=0&noise=25&networkShape=8,4&seed=0.38071&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification)

------
_AllisonMobley
Can somebody explain what I'm watching when I press play?

~~~
terda12
Hopefully this helps (correct me if I'm wrong, I'm still learning about neural
nets):

Think of the whole neural net as a function:

input * weight = output

At each iteration, we feed in the input to the neural net. Then the neural net
compares what output it gets to the correct output.

For example, input1 is 5, and the correct output for input1 should have been
2. But the neural net got 3 as the output. So it then decreases the weights
slightly so it would get 2.75 next time it has input of 5. Repeat thousands of
times. That's the basic idea for machine learning and neural networks.

The algorithm it uses to figure out how much to decrease the weights is called
"backpropagation" which uses gradient descent. To explain gradient descent, as
as a roller coaster track. Imagine the roller coaster starts off on a random
location on the track. Then gravity takes the roller coaster down the track
until it ends up on a low point between two hills and stays there. This is the
new location of the roller coaster. This new location is nice because it has
the lowest energy the roller coaster could find, so it stays there. (We use
derivatives to figure out the slope of a curve, which then gives us the
direction where the curve goes downhill).

In neural networks, the roller coaster curve is the "cost function", which
basically calculates the amount of difference between the neural net's output
and the actual correct output it should have got. The initial weight is the
roller coaster's initial position. The new weight is the roller coaster's
final position, at the bottom of the cost function curve. This new position
thus gives us the lowest cost.

Note that there may be even lower valleys, but when we roll the rollercoaster
it stops at its nearest low valley. This is why we randomize the weights at
the beginning - to put the roller coaster near possibly even lower valleys.

~~~
ryanmonroe
Okay, so it works by minimizing (equiv. maximizing) some function. But that
doesn't say much about how it "learns" the gradient. What function does it
care about? Average squared error (predict_prob-Z_i)^2 ? Average absolute
error? The likelihood function of some assumed distribution? Maximum distance
between the classification border and closest observed points? If I saw
someone carrying a bag full of blueberries and some bread home from the
grocery store and asked to know how they chose to buy that, to which they
replied "I had a list of characteristics which I thought where important for
groceries to have in this trip to the store. For each grocery item, I recorded
a vector of degrees to which the item possesses each of those characteristics.
Finally, I chose the group of groceries that had the best combination of
degree vectors", I still wouldn't really know anything about why they bought
the blueberries and bread.

~~~
zodiac
The function it minimizes is called the "loss function", and its value for the
training and test sets are shown in the upper right area. AFAICT the site
doesn't say how it's computed, but I think it's average squared error. The
gradient is not learned; if you think of the loss function as a real-valued
function of the weights, the gradient is just the partial derivatives with
respect to the weights.

------
karpathy
this is very nice! I think that the reason swiss roll doesn't work as easily
might be because of initialization. In 2 dimensions you have to be very
careful with initializing the weights or biases because small networks get
more easily stuck in bad local minima.

~~~
okigan
In this case you see that it is the swiss roll so you could say pick "proper
initialization".

But that technique would not work when you cannot see that it is a "swiss
roll" or in multiple dimensions.

~~~
brianchu
I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains
in neural net performance have been made through better initialization schemes
(not dataset specific, just in general, e.g. an initialization scheme might
adapt the initial weight distribution depending on the number of hidden units
in the next layer), and smaller models are in general more sensitive to
initialization.

------
danblick
Has anyone been able to learn a function for the spiral (Swiss roll) data
that's as good as a human-designed function would be?

~~~
moconnor
For this simple example just choosing the largest possible fully-connected
network with ReLU and L2 regularization to prevent overfit quickly converges
to a nice spiral (test loss of 0.001 for me):

[http://playground.tensorflow.org/#activation=relu&regulariza...](http://playground.tensorflow.org/#activation=relu&regularization=L2&batchSize=30&dataset=spiral&regDataset=reg-
plane&learningRate=0.03&regularizationRate=0.003&noise=0&networkShape=8,8,8,8,8,8&seed=0.38911&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification)

~~~
findthewords
I wouldn't call it quick... spiral in 150 iterations, with sigmoid magic:
[http://playground.tensorflow.org/#activation=sigmoid&regular...](http://playground.tensorflow.org/#activation=sigmoid&regularization=L2&batchSize=10&dataset=spiral&regDataset=reg-
plane&learningRate=0.1&regularizationRate=0.01&noise=0&networkShape=8&seed=0.63555&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification)

I find the pulsating unsightly.

------
halotrope
You could totally optimise network architecture by crowdsourcing topology
discovery for different problems into a multiplayer game with loss as a score.

------
Your_Creator
So glad anns are becoming mainstream

Eventually it will have to be recognized as a new species of life, so I hope
programmers, tinkerers and everyone else keeps that in mind because all life
must be respected

And this particular form will be our responsibility, we can either embrace it
as we continue to merge with our technology, or we can allow ourselves to go
extinct like so many other species already have

For the naysayers - ever notice how attached we are to our phones? Many behave
as if they are missing a limb without it - it's because they are, the brain
adapts rapidly and for many, the brain has adapted to outsourcing our
cognition. It used to be books, day runners, journals, diaries - now we have
devices and soon they'll be implants or prosthetics

The writers at marvel who came up with the idea of calling iron man's suit a
prosthetic were definately onto something and suits like that are probably our
best chance of successful colonization of other planets. We'll need ai to be
our friend out there, working with us

------
aab0
This is a lot of fun. The default dataset is too easy, though, try out the
Swiss Roll one!

~~~
minimaxir
There is a reason why sin(X) is an input property. :p

~~~
aab0
Using sin(x) or the other input features like x^2 goes back to making it too
easy, though. So far the best I can do is 7 layers of 7 which gets a loss of
0.02. 3x7 is _almost_ cracking the Swiss Roll but can't quite finish it off
and gets stuck at 0.05: [https://imgur.com/Z3f2ECc](https://imgur.com/Z3f2ECc)
... Surprisingly, 2x8 can do it, as long as I have noise or regularization on,
but 8/7 then seriously struggles. Is 16 neurons a critical limit here?

~~~
teraflop
I managed to get to 0.01 loss from only x1/x2, using 3 hidden layers, L1
regularization, a bit of added noise, and some patience:
[http://i.imgur.com/Y3zKpJF.png](http://i.imgur.com/Y3zKpJF.png)

~~~
aab0
Yes, noise & regularization seem to be key here. I've gotten a 2-layer with
7/8 neurons down to 0.06 and dropping but only with noise & l1:
[http://playground.tensorflow.org/#activation=relu&regulariza...](http://playground.tensorflow.org/#activation=relu&regularization=L2&batchSize=6&dataset=spiral&regDataset=reg-
plane&learningRate=0.01&regularizationRate=0.03&noise=10&networkShape=8,7&seed=0.52682&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&noise_hide=false)
Final loss of 0.051. Interestingly, increasing noise from 10 to 15 destroys
performance, loss of 0.47.

------
sparky_
This is a very cool toy. As someone with no experience in ML, this is an
interesting visual approach to the absolute basics.

And great for challenging your friends in an epic battle of convergence!

~~~
trgn
If you like visual demonstrations of ML topics, you may be interested in
[http://ponder.hepburnave.com](http://ponder.hepburnave.com). It is an
interactive demonstration of a self-organizing map, generating a 2D-map from a
spreadsheet with multivariate data. It's an unsupervised learning approach,
good for data exploration tasks, less so for classification tasks
(/shamelessPlug).

------
pkaye
I'm not well versed in neural networks but a lot of the new neural network
software stacks coming out seem to be quite plug and plug. What kind of
expertise would engineers need to have a few years from now when the
technology is well developed and it doesn't need to be rewritten from scratch
every time?

~~~
walrus
I'm not qualified to answer this, but I will anyway.

To "operate" neural networks (as opposed to writing a framework for them), you
need to know the building blocks. There are basic blocks like fully connected
layers, convolutions, and nonlinear activations. Beyond those, there are
higher level building blocks like LSTMs[1], gated recurrent units[2], highway
layers[3], batch normalization[4], and residual blocks[5] that are made up of
simpler blocks. Learning what these do and when it's appropriate to use them
requires following current literature.

Operating neural networks requires some systems engineering skill. It takes a
long time to train a single network and you'll find yourself trying many
different architectures and hyperparameters along the way. Because of this,
you'll want to distribute the training across many different systems and be
able to easily monitor and deploy jobs on those systems.

A solid grasp of mathematics is useful to effectively debug your networks.
You'll frequently find your network doesn't converge or gives totally garbage
results, so you need to know how to dig into the network internals and
understand how everything works. This is especially true if you're
implementing a new building block from a paper.

Finally, know your machine learning and statistics fundamentals. Understand
overfitting, model capacity, cross validation, probability, model ensembles,
information theory, and so on. Know when a simpler model is more appropriate.

[1] ftp://ftp.idsia.ch/pub/juergen/fki-207-95.ps.gz

[2] [http://arxiv.org/abs/1409.1259](http://arxiv.org/abs/1409.1259)

[3] [http://arxiv.org/abs/1505.00387](http://arxiv.org/abs/1505.00387)

[4] [http://arxiv.org/abs/1502.03167](http://arxiv.org/abs/1502.03167)

[5] [http://arxiv.org/abs/1512.03385](http://arxiv.org/abs/1512.03385)

~~~
pkaye
So you don't think some of these details will not be automated away in the
near future so that it doesn't require a specialist to do operate a neural
network?

~~~
vintermann
Already, it's not nearly as hard as this demo makes it look. There's one
recent advance in particular that isn't in this demo, and that is Batch
Normalization.

If you've played around with it a bit, I'm sure you have seen that deeper
layers are hard to train... You see the dashed lines representing signal in
the network become weaker and weaker as the network gets deeper. BatchNorm
works wonders with this. It takes statistics from the minibatch of training
examples, and tries to normalize it so that the next layer gets input more
similar to what it expects, even if the previous layer has changed. In
practice you get a much better signal, so the network can learn a lot more
efficiently.

Without BatchNorm, more than two hidden layers is tedious and error-prone to
train. With it, you can train 10-12 layers easily. (With another recent
advance, residual nets, you can train hundreds!)

Such advances pushes the limit for what you can train easily, and what still
requires GSD ("graduate student descent", figuring out just the right
parameters to get something to work through intuition, trial and error). You
still have to watch out for overfitting, but the nice thing about that is that
more training data helps.

------
plafl
Beautiful. The next time someone asks what is machine learning about I'm going
to send a link to this page.

------
nxzero
"Don’t Worry, You Can’t Break It. We Promise."

(Nice, but it's completely unclear what's going on.)

~~~
visarga
My MacBook Pro running El Capitain froze my mouse and keyboard. I had to do a
hard reset.

------
nkozyra
Is a 50/50 training:test a normal default ratio for an ANN? I expected to see
a higher amount of training data represented as the initial setting.

------
hyh1048576
One of the finest data visualization I've seen.

------
icelancer
This is so great. An easy way to show my friends WTF I do sometimes for
math/CS work. Thank you so much.

------
imaginenore
I wish they had more interesting data sets.

~~~
HappyTypist
Submit a pull request.

