
Show HN: Recurrent neural net plays 'neural slime volleyball' in JavaScript - hardmaru
http://otoro.net/slimevolley
======
hardmaru
As requested, I put up a demo of neural slime volleyball without any pre-
trained agents so you can see them improve gradually as they evolve:

[http://otoro.net/slimevolley/training.html](http://otoro.net/slimevolley/training.html)

Basically, I left the computer running for a day or so, and came back
realising that they have become quite good. But a day of training on a macbook
pro is probably equivalent of a _few real decades_ of simulation at 30 frames
per second, so to answer someone's question is that evolutionary approach will
probably not work well when training with actual human players in human-time.
Other approaches such as deep-q-learning may be more suitable. But you gotta
wonder about the power and possibilities of simulating time at super human
speeds to artificially evolve cool stuff!

~~~
sahilwasan
Hi hardmaru.

Thats great stuff. I always want to learn AI and ML. But do not know where to
start. Can you please guide where to start and how can you done this. Thanks

~~~
hardmaru
Hi Sahilwasan

If you want to learn I found it is best to get your hands dirty and code
rather than just watch videos from MOOCs. I didn't know how to do any of this
stuff a few months ago.

I recommend Karpathy's Hacker's Guide to Neural Networks if you like JS

[http://karpathy.github.io/neuralnets/](http://karpathy.github.io/neuralnets/)

Most of my intro to neural nets is just by going thru programming examples in
this stanford tutorial on deep learning:

[http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutori...](http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial)

------
bufordsharkley
Very interesting-- slime volleyball was quite the thing back in my high school
days. (See it here if you can get the Java applets to run in 2015 [1])

I was surprised to see a few months ago that the version we played back in
high school had the AI programmed by Chris Coyne (of SparkNotes, OKCupid, and
now Keybase). I'm curious what was behind the AI in 2004, and furthermore,
impressed by Chris Coyne's seeming ubiquity...

1: [http://slimegames.eu](http://slimegames.eu)

~~~
hiharryhere
Massive blast from the past. Slime games were awesome. I spend a ton of time
in the highschool library playing volleyball. I think they guy behind the AI
in the one I played was an aussie guy called Daniel Wedge. We used to play on
his University provided page. Slime Cricket was great too

------
cowpig
Whoa.... any chance you can write up something on how this works? I've never
seen a working implementation of a recurrent net before. I would be very
grateful!!

Either way, this is very impressive :)

~~~
fpgaminer
His blog post describes how it was created:
[http://blog.otoro.net/2015/03/28/neural-slime-
volleyball/](http://blog.otoro.net/2015/03/28/neural-slime-volleyball/)

------
ebbv
This is cute but I think it's just a really hard game to play as a human. Your
little dude can't move fast enough.

~~~
uuave
Yeah, judging the angles and getting the ball to bounce in the right direction
is really hard. But I'm gonna see if my kids can play this. I'm guessing they
probably can.

I like these old-school games.

~~~
Ntrails
i'm finding a real problem with delay on jumping/direction which seems to be
part of the problem? As though I'm expecting movement to behave slightly
differently. could just be making excuses for myself though!

------
chillingeffect
This is massively frustrating :) although I started to develop some strategies
once I got used to the controls.

I have a question though: In useful.js, there's a function like this:

function getHeight() { return $(window).height()-20 _0; }

Why is 20_0 subtracted from the window's height?

~~~
squeaky-clean
It looks like it's actually

    
    
        return $(window).height()-20*0;
    

Which is just subtracting zero from the result. Not sure why. The only reason
I know for doing that is to cast a string to a number without the function
overhead of parseInt(). Furthermore, I don't actually see any code using that
function.

~~~
juliangregorian
Probably leftovers from development phase.

~~~
hardmaru
lol - apologies about my messy code guys. It's just hacked together as a rough
sketch to see if it'll work. as you can see my web coding skillz still needs
work. I put it up on github though (be warned about messy hackish code):

[https://github.com/hardmaru/neuralslimevolley](https://github.com/hardmaru/neuralslimevolley)

~~~
deepnet
I found the code clear and comprehensible - your genetic library for
convnet.js looks useful - have you seen Karpathy's reinforce.js ?

I notice you left a training flag - I would like to watch the nets train so
definitely going to have a go.

And thanks for introducing me to CNE, I am reading John Gomez's thesis right
now - very very interesting, looks to have a big advantage over Q-learning.

In your blog post you compare the 'DNA' of multiple training episodes, the
similar weights kind of suggests that you have exhaustively searched the
problem space ?

~~~
hardmaru
Thanks - thinking of putting up a version that has not been pretrained, so
users can see the training over several generations. I think I have one lying
around - remind me on twitter if I forget.

I'm also trying to learn 'deep q learning' (q-learning with a neural net as
the q function) and other cool stuff that has recently been developed.

btw, I think your previous post with the 'brain' of 140 numbers is messing up
this thread a bit on my browser for some reason and the comments not wrap,
wonder if there's a way to fix it.

~~~
deepnet
Many Thanks for the untrained version - I am working on visualising the
evolving genome based on your RGBA illustration.

I read Sutton and Barto on Q-Learning but only 'got' it when I saw it hand-
done with matrix math. [http://mnemstudio.org/path-finding-q-learning-
tutorial.htm](http://mnemstudio.org/path-finding-q-learning-tutorial.htm)

Sutton & Barto suggest Eligibility Traces seem necessary to make Q learn fast
enough.
[http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node79.html](http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node79.html)

Gomez's thesis suggests that genetic methods have an advantage over backprop
for learning sequences of actions.
[http://www.cs.utexas.edu/users/nn/downloads/papers/gomez.phd...](http://www.cs.utexas.edu/users/nn/downloads/papers/gomez.phdtr03.pdf)

Deep Q Learning is brilliant - evolving the Q function would certainly be very
interesting - currently trying to get the ALE sending frames to my convnet :)

mods have helped fix the errant brain code - thanks for the heads up.

~~~
hardmaru
cool! Let's talk further offline later- I would be interested to know what you
end up doing with it, and pick your (biological) brain for ideas. Ping me on
Twitter message later-

------
thomasfoster96
I'm a little confused - the convnet.js library doesn't do recurrent neural
networks, but the recurrent.js library (by the same author) does. Yet
convnet.js seems to be being used for this, not recurrent.js.

Still, this is pretty awesome.

~~~
hardmaru
Hi Thomas

I manually take the outputs of a feed forward network and out them back as
inputs in the next time step, kind of making it a fully connected RNN.

Here are some details of the implementation:

[http://blog.otoro.net/2015/03/28/neural-slime-
volleyball/](http://blog.otoro.net/2015/03/28/neural-slime-volleyball/)

~~~
thomasfoster96
Ah, right. That makes sense (even if it's a little tedious).

------
nine_k
Is the graph of the top a representation of the neural net? That is, are 11
neurons in 2 layers enough to operate the game?

~~~
deepnet
[accompanying blog post]([http://blog.otoro.net/2015/03/28/neural-slime-
volleyball/](http://blog.otoro.net/2015/03/28/neural-slime-volleyball/))

I too am perplexed as to what the neurons represent, the blog describes the
RNN as 19 inputs connected to 7 outputs.

Examining slimevolley_pro.js (Agent.prototype.drawstate) shows only the agent
states are depicted onscreen x,y,vx,vy,bx,by,bvx,bvy in the top row and
actions forward, jump and backward below.

The code is well documented and builds on convnet.js by Karpathy.

The network is trained by genetic algorithm and self play - so this is a
neural net trained with reinforcement learning.

The author's method seems unique and effective.

The blog post comparison of the resultant 'genomes' of network weights seem to
show that the space has been searched exhaustively.

The game is played with trained networks but the code contains a training flag
- it will be interesting to watch them train with self play.

~~~
albertzeyer
The method is not unique. See some of the work of Schmidhuber's group. They
are doing a lot of reinforcement learning for recurrent nets (LSTMs) and also
via evolutionary algorithms. Eg see Evolino.

~~~
deepnet
None of the ideas are new.

[http://people.idsia.ch/~juergen/evolino.html](http://people.idsia.ch/~juergen/evolino.html)

Yet Schmidhuber's nets are much more complex and certainly different.

I still think Otoro's very simple RNN feedback nets are unique - especially
when coupled with training by self-play.

Does Schmidhuber have any game playing agents ?

~~~
darkmighty
I was very impressed with this handwriting demo (and the accompaning paper):

[http://www.cs.toronto.edu/~graves/handwriting.html](http://www.cs.toronto.edu/~graves/handwriting.html)

Using a recurrent architecture.

To be honest I haven't played with NNs, but it puzzles me as to why the non-
recurrent approach is so prevalent for complex tasks. I mean, it's the basic
combinatorial circuit vs sequential circuits, which we all know are much more
suited for complex or large outputs. Where's everything we learned from
synchronous logic synthesis?

~~~
deepnet
Grave's Handwriting Net was trained using Backpropagation - whereby the error
between the net's estimate and a training target is sent backward through the
net - so the net's estimates gradually become closer to the targets.

Backpropagation takes longer the deeper the net - Recurrent Neural Nets are
deep in time so Back Propagation can become intractable or unstable.

Otoro's Slimeball demo evolves a Recurrent Net rather than training it - this
appears to be a very efficient method, less likely to get stuck in local
minima.

The slimes evolve through self-play which is a trial and error method and
reinforcement methods seem to do better on control tasks than passive
learning.

~~~
darkmighty
Ah I see. But as far as training goes the difference between the two methods
("evolution" and backprop) is a matter of locality, no? The backprop modifies
weights loosely based on local gradiet towards fitness, and evolution goes in
sparse random directions. In this view backprop is indeed vulnerable to local
maxima if your optimization method isn't very good, but isn't it just a matter
of choosing good optimization methods? In other words, combining local
backprop optimization with global evolutionary methods should be the role of
robust optimization algos, no?

------
data_scientist
Please modify the bindings, the directional keys are international, without a
qwerty keyboard I can't play - and no, I will not modify it systemwide for
just a JS game.

Nice one nevertheless!

~~~
hardmaru
Oops I haven't thought about that!

The arrow keys should work right?

Is there an easy way to detect non-qwerty keyboards so can implement it
elegantly in JS?

------
donatj
Does it continue to learn when I take over control, eg: learn from me, or is
the learning done at the time I can play?

~~~
deepnet
AFAIK the game is not learning when playing a human.

There is a flag to control learning in the javascript trainingMode = false;

------
discardorama
The interesting thing, to me, is how was network the trained?

~~~
deepnet
Other networks learn - this network evolved.

By breeding the winners of many self-play tournaments a perfect player was
born.

Training and learning show examples and mark the student against a correct
answer - the student network tries to minimise their error.

Genetic Algorithms are a type of Reinforcement Learning, i.e. learning by
trial and error.

The agent can direct it's own learning and explore.

Useful for training robots and other embodied agents such as oppenents.

~~~
discardorama
I am aware of all of those terms. But I'd prefer to hear details about how
_this particular network_ was evolved, as you put it.

~~~
deepnet
Pretty comprehensive details here. [http://blog.otoro.net/2015/03/28/neural-
slime-volleyball/](http://blog.otoro.net/2015/03/28/neural-slime-volleyball/)

~~~
discardorama
Thanks! That's what I was looking for. Very interesting.

------
seanalltogether
I feel like I'm missing something here, can anyone give a brief synopsis on
the significance of this?

~~~
sweezyjeezy
The AI of the computer player is an RNN, which is a type of neural network.
The objective of the game, what the controls do, and strategies were never
hard coded into the AI, it learned them through trial and error, the way a
human would. It's a lightweight implementation too, done in JS. One neat
feature is that you can see the real-time activations of the neurons which are
connected to the control nodes in the background.

~~~
seanalltogether
Is the "learned" behavior stored in the javascript somewhere, or is it coming
from a socket in real time? Is it just a giant data array for handling
possible decisions?

~~~
deepnet
Line 60 of slimevolley_pro.js contains the "learned" behaviour in 140 floats:

    
    
      "gene":{"0":7.5719,"1":4.4285,"2":2.2716,"3":-0.3598,"4":-7.8189,"5":-2.5422,"6":-3.2034,"7":0.3935,"8":-6.7593,"9":-8.0551,"10":1.3679,"11":2.1859,"12":1.2202,"13":-0.49,"14":-0.0316,"15":0.5221,"16":0.7026,"17":0.4179,"18":-2.1689,"19":1.646,"20":-13.3639,"21":1.5151,"22":1.1175,"23":-5.3561,"24":5.0442,"25":0.8451,"26":0.3987,"27":-2.6158,"28":0.4318,"29":-0.7361,"30":0.5715,"31":-2.9501,"32":-3.7811,"33":-5.8994,"34":6.4167,"35":2.5014,"36":7.338,"37":-2.9887,"38":2.4586,"39":13.4191,"40":2.7395,"41":-3.9708,"42":1.6548,"43":-2.7554,"44":-1.5345,"45":-6.4708,"46":-4.4454,"47":-0.6224,"48":-1.0988,"49":4.4501,"50":9.2426,"51":-0.7392,"52":0.4452,"53":1.8828,"54":-2.6277,"55":-10.851,"56":-3.2353,"57":-4.4653,"58":-3.1153,"59":-1.3707,"60":7.318,"61":16.0902,"62":1.4686,"63":7.0391,"64":1.7765,"65":-4.9573,"66":-1.0578,"67":1.3668,"68":-1.4029,"69":-1.155,"70":2.6697,"71":-8.8877,"72":1.1958,"73":-3.2839,"74":-5.4425,"75":1.6809,"76":7.6812,"77":-2.4732,"78":1.738,"79":0.3781,"80":0.8718,"81":2.5886,"82":1.6911,"83":1.2953,"84":-5.5961,"85":2.174,"86":-3.5098,"87":-5.4715,"88":-9.0052,"89":-4.6038,"90":-6.7447,"91":-2.5528,"92":0.4391,"93":-4.9278,"94":-3.6695,"95":-4.8673,"96":-1.6035,"97":1.5011,"98":-5.6124,"99":4.9747,"100":1.8998,"101":3.0359,"102":6.2983,"103":-2.703,"104":1.5025,"105":6.1841,"106":-0.9357,"107":-4.8568,"108":-2.1888,"109":-4.1143,"110":-3.9874,"111":-0.0459,"112":4.7134,"113":2.8952,"114":-9.3627,"115":-4.685,"116":0.3601,"117":-1.3699,"118":9.7294,"119":11.5596,"120":0.1918,"121":3.0783,"122":-6.6828,"123":-5.4398,"124":-5.088,"125":3.6948,"126":0.0329,"127":-0.1362,"128":-0.1188,"129":-0.7579,"130":0.3278,"131":-0.977,"132":-0.9377,"133":2.2935,"134":-2.0353,"135":-1.7786,"136":5.4567,"137":-3.6368,"138":3.4996,"139":-0.0685}
    

These are the weights on the connections between the inputs, neurons and
outputs.

Everything you need to know about how to win Slime Volleyball in 140 numbers !

~~~
hardmaru
Hi deepnet

Do you mind helping me truncate the length of the message above with the
number array? It's causing issues on browser sessions not word wrapping
messages

Thanks vm

~~~
deepnet
Very sorry I have messed things up :(

Unfortunately I have not got an edit or delete link so I am powerless to alter
my mistake.

If any mods* can help delete this rogue copypasta I would be appreciative.

[relevant xkcd]([https://xkcd.com/327/](https://xkcd.com/327/))

*emailed support

~~~
dang
Sure. Fixed.

For future reference: I fixed it by adding two spaces before the very long
line. That makes it be formatted as code and not wrap.

------
technologia
A blast from the past and I'm still losing to the computer. Nothing has
changed.

------
billconan
very interesting! now you owe me 10min of my life. :)

