Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Recurrent neural net plays 'neural slime volleyball' in JavaScript (otoro.net)
147 points by hardmaru on May 12, 2015 | hide | past | favorite | 74 comments

As requested, I put up a demo of neural slime volleyball without any pre-trained agents so you can see them improve gradually as they evolve:


Basically, I left the computer running for a day or so, and came back realising that they have become quite good. But a day of training on a macbook pro is probably equivalent of a _few real decades_ of simulation at 30 frames per second, so to answer someone's question is that evolutionary approach will probably not work well when training with actual human players in human-time. Other approaches such as deep-q-learning may be more suitable. But you gotta wonder about the power and possibilities of simulating time at super human speeds to artificially evolve cool stuff!

Hi hardmaru.

Thats great stuff. I always want to learn AI and ML. But do not know where to start. Can you please guide where to start and how can you done this. Thanks

Hi Sahilwasan

If you want to learn I found it is best to get your hands dirty and code rather than just watch videos from MOOCs. I didn't know how to do any of this stuff a few months ago.

I recommend Karpathy's Hacker's Guide to Neural Networks if you like JS


Most of my intro to neural nets is just by going thru programming examples in this stanford tutorial on deep learning:


Mindblowing that the trained agents are 10s of years old - this puts their skill in perspective.

Very interesting that we humans may be too 'realtime' to train evolving agents.

Do your plank avoiding creatures evolve faster ?


Watching the creatures avoiding planks it is hard to reconcile the deterministic mathematics with the seeming agency and 'beingness' of the simulated life.

The creatures evolved much faster! If you run the training demo at http://otoro.net/planks/training.html I noticed that after 50-100 generations they develop insect like characteristics.

But for the slime agents, they needed much evolution. I think when I did it I actually had them evolve on the easy version first, and then increased the area size to evolve them on the larger play area. This sort of step by step evolution strategy was covered in some neuroevolution papers I read. I also applied it to the pendulum example.

While Deep-Q-Learning is powerful, it uses backprop - your use of CNE genetic evolution rather than backprop may provide a more global search than Mnih's Deep-Q.

The methods seem complementary , I have an inkling that the combination may be greater than the sum of the parts.

Another innovation in the Atari paper, Mnih's deep-Q-agent remembers successful episodes and trains more from memory than experience.

Of course they are not training against a human player but perhaps this memory training could be used with human games to exploit expert knowledge.

The recent Neural Net Go players by Edinburgh's Clarke & Storkey group and Deepmind's Maddison & Huang both train on corpii of expert human play.

But the Go players are passive backprop learners.

Tesauro's TD-Gammon learned to expert level through self play.

I am speculating that a Reinforcement Learner could learn to utilise the benefits of human play with memory, expert corpii and self-play.

This is a fascinating parallel area - two equal players rather than Mnih's single player Atari - perhaps there are good 2-player Atari games that could be added to the ALE benchmark - Joust ?


Teaching Deep Convolutional Neural Networks to Play Go by Clarke & Storkey http://arxiv.org/abs/1412.3409

Move Evaluation in Go Using Deep Convolutional Neural Networks by Maddison , Huang , Silver & Sutskever http://arxiv.org/abs/1412.6564

Tesauro's TD-Gammon http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node108.htm...

Playing Atari with Deep Reinforcement Learning , Mnih et al. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Otoro's CNE Bibliography http://blog.otoro.net/2015/01/27/neuroevolution-algorithms/

Neuro Evolution - http://en.wikipedia.org/wiki/Neuroevolution

Thanks for the references

So you think basically recording human actions (possibly in some feature extracted way) in part of the 'replay memory' for DQN would work well?

I have also been reading about deepmind's recent survey of combining deep learning methods with actor critic models

What I also want to explore is the possibility to use evolution to evolve q-functions, which shouldn't be so hard to do, rather than evolve policies directly (like in this game).

The possibility to evolve self learning machines excite me, rather than just evolving machines in a fixed state.

I can also explore whether Darwinian evolution (weights are randomized at birth) is better or worst than Lamarckism evolution where weights are passed to offsprings for.

I put some further thoughts and references in my post here


The surviving agents will be born with a better capacity to learn rather than the capacity to do a predefined job. Stay tuned!


This slide from David Silver's ICLR talk hint's at Google Deepmind's Gorila Parallel Large Scale Actor Critic Deep Q Architecture

There is some evidence that expert curriculii can make learning much faster , although with game agents I dont know of anyone exploring this since Michie and Chamber's 1968 work on tic-tac-toe and pole-balancing comparing expert training and self-play with these benchmarks.


Collobert, Weston and Bengio have explored evolving efficient curicula


Posted a small write up here with regards to the evolution setup used:


Very interesting-- slime volleyball was quite the thing back in my high school days. (See it here if you can get the Java applets to run in 2015 [1])

I was surprised to see a few months ago that the version we played back in high school had the AI programmed by Chris Coyne (of SparkNotes, OKCupid, and now Keybase). I'm curious what was behind the AI in 2004, and furthermore, impressed by Chris Coyne's seeming ubiquity...

1: http://slimegames.eu

Massive blast from the past. Slime games were awesome. I spend a ton of time in the highschool library playing volleyball. I think they guy behind the AI in the one I played was an aussie guy called Daniel Wedge. We used to play on his University provided page. Slime Cricket was great too

Whoa.... any chance you can write up something on how this works? I've never seen a working implementation of a recurrent net before. I would be very grateful!!

Either way, this is very impressive :)

His blog post describes how it was created: http://blog.otoro.net/2015/03/28/neural-slime-volleyball/

The author drops into this reddit thread to answer questions


This is cute but I think it's just a really hard game to play as a human. Your little dude can't move fast enough.

I agree, but I'm not sure I would have said that without first getting clobbered by the AI. The list of really hard things for a human seems to be growing these days.

Yeah, judging the angles and getting the ball to bounce in the right direction is really hard. But I'm gonna see if my kids can play this. I'm guessing they probably can.

I like these old-school games.

i'm finding a real problem with delay on jumping/direction which seems to be part of the problem? As though I'm expecting movement to behave slightly differently. could just be making excuses for myself though!

I should have done a better job for the ball dynamics

Perhaps the easy version is more suitable for the kids


Thanks for the feedback it is very helpful.

This is massively frustrating :) although I started to develop some strategies once I got used to the controls.

I have a question though: In useful.js, there's a function like this:

function getHeight() { return $(window).height()-200; }

Why is 200 subtracted from the window's height?

It looks like it's actually

    return $(window).height()-20*0;
Which is just subtracting zero from the result. Not sure why. The only reason I know for doing that is to cast a string to a number without the function overhead of parseInt(). Furthermore, I don't actually see any code using that function.

Probably leftovers from development phase.

lol - apologies about my messy code guys. It's just hacked together as a rough sketch to see if it'll work. as you can see my web coding skillz still needs work. I put it up on github though (be warned about messy hackish code):


I found the code clear and comprehensible - your genetic library for convnet.js looks useful - have you seen Karpathy's reinforce.js ?

I notice you left a training flag - I would like to watch the nets train so definitely going to have a go.

And thanks for introducing me to CNE, I am reading John Gomez's thesis right now - very very interesting, looks to have a big advantage over Q-learning.

In your blog post you compare the 'DNA' of multiple training episodes, the similar weights kind of suggests that you have exhaustively searched the problem space ?

Thanks - thinking of putting up a version that has not been pretrained, so users can see the training over several generations. I think I have one lying around - remind me on twitter if I forget.

I'm also trying to learn 'deep q learning' (q-learning with a neural net as the q function) and other cool stuff that has recently been developed.

btw, I think your previous post with the 'brain' of 140 numbers is messing up this thread a bit on my browser for some reason and the comments not wrap, wonder if there's a way to fix it.

Many Thanks for the untrained version - I am working on visualising the evolving genome based on your RGBA illustration.

I read Sutton and Barto on Q-Learning but only 'got' it when I saw it hand-done with matrix math. http://mnemstudio.org/path-finding-q-learning-tutorial.htm

Sutton & Barto suggest Eligibility Traces seem necessary to make Q learn fast enough. http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node79.html

Gomez's thesis suggests that genetic methods have an advantage over backprop for learning sequences of actions. http://www.cs.utexas.edu/users/nn/downloads/papers/gomez.phd...

Deep Q Learning is brilliant - evolving the Q function would certainly be very interesting - currently trying to get the ALE sending frames to my convnet :)

mods have helped fix the errant brain code - thanks for the heads up.

cool! Let's talk further offline later- I would be interested to know what you end up doing with it, and pick your (biological) brain for ideas. Ping me on Twitter message later-

Hope it didn't seem like I was trying to criticize. The app is really cool, and thanks for open-sourcing it. I'd be really interested in seeing the version that doesn't start pre-trained.


This post outlines a version that doesn't start pre-trained


I'm a little confused - the convnet.js library doesn't do recurrent neural networks, but the recurrent.js library (by the same author) does. Yet convnet.js seems to be being used for this, not recurrent.js.

Still, this is pretty awesome.

Hi Thomas

I manually take the outputs of a feed forward network and out them back as inputs in the next time step, kind of making it a fully connected RNN.

Here are some details of the implementation:


Ah, right. That makes sense (even if it's a little tedious).

Is the graph of the top a representation of the neural net? That is, are 11 neurons in 2 layers enough to operate the game?

[accompanying blog post](http://blog.otoro.net/2015/03/28/neural-slime-volleyball/)

I too am perplexed as to what the neurons represent, the blog describes the RNN as 19 inputs connected to 7 outputs.

Examining slimevolley_pro.js (Agent.prototype.drawstate) shows only the agent states are depicted onscreen x,y,vx,vy,bx,by,bvx,bvy in the top row and actions forward, jump and backward below.

The code is well documented and builds on convnet.js by Karpathy.

The network is trained by genetic algorithm and self play - so this is a neural net trained with reinforcement learning.

The author's method seems unique and effective.

The blog post comparison of the resultant 'genomes' of network weights seem to show that the space has been searched exhaustively.

The game is played with trained networks but the code contains a training flag - it will be interesting to watch them train with self play.

The method is not unique. See some of the work of Schmidhuber's group. They are doing a lot of reinforcement learning for recurrent nets (LSTMs) and also via evolutionary algorithms. Eg see Evolino.

I stand corrected, Schmidhuber in AMA http://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am... has an evolved Atari agent before Mnih

Otoro states he is using CNE Conventional Neural Evolution, but combining it with ideas about recurrence from the Atari paper.


He outlines the evolution of his thinking and slime volleyball in this post which cites John Gomez's thesis as the inception of CNE.

Certainly parallel ideas to Schmidhuber but the implementation details are somewhat different in the U of Texas Neuro-Evolution models.

None of the ideas are new.


Yet Schmidhuber's nets are much more complex and certainly different.

I still think Otoro's very simple RNN feedback nets are unique - especially when coupled with training by self-play.

Does Schmidhuber have any game playing agents ?

I was very impressed with this handwriting demo (and the accompaning paper):


Using a recurrent architecture.

To be honest I haven't played with NNs, but it puzzles me as to why the non-recurrent approach is so prevalent for complex tasks. I mean, it's the basic combinatorial circuit vs sequential circuits, which we all know are much more suited for complex or large outputs. Where's everything we learned from synchronous logic synthesis?

[reply to darkmighty comment below, thread depth limitation]

Indeed this is exactly it, evolution is a global method, learning is local.

I am reading John Gomez's thesis where he compares and combines learning and evolution


Otoro's post on the evolution of his slime volleyball thinking is well worth a read.


I will check those out, thanks. This is indeed a fascinating topic. I guess every scientist wants to understand learning.

The connection comes up in david mckay's Information Theory book too, a reading I definitively recommend, although I haven't been through it properly myself.


McKays Information Theory is a brilliant read so far, many thanks.

Having learning couched in Information Theory terms brings it all right back to Claude Shannon's early work on Reinforcement Learning Chess programs and Alan Turing's ideas about evolving efficient machine code by bitmask genetic recombination.

Grave's Handwriting Net was trained using Backpropagation - whereby the error between the net's estimate and a training target is sent backward through the net - so the net's estimates gradually become closer to the targets.

Backpropagation takes longer the deeper the net - Recurrent Neural Nets are deep in time so Back Propagation can become intractable or unstable.

Otoro's Slimeball demo evolves a Recurrent Net rather than training it - this appears to be a very efficient method, less likely to get stuck in local minima.

The slimes evolve through self-play which is a trial and error method and reinforcement methods seem to do better on control tasks than passive learning.

Ah I see. But as far as training goes the difference between the two methods ("evolution" and backprop) is a matter of locality, no? The backprop modifies weights loosely based on local gradiet towards fitness, and evolution goes in sparse random directions. In this view backprop is indeed vulnerable to local maxima if your optimization method isn't very good, but isn't it just a matter of choosing good optimization methods? In other words, combining local backprop optimization with global evolutionary methods should be the role of robust optimization algos, no?

That handwriting demo is awesome! Thanks for the pointers - I want learn more about how that works.

I wish I stayed at U of T for a few more years...

It's just an abbreviated form of the nets shown in this (http://blog.otoro.net/wp-content/uploads/sites/2/2015/03/sli...) image, the 3 outputs displayed correspond to the "forward", "jump", and "back" on the diagram, and the 8 inputs correspond to the agents' and balls' position vectors (which you can see in the Brain() code in slimevolley_pro.js).

Since RNNs are self connected, in some sense it has a lot more depth than just 2, so they can learn very non-linear features (but the weights of the connections are shared across the layers).

Please modify the bindings, the directional keys are international, without a qwerty keyboard I can't play - and no, I will not modify it systemwide for just a JS game.

Nice one nevertheless!

Oops I haven't thought about that!

The arrow keys should work right?

Is there an easy way to detect non-qwerty keyboards so can implement it elegantly in JS?

use the cursor keys for player 2

Does it continue to learn when I take over control, eg: learn from me, or is the learning done at the time I can play?

AFAIK the game is not learning when playing a human.

There is a flag to control learning in the javascript trainingMode = false;

The interesting thing, to me, is how was network the trained?

Other networks learn - this network evolved.

By breeding the winners of many self-play tournaments a perfect player was born.

Training and learning show examples and mark the student against a correct answer - the student network tries to minimise their error.

Genetic Algorithms are a type of Reinforcement Learning, i.e. learning by trial and error.

The agent can direct it's own learning and explore.

Useful for training robots and other embodied agents such as oppenents.

I am aware of all of those terms. But I'd prefer to hear details about how this particular network was evolved, as you put it.

Thanks! That's what I was looking for. Very interesting.

I feel like I'm missing something here, can anyone give a brief synopsis on the significance of this?

The AI of the computer player is an RNN, which is a type of neural network. The objective of the game, what the controls do, and strategies were never hard coded into the AI, it learned them through trial and error, the way a human would. It's a lightweight implementation too, done in JS. One neat feature is that you can see the real-time activations of the neurons which are connected to the control nodes in the background.

Is the "learned" behavior stored in the javascript somewhere, or is it coming from a socket in real time? Is it just a giant data array for handling possible decisions?

My understanding is that the weights[1] were produced from a series of trials, and then persisted in the JS.

1: http://en.wikipedia.org/wiki/Synaptic_weight

It would be neat if you could have it know nothing, and learn from you over a series of games.

Line 60 of slimevolley_pro.js contains the "learned" behaviour in 140 floats:

These are the weights on the connections between the inputs, neurons and outputs.

Everything you need to know about how to win Slime Volleyball in 140 numbers !

> Everything you need to know about how to win Slime Volleyball in 140 numbers !

I know kung fu!

haha, I like that.

Yeah, there are a total of 140 weights/biases in the neural network. This stuff is not bleeding edge - just taking techniques popular in academia and industry many years ago to make a fun game mainly for my own educational purpose. Basically recently I saw a lot of work on AI's playing ATARI games and wanted to learn reinforcement learning techniques, and the best way to learn is to build a game!

I put up a description of this sketch here:


Basically, the agent's brain consists of 7 neurons (compared to 250k for an ant), which are connected to all the inputs. The inputs are the game states, like locations of agents and ball, and velocities. Each neuron is also connected to each other.

The first 3 neurons control the neuron's output (it's motor function), like whether it hits the up, forward, or back button - the same function that a human game player can do.

The other 4 neurons are hidden neurons responsible for any deeper computation or thought, and are fed back into the input with a time lag.

I also graphed a slice of the neural network in real time on the screen just for effect. More for design effect if anything.

How many numbers do you need? It isn't just a matter of trial and error is it?

The number of weights depends on the network architecture.

In this case fully connected , 1 weight per connection .

16 inputs to 7 neurons + 7 neurons to 3 outputs + 7 biases

16x7 + 7x3 + 7 = 140


The number of neurons is a decidable.

Hi deepnet

Do you mind helping me truncate the length of the message above with the number array? It's causing issues on browser sessions not word wrapping messages

Thanks vm

If anyone else is having this issue, I fixed it like this:

    document.querySelector('style').innerHTML += '.comment p { max-width: 70vw; overflow: scroll; }'

Very sorry I have messed things up :(

Unfortunately I have not got an edit or delete link so I am powerless to alter my mistake.

If any mods* can help delete this rogue copypasta I would be appreciative.

[relevant xkcd](https://xkcd.com/327/)

*emailed support

Sure. Fixed.

For future reference: I fixed it by adding two spaces before the very long line. That makes it be formatted as code and not wrap.

A blast from the past and I'm still losing to the computer. Nothing has changed.

very interesting! now you owe me 10min of my life. :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact