Teaching AI to race using NNs and genetic algorithms 141 points by Halienja on Aug 30, 2016 | hide | past | favorite | 42 comments

 Been staring in awe at this with my little sons (2&3). We actually get enthusiastic when the first signs of understanding arise. Where can I begin learning about this? I've done several sklearn examples, but this application to 'a new world' (like a racing game), how do you build that?With build I mean, how do you structure inputs and objects so that the goal is met? The goal for racing is both steering, braking, trottle...
 I did something very similar to this for an artificial life class back in college. My nets took significantly longer to learn how to race for some reason, but in the end it worked out. What I did was start off as simple as possible, just using a circular track so that it's easy to test bounds on the inputs (position greater than inner diameter and less than outer). My nets would get their sensor inputs at each time step, and would output a steering angle and speed. The speed would basically just be directly added to their current position after pointing them in the chosen direction. The fitness function amounted to seeing which racers got around the track the fastest.You can get some pretty interesting behavior if you start with a barebones model of the world like that. At one point my racers started teleporting around parts of the track by taking advantage of the fact that I wasn't doing continuous collision detection - they would detect a turn and then output a speed and direction that would cause them to instantly appear on the other side of the corner without navigating the curve on the way. It was interesting to watch them balance the ability to teleport against the range of their sensors.This sort of thing really lends itself to incremental tinkering, especially with javascript as powerful as it is today for prototyping. That gives a lot of intermediate feedback that will give you chances for finding new strange things to investigate. Adding features like tricky track generation is a decision you can make along the way if it seems interesting.
 It's a genetic algorithm, which means that it has a fitness function for how close the goal is to being met (choosing the furthest traveled car), and a 'gene', which is a sequence of instructions. The game utilizes a Neural Network to classify when it should turn/which way it should turn/how it should steer/etc, and then this NN is adapted into the genetic algorithm, which effectively generates random genes, follows each of those genes, then chooses the best 'fit' one according to the fitness function. Next, the algorithm restarts with the exception of now it's randomness is slightly based on the best fit gene from the last epoch, and repeat until you've got a gene that works.Imagine having a wall, and a certain portion of it is sticky (since it's covered in invisible glue), but you don't know which portion, and you certainly don't want to touch it to find out. So you throw spaghetti at the wall, and see which portions stick. Next, you throw more spaghetti at the part that seems to stick, and repeat until you've got the entire sticky portion mapped out.There's an old genetic algorithms demo with smart rockets and path-finding here: http://www.blprnt.com/smartrockets/ [warning: requires flash].If you're actually interested in building one of these, Daniel Shiffman wrote The Nature of Code: http://natureofcode.com/, which is an introductory level book for this kind of thing. He uses Processing in the book itself for modeling, but he also goes over the theories pretty well.If you want some formal literature, I took a brain and behavioral processing class in college, and the professor still has his slides up here: http://www.life.illinois.edu/mcb/419/sessions/index.php, along with links to papers and homework scaffolding.
 From what I can see, the course you mentioned looks very cool, however it seems a lot of the slides and linked papers are password protected. Do you have copies mirrored and/or know if the professor would be OK with these being shared?(Don't worry if not, As much as I'd love to browse some of whats on there, I don't want to make extra work for you or get anyone in trouble!)
 Hmm, that's strange, I can still access the files despite being in incognito mode on Chrome. I'm fairly certain the professor doesn't mind this being spread around, but I don't know HN's policies on sharing stuff like this, so shoot me an email (it's in my profile) and we can discuss it.
 Source code: https://github.com/TomaszRewak/ML-games
 Thanks for sharing.You can find here all of the code. Everything here is written (by me) from a scratch, beggining with game engine, and ending on ML algorithms. It's taken to the level where I even multiply matrixes for graphics transformation, operating on pure HTML5 canvas. So everyone should find there something interesting for him. Only editors are a great plugin for ace C9.
 Hi, why did you have to make your own graphics engine? Does canvas not supply everything you need?
 It's more like game engine, not a graphics engine. Canvas enabled me to draw basic shapes with a specific transformation. I had to implement something, that would give me a convenient way of managing them (like grouping them, or animating them - as canvas only provides a way of drawing and forgetting them). So with my code I can add an object to some scene, add children to it, transform it at the fly and move the camera. With pure canvas that also would be possible, but not as simple.
 got it. are you aware of/did you ever consider using the game engine Phaser? Probably wouldn't have been as fun as making your own game engine, but it does everything you mentioned.
 Yes, I did, but as you mentioned, where's fun in it? :DWhen I create projects of my own I like to face some challenges that help me improve my skills. On the other hand there is work, where I use libraries and other ready stuff, as the software has to be usually deployed as fast as possible and in the highest quality. But with project like this I can take my time and just have fun :) And by fun I mean comming up with an arhitecture, code optimization, data structures (like quad tree in this example) implementation and many others.
 This is really amazing how fast they learn! It seems to make a lot more progress than, for example, that genetic car evolver that at generation 200 is still trying to decide whether to put the wheels on the bottom.
 Actually when I was designing this, I thought that it will take at least 100 generations to learn some simple moves. But NN are powerful indeed.
 I observed something interesting: when I changed the number of hidden layers from 100 to 1, almost nothing changed.It took slightly more generations to learn how to race, but not that many more (~80 generations vs ~90 generations). In addition, I thought I observed a small difference in behavior, but nothing I could explicitly measure.
 It might be that in the process of learning, algorithm decided, that it can pass a constant value at the speed output and use this one node just to steer.But I'm testing this scenario right now, and all I can get are cars going on different curves. Are you sure, that you have restarted the process after making changes to the script?
 After 100 generations thay actually have learned to move correctly, that's impresive. But they move very slowly. But for 100 nodes in a single hidden layer it takes them only around 10-20 generations to learn for me (not 80), and they race really fast.
 That actually sounds right. For me, they took 80 or so to race what I subjectively thought of as "well", but they only took 10 or 20 - as you said - to roughly figure out what they were doing.And I figured out what the behavior difference was: before they finally "figure it out", they spend most of their time spinning in place or moving in "epicycles", for lack of a better term.Thanks for the project; it's been great fun!
 Generation 50 but none of the cars has learned to pass more than 5-6 corners.Also, changining the track from one interation to another I don't think it helps. The training should be done on the same track until at least one of the cars is able to complete it. Than it can be changed.Sometimes some of the cars get stuck and the round doesn't finish, without having to click the "End round" button. There should be some mechanism that detects that cars are stuck and end the round automatically.
 Ok, generation 60 and now a huge progress has been made. More than 50% of the cars are now able to finish almost all of the tracks. Very cool!
 Thanks :)I actually have had inactivity detection implemented at some point, but I've resigned of this in favor for user-defined stop condition. But you are right, I will bring that back.It's actually expected that in some cases it might take many generations to learn, as the population is rather small. In other attempts they can do this in like 5 generations, so please, tray to restart the whole process.Keeping one track untill at least one car is able to finish it is not as good idea as it seems due to ovefritting. The will simply memorize all the moves depending on some insigificant factors and they won't be albe to finish any other track. I have tested that :D
 However, maybe you will consider the request described here:https://news.ycombinator.com/item?id=12397375For research purposes :)
 Yeah, sure, why not :) I will have some spare time at the weekend, so then I will try to implement evenything that has been proposed in the feedback that i've got.
 Changing mutation rate to 0.1 and crossover rate to 0.3 seems to improve the learnig..0.5 mutation rate is way too much..
 Yeah, that might be true, and that's what this editor is for :D I didn't want to give users the best solution at the beggining, so they might search for it themselves. It's actally only the third combination of parameters that I've tested.On the other hand 0.5 is not so large, when you look at the mutation procedure. If a speciman is going to be mutated, only 20% of it's weights are changed and only within a range [-0.1,...,0.1]. But fill free to experiment :)
 retbull on Aug 30, 2016 [–] Took me to gen 11 and I only lost 1 car after that they all completed.
 I disagree about changing the track. It should be changed every iteration. If the track is not changed every iteration, the NN might develop to work in specific scenarios (only turning left for example) and when run on a different track, might fail miserably.
 On a track with both right and left turns that wouldn't have been the case.While I agree that exposing the NN to diverse tracks each iteration asures a thorough learning, I was kind of curios how susceptible the algorithm is to exploit irelevant particularities of a track instead of finding the general rule.I wish the creator will add the posiblity to chose if the track should be changed each iteration or only after at least one car completed it.
 I clicked End Race in the first few generations when all the cars got stuck in the middle of the road. I think that by the fifth one at least one car was able to complete a lap.
 I was at Generation 47 and all cars (almost, depending on the new genetic modifications) were able to finish the lap.
 "It is not the strongest that survive, but those that adapt to change " -- Charles Darwin
 Another awesome one for the collection! It would be interesting to compare this and the RC car: https://www.youtube.com/watch?v=1AR2-OHCxsQr/WatchMachinesLearn
 I've found some tricks to get it to learn really well (by gen 20 they all finish really fast) 1. Using a relu, return value => value >= 0 ? value : 0.0; 2. Bumping it up to 30 cars (if you use too many, it slows down, and you dont get as good of results because each iteration takes ages) 3. Changing the lap condition to c.lap < 5 and the time to 30 s. This lets each race go longer and widens the discrepancy between the best and worst racers - which improves selection
 Setting the hidden layer activation function to`````` value => value >= 0 ? value : 0.5 * value; `````` seems to improve convergence a lot.
 Neat!It should be noted that the genetic algorithm used is not the same as reinforcement learning, in which the entire history of decisions leading up to a success is fed back to the NN as training examples.
 I might be wrong, but isn't it? I think, that in fact it is, and what you are talking about is reinforcement learning with tabu search, or something.EDIT: Sorry, I've understood you wrong, what you are describing is, of course, not a tabu search, but still I believe that this GA is a reinforcement learning.EDIT2: After some research I can say, that opinions are divided :D
 So is the genetic algorithm used to mutate the neural network weights?
 Yes, that's correct.
 Nice, I love the way the user can change the NN and genetic algorithm parameters!There isn't much documentation though - is the genetic algorithm being used to evolve the NN somehow?
 Since all the NNs have the same architecture, the breeding is done by taking some parameters from one parent NN and some parameters from the other to populate the weights of the child NN (in this case, indices that are 1 mod 2 are taken from the first parent, the others from the second one). In addition to this there's some random tweaks of some parameters (20% chance that any given parameter will be changed, with the change being uniformly random from the range [-0.1,0.1]).
 Thanks :)Yup, I (as a developer of this thing) haven't created a documentation yet, but it's mostly due to the fact, that I have deployed this just yesterday and there are still many things that I'd like to change in it. But I will work on that :)
 Edit: The other commenter actually bothered to look at the code. Scratch my comment.

Search: