Amazing that this can be done in 300 lines of perfectly readable JS without any libraries. And the author apparently wrote it in 2 days. Great intro to genetic algorithms and reinforcement learning. Universities should teach like that.
Somebody has asked about the big picture how the thing works:
You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole. (https://en.wikipedia.org/wiki/Feedforward_neural_network)
The question now is how to find the correct weights for the net.
That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations. That is use some of the best individuals. Some randomly generated ones and some created by breeding from existing individuals. (https://en.wikipedia.org/wiki/Genetic_algorithm)
And that's basically it in this case.
This is going to turn out to be a rant, but sometimes it feels like there's a culture of solving problems by a philosophy of "let's throw more money at it, more technology at it, more people, more unnecessary abstraction layers, complexity and bloat - it's bound to be solved eventually". And with enough fire power, it usually is - but at what cost... I'm not sure what the source of the problem is though, and it's not only with code. Many times I struggle to understand certain concepts via professional literature, and when I finally understand them, I'm puzzled as to why they are taught in a way that if I didn't know better, I'd assume a deliberate attempt to confuse the reader instead of making the learning simple, intuitive and fun.
Exactly the same applies to code I've had to refactor and sometimes rewrite over the years. Obviously, some of it has to do with trying to get a product out in crazy schedules which requires making compromises, but I can tell the difference between compromise and "I don't really care" attitude, and I'm talking about the latter. It's a pity though - not only for the one that has to sort out and sometimes clean up the mess, but also for the one that creates it. Things are so much more fun where you care and are passionate about what you do, that I'm sorry for those who are missing out on it.
End of rant. Not sure it belongs here but had to get it out of my system :-)
It is seductive to want things to be elegant. It is costly to wait for the elegant solution.
As for explaining being a tough skill - what I find in common with the previous point is the lack of stress on keeping things as simple and obvious as possible. If you truly understand something, you should be able to convey that understanding assuming you really want to - it just takes investment, usually by asking yourself why it is obvious to you, then putting yourself in the other party's shoes and then finding the shortest and clearest path to bring them from where they are to where you are. In my experience it's not a tough skill - just requires introspection into your own understanding and not assuming anything about what the other person understands.
The basis is not that brute force is better. The essence is most of us are not seeking an elegant solution to a problem. We are seeking a solution to a problem. Often, just getting that answer is all that matters. Finding a more concise way to get it is something I fully agree that someone should be trying to do. And, in the long term, it is a huge boon if it is found. For most tasks, though, the original solution is all that was needed.
I ran the sim a few times and got wildly different results.
- First time (about average) it stabalised (scopre >10,000) at Generation 18.
- The quickest stabalisation was at Generation 3
- sometimes it got to Generation 50 without stabalising.
Brute force will eventually get there, but I guess based on so few parameters it is easy to create a misleading weighting and difficult to un-learn that.
I would had expected at least also the horizontal distance from the next pipe...
This isn't my field, but given the simplicity of the inputs and network, and the way commenters are seeing the demo achieve perfect play after anything between 2 and 200 generations, it makes me wonder if this isn't more of a brute-force search than actual learning?
That is, it smells like there's a "correct" set of neuron values - where any genome within some tolerance of those values wins forever, and any other genome dies quickly. If that's the case, the system can't really evolve towards a solution, can it? It would just cycle randomly through lots of genomes that die immediately, until by pure chance one lives forever. I only tried the demo a few times but that's what it looked like it was doing.
edit: With sigmoid or hard-threshold activation, this function is really simple to implement. If we want to flap iff bird is lower than pipe, we can do that with no hidden layers and a weight vector of [1 -1]. I'd be curious to see someone fork and implement this.
Yes, slightly disappointingly the neural network can be replaced by the line:
if((this.birds[i].y - 70) / this.height > nextHoll) this.birds[i].flap();
What does make the distance irrelevant is that the holes are high enough to safely flap whilst inside them, so you never have to make a timed leap through, a luxury not afforded to users of the original game if I recall.
If the aperture is smaller than the jump height, then you need to do something smart to time your jumps.
It took a few dozen generations to find versions that would make it through a few columns if those columns had gaps without too much vertical distance between them. Somewhere in generation 60-70, I could see versions figuring out how to transition between heights, but failing by overshooting, or not accounting for gravity on the far side of the gap. But once it figured out how to transition between distant heights, it had something that kept working.
GIF of the successful bird: https://imgur.com/a/A0li8
edit: Gen 22 still going strong @ 1 bird, up to 231,000 points. Looking at game.js it's a fairly simple model, I suspect once you get a bird that makes it past the 1-minute mark it will be able to continue forever.
EDIT - to be more clear: the first generation died in a couple of seconds. A bird from the second generation was immediately able to navigate forever. Either this was my chance to win the national lottery (what a way to waste it) or something's wrong.
It's always different and you can get to 1000 points if you are lucky with the map. But the one that got to 1000 might not get a gap that didn't show up in the last map.
I am running right now a bird from generation 3 going at 26k+ score.
How is it possible to get such good specimens after so little generations? I'm guessing part of it is the simplicity of the model and the environment (the game), but still.
For every incredibly complicated algorithm designed to solve some really huge problem and deployed at massive scales, there are another thousand little problems at a far lower scale. I did a project at a school once using a GA to automatically create class rosters based on gender balance, student social networks, and grade distribution. This used to be a two-week long guidance counselor project at the school. Now they press a button.
Everyone in this forum clearly understands the impact that software is having on the world. But nevertheless, I still think we underestimate the impact that a more computationally literate society with more readily available computing power can have. I think examples like this excellently written piece of code highlight that.
Here is one -- admittedly very bright -- person who can from the comfort of his browser come up with such a fascinating and ingenious solution to this computational problem. Yes, it is being used for flappy bird, but this would easily extend to maybe a whole set of domain of problems that he/his company/his community could be facing. Yes, it is flappy bird, but this alone could serve as the motivation for an incredible course in a high-school dealing with topics from coding to the theory of evolution. "The Evolution of Cooperation" had a similar effect on me -- Axelrod uses nothing but very elementary Algebra to make such a strong and insightful argument that got me thinking like nothing ever had before.
Here's hoping we make this kind of "play", and this sort of thinking, as widespread as possible in society.
It seems only the second time there was machine learning involved.
While visiting a lab in Paris, I saw some researchers using neuroevolution to teach a bird-like robot how to fly.
some links related to neuroevolution:
in some cases you might lack an easy way to calculate a fitness score out of the NN performance, which is needed to run the GA.
i tried training a simple NN with a stupid hill climber some time ago but quickly hit a roadblock even with very few neurons because of local minima ... or maybe bugs.
i guess for more complicated problems the pure GA training method might just not be "cost effective" (i.e. time/quality tradeoff).
Different applications, though. Backtracking in the normal sense needs input and expected output (e.g. lots of training data), while GA/EA learns to solve it without explicit wrong/correct actions, just the score at the end.
Are there other good ways of training a neural network in this kind of scenario?
I've added "vertical velocity of next pipe" and "distance until next pipe" as inputs as well as upped the hidden layer neurons to 4 - anyone have any other ideas to improve its performance?
For programmers there are even more striking examples of AI out there, complete with code, documentation and explanations.
Check this out:
In me this sparked a months long fascination with genetic algorithms solving programming puzzles with a few sets of 'initial memory states' and 'desired memory states'.
The articles start with very simplistic generated programs, and end with a sophisticated model able to generate guess-the-number games using loops, conditionals and handling input from users.
Edit: Seeing the new comments here, it seems like it only starts learning after you refresh once or twice. Very cool!
It's amazing this fits in thos LoC, but i don't see the total picture.. Any books you can recommend?
You use a neuronal network to decide at each step to flap the birds wings or not. That is the only output. The input is only the birds height (y-position) and the height of the next hole.
That net is not trained in a traditional supervised way like with gradient descent (which would be more complicated). Instead it uses a genetic algorithms to find new weights for neuronal networks of future generations.
You can see some of the older NeuroEvolution papers here:
I believe Ken Stanley has gone to become a professor at UCF to continue his research:
Otoro explains neuroevolution and demonstrates a Silme Volleyball opponent evolving
(Naturally I came to the comments because I thought this was BS)
EDIT: actually generation 33 is still happening, even on "5x" mode, 260k points and counting. Looks like it totally got how the game works.
these agents live in a world where they're trying to find and eat food
Will update when/if it dies.
EDIT: How lucky did I just get? Still going at 100k+
Not worked on FF, maybe it has to do with my plugins/privacy settings.
Edit: generation 25 got 10k points and was continuing onward when I stopped. Beautiful.
But here it is: "should it flap the wings or not"
What's the difference? Is this a better way?
Robot wars, all that!!!
Generation 11, score 1M+. http://imgur.com/GjpAFAz
one lone hero