
Hacking Flappy Bird with Machine Learning - sarvagyavaish
http://sarvagyavaish.github.io/FlappyBirdRL/
======
habosa
I really want to build a Lego Mindstorms robot (or similar) that watches my
phone with a webcam, holds a stylus, and plays Flappy Bird. I have played
enough (high score 194) that I am pretty sure I know the right strategy, I
just want to take human error out of the equation.

Are there open source vision libraries that are low-latency enough to play
this in real time? Assuming the robot could keep up with some regular servos.
Would be a very interesting project.

Edit: I'm thinking I could run an Android emulator, and then use MonkeyRunner
with Python OpenCV processing my monitor to do the work. Anyone who has any
relevant experience please reply, I'd love to hear if this is feasible.

~~~
logicallee
holy shit you sound like a crack addict

(only with flappy bird) - they say it's addictive but you're taking this to
another level. "Man I need to do a Lego Mindstorms robot with a webcam and
stylus and OpenCV - I've gotten to 194 but I _NEED_ more. I need to take the
human element out of this equation...."

~~~
leoc
A worrying amount of serious CS effort was put into the problem of Sudoku
solving after it became a big hit.

~~~
aaronsnoswell
Logical Conclusion: All computer scientists are crack addicts.

------
dwetterau
Neat AI approach to play it. When this first blew up 5 or so days ago I threw
together a simple javascript bookmarklet that played the game at:
[http://www.mrspeaker.net/dev/game/flappy/](http://www.mrspeaker.net/dev/game/flappy/)

It's really poorly formatted and has unused code in it but it was fun to
automate and watch.
[http://pastebin.com/yTmdWgfC](http://pastebin.com/yTmdWgfC)

To use it just install the bookmarklet, click on it and then start the game.
Press space to make it die.

------
brandonb
Very cool! It's remarkable that you can get a good score with effectively only
two features: vertical distance from the lower pipe and horizontal distance
from the next pair of pipes.

I suspect you could cut the training time by an order of magnitude with a
different optimization algorithm, or just by varying alpha. For example, have
you considered trying something like a line search?
[http://en.wikipedia.org/wiki/Line_search](http://en.wikipedia.org/wiki/Line_search)

~~~
tlarkworthy
reenforcement learning is not an optimization algorithm, and his times are
consistent with an off the shelf Q-learning approach.

reenforcement learning is trying to find the optimal policy of action from a
given position in the discrete state space.

The policy is roughly a map between state -> action. But it understands the
temporal nature of the world.

It will start knowing nothing, then discover hitting a pipe is really bad,
therefore states close to nearly hitting a pipe are almost as bad. Over tons
of iterations the expected negative value of hitting a pipe bleeds across the
expected value of the state space.

With regression and optimization, its never clear what you are optimizing
against. Obviously hitting a pipe is bad. But what about the states near a
pipe? Are they good or bad? There is no natural metric in the problem that
tells you the distance from the pipe or what the correction action should be.

So that's the major different between reenforcment and supervised learning.

~~~
avaku
You could set up a cost function to account for closeness to being hit (a.k.a.
decision theory approach)

------
potash
How hard would it be to solve this deterministically?

Can someone comment on the game's physics? I am assuming:

    
    
      - constant horizontal velocity
      - gravity
      - flapping implemented via impulse
      - collisions are handled with bounding boxes
    

Maybe someone who knows something about optimal control (ODE) say whether this
is analytically solvable? Of course there's still the practical stuff
(numerical integration, I/O lag) to deal with but I'm optimistic.

~~~
kenferry
If you play the game, it's actually quite trivial to program
deterministically. :-) Flapping is not implemented as an impulse. It's more
like jumping in Mario - it instantaneously fires a canned "flap" movement that
is always the same.

Set flap_height = bottom of next pipe + constant.

If bird height < flap_height, flap.

Done!

~~~
userbinator
It also looks like you'd want to avoid flapping into the top of the pipe.

~~~
kenferry
Perhaps I should have said "threshold_height".

    
    
        threshold_height = bottom_of_pipe_height + 10 pixels; // or something
    
        if (current_height < threshold_height)
          flap
        else
          don't flap
    

If you never flap when above the threshold, you will not hit the top.

------
Blahah
By far the most interesting 'flappy bird' post so far.

------
primaryobjects
Neat. You could also probably do this with a genetic algorithm. Thinking out
loud here:

\- Have a neural network with 6 inputs for:

    
    
       - player x
       - player y
       - acceleration speed
       - acceleration direction
       - next pipe mid-point x
       - next pipe mid-point y

\- Two outputs of 0 or 1 for click or no-click

The fitness score would be how close the player is to the pipe mid-point.
Hopefully, this would cause the bird to stay as close as possible to fly
between the pipes. The genetic algorithm would select the best neural network
that knows when to flap based on the current input state.

~~~
radarsat1
I'm curious, I've thought about using genetic algorithms for training neural
nets before, but haven't seen it mentioned much. I imagine PSO could work as
well. What is the general concensus on using algorithms other than back
propagation for neural net learning? When is it appropriate / not appropriate?

~~~
primaryobjects
Some people frown upon training neural networks with genetic algorithms, just
because gen algs are so random and hard to dissect. I think it's silly to
discard anything that is a potential solution to a problem.

I go by a rule of thumb like this:

\- If you are able to collect lots of training data (inputs and valid outputs)
then use back-propagation. It's faster and you might get better results.

\- If you don't know the outputs for inputs or if there are simply too many
possible combinations (such as in the case of a game), then use a genetic
algorithm. It's effectively a search engine that finds the best solution
within the problem space (the solution being the optimal weights for the
neural network).

Using Neural Networks and Genetic Algorithms
[http://primaryobjects.com/CMS/Article105.aspx](http://primaryobjects.com/CMS/Article105.aspx)

------
fogleman
I'd like to see a two-dimensional chart that plots Click vs Do Nothing for the
two input parameters. (Vertical distance from lower pipe and Horizontal
distance from next pair of pipes)

------
pilooch
Reinforcement learning (RL) is the future of real-world AI. Bear my words ;)
RL has been around for a long time, it is the mix of optimal (in the
optimization sense) decision making along with machine learning (ML). It does
benefit of most recent advances in ML. As such it is likely to power the next
batch of 'intelligent' applications and services.

------
fogleman
The GIF shown on this page makes it look like the input space could simply be
pipe height and bird height.

[https://github.com/zachhuff386/flapbot](https://github.com/zachhuff386/flapbot)

------
mrtksn
I was waiting for something like this, but I expected somebody to build a
flappy bird playing robot :)

Do you know some good literature for machine learning 101? Where to start?

~~~
MrBuddyCasino
So was I. I remember a posting about deep learning on HN, they managed to let
the algorithm play a NES game purely by analyzing the video. So who is up for
the challenge?

~~~
Houshalter
I would try but how can you get input and output from the flappy bird game?

------
ronaldx
The video seems jerky in a way which suggests the engine is taking longer to
make decisions at certain points in the cycle.

Is that plausible? Or just my imagination?

------
bjd
How funny is it that I find this post by searching for monkeyrunner and
opencv? When I was looking into this for the same silly reason.

------
shurcooL
This reminds me (vaguely) of the AI I made for a worm-in-tunnel game long ago.

[https://dl.dropboxusercontent.com/u/8554242/available-
for-2-...](https://dl.dropboxusercontent.com/u/8554242/available-
for-2-weeks/wormFX_2003.swf)

(It's there in the background of the main menu.)

------
bamos
Cool post! Minor typo: Monekyrunner -> monkeyrunner

------
judk
If you learn that all actions `a` from state `s_i` have very low reward, does
that propagatet backward to `s_j,a` that feed into `s_i`?

------
FlyingLawnmower
Beating the unbeatable with science. I like it.

------
nathan_f77
Had the same idea when I first played the game, but didn't do anything about
it. Well done for doing it!

------
vinchuco
good job! now you have to do it with [http://corpsmoderne.itch.io/flappy-
space-program](http://corpsmoderne.itch.io/flappy-space-program) and deal with
the complications

------
kookiekrak
can you build one for this? [http://kookiekrak.itch.io/flappy-
pipes](http://kookiekrak.itch.io/flappy-pipes)

~~~
isaacb
</shameless-plug>

------
geek90
Awesome work!

------
kevonc
impressive work man!

------
dpanah
Tre sweet

