
FlappyBird hack using Deep Q-Learning - DaGardner
https://github.com/yenchenlin1994/DeepLearningFlappyBird
======
zodPod
Finally we're trying to teach algorithms to feel anger! I love it!

EDIT: Just to be clear, this is a joke based on the game being flappy bird.

Seriously, though, this is awesome. I love this kind of stuff!

------
SoonDead
First AlphaGo and now this. Our world has truly come to an end!

------
ThisIBereave
Interesting implementation detail

while "pigs" != "fly":

~~~
arfar
My favourite when writing C is:

    
    
        #define ever (;;)
        ...
        for ever {
            ...
        }

~~~
yenchenlin1994
Hello, I'm the author, this is nice! Please send a PR if there's a Python
version.

------
kotach
LSTM would converge even faster.

A K-level breadth first search mimicking the optimal policy and a simple
learning to search algorithm with a cost sensitive binary linear classifier
would work well too.

After training it would be a constant time evaluation of what to do next.

~~~
yenchenlin1994
Hello I'm the author. Cool idea, would you please provide some paper so that I
can absorb?

~~~
kotach
Checkout Dagger [2], SEARN [3] and LOLS [1] (LOLS is available in vowpal
wabbit search capabilities). A lot of interesting stuff on mimicking optimal
policies, local optimality, joint learning and similar stuff :D

The whole point of playing is doing your decisions jointly, dependent on the
previous decisions. If you learn your model that way it'll make its decisions
trying to minimize future regret.

Local optimality is a very nice property. It means that if you play out a
game, not a single change of any of the previous moves could lead you to a
better result. Of course, local optimality is hard but for some problems it's
pretty easy to achieve if your optimal policy is good, and your features are
adequate (which they will be if you use neural networks).

Of course, flappy bird is pretty local game and all of this might be an
overkill :D

AlphaGo wasn't trained jointly over Go games, so it's lacking in that regard.
But the power of neural networks is compensating. Who can imagine what AlphaGo
would be like if they trained their policy networks jointly? :D

A nice introduction to LSTMs:
[http://colah.github.io/posts/2015-08-Understanding-
LSTMs/](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

[1]:
[http://arxiv.org/pdf/1502.02206.pdf](http://arxiv.org/pdf/1502.02206.pdf)

[2]: [http://arxiv.org/pdf/1011.0686.pdf](http://arxiv.org/pdf/1011.0686.pdf)

[3]: [http://searn.hal3.name](http://searn.hal3.name)

~~~
yenchenlin1994
Thanks a lot. I'll read these exciting material.

------
filthydumbidiot
Interesting. Here's a similar project from a couple years ago:

[http://sarvagyavaish.github.io/FlappyBirdRL/](http://sarvagyavaish.github.io/FlappyBirdRL/)

------
vessenes
I'd like to see this hooked up to a physical phone and actuator. Anybody here
seen anything done using a realtime physical loop in the learning process?

------
hcrisp
I was sort of hoping that the bird would hit a pipe at the end of the 6-minute
video.

~~~
yenchenlin1994
Hello, I'm the author, actually it did hit a pipe at the end of the 6-minute
video. I accidentally trim that part, sorry ...

