
OpenAI Retro Contest - gdb
https://contest.openai.com/
======
mustdeparthasty
Two months to advance the state of the art on a complex physics-based game
with branching paths. Current approaches such as DQNs or god forbid DRL[1]
barely reach the performance of my three year old cousin in atari game score
maximization and are mostly non-transferable to new levels... Good luck.

[1] [https://www.alexirpan.com/2018/02/14/rl-
hard.html](https://www.alexirpan.com/2018/02/14/rl-hard.html)

~~~
hwoolery
Of course it's a difficult problem, but why try and dissuade people from
attempting it? Nothing novel ever came from somebody who didn't fail first : )

~~~
solarkraft
I don't consider it dissuasion, but a fair warning & proposition preparation.
Seriously: Good luck!

------
aquova
I'm a big retro Sega fan, and I've always wanted to look into doing something
like this, but this seems... really difficult. Would the best approach be to
jump right in and hope for the best, or are there any sources I should look
into?

~~~
make3
If you want to do ok and understand what's up, watch the awesome free lectures
by Alphabet's DeepMind here (the people at the origin of AlphaZero):
[https://www.youtube.com/playlist?list=PLweqsIcZJac7PfiyYMvYi...](https://www.youtube.com/playlist?list=PLweqsIcZJac7PfiyYMvYiHfOFPg9Um82B)

------
minimaxir
The demo GIFs show Sonic 1 and 3, but all 3 Genesis games have slightly
different physics and mechanics which could trip up an AI if not trained on
all 3 games. Is the challenge just using the Sonic 1 engine?

~~~
nulltype
The challenge is using all 3 Genesis games.

~~~
minimaxir
Yes, just found it in the details:
[https://contest.openai.com/details](https://contest.openai.com/details)

From the downloaded CSV of game states, _all stages_ from the 3 games are
valid training stages, and the custom stages used for testing are derived from
all 3 games. Yikes.

I really hope those custom stages use the same art assets as the original
games.

~~~
oh_sigh
Every test run could theoretically start with a jump or some other behavior to
exercise the different physics or whatever among the 3 games, and use a
discriminator network to feed into one of three networks individually trained
on each game?

~~~
tedivm
The whole point of the OpenAI exercise is to improve transfer learning- a
winning project should be able to work off of a single model.

For an example of a system that does learn new rules using a single model
check out this post from Vicarious[0].

[0] [https://www.vicarious.com/2017/08/07/general-game-playing-
wi...](https://www.vicarious.com/2017/08/07/general-game-playing-with-schema-
networks/)

------
chpmrc
I'm wondering if there's a MOOC that takes you from zero to being able to
build a system that learns how to play such a game, maybe focusing a bit less
on the math (especially the proofs). I took Andrew Ng's course on Coursera but
I feel like the gap between what I know and what's needed for this contest is
huge. Am I wrong?

~~~
falkenb0t
No, you're not wrong. That said, I don't think this is a competition aimed at
people with a beginner level knowledge of machine/deep learning. I don't
believe there is currently any MOOC that is going to hold your hand from 0 to
expert. Given the current state of the field, I believe the only surefire path
towards this level of knowledge is either university education or a /lot/ of
reading.

------
bagrow
Seems like a stretch to call this “transfer learning”. Maybe training on Sonic
and testing on Mario.

Would be cool to see some kind of adversarial competition. You train to, say,
beat a game level but you test to beat someone else’s submission. (Short on
the specifics, I know.)

~~~
vhold
Transfer learning is what makes this contest currently unique. Usually in
these machine learning contests, you hand off your model to the contest
runner, and they just run it and evaluate its performance. It has to be
already good at the task.

In this contest, you're not just submitting a trained model, you're submitted
a docker environment capable of performing training, which they will run with
their secret levels, with specified constraints. So you want to make a highly
trained model whose capabilities can be transferred to secret levels.

------
taeric
Cynical prediction one: Nothing learned here will be readily transferrable to
another domain. :(

------
rmellow
I expect training & testing to make use of emulators & ROMs - wouldn't Sega
potentially have a problem with this or is it considered fair use?

~~~
minimaxir
The loophole suggested in the instructions is to use the ROMs provided by
legally-purchased copies of the game from Steam.

------
mlboss
Can we move away from video games ? I know games provide a closed loop for
these kind of experiments/contests, but I want to see more practical
application of RL. How about code generation that create programs based on the
test cases or RL agent that can use a 3d design software.

~~~
Asdfbla
The problem is that reinforcement learning is far from solved and doesn't work
all that well yet, so these toy problems are probably what researchers will
stick with for some time to come.

------
hanoz
Couldn't a new level contain some sort of new scenario that would be
completely impossible to navigate based on previous experience without some
_very_ general AI?

------
mrfusion
What’s the input to the agent? Just all the pixels on the screen?

~~~
joshvm
> Each timestep advances the game by 4 frames, and each observation is the
> pixels on the screen for the current frame, a shape [224, 320, 3] array of
> uint8 values. Each action is which buttons to hold down until the next frame
> (a shape [12] array of bool values, one for each button on the Genesis
> controller where invalid button combinations (Up+Down or accessing the start
> menu) are ignored ... During training you can access a few variables from
> the memory of the game through the info dictionary. During testing, these
> variables are not available.

So yep, the RGB image.

------
dnautics
Is this a fair representation of a human playing a game? Usually as a human
player I don't go into a level expecting to clear it never having seen it
before.

~~~
shmageggy
It's fair because you get better at gaming in general when you play games. For
instance, your performance on Mario will be better after playing a lot of
Sonic.

This is in contrast to current models which generally need to be trained up
from scratch on each new game, even if the games are mechanically similar.

~~~
dnautics
I think maybe a reasonable choice would be to give the ai the ability to store
some limited amount of information, let's say 4kb, and it gets to populate
this on a first run and draw from it on a second run. Second run scores are
compared.

------
Mizza
I couldn't find - high score or fastest time? Totally different skills.

~~~
gpm
Best score

Check out
[https://contest.openai.com/details](https://contest.openai.com/details)

I'm not sure how different those skills are from an AI development point of
view though, why do you think they are?

~~~
ikeboy
If I'm reading that right they're defining score in terms of time so it's
actually the same?

~~~
gpm
Ugh, you're mostly right, I skimmed too quickly.

> The reward your agent receives is proportional to its progress to the
> predefined horizontal offset within each level, positive for getting closer,
> negative for getting further away. If you reach the offset, the sum of your
> rewards will be 9000. In addition there is a time bonus that starts at 1000
> and decreases learning to 0 at the end of the time limit, so beating the
> level as quickly as possible is rewarded.

So mostly time - but past a certain threshold just completion. And if you
don't finish "portion completed".

~~~
ikeboy
Well if you get to the time limit then it ends wherever you are, so this is
literally just how long it took to complete, or how far you got if you did not
complete. Completing gives you 9000 points and the max time bonus is 1000 so
it heavily incentivises winning consistently versus higher variance strategies
that may be quicker.

------
ikeboy
Do you win anything?

~~~
tejasmanohar
Second to last paragraph:

    
    
      The contest will run from April 5 to June 5 (2 months)
      and *winners will receive some pretty cool trophies.*

------
Ancalagon
Was this just released today?

~~~
gdb
Yep!

