
Preserving Outputs Precisely while Adaptively Rescaling Targets - DanielleMolloy
https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/
======
pmulv
Slightly off-topic, but David Silver did work on this project and discusses it
briefly in his really great reinforcement learning course [0]. It's worth
checking out if anyone is looking for a well-paced, well-explained
introduction to some of the basic reinforcement learning algorithms and
theory.

[0]
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)

Edit: The slides are fine but the gem is the YouTube lectures playlist!

------
tialaramex
Since the Atari era, video games changed significantly in as much as they no
longer tend to be built around trying to "kill" the human player and end their
game. This structure made sense in a video game arcade (you die, you put in
another quarter, the more often this happens without you losing interest the
more money the game makes) and it got inherited into most early video games
_because_ of the arcades, but it's not actually fun.

In most modern games the human will invariably "win" eventually, so the "try
to get most points before dying" approach makes less sense in general. But,
_Speed running_ has become a very popular activity even in games you may not
think of as being about "speed". I will be interested to see multi-purpose AIs
take on speed running for two reasons:

1\. The evaluation function is easier for the machine AND the human assessing
it to understand, especially for "Any%" runs in most games, where the rule is
basically you don't care how the game is supposed to be played, you just want
to go from the start to the end in the least time using the controls (e.g.
Mario Kart speed runs are often mostly about convincing the game that your
shorter route counts as a "lap" of the track).

2\. TAS (Tool Assisted Speed-running) is already a thing, TAS players don't
compete with live human players, they choose and play back a precise sequence
of inputs to cause the game to play out in a particular way, so they can do
hundreds of frame-precise movements, or even influence hidden internal
variables (with the effect that seemingly "random" behaviour from a game often
proves to be deterministic). An AI can do everything the TAS player can,
moment by moment, but it reacts to changes, which a TAS run cannot, on the
other hand the TAS players often have deep insight into the implementation
details of the game which would not be available to a general purpose AI
player.

~~~
MrEldritch
That's actually an interesting point - TAS tools should make it possible to
make _any_ game into an AI training environment.

The one thing I would say about that is that speedrunning is a really, really
difficult thing to do with a gradient descent approach; you can get really
optimized with a particular strat, but getting substantially faster often
requires _drastically_ different strategies - which can only be discovered
with a very broad knowledge of the game's systems, in areas which you will
likely never see if your only experience with the game is speedrunning one
particular strat exclusively. And exploration of the game's space _during_ a
speedrun would kill your time.

~~~
taneq
> speedrunning is a really, really difficult thing to do with a gradient
> descent approach

That's kind of what makes it interesting, no? You could argue that
speedrunning requires not just observing a change in score from a change in
inputs, but understanding, symbolically manipulating, and exploiting game
mechanics.

------
arayh
This makes me wonder if games that "require" the player to read the manual
first can be played by a similar agent with/without reading the manual.

Or perhaps even more difficult are the games that even humans struggle to play
through to the end due to "cryptic gameplay elements" (of which there are a
lot of examples in Atari games).

~~~
ASalazarMX
I seriously doubt this AI can play games like Adventure, Raiders of the Lost
Ark or the Swordquest series, where there is no score tally and a complex
sequence of actions is needed to win the game.

~~~
tialaramex
Like a human player the AI needs to have some idea whether it's succeeding.
Actually, an AI needs this slightly less than humans because it's so
purposeful anyway - it's not as though it can get bored and go watch TV. But
it needn't be displayed as a "score" number, there's no score in Chess for
example, and that didn't stop Deep Mind's chess engine.

In Advent for example there are clear sub-goals and an AI can be programmed to
try to achieve these, you actually get points in Advent (all versions I
think?) for partial completion, so it's just a matter of exposing that to the
evaluation function, but other evaluation functions might work as well, or
even better.

~~~
ASalazarMX
Isn't this the general AI that plays Atari games by watching? It would have to
guess the rules of Adventure.

If it doesn't evolve into a corner, it probably could master those games given
enough time, but how much is enough time?

~~~
MrEldritch
The standard "hard game" of that category is Montezuma's Revenge - which has
no score, is easy to die right on the first screen, and requires a fairly
complex series of actions to make any clearly visible progress (moving to the
second screen) at all.

Deep Q Networks (the Atari AI you're thinking of) and really all reinforcement
learning systems generally fail to make it past even the first screen.

~~~
tialaramex
The POPART paper actually shows that Montezuma's Revenge is one of the games
in the standard set of Atari games they tested, and yup, their AI was no good
at it.

~~~
dane-pgp
That game (and others like it) would probably be much easier if instead of
optimising for score they optimised for "game states (pixel arrangements) that
the AI hasn't seen before". This metric should correlate well with moving the
player character around each screen, and getting to new screens, as well as
picking up objects.

I believe some researchers have tried this approach and had some success, but
I'd like to see how far DeepMind (for example) could take it.

