Hacker News new | past | comments | ask | show | jobs | submit login
Preserving Outputs Precisely while Adaptively Rescaling Targets (deepmind.com)
126 points by DanielleMolloy 6 months ago | hide | past | web | favorite | 15 comments

Slightly off-topic, but David Silver did work on this project and discusses it briefly in his really great reinforcement learning course [0]. It's worth checking out if anyone is looking for a well-paced, well-explained introduction to some of the basic reinforcement learning algorithms and theory.

[0] http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

Edit: The slides are fine but the gem is the YouTube lectures playlist!

Since the Atari era, video games changed significantly in as much as they no longer tend to be built around trying to "kill" the human player and end their game. This structure made sense in a video game arcade (you die, you put in another quarter, the more often this happens without you losing interest the more money the game makes) and it got inherited into most early video games _because_ of the arcades, but it's not actually fun.

In most modern games the human will invariably "win" eventually, so the "try to get most points before dying" approach makes less sense in general. But, _Speed running_ has become a very popular activity even in games you may not think of as being about "speed". I will be interested to see multi-purpose AIs take on speed running for two reasons:

1. The evaluation function is easier for the machine AND the human assessing it to understand, especially for "Any%" runs in most games, where the rule is basically you don't care how the game is supposed to be played, you just want to go from the start to the end in the least time using the controls (e.g. Mario Kart speed runs are often mostly about convincing the game that your shorter route counts as a "lap" of the track).

2. TAS (Tool Assisted Speed-running) is already a thing, TAS players don't compete with live human players, they choose and play back a precise sequence of inputs to cause the game to play out in a particular way, so they can do hundreds of frame-precise movements, or even influence hidden internal variables (with the effect that seemingly "random" behaviour from a game often proves to be deterministic). An AI can do everything the TAS player can, moment by moment, but it reacts to changes, which a TAS run cannot, on the other hand the TAS players often have deep insight into the implementation details of the game which would not be available to a general purpose AI player.

That's actually an interesting point - TAS tools should make it possible to make any game into an AI training environment.

The one thing I would say about that is that speedrunning is a really, really difficult thing to do with a gradient descent approach; you can get really optimized with a particular strat, but getting substantially faster often requires drastically different strategies - which can only be discovered with a very broad knowledge of the game's systems, in areas which you will likely never see if your only experience with the game is speedrunning one particular strat exclusively. And exploration of the game's space during a speedrun would kill your time.

> speedrunning is a really, really difficult thing to do with a gradient descent approach

That's kind of what makes it interesting, no? You could argue that speedrunning requires not just observing a change in score from a change in inputs, but understanding, symbolically manipulating, and exploiting game mechanics.

A lot of human speed-runners exploit glitches to achieve record times and even more so with TAS runs, where you can exploit glitches that require superhuman input sequences. It would be very interesting to see an AI discover and take advantage of such glitches in video games to improve its speedruns.

Actually, just the idea of having an AI discovering glitches at all is an impressive use of AI.

Next minute, American Fuzzy Lopp takes first place in a Super Mario 64 speed running competition...

I like the simple idea behind arcade games. Last week I played a free flipper game on a ps4. I enjoyed it. On the other side, modern games feel like hybrid bastardization of movies with near zero gameplay improvement over games 10 year ago.

This makes me wonder if games that "require" the player to read the manual first can be played by a similar agent with/without reading the manual.

Or perhaps even more difficult are the games that even humans struggle to play through to the end due to "cryptic gameplay elements" (of which there are a lot of examples in Atari games).

I seriously doubt this AI can play games like Adventure, Raiders of the Lost Ark or the Swordquest series, where there is no score tally and a complex sequence of actions is needed to win the game.

Like a human player the AI needs to have some idea whether it's succeeding. Actually, an AI needs this slightly less than humans because it's so purposeful anyway - it's not as though it can get bored and go watch TV. But it needn't be displayed as a "score" number, there's no score in Chess for example, and that didn't stop Deep Mind's chess engine.

In Advent for example there are clear sub-goals and an AI can be programmed to try to achieve these, you actually get points in Advent (all versions I think?) for partial completion, so it's just a matter of exposing that to the evaluation function, but other evaluation functions might work as well, or even better.

Isn't this the general AI that plays Atari games by watching? It would have to guess the rules of Adventure.

If it doesn't evolve into a corner, it probably could master those games given enough time, but how much is enough time?

The standard "hard game" of that category is Montezuma's Revenge - which has no score, is easy to die right on the first screen, and requires a fairly complex series of actions to make any clearly visible progress (moving to the second screen) at all.

Deep Q Networks (the Atari AI you're thinking of) and really all reinforcement learning systems generally fail to make it past even the first screen.

The POPART paper actually shows that Montezuma's Revenge is one of the games in the standard set of Atari games they tested, and yup, their AI was no good at it.

That game (and others like it) would probably be much easier if instead of optimising for score they optimised for "game states (pixel arrangements) that the AI hasn't seen before". This metric should correlate well with moving the player character around each screen, and getting to new screens, as well as picking up objects.

I believe some researchers have tried this approach and had some success, but I'd like to see how far DeepMind (for example) could take it.


If you continue to post uncivil and unsubstantive comments to HN, we are going to ban you. We've asked you several times already.


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact