
AI Uses Less Than Two Minutes of Videogame Footage to Recreate Game Engine - ehllo
http://gvu.gatech.edu/index.php?q=ai-uses-less-two-minutes-videogame-footage-recreate-game-engine
======
dvt
This is so misleading, it's not even funny. Predicting _animation frames_ is
something completely different than recreating a game _engine_. It's the
difference between re-creating a movie frame-by-frame and building a movie
_camera_. Obviously this is impressive in its own right, but we're a a long
ways away from AI building general-purpose systems.

~~~
yters
The constant overselling of AI is a negative indicator.

~~~
jdietrich
I'd suggest that you review how computing was reported on in decades past. We
made a lot of predictions that seemed utterly fantastical for some decades,
but eventually came good when Moore's Law caught up. Many of these predictions
drastically underestimated the impact of computing, particularly with regards
to networking and mobile.

I sincerely believe that the move towards deep learning is every bit as
radical as the development of the microprocessor. We're starting to find
incredibly elegant solutions to problems that have stymied computer scientists
for decades. We're finding orders-of-magnitude improvements to difficult
problems at an astonishing rate, often with remarkably modest resources.

There's clearly a huge amount of hype happening at the moment, but genuine
technological revolutions are usually preceded by a ludicrous hype bubble.

[https://www.youtube.com/watch?v=HW5Fvk8FNOQ](https://www.youtube.com/watch?v=HW5Fvk8FNOQ)

~~~
YeGoblynQueenne
>> We're starting to find incredibly elegant solutions to problems that have
stymied computer scientists for decades.

I think by "incredibly elegant solutions" you mean the typical black-box
statistical machine learning algorithm that takes in some data as input and
outputs an approximated function that "solves" whatever problem your data was
representing.

If that is indeed what you mean (apologies otherwise) I really struggle to see
that as "elegant" -or even a solution. We start with some problem too complex
to understand and build a complex system we don't quite understand, that
solves the problem in a way we don't quite understand. What did we gain? A
system that solves the problem - maybe, when it feels like it, depending on
the problem, provided enough data etc etc. Our understanding of the problem
hasn't changed and therefore our ability to solve it, hasn't.

To give an analogy- it's like a kid at school who can't solve some arithmetic
problem they have for homework asking their big sister to solve it for them.
The kid now knows the answer to the problem but still doesn't know how to
solve it, on their own. They didn't "find a solution"\- they found someone who
knows the solution.

I think, if you look at the progress we've made as a civilisation in producing
knowledge, you'll find that this was always driven by people who solved
problems, themselves. And if you think of the kind of people who invoke some
higher authority that has all the solutions, you're probably looking at
religion.

~~~
thomashop
you're completely overlooking the fact that deep neural networks can
generalize to unseen data. the kid analogy doesn't work at all.

~~~
kthejoker2
Someone didn't watch Jimmy find the warp whistle in The Wizard ...

In all seriousness, rapid adaptation to New environments (including video
games and math homework) is within small children's neutral capabilities. But
they can't discover properties they aren't designed to discover.

They're left with scientific method for everything else. Which is just
generalizing to unseen data.

------
LeifCarrotson
Reminds me of the moral of "that alien message":

> Riemann invented his geometries before Einstein had a use for them; the
> physics of our universe is not that complicated in an absolute sense. A
> Bayesian superintelligence, hooked up to a webcam, would invent General
> Relativity as a hypothesis—perhaps not the dominant hypothesis, compared to
> Newtonian mechanics, but still a hypothesis under direct consideration—by
> the time it had seen the third frame of a falling apple. It might guess it
> from the first frame, if it saw the statics of a bent blade of grass.

[http://lesswrong.com/lw/qk/that_alien_message/](http://lesswrong.com/lw/qk/that_alien_message/)

~~~
gear54rus
So we are looking at it from the perspective of an ai. wow. that was really
cool.

I did not understand the part about the internet though, what does this mean?

> oh-so-carefully persuade them to give us Internet access, followed by five
> minutes to innocently discover their network protocols, then some trivial
> cracking whose only difficulty was an innocent-looking disguise.

and what about AIs melting? why is that? I feel dumb but I enjoyed it:)

~~~
AgentME
Humanity is running in a simulation in the alien's world. Humanity convinces
the aliens to give humanity access to the alien's world's internet, so that
humanity can learn about the alien's world and to be able to talk to other
(trickable) aliens. "Five minutes" refers to five minutes in the alien's
world, which is thousands of years to humanity.

>and what about AIs melting? why is that?

The author is just lampshading the fact that he doesn't want the story to
involve any (non-human) AI.

The whole story is making an example about the capabilities of AI, so if
humanity within the story themselves developed AI, then it would make the
metaphor unnecessarily recursive.

------
YeGoblynQueenne
The definition of "game engine" in the paper is subtly different to the one
most players and game engine programmers would expect. The paper calls the set
of game mechanics a "game engine". I think the players and programmers would
probably call that "mechanics" and refer to the actual code running the game
as an "engine".

For example, if I think of an AI that "learns a game engine from video" I
would expect to see an AI that can take as input video of, say, myself playing
XCOM2 and output the Unreal engine (or an engine with an interface
indistinguishable from it).

In other words, I'd expect to see an AI that goes from example game output to
an automaton that can reproduce that output _and any other output that the
original automaton could produce_.

Instead, from a quick read of the paper, I understand they learn to reproduce
a set of video frames that match the input- and then train a different AI to
actually simulate the physics (i.e. make sure the player doesn't fall through
walls etc).

While this is remarkable and I'm very glad to see a rule-based, greedy
approach that seems to work pretty damn well, I really don't think it does
what it says on the tin.

Still- this was published in IJCAI. So kudos to the authors for that.

------
npad
There’s a good summary by Two Minute Papers here:
[https://youtu.be/2VyhmbEjs9A](https://youtu.be/2VyhmbEjs9A)

------
jupiter90000
Does this really learn a game engine that a human user could play the game
with? I can't tell from the article or the paper. It sounds like it learns
rules from how the objects are seen to move and tries to replay that from the
learned rules. I'm not sure what the point of that would be, when the video it
learned from already replays without error. If someone has more insight let me
know what I'm missing.

~~~
yorwba
The rules it learns take player input into account, so yes, it should be
possible to play the replica just like the original game (modulo glitches).

~~~
YeGoblynQueenne
The video frames it learns from have no information about player input ("press
X to jump"). So I don't see how it can learn a function from player inputs to
game engine states- without which there's no way to actually play the game,
let alone in a different way than the original.

~~~
yorwba
Maybe I'm misunderstanding the paragraph in the paper that begins with _The
final type of neighbor that the system handles, changes a rule from being
handled normally to being considered a “control” rule. This handles the case
of rules that the player makes decisions upon._

After rereading it I'm no longer sure of my previous interpretation. Maybe
they are generating these rules conditional on the player input without
telling the training process what the input was.

But they evaluate the realism of the learned engine against the original by
training a reinforcement-learning agent to play the game, which means that
they somehow connected those conditional rules to their corresponding input
values.

~~~
YeGoblynQueenne
The way it's reported is confusing but after a second, more careful read-
through, I still can't see that the "engine" they learn is anything like an
ordinary game engine that can be compared by a human player's inputs.

Specifically, they don't mention anywhere that their engine was played by a
human- to prove its playability. Maybe that's an omission or they didn't
consider it important, but it makes it very hard to figure out if this is at
all possible.

Finally, there's this bit on the 4th page (next to the algorithm box) where
they say:

    
    
      "At this point we can guarantee that we have an engine that can 
      predict the entire sequence of frames from the initial frame. 
      Notably this means that the engine can only reflect changes that
      actually occur in the input sequence of parsed frames".
    

I take this to mean that, given a specific starting frame, their engine wil
always predict the same sequence- which must mean its prediction does not take
into account player input.

But it's true that this point is very hard to figure out.

>> Maybe they are generating these rules conditional on the player input
without telling the training process what the input was.

Basically, yeah- that's my understanding.

------
ehllo
Link to the Paper:

[https://www.cc.gatech.edu/~riedl/pubs/ijcai17.pdf](https://www.cc.gatech.edu/~riedl/pubs/ijcai17.pdf)

------
projectramo
Cool. Now they need to create a tool so that a person can animate a few frames
deterministically to imply the physics they need.

Then the AI extracts the engine, and one can use that to write the game.

I am curious, though, as to what kind of concepts the game discovered or had
to be taught. Was it taught platform concepts (platforms, killer objects etc)
and then simply had to map to what is what on screen?

Or can it really learn almost any videogame?

~~~
yorwba
As is usual in such cases, it pays to look at the paper for all the details
that the hype article leaves out.

They use prior knowledge about the sprites that can appear in the game to
extract them from the video frames. The extraction result is then represented
as a list of facts (sprites present, their absolute and relative locations,
velocities and the camera scroll position). Learned rules are applied to these
facts to derive a representation of the next frame, which is then displayed.

That means that platforms and killer objects are discovered concepts, but the
features that allow their discovery (relative positions) are already
hardcoded. Since most games have collision-based interactions, that is likely
not much of a limitation.

The greater problem will be with hidden information. If e.g. a bomb stays
black for 5 seconds after being placed before it starts blinking red and then
exploding, their method would be unable to learn when the bomb should start to
blink, because their representation of game state has no memory. Same with
objects dropped off-screen, inventory, quest status or any other information
that is relevant to the game, but not always displayed on the screen.

------
yters
If frame prediction counts as a game engine, then I can write a game engine by
recording myself play call of duty.

------
davidmanescu
In the article they've trained the AI to predict what frames the game would
provide given a certain input, but to demonstrate its effectiveness they show
the original game given a certain input, side-by-side with the AI's prediction
given the same input. I sure hope that wasn't part of the training set.
Teaching an AI to recognise something it's already been taught and mirror it
relatively closely isn't an achievement...

------
yters
It'd be interesting if they released the replicated levels for humans to play,
as well as the level source code for examination.

------
jcranberry
I wonder how this works for 3D game engines? It would be positively insane if
AI could automate one of the most complex and high skill requirement positions
in the video game industry.

~~~
yorwba
What would that position be? You realize that this is about copying the
mechanics of an existing game, right?

You could maybe adapt it to learn from hand-animated examples, which would
certainly make some games easier to create. But I doubt that the effort
required to specify all interactions in a 3D game by hand would be much less
than just coding them.

~~~
R_haterade
What if you trained it on footage from the real world? I didn't see anything
in the article that stated the AI needed any sort of feedback from interactive
controls...

~~~
yorwba
It doesn't need _feedback_ from controls, but they have to be present as
inputs during the learning process, otherwise you can't hook them up
correctly. Learning real-world physics from video would be impressive in its
own right, but alone it's not enough to create a game. It's also not
necessary, since we can already simulate most physical phenomena; and much
more efficiently than what a learning process is likely to produce at first.

------
dandermotj
Clickbait, but great results all the same

------
infinity0
The article doesn't explain why it's not overfitting.

Overfitting isn't that impressive.

------
psyc
I recall Eliezer Yudkowsky once writing that a superintelligence could deduce
special relativity from 2 seconds of footage of an apple falling to the
ground.

------
GrumpyNl
There is money in AI so everybody claims to be in AI right now. Most claims i
have seen so fare are smart list pickers.

------
juskrey
For me, gaming was all about secrets. Obviously pattern detector can't
replicate that..

------
ketsa
very misleading title...

------
herbstephens
wow, impressive

------
ryacko
Soon even artists will be unemployed.

~~~
wpietri
Just the opposite, I think. The promise of computers has always been to take
over boring work so that humans can focus on the more interesting bits.
Artists will always have something to do.

~~~
jdietrich
Entirely plausible hypothetical: what happens if Deepmind train a massive
neural network on the entire Spotify catalog, weighted for Billboard chart
position and Grammy nominations? What if that algorithm turns out to be as
good at songwriting as AlphaGo is at the game of Go? Will anyone listen to
human-written songs if AlphaSong starts producing superhumanly beautiful
music? Will any of us listen to the same music if AlphaSong can analyse our
personal playlists and produce an infinite number of songs that perfectly
match our musical tastes?

~~~
YeGoblynQueenne
In practical terms, the problem with learning songwriting, vs learning Go, is
that the number of board positions in Go is finite whereas the number of songs
that can be written is infinite.

With Go, then, the task of an AI player (like AlphaGo) is to search a finite
space for those board states that allow it to win.

With songwriting- the task is not even clearly defined. Obviously, you're
trying to generate songs people will want to listen to, so you have to search
an infinite space for a set of songs that satisfy some criteria of
"listenability", but what criteria are these? Popular songs range over wildly
different music styles, from Black Metal to Medieval folk revivalist music
through RnB and Arabic belly dancing music. What examples will we train our AI
songwriter on? All music, ever? Specific kinds of music? Specific songs as
exemplars of specific styles?

Or in other words, what exactly will our AI songwriter be trying to learn?
It'd be looking into an infinite haystack filled with almost identical needles
for a particular needle it's never even seen well enough to recognise.

~~~
jdietrich
>In practical terms, the problem with learning songwriting, vs learning Go, is
that the number of board positions in Go is finite whereas the number of songs
that can be written is infinite.

There are about 10^170 legal positions in Go. We think that there are about
10^82 particles in the observable universe. The space isn't infinitely large,
but it's close enough.

The space of _enjoyable_ songs is remarkably small. There are only 12 notes in
the chromatic scale and 8 in the diatonic scale. The human range of hearing
only covers about nine octaves. There are a relatively small pool of rhythms
and harmonies that sound pleasing to most people. Accidental plagiarism is a
very common problem in popular music - it's easy to unintentionally write a
song that's almost identical to another song. The most common defence in music
plagiarism cases is simply to find lots of prior art; the song that you've
allegedly plagiarised is probably extremely similar to a lot of other songs.

Jazz musicians can improvise over unfamiliar chord progressions precisely
because most music isn't particularly original. If you understand music
theory, you can make fairly accurate guesses about what's coming next.
Practically all music follows a similar set of understandable patterns, even
across cultures.

[https://www.theatlantic.com/science/archive/2016/09/music-
pl...](https://www.theatlantic.com/science/archive/2016/09/music-
plagiarism/499985/)

~~~
YeGoblynQueenne
>> There are about 10^170 legal positions in Go. We think that there are about
10^82 particles in the observable universe. The space isn't infinitely large,
but it's close enough.

I don't understand why the number of atoms in the universe has anything to do
with the tractability of Go, or anything else. I've heard of this idea before,
but, for example, my computer can calculate the factorial of 106, that is
1.146281e+170 (1.146281 times 10 to the power of 170), in much less than a
second. 10^170 means nothing on its own. And like I say, the point with Go AI
is to try and not have to examine _all_ possible board positions.

The other thing to keep in mind is that finite is always _infinitely_ smaller
than infinite- no matter how large it (finite) is. And infinite is _always_
intractable, so if a problem is infinite you can't solve it, unless you can
reduce it to a finite problem.

As to music- the number of combinations you're looking at is far larger than
the number of permutations of the set of notes. A musical piece is actually a
string from a language with the musical notes for symbols and some unknown set
of rules that govern what is a well-formed string in the language (i.e. what
is a musical piece and what is just noise). That language is probably context-
free, or above- and certainly not finite. So, no, you can't hope to just
machine-learn how to make a song just by training on a few examples.

These sort of infinite problems crop up all the time in human activities. It's
no use trying to approach them without a full understanding of the
tractability issue. I mean, that's why we have complexity theory in the first
place- because some problems can't be solved in a general manner in the
available time in the universe (and again that time has nothing to do with the
atoms in the same universe).

