
Playing hard exploration games by watching YouTube - indescions_2018
https://arxiv.org/abs/1805.11592
======
sleepychu
Neat, I don't understand what they mean by having embedded a reward video into
the set. Is that a video where copying the behaviour will deliver victory?

~~~
algon33
Yes, they take the state of the video every 16 frames and look at its
embedding. These were made into checkpoints.

The AI is rewarded if at each checkpoint the state vector its produced is
sufficiently aligned with the videos.

I guess that's the initial training to deal with sparse rewards.

------
eric_h
here's video of the agent actually playing (linked in the paper):
[https://www.youtube.com/watch?v=Msy82sIfprI](https://www.youtube.com/watch?v=Msy82sIfprI)

------
jexah
This is really cool. A step in the right direction towards general learning
through observation.

------
erikb
This is actually quite human. I also watch Let's plays if I struggle with a
quest (or game in general).

Also interesting assumption to say "harder = fewer rewards". Probably doesn't
always apply but is a good generalization.

------
jonbaer
Are audio cues also analyzed here? ie: "We observe that use of the audio
signal in CMC results in more emphasis being placed on key items and their
location in the inventory"

~~~
maffydub
Yes. s3.2 suggests they use audio cues to help them align the video frames
from different videos. (I guess it's easier to correlate audio than video.)

~~~
Cthulhu_
I can imagine video quality on youtube varies more than audio, or that audio
is easier to hash / make signatures of.

~~~
jonbaer
Hmm I was more under the impression it was for context, in other words
creating/executing strategies based on what audio cues were received (like
when a key or coin is acquired) and keeping tabs of what actions the user
performed after that point.

------
navaati
This should probably say "ML" or "AI" or whatever, I was slightly disappointed
to realize it was not a funny paper about… I don't know to be fair.

~~~
zodPod
I can see where you're coming from, the title definitely made me initially
feel like it was going to be about getting satisfaction from watching other
people play a game or something like that.

