
Beating Atari with Natural Language Guided Reinforcement Learning [pdf] - bootload
https://arxiv.org/abs/1704.05539
======
visarga
In other words, the RL system needs to acquire high order concepts (such as
"climb the ladder" or "run from the bad guy"), and equipped with those it
fares much better. The problem with learning abstract concepts is that they
are combinations of combinations of low level features and it is exponentially
hard to search for them in raw data. Humans inputting such concepts as
"command phrase + reward in case the agent correctly executes the command" cut
through the search space and help it learn to operate over the new concepts it
learned.

Imagine how would you classify even a simple concept, such as "riding" \- a
human could be riding a horse, or a monkey could be riding an elephant, or
there could be tons of other cases. Riding would be much simpler to detect if
there is a classifier that selects "objects used for rides" and "agents that
can ride" and the spatial relation between them - that would be a high order
concept that we can't simply learn from images, we have to have preliminary
abstractions that help in classifying it.

I think this is the future in AI - higher order concepts, based on
compositions of previously known concepts. Both language and the physical
world are made of objects and relations between objects, so it would be
necessary to learn to combine concepts into new concepts, even when training
data is very small. It would solve the problem of sharing knowledge between
tasks. Another benefit would be the ease of inspecting the internal state of
the system - which would be a graph of language based concepts, unlike the
internal states of neural nets which are inscrutable. An agent that has higher
order abstractions and an object-relations graph would also be programmable in
plain language and capable of reasoning over facts - that would make AI
accessible to the public at large.

Another way of putting it is that up until now, we used plain vectors, as if
it was untyped data, but now we need to operate over strongly typed vectors
with higher order operators. We need type theory into neural nets, to apply
type constraints and to convert from one type to another, by applying
operations. Such operations are hard to learn directly from labeled sets of
images.

~~~
nojvek
You make a great point. Rather than neural nets operating on vectors they need
to operate on typed graph structure and compute recursive functions with
memory stacks.

Kind of like how computers work but learn the code and reason about data
structures automatically

~~~
Houshalter
You can't differentiate through discrete data structures though. The main
reason NNs work at all is because vectors are continuous. Small changes in the
input lead to small changes in the output. Then you can work backwards and
find out exactly what parameters to tweak.

------
e19293001
The book "Grokking Deep Learning"[0] which is currently under construction
promises you that you'll learn how to build an A.I. that can (and even better
than you) play Atari. Though there has been a slow progress but I'm keeping up
my faith on it.

[0][https://www.manning.com/books/grokking-deep-
learning](https://www.manning.com/books/grokking-deep-learning)

~~~
Kurtz79
I imagine the focus on the Atari is because of the much simpler mechanics and
input data, compared to more modern games, correct ?

How hard it would be to make the transition to more complex games, like Zelda
or Mario for the NES ?

~~~
e19293001
Not sure how hard it is since I haven't tried to implement it myself. But I've
seen from youtube that somebody had already implemented an AI to play Mario
and floppy bird. You might want to search the web about that since it's quite
popular and amazing.

------
alexbeloi
Interesting idea and good results, but I don't think there's any natural
language work being done here. The descriptive sentences are only there as
human readable labels for features extracted from templates, and the actual
performance increase is coming from these custom features and giving the agent
intermediate rewards.

But I still like idea, on x86 'INC' is just a human readable label for 0xCD,
and we instruct our processor to do a series of tasks. The difference is on an
x86, we specify the instructions and we get the reward (yay). Here you specify
the meta-instructions and the agent is rewarded for completing those. The
Atari game's end goal reward is sort of irrelevant to the agent, its real goal
is completing each meta-instruction we have given it. Only the instruction
giver knows the end-goal and the sequence of instructions to give.

Perhaps you could teach a second agent to be the instruction giver, the first
agent the instruction performer (gets rewarded for completing subtasks only),
and together they solve the problem. The job of the first agent still seems
intractable for a game like Montezuma's revenge.

The real natural language work will be if you can actually use human
descriptive sentences to create the subtasks without making hand-made
templates.

------
sharemywin
Seems like if you replaced MONTEZUMA’S REVENGE with quicken or excel and you
could replace 1/2 the office workers in the world.

~~~
Houshalter
I don't know if you're joking but this is why everyone is so concerned about
automation causing unemployment. There isn't a huge leap between controlling a
video game character and controlling a robot. Quite possibly the majority of
current jobs could be automated in the near future.

------
zxcb1
[https://en.m.wikipedia.org/wiki/Language-
game_(philosophy)](https://en.m.wikipedia.org/wiki/Language-
game_\(philosophy\))

