Hacker News new | comments | show | ask | jobs | submit login
Beating Atari with Natural Language Guided Reinforcement Learning [pdf] (arxiv.org)
82 points by bootload 186 days ago | hide | past | web | 12 comments | favorite

In other words, the RL system needs to acquire high order concepts (such as "climb the ladder" or "run from the bad guy"), and equipped with those it fares much better. The problem with learning abstract concepts is that they are combinations of combinations of low level features and it is exponentially hard to search for them in raw data. Humans inputting such concepts as "command phrase + reward in case the agent correctly executes the command" cut through the search space and help it learn to operate over the new concepts it learned.

Imagine how would you classify even a simple concept, such as "riding" - a human could be riding a horse, or a monkey could be riding an elephant, or there could be tons of other cases. Riding would be much simpler to detect if there is a classifier that selects "objects used for rides" and "agents that can ride" and the spatial relation between them - that would be a high order concept that we can't simply learn from images, we have to have preliminary abstractions that help in classifying it.

I think this is the future in AI - higher order concepts, based on compositions of previously known concepts. Both language and the physical world are made of objects and relations between objects, so it would be necessary to learn to combine concepts into new concepts, even when training data is very small. It would solve the problem of sharing knowledge between tasks. Another benefit would be the ease of inspecting the internal state of the system - which would be a graph of language based concepts, unlike the internal states of neural nets which are inscrutable. An agent that has higher order abstractions and an object-relations graph would also be programmable in plain language and capable of reasoning over facts - that would make AI accessible to the public at large.

Another way of putting it is that up until now, we used plain vectors, as if it was untyped data, but now we need to operate over strongly typed vectors with higher order operators. We need type theory into neural nets, to apply type constraints and to convert from one type to another, by applying operations. Such operations are hard to learn directly from labeled sets of images.

You'd probably want to start with propositional logic. But I think the vectors would still be numbers but they would encode some higher order task e.g encode another neural network or store its output in memory that could be accesssed on demand. So you would have a NN that would be climb ladder. Basically a collection of these hybrid memory neural networks.

You make a great point. Rather than neural nets operating on vectors they need to operate on typed graph structure and compute recursive functions with memory stacks.

Kind of like how computers work but learn the code and reason about data structures automatically

You can't differentiate through discrete data structures though. The main reason NNs work at all is because vectors are continuous. Small changes in the input lead to small changes in the output. Then you can work backwards and find out exactly what parameters to tweak.

The book "Grokking Deep Learning"[0] which is currently under construction promises you that you'll learn how to build an A.I. that can (and even better than you) play Atari. Though there has been a slow progress but I'm keeping up my faith on it.


I imagine the focus on the Atari is because of the much simpler mechanics and input data, compared to more modern games, correct ?

How hard it would be to make the transition to more complex games, like Zelda or Mario for the NES ?

Not sure how hard it is since I haven't tried to implement it myself. But I've seen from youtube that somebody had already implemented an AI to play Mario and floppy bird. You might want to search the web about that since it's quite popular and amazing.

Interesting idea and good results, but I don't think there's any natural language work being done here. The descriptive sentences are only there as human readable labels for features extracted from templates, and the actual performance increase is coming from these custom features and giving the agent intermediate rewards.

But I still like idea, on x86 'INC' is just a human readable label for 0xCD, and we instruct our processor to do a series of tasks. The difference is on an x86, we specify the instructions and we get the reward (yay). Here you specify the meta-instructions and the agent is rewarded for completing those. The Atari game's end goal reward is sort of irrelevant to the agent, its real goal is completing each meta-instruction we have given it. Only the instruction giver knows the end-goal and the sequence of instructions to give.

Perhaps you could teach a second agent to be the instruction giver, the first agent the instruction performer (gets rewarded for completing subtasks only), and together they solve the problem. The job of the first agent still seems intractable for a game like Montezuma's revenge.

The real natural language work will be if you can actually use human descriptive sentences to create the subtasks without making hand-made templates.

Seems like if you replaced MONTEZUMA’S REVENGE with quicken or excel and you could replace 1/2 the office workers in the world.

I don't know if you're joking but this is why everyone is so concerned about automation causing unemployment. There isn't a huge leap between controlling a video game character and controlling a robot. Quite possibly the majority of current jobs could be automated in the near future.

this meta-trolling comment made me giggle hard

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact