Language Understanding for Text-based Games Using Deep Reinforcement Learning [pdf]

eli_gottlieb · on July 1, 2015

I'm guessing by the fact that they claim to have beaten a benchmark algorithm that their learners didn't actually play all that well.

gwern · on July 1, 2015

I don't have good intuitions on how hard text-based adventures are but they test it on the first quest in 'Fantasy World', which is a real game with significant complexity (Table 1) compared to their demo game 'Home'.

Ignoring 'Home' as possibly tweaked to unfairly favor one RL agent, Figure 3 is interesting in trying to compare performance on 'Fantasy'. Both RL agents massively outperform the 'random' agent, which is definitely something you want to see; on Fantasy, the bag-of-words NN representation seems like it might be doing much worse (looks like it averages 80% chance of solving the quest by the end of training, while the LTSM NN layer lets the RL agent reaches ~100% solve rates almost immediately). Since one would expect a LTSM to do better, this better performance and apparent ceiling on bag-of-words isn't too surprising.

But the selection of gameplay is really limited - this is not the DeepMind Atari player where the excellent performance is seen over dozens of games, to say the least.

I wonder how well it would play Nethack or other Roguelikes? That seems like it could be an excellent guinea pig for RL agents.

jcranmer · on July 1, 2015

Well, they're not really playing text-based adventures. The fun and challenge of most adventures is that you don't necessarily know all the possible actions, and you have to guess what possible state transitions might exist. Many adventures make this more devilish by purposefully being coy about stuff (e.g., one game had a puzzle solved with "use stick on door"--the stick is a stick of dynamite).

What the paper studied was not that sort of text-based adventure, from what I can tell. Their AIs had a list of all possible actions, which is the sort of thing that could easily by solved by a simple reinforcement learner given a few thousand runs. Instead of making it more difficult by removing the list of actions, they made it more difficult by varying the success condition. In effect, what they've shown is that their algorithm does a better job of figuring out what the success condition is given a textual description, not that their algorithm does a better job of figuring out plausible actions.

eli_gottlieb · on July 2, 2015

>Well, they're not really playing text-based adventures. The fun and challenge of most adventures is that you don't necessarily know all the possible actions, and you have to guess what possible state transitions might exist.

Making the environment only partially observable is fairly normal for RL papers. Of course, having a binary success condition encoded as, "Did you reach state Accept of this very extremely large, partially-observable deterministic (potentially pushdown instead of finite) state machine?" does make it hard to use RL, since last I heard, RL requires that the reward feature be continuously valued and that it actually vary when incremental progress towards the goal is made.

marssaxman · on July 1, 2015

People do that sort of thing on purpose? That was the aspect of text adventures that ultimately drove me away from the whole format!

ganarajpr · on July 1, 2015

Would love to see the source code if they have published it anywhere ?