

Language Understanding for Text-based Games Using Deep Reinforcement Learning [pdf] - enkiv2
http://arxiv.org/pdf/1506.08941v1.pdf

======
eli_gottlieb
I'm guessing by the fact that they claim to have beaten a benchmark algorithm
that their learners didn't actually play all that well.

~~~
gwern
I don't have good intuitions on how hard text-based adventures are but they
test it on the first quest in 'Fantasy World', which is a real game with
significant complexity (Table 1) compared to their demo game 'Home'.

Ignoring 'Home' as possibly tweaked to unfairly favor one RL agent, Figure 3
is interesting in trying to compare performance on 'Fantasy'. Both RL agents
massively outperform the 'random' agent, which is definitely something you
want to see; on Fantasy, the bag-of-words NN representation seems like it
might be doing much worse (looks like it averages 80% chance of solving the
quest by the end of training, while the LTSM NN layer lets the RL agent
reaches ~100% solve rates almost immediately). Since one would expect a LTSM
to do better, this better performance and apparent ceiling on bag-of-words
isn't too surprising.

But the selection of gameplay is really limited - this is not the DeepMind
Atari player where the excellent performance is seen over dozens of games, to
say the least.

I wonder how well it would play Nethack or other Roguelikes? That seems like
it could be an excellent guinea pig for RL agents.

~~~
jcranmer
Well, they're not really playing text-based adventures. The fun and challenge
of most adventures is that you don't necessarily know all the possible
actions, and you have to guess what possible state transitions might exist.
Many adventures make this more devilish by purposefully being coy about stuff
(e.g., one game had a puzzle solved with "use stick on door"\--the stick is a
stick of dynamite).

What the paper studied was not that sort of text-based adventure, from what I
can tell. Their AIs had a list of all possible actions, which is the sort of
thing that could easily by solved by a simple reinforcement learner given a
few thousand runs. Instead of making it more difficult by removing the list of
actions, they made it more difficult by varying the success condition. In
effect, what they've shown is that their algorithm does a better job of
figuring out what the success condition is given a textual description, not
that their algorithm does a better job of figuring out plausible actions.

~~~
eli_gottlieb
>Well, they're not really playing text-based adventures. The fun and challenge
of most adventures is that you don't necessarily know all the possible
actions, and you have to guess what possible state transitions might exist.

Making the environment only partially observable is fairly normal for RL
papers. Of course, having a binary success condition encoded as, "Did you
reach state Accept of this very extremely large, partially-observable
deterministic (potentially pushdown instead of finite) state machine?" does
make it hard to use RL, since last I heard, RL requires that the reward
feature be continuously valued and that it actually vary when incremental
progress towards the goal is made.

------
ganarajpr
Would love to see the source code if they have published it anywhere ?

