
Towards deep symbolic reinforcement learning - mpweiher
https://blog.acolyer.org/2016/10/12/towards-deep-symbolic-reinforcement-learning/
======
alexbeloi
Based on a quick skim, what they're describing already exists in the form of a
feature extractor + traditional RL algorithm. State -> features/symbols ->
actions as opposed to state -> actions.

So, sure if you split up your decision making into more pieces, then you can
specialize those pieces better to your specific task, then it will probably
perform better at that task. But the whole point (and success) of deep
learning is the feature extractor and decision model should are one network,
where the early layers are acting as feature extractors (symbolic component)
and the later layers are acting as the decision makers or value learners.

In my mind, this partitioning strategy runs counter to the greater goals of
machine learning, which to me is a generic learner that can 'take raw data in'
and 'get smart stuff out'.

~~~
Eridrus
You're definitely right that we've found joint training to be highly useful
and having a neural net simply act as a feature extractor is unsatisfying, but
I think that success at unifying neural nets with symbolic or even non-
learning systems would be a big step forward. This isn't it, but even the
alphago work relied on Monte Carlo tree search, so I wouldn't dismiss this out
of hand.

~~~
hacker42
But MCTS works in AlphaGo because the 'world' can be fully represented by a
tiny state vector which fits a billion times into primary memory these days.
The real difficultly of real world problems is the ill-defined, mutable, high-
dimensional and partially observable state space.

~~~
Eridrus
I'm merely pointing out that "all neural everything" is not always the best
approach despite joint training being very appealing.

------
aub3bhat
It seems like a really interesting approach but it needs to be validated on a
larger set of Atari games. Considering even a Linear learner outperforms DQN
on several games such as Double Dunk. It's possible that the DQN underperforms
because the game is more amenable to the symbolic representation used. [0]

[0] Figure 3
[http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_...](http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_et_al.pdf)

------
jesuslop
In addition to the intro that I liked a lot, I find interesting, or even
important, the usage of an unsupervised method to generate the visual symbolic
dictionary, a convolutional autoencoder here.

In arxiv:1412.6856 unsupervised object detectors emerge as byproducts in a CNN
for another kind of task, for more interesting, photorealistic objects.

------
bmh100
I feel that the use of moving objects weakened the case in the example, such
as requiring object persistence priors. The paper also did not report
theoretical maximum scores. Instead, I would have liked to see a planning
scenario:

1\. Objects are randomly placed such that all circles are reachable without
hitting exes.

2\. Objects do not move.

3\. The objective function is the total points collected, discounted by time.

In this case, which is essentially the traveling salesman problem[1], it is
possible to analytically compute the optimal strategy using discrete
optimization methods (e.g. dynamic programming[2]). The optimal strategy gives
us an absolute basis for algorithm evaluation. Next, divide the playing space
into groups (maybe via spectral clustering[3]). Use dynamic programming to
determine the order in which to visit groups, and automated planning for the
sequence of moves to reach the groups. After group planning comes step
planning. Build a state predictor that maps a the current state and a set of
steps to a new state, where the steps have been determined through automated
planning to reach an object. Again, use something like dynamic programming to
determine the optimal series of plans to visit all the objects within the
group.

This would be a hybrid system for using NN-derived features to fuel discrete
optimization and automated planning algorithms. Such an approach should learn
faster than DQNs and generate good solutions faster than dynamic programming.

In the more general case of an Atari game, I can see clustering frames into
game states, mapping moves to those states, and using automated planning to
determine the series of states, and therefore moves, to optimize score.

[1]:
[https://en.wikipedia.org/wiki/Traveling_salesman_problem](https://en.wikipedia.org/wiki/Traveling_salesman_problem)

[2]:
[https://en.wikipedia.org/wiki/Dynamic_programming](https://en.wikipedia.org/wiki/Dynamic_programming)

[3]: [http://scikit-
learn.org/stable/modules/clustering.html#spect...](http://scikit-
learn.org/stable/modules/clustering.html#spectral-clustering)

------
Seanny123
Disclaimer: I work in the lab that created Spaun.

This seems kind of similar to Spaun [0]. You use Deep Learning to get the
features you want and then you do symbolic like operations on those feature
vectors to accomplish reinforcement learning. At least Spaun used neurons for
the rule manipulation too. I also don't believe the transfer learning claim at
all, because the tasks are way too similar. Still, it's a step in the right
direction as far as I'm concerned [1]. That being said, good luck find a set
of general prior rules and scaling them.

[0]
[http://science.sciencemag.org/content/338/6111/1202](http://science.sciencemag.org/content/338/6111/1202)

[1] [https://medium.com/@seanaubin/why-does-ai-still-suck-and-
can...](https://medium.com/@seanaubin/why-does-ai-still-suck-and-can-it-suck-
less-9db36be294dc#.akb82s8o9)

------
cmrx64
See also [http://www.neural-symbolic.org/](http://www.neural-symbolic.org/)

------
botw
Isn't this approach similar to or used in DeepMind's AlphaGo where the policy
network is corresponding to high-level representational knowledge(kind of
expert system in traditional AI), and the value network is corresponding to
the decision making part(reinforcement learning)?

------
ninjamayo
Automated planning systems and languages have some interesting ideas you could
look into. I worked on a similar idea with yours for resource allocation a
while ago, even before deep learning and automated planning had a lot to
offer.

------
nothing123
The symbolic component can be seen as programming so this is a mixture of
machine learning and classical programming.

~~~
marcosdumay
The name "symbolic learning" also applies things like the Google's Page Rank.

