
Model-Based Reinforcement Learning with Neural Network Dynamics - jonbaer
http://bair.berkeley.edu/blog/2017/11/30/model-based-rl/
======
sytelus
If I understand correctly, there are the novel features of this paper:

\- Use neural network to learn (state, action) -> (state, prev_state)
separately.

\- Use this neural network to predict state in 100-step horizon and use that
prediction to chose action in RL setting.

This method reduces sample complexity of RL. The method still isn't beating
model-free methods with lots of data. I think the interesting part was using
micro insects which unfortunately are not available commercially and results
may not be reproducible by others.

~~~
dekhn
If I can read the picture properly, the microinsect is just a small plastic
part (injection molded? it doesn't look printed), some servos, and a
microcontroller. Any lab that's serious in this area could recreate it.
[https://robotics.eecs.berkeley.edu/~ronf/PAPERS/dhaldane-
icr...](https://robotics.eecs.berkeley.edu/~ronf/PAPERS/dhaldane-icra15.pdf)

Ah, OK. It's a bit more than that. They had to upgrade the bot significantly
with next-gen materials to achieve high speeds,
[https://spectrum.ieee.org/automaton/robotics/robotics-
hardwa...](https://spectrum.ieee.org/automaton/robotics/robotics-
hardware/icra-2015-x2velociroach-smashes-speed-record-for-tiny-legged-robots)

------
tw1010
Not to sound negative – this is a cool paper – but isn't this in the grey zone
of being actual research? It feels like a fairly trivial application of things
already known since way back to work in simulation.

~~~
mike_n
yes, model-based (ie - learning physics and predicting future states and
planning accordingly, rather than just choosing actions that you've learned
will give you rewards) RL has been in the literature for many years, see
Sutton's DYNA for example.

But I think the model-based part is still considered relatively hard in real-
world environments due to the large search space of possible
actions/consequences.

And note that here they didn't actually use the model at runtime, they just
used it as an 'expert' (rather than getting a human to provide guidance) to
give a model-free agent policy a head start (see DAGGER algorithm) and reduce
the number of training samples.

~~~
AstralStorm
I'd they figure out how to change these models, say, an NFA representation of
one, we might have something very interesting...

Best of both worlds?

