
Observational Learning by Reinforcement Learning - guiambros
https://arxiv.org/abs/1706.06617
======
deepnet
Copying behaviour without divining intent can lead to problems such as 'cargo-
culting'.

Imagine observing a man shaking his leg, first one then the other, then his
whole body convulses and twitches - is he dancing ?

Absent the knowledge that a wasp has flown up his trousers leg.

Copying without comprehension may lead to getting stung !

Inverse Reinforcement Learning [1] to reverse engineer goals will be needed
especially for embodied AI in Partially Observed Enviroments, i.e. the real
world (as opposed to simulations).

Berkeley's CS294-112 [2] Deep Reinforcement Learning for Robotics provides
good coverage of methods of mirroring, DAGGer, Deep-Q, iLQR, and IRL.

[1]
[https://people.eecs.berkeley.edu/~pabbeel/cs287-fa12/slides/...](https://people.eecs.berkeley.edu/~pabbeel/cs287-fa12/slides/inverseRL.pdf)

[2]
[https://www.youtube.com/playlist?list=PLkFD6_40KJIwTmSbCv9OV...](https://www.youtube.com/playlist?list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX)

~~~
jasonkostempski
The man at risk of being stung is actively trying to prevent it. Blindly
mimicking those movements, when not at risk of being stung, wouldn't have an
effect, positively or negatively, on the ability to avoid stings. But if the
man slammed his head into a post trying to avoid the wasp, copying would
obviously be an issue. As long as the mimic is aware of harmful actions,
learned either by experience (should be preferred) or given rules (cargo-
culting risk), and stops copying the subject immediately (since it may have
malicious intent or need emergency assistance), it should be safe, at least as
safe as any of us can hope to be.

