
Evolving Stable Strategies - wei_jok
http://blog.otoro.net/2017/11/12/evolving-stable-strategies/
======
candiodari
I wonder what happens when you simply backprop using experience replay in
either a CNN or fully connected net. Just run a random neural net, and take
"samples" (inputs + outputs) every 1s or so. After 30s get an error,
optionally "discount" it over time, and run backprop.

