Very interesting work and an amazing demo!! Btw, a very basic question: in the demo is there any curricula/training process being shown, or is it basically the final policies learned by each of the four competing approaches?
(ps: I co-wrote a short elementary paper on auto curricula design for RL in 2017 [0])
Thanks! The demo just shows the final agents after training (30K gradient updates). Interesting work re the reward maximizing curricula. I have not seen this before, so thanks for the pointer.
(ps: I co-wrote a short elementary paper on auto curricula design for RL in 2017 [0])
[0]: https://arxiv.org/abs/1703.07853