OpenAI Procgen: Procedurally Generated Game-Like RL Environments

xamuel · on Dec 3, 2019

This page has plots of performance of a certain agent on these different environments: https://openai.com/blog/procgen-benchmark/

Question to anyone familiar with this stuff: I can't figure out which agent they're running on these environments to create the plots in the above link. Is it some well-known agent which is supposed to be clear from context?

gwern · on Dec 3, 2019

It is OpenAI so you can safely assume that it's their PPO workhorse agent. But if you were unsure, they provide repos and papers for further details, both of which mention early on the agent ( https://github.com/openai/train-procgen#try-it-out and https://cdn.openai.com/procgen.pdf#page=4 ).

rivesunder · on Dec 3, 2019

They use Proximal Policy Optimization, which is pretty much the go-to algorithm for RL at OpenAI. It was developed by the last author of procgen, John Schulman, et al. (https://openai.com/blog/openai-baselines-ppo/)

d-d · on Dec 3, 2019

Are we in an environment like one of these?

Oh well, back to work.

euske · on Dec 4, 2019

I don't deny this is indeed a cool project, but how is this related to their mission of "building a safe and beneficial AGI" backed by a billion dollar funding? They look like having too much fun with sidetracking (which is totally understandable!).

SequoiaHope · on Dec 4, 2019

Reinforcement learning is almost exclusively researched in video game like environments. Being able to create a variety of game like environments that all have the same interface will make it easier to test a single algorithm in many different situations.

sansnomme · on Dec 4, 2019

Also it's a lot easier to justify side projects that improves tooling and infrastructure when you have billions in funding and not just millions. For smaller shops, the usual answer is to try RL on existing procedural gen games e.g. Minecraft.

SequoiaHope · on Dec 5, 2019

Well as your funding goes up, the scope of your mission increases. I don’t see this as a “side project” so much as a necessary part of solving the bigger picture. It’s like how building a battery factory is what makes Tesla work. You can buy batteries from Panasonic and indeed they did, but as they grew they realized they needed their own factory. In the same way, if you want to work on developing novel RL algorithms you can’t test it all in one game. You need a way to test it in many different problems using a standard interface. That’s how I see this.