Question to anyone familiar with this stuff: I can't figure out which agent they're running on these environments to create the plots in the above link. Is it some well-known agent which is supposed to be clear from context?
They use Proximal Policy Optimization, which is pretty much the go-to algorithm for RL at OpenAI. It was developed by the last author of procgen, John Schulman, et al.
(https://openai.com/blog/openai-baselines-ppo/)
I don't deny this is indeed a cool project, but how is this related to their mission of "building a safe and beneficial AGI" backed by a billion dollar funding? They look like having too much fun with sidetracking (which is totally understandable!).
Reinforcement learning is almost exclusively researched in video game like environments. Being able to create a variety of game like environments that all have the same interface will make it easier to test a single algorithm in many different situations.
Also it's a lot easier to justify side projects that improves tooling and infrastructure when you have billions in funding and not just millions. For smaller shops, the usual answer is to try RL on existing procedural gen games e.g. Minecraft.
Well as your funding goes up, the scope of your mission increases. I don’t see this as a “side project” so much as a necessary part of solving the bigger picture. It’s like how building a battery factory is what makes Tesla work. You can buy batteries from Panasonic and indeed they did, but as they grew they realized they needed their own factory. In the same way, if you want to work on developing novel RL algorithms you can’t test it all in one game. You need a way to test it in many different problems using a standard interface. That’s how I see this.
Question to anyone familiar with this stuff: I can't figure out which agent they're running on these environments to create the plots in the above link. Is it some well-known agent which is supposed to be clear from context?