
Capture the Flag: the emergence of complex cooperative agents - Impossible
https://deepmind.com/blog/capture-the-flag/
======
SmooL
This is pretty huge. From the paper:

 _The proposed training algorithm stabilises the learning process in partially
observable multi-agent environments by concurrently training a diverse
population of agents who learn by playing with each other, and in addition the
agent population provides a mechanism for metaoptimisation. We solve the
prohibitively hard credit assignment problem of learning from the sparse and
delayed episodic team win /loss signal (optimising thousands of actions based
on a single final reward) by enabling agents to evolve an internal reward
signal that acts as a proxy for winning and provides denser rewards. Finally,
we meet the memory and long-term temporal reasoning requirements of high-
level, strategic CTF play by introducing an agent architecture that features a
multi-timescale representation, reminiscent of what has been observed in
primate cerebral cortex (11), and an external working memory module, broadly
inspired by human episodic memory (22). These three innovations, integrated
within a scalable, massively distributed, asynchronous computational
framework, enables the training of highly skilled CTF agents through solely
multi-agent interaction and single bits of feedback about game outcomes._

Also important to note that agents learn only from raw pixel data and output
to a virtual controller, and that even after artificially crimping the AI
reaction time and shooting accuracy to human levels, they still performed far
better than humans.

