
Emergent Tool Use from Multi-Agent Interaction - gdb
https://openai.com/blog/emergent-tool-use/
======
Inufu
Nice visualizations and explanation!

You might want to make it clearer that the agents don't actually receive any
visual observations, but rather directly the xy positions of all other agents
and objects.

This also seems very similar to "Capture the Flag: the emergence of complex
cooperative agents" ([https://deepmind.com/blog/article/capture-the-flag-
science](https://deepmind.com/blog/article/capture-the-flag-science))?

Regarding the conclusion:

> We’ve provided evidence that human-relevant strategies and skills, far more
> complex than the seed game dynamics and environment, can emerge from multi-
> agent competition and standard reinforcement learning algorithms at scale.
> These results inspire confidence that in a more open-ended and diverse
> environment, multi-agent dynamics could lead to extremely complex and human-
> relevant behavior.

This has been well established for a while already, e.g. the DeepMind Capture
the Flag paper above, AlphaGo discovering the history of Go openings and
techniques as it learns from playing itself, AlphaZero doing the same for
chess, etc.

~~~
gdb
Good catch! Will update the post to be explicit that there are many pre-
existing awesome results in this vein.

~~~
cscurmudgeon
Any possibility of releasing the simulation environment? Looks quite cool!

~~~
aray
At the top of the article are two open source repo links:

Environment Generation: [https://github.com/openai/multi-agent-emergence-
environments](https://github.com/openai/multi-agent-emergence-environments)

Worldgen: [https://github.com/openai/mujoco-
worldgen](https://github.com/openai/mujoco-worldgen)

Is this what you were looking for?

------
dooglius
The state space here looks pretty small, it seems to me that with so much
training it's just a case of brute-force search. When I think of "tool use" in
regards to the intelligence of early humans, I imagine something more like [0]
where the state space is enormous and it takes a good deal of reasoning and
planning to get to a desired result.

[0]
[https://www.youtube.com/watch?v=BN-34JfUrHY](https://www.youtube.com/watch?v=BN-34JfUrHY)

~~~
nicklovescode
It's unclear to me that we navigated the state space so discretely. My guess
would be that we used a combination of rock throwing + stock hitting before
eventually deciding that combining the two might be fruitful.

After the idea is polished it looks clever, but it may have been invented
through a series of mostly random steps

------
SmooL
Amazing. Very cool to see this sort of emergent behavior.

I also very much enjoyed this section:

"We propose using a suite of domain-specific intelligence tests that target
capabilities we believe agents may eventually acquire. Transfer performance in
these settings can act as a quantitative measure of representation quality or
skill, and we compare against pretraining with count-based exploration as well
as a trained from scratch baseline."

Along with the videos, I can't help but get a very 'Portal' vibe from it all.
"Thank you for helping us help you help us all." \- GLaDOS

~~~
jcims
Did you see the 'Surprising Behaviors' at the bottom? Pretty funny

[https://openai.com/blog/emergent-tool-
use/#surprisingbehavio...](https://openai.com/blog/emergent-tool-
use/#surprisingbehaviors)

------
haylel
Looks awesome. I tried coding up a multi-agent system for my CS degree and it
was incredibly complicated. I was trying to implement an algorithm I found to
give each agent emotions of fear, anger, happiness and sadness in order to
change their behaviours... it was way more difficult than I expected but you
can read more about it here if you're also interested in this stuff. The 3D
graphics in this example are way cooler than my 2D shapes.

[https://medium.com/@dshields/working-with-emotional-
models-i...](https://medium.com/@dshields/working-with-emotional-models-in-an-
artificial-life-simulation-c6309a586e55)

~~~
teabee89
This is really interesting, thank you for sharing!

------
tlb
The animations are nice, compared to a default visualization with dots and
lines moving around. Was this done just for the public release, or was it
worth it to researchers to have an eye-pleasing visualization while doing the
experiments?

~~~
visarga
The environment was actually an important part of the project. It does physics
simulation. Having such a 'realistic' environment allowed the agents to
discover all sorts of cheats (they appear at the end of the article).

~~~
boardwaalk
They're talking about the visualization, not the physics. The agents aren't
getting visual input. That would make things much, much slower.

~~~
vertoc
Right but to fully understand what's going on you need to also visualize the
physics in a 3D world - just dots and lines and squares wouldn't fully show
what's going on. This may be close to the simplest visualization that made
sense

~~~
breck
The visualizations look great, but wouldn't run on an N64, which had many
physics games. I'm wondering the same thing as the OP--was this advanced level
of graphics used during the research, or was the styling added after the fact
for readers? A low res visualization seems like it would do the job equally
well, but maybe not. Curious what they are finding and whether there are
benefits to having a great looking visualization during the EDA phase.

~~~
benrbray
Researchers have much better graphics tools available to them today than they
did in the N64 era. Basic familiarity with e.g. Unity would be enough to run
these sorts of simulations.

------
corey_moncure
One plausible, perhaps optimal strategy in the second arena is for the hiders
to build a shelter around the seekers and lock them in place, circumventing
the whole cat and mouse over ramps and ramp surfing (which the seekers would
never be able to access). I wonder why this strategy is not arrived at.

~~~
minimaxir
There are multiple seekers, and the seekers may not be placed close together.

~~~
PhasmaFelis
At one point in the video, it looked like a hider moving a object past a
frozen seeker jostled the seeker with it. I wonder if it's possible to use the
objects to push seekers together, then "jail" them.

------
lettergram
Even my work with basic circuits for sea slugs led to “cooperative” behavior:

[https://austingwalters.com/modeling-and-building-robotic-
sea...](https://austingwalters.com/modeling-and-building-robotic-sea-slug/)

I think sometimes we see what we want to see. Not saying it’s not interesting
work, just that it’s less round breaking than you may think.

------
sebringj
I'm completely amazed by that. The hint of a simulated world seems so matrix-
like as well, imagine some intelligent thing evolving out of that. Wow.

------
brianpgordon
This is incredible. The various emergent behaviors are fascinating. I remember
being amazed a decade ago by the primitive graphics in artificial life
simulators like Polyworld:

[https://en.wikipedia.org/wiki/Polyworld](https://en.wikipedia.org/wiki/Polyworld)

[https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s](https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s)

It seems that OpenAI has a great little game simulated for their agents to
play in. The next step to make this even cooler would be to use physical,
robotic agents learning to overcome challenges in real meatspace!

~~~
bryanrasmussen
hmm, yes in the story I'm envisioning the AIs don't wipe out humanity because
they have achieved sentience, but just because it turns out killing all humans
is an optimizing component of solving some other problem.

~~~
ismail
Asimov 3 rules as the final policy when making decisions should sort this
problem out. This assumes that the rules cannot be changed by the AI.

~~~
Falling3
I've always been very incredulous that there would be any possibility of
taking something sufficiently complex to be considered an AGI and hard-coding
anything like the 3 rules into it.

~~~
smogcutter
By the same token, I’m extremely suspicious of the idea that such a
sufficiently complex AGI could also be dumb enough to optimize for paper clip
production at the expense of all life on earth (or w/e example).

~~~
ludwigschubert
...and many would say that’s because us humans are bad at imagining optimizing
agents without anthropomorphizing them. This is a reasonable, even typical
suspicion that many people share! The best explanation I know of why it’s
unfortunately wrong is by Robert Miles in a video, but if you prefer a more
thorough treatment, you could also read about “instrumental convergence”
directly. If you find a flaw in this idea, I’d be interested to hear about it!
:)

Robert Miles’ video:
[https://youtu.be/ZeecOKBus3Q](https://youtu.be/ZeecOKBus3Q)

Instrumental Convergence:
[https://arbital.com/p/instrumental_convergence/](https://arbital.com/p/instrumental_convergence/)

Now afaik nothing in this argument says that we can’t find a way to control
this in a more complex formalism-but we clearly haven’t done so yet.

~~~
smogcutter
Sorry, just saw this. I think it’s his assumption that an AGI will act
strictly as an agent that’s flawed. It requires imagining an agent that can
make inferences from context, evaluate new and unfamiliar information, form
original plans, execute them with all the complexity implied by interaction
with the real world, reprogram itself, essentially do anything... except
evaluate its own terminal goal. That’s written in stone, gotta make more
paperclips. The argument assumes almost unlimited power and potential on the
one hand, and bizarre, arbitrary constraints on the other.

If you assume an AGI is incapable of asking “why” about its terminal goal, you
have to assume it’s incapable of asking “why” in _any_ context. Miles’ AGI has
no power of metacognition, but is still somehow able to reprogram itself. This
really isn’t compatible with “general intelligence” or the powers that get
ascribed to imaginary AGIs.

I’m certainly no expert, but I expect there will turn out to be something like
the idea of Turing-completeness for AI. Just like any general computing
machine is a computer, any true AGI will be sapient. You can’t just
arbitrarily pluck a part out, like “it can’t reason about its objective”, and
expect it to still function as an AGI, just like you can’t say “it’s Turing
complete, except it can’t do any kind of conditional branching.” EDIT better
example: “it’s Turing complete, but it can’t do bubble sort.”

This intuition may be wrong, but it’s just as much as assumption as Miles’
argument.

I’m also not ascribing morality to it: we have our share of psychopaths, and
intelligence doesn’t imply empathy. AGI may very well be dangerous, just
probably not the “mindlessly make paperclips” kind.

------
YeGoblynQueenne
This is visually very impressive, of course, but what is the significance of
this work? I am not very familiar with intelligent agents research so I don't
understand to what extent learning cooperative tool use in an adversarial
environment (if I understand correctly what is shown) represents an important
advancement of the state of the art in intelligent agents research, or not.

In any case this is a simulation- so it's basically impossible to take the
learned model and use it immediately in a real-world environment with true
physics and arbitrary elements, let alone with unrestricted dimensions (the
agents in the article are for the most part restricted to a limited play
area). So if I understand this correctly the trained model is only good for
the specific simulated environment and would not work as well under even
slightly different conditions.

------
rkagerer
I love how the 3D visualization and game selection make their research
immediately relatable - right down to the cute little avatars!

 _" We’ve shown that agents can learn sophisticated tool use in a high
fidelity physics simulator"_

I always suspected to evolve intelligence you need an environment rich in
complexity. Intelligence we're familiar with (e.g. humans) evolved in a
primordial soup packed with possibilities and building blocks (e.g. elaborate
rules of physics, amino acids, etc). It's great to see this concept being
explored.

It reminds me of Adrian Thompson's experiments in the 90's running
generational genetic algorithms on a real FPGA instead of mere simulations
[1].

After 5000 generations he coaxed out a perfect tone recognizer. He was able to
prune 70% of the circuit (lingering remnants of earlier mutations?) to find it
still worked with only 32 gates - an unimaginable feat! Engineers were baffled
when they reverse-engineered what remained: if I recall correctly, transistors
were run outside of saturation mode, and EM effects were being exploited
between adjacent components. In short, the system took a bunch of components
designed for digital logic but optimized them using the full range of analog
quirks they exhibited.

More recent attempts to recreate his work have reportedly been hampered by
modern FPGA's which make it harder to exploit those effects as they don't
allow reconfiguration at the raw wiring level [2].

In Thompson's own words:

 _" Evolution has been free to explore the full repertoire of behaviours
available from the silicon resources provided, even being able to exploit the
subtle interactions between adjacent components that are not directly
connected.... A 'primordial soup' of reconfigurable electronic components has
been manipulated according to the overall behavior it exhibits"_

\---

[1] Paper:
[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=669...](http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=6691182CC83AE8577D7C44EB9D847DA1?doi=10.1.1.50.9691&rep=rep1&type=pdf)

Less technical article: [https://www.damninteresting.com/on-the-origin-of-
circuits/](https://www.damninteresting.com/on-the-origin-of-circuits/)

[2]
[https://www.reddit.com/r/MachineLearning/comments/2t5ozk/wha...](https://www.reddit.com/r/MachineLearning/comments/2t5ozk/what_ever_happened_with_the_evolutionary/cnxfg1s?utm_source=share&utm_medium=web2x)

------
breck
What is the size of these “strategies”, measured in weights,bytes, or whatever
measurement you look at?

~~~
tlb
1.6 million parameters. There are some details in section 5 and appendix B.7
of [https://d4mucfpksywv.cloudfront.net/emergent-tool-
use/paper/...](https://d4mucfpksywv.cloudfront.net/emergent-tool-
use/paper/Multi_Agent_Emergence_2019.pdf)

~~~
breck
Thanks! I like that section on how batch size affects convergence. I wonder
how parameter size limits would similarly affect which Stages could be
reached. I would not be surprised if you could hit those stages with 100x or
fewer params.

------
markkat
Has any intelligence arisen without multi-agent interaction?

Probably belongs in our definition of intelligence.

------
The_rationalist
Am I misunderstanding something?

Instead of teaching the "AI" intelligent rules or rules for creating rules for
maximising their goals. They teach them nothing, which means they have 0
usable high level knowledge. And the "AI" pure bruteforce for finding
empirically best solutions for this ridiculously simple universe.

How is that advancing research? This is just a showcase of what modern
hardware can do, and also a showcase of how far we are from teaching
intelligence. My brain understand the semantics of this universe and would
have been able to find most strategies without simulating the game more than
once in my head. So definitely this is a showcase of how far (bruteforce is
like step 0) we (or at least openAI) are from making AGI.

~~~
The_Amp_Walrus
Some AI researchers believe that using learning methods with no built-in prior
knowledge and throwing a bunch of compute at them is the path to building
effective AI. I'm thinking of Richard Sutton in particular:

\- Bitter Lesson essay:
[http://www.incompleteideas.net/IncIdeas/BitterLesson.html](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)

\- A lecture of his on temporal difference learning, which is a "model-free"
method of reinforcement learning:
[https://www.youtube.com/watch?v=LyCpuLikLyQ](https://www.youtube.com/watch?v=LyCpuLikLyQ)

I personally don't agree with his emphasis on model-free learning, but it's
not the case that people are building model-free RL agents because they don't
understand the trade off that they're making.

------
mooneater
Finally Auotcirricula gets some love! Discussed in some detail in
[https://www.talkrl.com/episodes/natasha-
jaques](https://www.talkrl.com/episodes/natasha-jaques)

~~~
mooneater
Natasha Jaques explains the idea at about 39:50

------
homieg33
I wonder if it’s possible to incorporate a monkey see monkey do aspect to the
learning algorithm that could observe human’s playing the game and incorporate
that information into its models?

~~~
visarga
Yes, it's called imitation learning and is a subfield of reinforcement
learning. The problem is that even a small error could gradually accumulate
and cause the sequence of actions to diverge. RL agents learn not just how to
act in a given situation but also to evaluate possible actions, situations and
even to model the environment. That way they can adapt dynamically instead of
diverging from the optimal actions.

~~~
homieg33
Interesting, ideally it uses the observed human behaviors to seed/inform it’s
own attempts as a shortcut to advanced behavior without the many millions of
generations needed.

------
ReDeiPirati
Great viz, design & structure! But for the first time, I had the impression
that you didn't report anything new or different. All the takeaways of this
work were pretty obvious given the last couple of years research. Am I missing
anything?

------
cr0sh
I have a friend who observed similar emergent behavior in an a-life (gene-
based from what I understand) simulation he created, in an environment of
"tanks in a maze" (or something like that).

The "genes" consisted a simplified assembler (run on a VM) that could describe
a program the tank would use to control itself - it could sense other tanks
within line-of-sight to a certain degree, it could sense walls, it could fire
its cannon, move in a particular direction, sense when another tank had a
bearing (cannon pointed) on itself, etc.

He set up 100 random tanks (with random "genes"/programs) and let the
simulation run. Top scorers (who had the most kills) would be used to seed the
next "generation", using a form of sexual "mating" and (pseudo-) random
mutation. Then that generation would run.

He said he ran the simulation for days at a time. One day he noticed something
odd. He started to notice that certain tanks had "evolved" the means to
"teleport" from location to location on the map. He didn't design this
possibility in - what had happened was (he later determined) that a bug he had
left in the VM was being exploited to allow the tanks to instantaneously move
within their environment. He thought it was interesting, so he left it as-is
and let the simulation continue.

After a long period of running, my friend then noticed something very odd.
Some tanks were "wiggling" their turrets - other tanks would "wiggle" in a
similar fashion. After a while all he could deduce was that in some manner,
they were communicating with each other, similar to "bee dancing", and
starting to form factions against each other...

...it was at that point he decided things were getting much too strange, and
he stopped the experiment.

Sadly, he no longer has a copy of this software, but I believe his story,
simply because I have seen quite a bit of other code and have worked closely
with him on various projects since (as an adult) to know that such a system
was well within his capability of creating.

At the time, he was probably only 16 or 17 years old, the computer was a 386,
and this was sometime in the early 1990s. I believe the software was likely a
combination of QuickBasic 4.5 and 8086 assembler running under DOS, as that
was his preferred environment at the time.

I've often considered recreating the experiment, using today's technology,
just to see what would happen (at the time he related this to me, as an adult,
he asked me how difficult it would be to make a more physical version of this
"game"; I'm still not sure if he meant scale model tanks, or full-sized -
knowing him, though, he would have loved to play with the latter).

------
jpetrucc
As always, crazy interesting stuff coming out of OpenAI!

This is the type of stuff that amazes me - I really wish I had more of an
opportunity to play with AI/ML in my day to day work.

~~~
gdb
> I really wish I had more of an opportunity to play with AI/ML in my day to
> day work.

Anyone who feels this way — we're hiring :)!
[https://openai.com/jobs/](https://openai.com/jobs/).

(Also if I can answer any questions about OpenAI, feel free to ping me at
gdb@openai.com.)

~~~
shubidubi
Anything remote friendly? Unfortunately all jobs are in SF only.

------
eiopa
I dig the fine-tuning tests!

Did you end up using this as a way to estimate how "healthy" the agents are,
or was this explored after the system was already working well?

------
fedebehrens
Does anyone know if there are some accessible GitHub projects that can do
something similar to this? Would like to set up a new project with my nephew
:)

------
westurner
I, for one, really appreciate the raytracing in these visualizations. I wish
for more box surfing examples.

------
Leary
Anyone thinks the hiders will learn to box the seekers in entirely before the
rounds start?

------
adamnemecek
This is just adjoint functors. Pls work out automatic integration. Dual
numbers is where the path starts.

