
Understanding Agent Cooperation - piokuc
https://deepmind.com/blog/understanding-agent-cooperation/
======
JamilD
The AI can minimize loss / maximize fitness by either moving to look for
additional resources, or fire a laser.

Turns out that when resources are scarce, the optimal move is to knock the
opponent away. I think this tells us more about the problem space than the AI
itself; it's just optimizing for the specific problem.

~~~
laretluval
> I think this tells us more about the problem space than the AI itself; it's
> just optimizing for the specific problem.

My reading is that the point of this research is to find out what problem
spaces are conducive to cooperation, not to find out details of how these
particular agents work.

~~~
JamilD
Yeah — after reading this article and the paper, I agree. The previous link
[0] on this submission was much more sensationalized and very misleading.

[0] [http://www.sciencealert.com/google-s-new-ai-has-learned-
to-b...](http://www.sciencealert.com/google-s-new-ai-has-learned-to-become-
highly-aggressive-in-stressful-situations)

------
projektir
I'm rather worried about the wording used, and AI being created in that
context. Do we really not realize what we're doing? AI is not magic, it's not
free from fundamental math, it's not free from corruption. It's just going to
multiply it that much more.

Any AI that has been programmed to highly value winning is not going to be
very cooperative. For it to be cooperative, especially in situations that
simulate survival, it needs to have higher ideals than winning, just like
humans. It needs to be able to see and be aware of the big picture. You don't
need to look at AI for that, you can just look at the world.

Development of AI's of this nature will just lead to a super-powered Moloch.
Cooperative ethics is a highly advanced concept, it's not going to show up on
its own from mere game theory without a lot of time.

~~~
philipov
Cooperative ethics arise immediately in the Prisoner's Dilemma merely by
adding an unknown number of iterations to the game. The most efficient
strategy is a version of tit-for-tat.

~~~
projektir
I'm assuming you're referring to something like this:
[https://egtheory.wordpress.com/2015/03/02/ipd/](https://egtheory.wordpress.com/2015/03/02/ipd/)

I think we shouldn't confuse efficient strategies with the chosen strategies.
What causes Moloch is the inability to see the big picture, to see outside of
the self in the collective (maybe Buddhism has a point).

An efficient strategy may very well be something we'd prefer, such as tit-for-
tat. But is that the strategy we choose? Looking at the long history of
evolution, I'd say no.

~~~
Kalium
In the long run, we've built massively complex human societies that develop
intricate technologies. Technologies whose production requires supply chains
many thousands of people, and so complex that nobody involved understands all
the technologies involves. All so some people can contend that humanity isn't
able to see the collective beyond the self.

I would say we have a demonstrated ability of seeing the big picture, and a
pretty good track record of making it work.

~~~
projektir
> In the long run

Hence I said:

> without a lot of time

An AI spending a lot of time doing effectively the same thing humans have been
doing (read: propagation of immense amounts of suffering) is not really
something I'd want to see repeated. It seems rather obvious that these
conclusions are very difficult and slow to arrive at at a proper scale, so no
AI will have them by default. They'll be aggressive by default, just like your
average animal in evolution. The fact that given their own millions of years
(sped up) they may eventually arrive at the rudimentary level of cooperation
that humans possess does not instill a lot of hope in me.

> I would say we have a demonstrated ability of seeing the big picture, and a
> pretty good track record of making it work.

I'm talking about this: [http://slatestarcodex.com/2014/07/30/meditations-on-
moloch/](http://slatestarcodex.com/2014/07/30/meditations-on-moloch/)

A good example that's going to be hard to ignore will be the upcoming climate
change due to humans catastrophically failing to see the big picture and
focusing on smaller gains within their sub-groups. It really doesn't have much
to do with complexity, but it has everything to do with the very same behavior
you're seeing the AI execute here.

~~~
lawless123
I cant help but notice you and someone else are here both promoting this
"Moloch" stuff and that particular website slatestarcodex

[https://news.ycombinator.com/reply?id=13636150&goto=threads%...](https://news.ycombinator.com/reply?id=13636150&goto=threads%3Fid%3Dashark%2313636150)

why?

Frankly i don't feel it's productive or rational to attach the name of a
biblical villain to new technology.

~~~
projektir
SSC, while fairly controversial (and I strongly disagree with a LOT of what's
on there), is mostly known to HN. At the moment, it has the best summary of
the concept that I'm aware of.

> Frankly i don't feel it's productive or rational to attach the name of a
> biblical villain to new technology.

Well, frankly, I disagree. Humans have an inherent blind spot when it comes to
complex systemic forces. We tend to imagine them as weak and irrelevant.
Reframing them as villains seems to be necessary to understand their power and
reach.

~~~
benjaminjackman
Not to totally sidetrack the discussion, but what are some of the things that
you strongly disagree with on SSC? (The Moloch article is one of the most
fascinating ones I have read.)

And, by the way, I had not considered the Moloch article as a direct re-
framing of a problem until you put it as such. I must say thinking about it in
that light I find humanizing `complex systemic forces` a rather novel
transformation and quite useful. Even having read the article a few times, I
hadn't thought to describe it as such. But morphing a problem from one fairly
inscrutable set of phenomena to be a villain allows us to use a different set
of mental tools to tackle understanding the problem.

Typically I had though more restrictively about such transformations, for
example, viewing a sound's waveform graphically can be illuminating in a
certain sense (transforming audio-temporal, to visual-spatial). The biggest
issue with the toMoloch transform is that the conversion process is obviously
going to be significantly more noisy and provide the author the ability
copious amounts of wiggle-room to steer the reader towards their own
conclusions. But just expressing the facets of the problem and making its
existence more well known has a lot of value. Anyhow thanks for helping me see
an article I have gotten quite a bit of insight out of in another way.

~~~
projektir
> Not to totally sidetrack the discussion, but what are some of the things
> that you strongly disagree with on SSC?

Most things I disagree with SSC on seem to be general rationalist beliefs and
may also be found on places like LessWrong. These views are usually expressed
less directly, and sometimes in comments.

For example, SSC and rationalists in general attribute very high value to IQ.
SSC has some posts relating to ability, genetics, and growth mindset that I
find very good:

[http://slatestarcodex.com/2015/01/31/the-parable-of-the-
tale...](http://slatestarcodex.com/2015/01/31/the-parable-of-the-talents/)

[http://slatestarcodex.com/2015/04/08/no-clarity-around-
growt...](http://slatestarcodex.com/2015/04/08/no-clarity-around-growth-
mindset-yet/)

But, while I mostly agree with both of those series, the continual claim that
IQ is the best thing since sliced bread, that it's everything, correlates with
everything, and is necessary for someone to reach certain heights, is a
something that I find to be more dogmatic than rational. I think the IQ-is-
everything model is too simplistic, and rather self-fulfilling, and if you
have a lot of patience, you can extract my position on ability development
from this old post:
[https://news.ycombinator.com/item?id=12617007](https://news.ycombinator.com/item?id=12617007)

> And, by the way, I had not considered the Moloch article as a direct re-
> framing of a problem until you put it as such.

To be fair, I'm not sure if Scott Alexander meant it that way. There was a
related post on the Goddess of Cancer, where I think the reframing part was
mentioned. But I already believe that Moloch is a manifestation of a wider
process, so the issue of explaining to someone how a blind process can have so
much power is not new.

> The biggest issue with the toMoloch transform is that the conversion process
> is obviously going to be significantly more noisy and provide the author the
> ability copious amounts of wiggle-room to steer the reader towards their own
> conclusions.

I don't know that it really introduces any more significant noise than
anything else. We're already surrounded by so much noise, and I would argue
much of it is from the aforementioned process itself, that better means are
needed than hoping that a given transformation was accurate anyway. I.e., can
we make predictions from the concept of Moloch? It looks to me that we can.

Generally, information needs to be routed to the right subsystems. Humans have
a few subsystems that are really good at identifying an adversary or assigning
blame. But they don't have any good subsystems to examine the situation itself
unless they're already above it, nor can they assign blame to the situation,
as they perceive it as neutral and inert. I would say the extreme
informational loss from the inability to process effects of systems and
situations is so much larger than the added noise that the transformation
absolutely needs to be done.

------
jerf
Not entirely spawned by this article, but the whole genre and some other
comments on HN by other users: I wonder if part of the "mystery" of
cooperation in these simulations is that these people keep investigating the
question of cooperation using simulations too simplistic to model any form of
trade. A fundamental of economics 101 is that valuations for things differ for
different agents. Trade ceases to exist in a world where everybody values
everything exactly the same, because the only trade that makes any sense is to
two trade two things of equal value, and even then, since the outcome is a
wash and neither side obtains any value from it, why bother? I'm not sure the
simulation hasn't been simplified to the point that the phenomena we're trying
to use the simulation to explain are not capable of manifesting within the
simulation.

I'm not saying that Trade Is The Answer. I would be somewhat surprised if it
doesn't form some of the solution eventually, but that's not the argument I'm
making today. The argument I'm making is that if the simulation can't simulate
trade at all, that's a sign that it may have been too simplified to be useful.
There are probably other things you could say that about; "communication"
being another one. The only mechanism for communication being the result of
iteration is questionable too, for instance. Obviously in the real world, most
cooperation doesn't involve human speech, but a lot of ecology can be seen to
involve communication, if for no other reason than you can't have the very
popular strategy of "deception" if you don't have "communication" with which
to deceive.

Which may also explain the in-my-opinion overpopular and excessively studied
"Prisoner's Dilemma", since it has the convenient characteristic of explicitly
writing communication out of it. I fear its popularity may blind us to the
fact that it wasn't ever really meant to be the focus of study of social
science, but more a simplified word problem for game theory. Studying a word
problem over and over and over may be like trying to understand the real world
of train transportation systems by repeatedly studying "A train leaves from
Albuquerque headed towards Boston at 1pm on Tuesday and a train leaves from
Boston headed towards Albuquerque at 3pm on Wednesday, when do they pass each
other?" over and over again.

(Or to put it _really_ simply in machine learning terms, what's the point of
trying to study cooperation in systems whose bias does not encompass
cooperation behaviors in the first place?)

~~~
titanomachy
Iterated prisoner's dilemma allows a sort of "communication". As the number of
iterations grows, the cost of losing each individual round becomes negligible
in the long run and agents can learn to use their decisions (COOPERATE or
DEFECT) as a binary communication channel. So instead of saying "let's
cooperate" over some side-channel, an agent indicates its intention to
cooperate by simply cooperating.

In iterated prisoners dilemma and other similar games, the "API" with which
agents interact with the world is extremely simple. The statement of the
problem is also very simple. The agent itself can be any computable algorithm
for deciding to cooperate or defect based on the past history of game rounds.
I find it interesting to see agents learn recognizable behaviours like
"communication" or "trade" when they aren't explicitly programmed to do those
things.

------
cs702
The folks at DeepMind continue to produce clever original work at an
astounding pace, with no signs of slowing down.

Whenever I think I've finally gotten a handle on the state-of-the-art in AI
research, they come up with something new that looks really interesting.

They're now training deep-reinforcement-learning agents to co-evolve in
increasingly more complex settings, to see if, how, and when the agents learn
to cooperate (or not). Should they find that agents learn to behave in ways
that, say, contradict widely accepted economic theory, this line of work could
easily lead to a Nobel prize in Economics.

Very cool.

------
bitwize
Oh great.

It's just a matter of time before it floods the Enrichment Center with deadly
neurotoxin.

------
vanderZwan
You know, rather than being scared by this, I think it's an excellent
opportunity to learn how and when aggression evolves, and maybe learn how we
can set up systems that nudge people to collaborate, perhaps even when
resources are scarce.

------
katzgrau
The article at first suggests that more intelligent versions of AI led to
greed and sabotage.

But I do wonder if an even more intelligent AI (perhaps in a more complex
environment) would take the long view instead and find a reason to co-
habitate.

It's kind of like rocks, paper scissors - when you attempt to think several
levels deeper than your opponent and guess which level _they_ stopped at. At
some intelligence level for AI, cohabitation seems optimal - at the next
level, not so much, and so on.

We're probably going to end up building something so complex that we don't
quite understand it and end up hurting somebody.

------
jonbaer
Why is this done on such a small level? I would have thought that with systems
now in place that evolutionary game theory could be done in simulations on
such a much larger scale (say 7bn agents +) ... if anything AI systems should
be able to determine if certain strategies work (like items like blocking
resources - such as a case of geopolitical theory) so see what cooperations
occur at that level. Still amazing work but it should be applied to a larger
scale for real meaning. More eager to see how RL applied to RTS games will
explore and develop strategies more than anything.

------
george_ciobanu
"Scarce resources cause competition" and "Scarce but close to impossible to
catch on own resources cause cooperation". Is that really a discovery worth
publishing?

------
tawpKek
>Self-interested people often work together to achieve great things. Why
should this be the case, when it is in their best interest to just care about
their own wellbeing and disregard that of others?

I think this is a kind of strong statement to take as a given, especially as
an opening. This is taking social darwinism as law, and could use more
scrutiny.

------
falsedan
Is it just me, or is this article extremely light on content? The core of it
seems to be

    
    
      > sequential social dilemmas, and us[ing] artificial agents trained by deep multi-agent reinforcement learning to study [them]
    

But I didn't find out how to recognise a sequential social dilemma, nor their
training method.

~~~
roymurdock
Here's the actual paper: [https://storage.googleapis.com/deepmind-
media/papers/multi-a...](https://storage.googleapis.com/deepmind-
media/papers/multi-agent-rl-in-ssd.pdf)

Don't expect any crazy deep insights, but it's a useful read if you want to
set up a similar experiment or understand the research methodology.

------
d--b
Mmmh, the problem of modeling social behavior is in defining the reward
function, not in implementing optimal strategies to maximize the reward.

In a game where you are given the choice of killing 10,000 people or be killed
yourself, which is the most rewarding outcome?

------
tmcpro
I wonder how Deepmind will simulate game theory as it advances

~~~
mtanski
I imagine you'll be able to get answer to more complex games. Things like
combination of N players, multiple stable states, different optima for
different players, external factors / stimuli. Answers by simulation rather
then proof.

------
c3534l
I know what I'm writing my systems science paper on.

------
bencollier49
What an awful headline. "AI learns to compete in competitive situations"
should be the precis.

Basically, it learned that it didn't need to fight until there was resource
scarcity in a simulation.

~~~
leereeves
I think aggressive is a better (more descriptive, narrower) word than compete
here.

Two racers are competing to see who runs faster, but if one pulls out a laser
gun and shoots the other, that's aggressive.

~~~
tutts
Not if laser guns are a part of the race. Then it's just competitive.

~~~
wukerplank
If the laser gun is not part of the race, it's cheating (at best). I don't
understand this nitpicking. Of course a gun is an aggressive tool.
"Aggressive" and "competitive" are not mutually exclusive.

~~~
slowmovintarget
It all depends on how the gun is modeled. If, which seems likely, it is as
simple as push-button-receive-bacon, then there are no consequences to pushing
the button.

It's difficult to characterize that as aggression, especially if the system
has no built in notion of harm or other-like-me.

That is what is actually scarier: Violence as paperwork.

------
saycheese
This reads as click-bait, here's the original blog post and research paper by
DeepMind:

"Understanding Agent Cooperation"
[https://news.ycombinator.com/edit?id=13635218](https://news.ycombinator.com/edit?id=13635218)

~~~
dang
Thanks! We changed to that from [http://www.sciencealert.com/google-s-new-ai-
has-learned-to-b...](http://www.sciencealert.com/google-s-new-ai-has-learned-
to-become-highly-aggressive-in-stressful-situations).

------
doener
[https://news.ycombinator.com/item?id=13620518](https://news.ycombinator.com/item?id=13620518)

------
creo
Bait.

