
An AI learned to play hide-and-seek. Strategies it came up with were astounding - tobltobs
https://www.vox.com/future-perfect/2019/9/20/20872672/ai-learn-play-hide-and-seek
======
cynusx
These discussions of "unintended consequences of AI" always seem to ignore
that those behaviours emerge during the training phase of the models. But then
generally you take a trained model and deploy it into production where it
doesn't learn any new behaviour anymore.

You train them for a specific ability in a controlled environment that
reflects deployment environment and after that the deployed system no longer
has an ability to learn and will just fail if the environment is different.

Journalist in particular seem to be blissfully unaware of this distinction and
always cast "artificial intelligence" as following similar mechanics then
human brains.

~~~
serioussecurity
NLP systems don't "just fail".

[https://www.npr.org/sections/thetwo-
way/2017/01/10/508363607...](https://www.npr.org/sections/thetwo-
way/2017/01/10/508363607/what-happened-when-dylann-roof-asked-google-for-
information-about-race)

~~~
Isinlor
If you look for terms that are used by bad people and not discussed otherwise,
you will find what these bad people write. I don't see how someone would
expect Google to police for example topics related to "The Bell Curve".
Recently Sam Harris had trouble dealing with that topic.

Should Google suggest "polish concentration camps" when you type "polish
concent"? I have no idea.

~~~
AstralStorm
It will look for what you typed, and find top hit of "German concentration
camps in occupied Poland" and second is about naming controversy.

Predictable behaviour of a simple statistical debiased algorithm. PageRank is
not really an AI.

------
igammarays
Does anyone know if these trained AI agents will use the same strategies to
immediately win in another map it has never seen before? And that it would get
it right the first time on the new map (not blunder about aimlessly and win
after millions of plays)?

~~~
QuadmasterXLII
They were given a new random map every iteration of the game, at least in the
second experiment.

------
thewhitetulip
I remember once when OpenAI team chose to not release a particular algorithm
because they thought it's too dangerous.

This example would force governments about how to regulate AI

~~~
dr_zoidberg
And then (earlier this year) some independent researchers replicated the model
and found out it wasn't as good as OpenAI "feared" \-- making the whole thing
look like a publicity stunt.

------
igammarays
Why don’t we have an AI programmer yet? “Write a sorting algorithm” is a well-
defined problem, and easy to test for correctness to a reasonable degree of
certainty. Why don’t we have machines finding us better sorting
implementations?

~~~
blueboo
As Yann LeCunn observed in a recent interview, program synthesis is a genre of
problem that neural nets seem to be horribly inefficient (bad) at. The
successful applications of SOTA program synthesis are laughably specific, as a
recent HN post from Microsoft on the subject showed.

~~~
igammarays
Thanks. I was trying to research this idea at the back of my mind, but I
didn’t even know what to search for. I didn’t know the term “program
synthesis” existed.

------
NKosmatos
Original link with videos playing correctly [https://openai.com/blog/emergent-
tool-use/](https://openai.com/blog/emergent-tool-use/)

------
shultays
More like "ai learned which buttons to press for a given input". Is it still
an ai if all it is a trained network that replays a set of button presses that
is optimized for a given input?

------
timdorr
Previous discussion:
[https://news.ycombinator.com/item?id=20996771](https://news.ycombinator.com/item?id=20996771)

------
mark-r
I wonder why the hiders never figured out the strategy of locking the seekers
in a box they couldn't escape from?

~~~
shultays
Probably harder to learn. It requires approaching seekers which ends bad if
you cant block them.

