
Specification gaming: the flip side of AI ingenuity - EvgeniyZh
https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity
======
dmix
Don't miss the linked examples list of machines optimizing for specification
instead of the intended goal, it's great reading for any software engineer:

[https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3Hs...](https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml)

~~~
koala_man
I wrote a thesis on reinforcement learning in Warcraft II. Our system started
with zero knowledge and learned to beat the existing AIs about half the time.

It did this by sending its single starting unit to attack the opponent's
single unit, before either could use theirs to build more.

Two identical, weak units fighting was just a coin flip.

~~~
chrisco255
Ah yes, the worker rush. I've seen this cheese in StarCraft as well.

------
mofeien
It is very difficult to safely define a loss function that adequately
represents not what you believe now you want, but rather the outcomes that you
would be happy with in hindsight once they occur. Especially if the rules of
the game are complex enough and allow for "creativity".

[https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-
hidden...](https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-
complexity-of-wishes)

------
iandanforth
"Within the scope of developing reinforcement learning (RL) algorithms, the
goal is to build agents that learn to achieve the given objective."

This is the fundamental problem. That is _not_ the goal. As long as people
believe that is the goal we will not advance.

Put another way, "the given objective" to date has always been wrong. All of
the desired behavior that we are trying to get machines to emulate is a tiny
subset of the learned behaviors of the capable agents (us).

The primary object we have is to survive. On a moment to moment basis our
object is to fulfill a set of competing needs (hunger, tiredness, boredom
etc). It is the interplay of these multiple objectives with an complex
environment which necessitates the kind of intelligence that we have.

And to be absolutely clear the answer to "How do we faithfully capture the
human concept of a given task in a reward function?" is "You don't" and trying
to do so is harmful to the research effort.

The correct path is to evolve a reward _system_ which functions to increase
the frequencies of behaviors which meet the needs of agents in the moment and
fulfill the general goal of survival. Then through a curriculum of
environmental challenges* allow for those needs to only be met by
accomplishing certain tasks which have value, or are of interest, to us.

*Simulator fidelity / hacking will continue to be a problem, but it is orthogonal to the problem of RL. RL finds the bugs, you fix them.

~~~
perl4ever
"The correct path is to evolve a reward system"

Only if you like grey goo.

Or, in a similar vein, I remember reading about how some tree drops vast
numbers of acidic leaves, because its strategy for getting ahead in life is to
bury competing seedlings in toxic waste.

------
fxtentacle
TLDR: A faulty loss function leads to faulty results.

This article has wonderful real-world examples of AIs gaming the mathematical
problem specification created by the AI researcher in order to both produce a
world-class score on a given dataset and be utterly unusable in the real
world.

While it doesn't appear to be mentioned in the article, this is exactly why I
believe that AI will not have a democratizing effect.

Either you have the mathematical and stochastic skills to describe your
problem criteria as a differentiable function, or not. In the first case,
you're already a good programmer, but AI might save you a bit of time over
trying to find the optimal parameters manually. In the second case, you will
not be able to ensure that the AI follows the implicit rules of the real
world, which means that you will produce something that scores extremely well
on its training data, but does not generalize to the real world.

And that's also why if someone claims a new "state of the art result on
benchmark XY", you should mentally append " ... but it works nowhere else."

~~~
spinningslate
> TLDR: A faulty loss function leads to faulty results.

This. The use of "gaming" to describe the observed behaviour seems strange. To
"game" suggests some malfeasance in the algorithm; a deliberate effort to
subvert the intended goal, which in turn implies knowing what the "real" goal
was - and somehow deciding to shortcut it.

That isn't the case. Rather, the problem lies in appropriately specifying the
desired outcome, and any relevant constraints that must be obeyed. Indeed,
DeepMind has separately written about the "specification problem" elsewhere
[0].

That's in no way to downplay the difficulty in specification. It's one we've
wrestled with in software for decades.

> Either you have the mathematical and stochastic skills to describe your
> problem criteria as a differentiable function, or not.

Indeed. That statement could equally have been written ~30 years ago when Z,
VDM and the likes were active areas of research. More recently, TLA+ has
gotten some traction, but it's still very much in the margins.

Perhaps ML will spur the next wave of specification formalisms. But it will
face the same challenge as its pre-decessors: how to make it accessible to
more than a very (very) small set of sufficiently capable few.

[0] [https://medium.com/@deepmindsafetyresearch/building-safe-
art...](https://medium.com/@deepmindsafetyresearch/building-safe-artificial-
intelligence-52f5f75058f1)

~~~
perl4ever
It seems to me that _saying_ the problem with AI is specifying the problem to
be solved is _admitting_ that you haven't even started to develop AI.

We could be just around the corner, but the fact that people are taking for
granted that there's no intelligence in AI isn't promising.

If you respond defensively about how difficult it is to give people what they
want, well, that's the whole point.

~~~
fxtentacle
I see myself more as the observer here. I see that there's plenty of
publications of people claiming to exceed the "state of the art", as well as
plenty of people claiming to have found the holy grail. But if you use their
resulting AIs with real-world data, they usually fail.

For example, people were willing to declare optical flow "solved" already 5
years ago. Yet, autonomous drones still crash into wires, glass, trees, snow,
water, etc., because their optical flow algorithms cannot handle these aspects
of reality (which were not present in the AI training data).

------
wetmore
Some more discussion on the difficulty of creating reward functions:
[https://www.alexirpan.com/2018/02/14/rl-
hard.html#reinforcem...](https://www.alexirpan.com/2018/02/14/rl-
hard.html#reinforcement-learning-usually-requires-a-reward-function)

~~~
_0ffh
Lol, that link sent me on a nearly one hour detour and then sent me back to HN
(specifically, to [1])!

[https://news.ycombinator.com/item?id=6269114](https://news.ycombinator.com/item?id=6269114)

------
m0netize
Isn't the right way to look at this problem called Mechanism Design?

[https://en.wikipedia.org/wiki/Mechanism_design](https://en.wikipedia.org/wiki/Mechanism_design)

The objective is designed as part of the game, but it's very difficult to get
the incentives right.

------
PeterStuer
Working on evolutionary robotics in the 80's and 90's we used to refer to
these things as 'scaffolding problems'. For non-trivial systems getting the
constrain specifications right, raising the right scaffolding, can become as
hard as handcrafting the solution would have been.

It is a pain point in every system that tries to solve a problem through
generative meta-heuristic search. Translating intention into a set of explicit
constraints that separate desired outcomes from undesired ones that satisfy
the letter of the constraints but not the intent gets exponentially harder as
the dimensionality of the solution space grows.

------
klmadfejno
Why can't you just train a model to predict the future and a model to play the
game; then have the game playing model take the course that leads to the least
certain outcome until the model can comfortably identify game states, at which
point you just say the reward function is proximity to that game state? Seems
to me the problem with trying to define a reward function up front is that the
AI has no clue how to describe game state so even if you do a good job you're
just side stepping the actual problem.

~~~
LolWolf
> leads to the least certain outcome

What's the loss function for this? How do you cross-validate this loss? The
only generally applicable methods I can think of (essentially some Bayesian-
type updating, which then uses Monte-Carlo-type methods for evaluating this
loss and "uncertainty") is just so much more expensive than anything we do at
the moment that, at the current computing level, we wouldn't even be able to
do it in most cases.

The problem is, an AI model will tell you an "uncertainty" but these
"uncertainties" will often not reflect the true underlying probability that it
is wrong. This is, of course, all a bit silly since it's hard knowing what you
don't know, and a proper update to this number is also not well-defined unless
you already have a bunch of samples about potential future outcomes (via, like
I mentioned before, a Monte-Carlo type approach. This would multiply the cost
of something as simple as an evaluation by many times.)

~~~
klmadfejno
In my mental model you don't need a loss function for this. It's just, given
these potential short term goals predict the future state of the game 1 minute
from now. Then choose the short term goal that maximizes uncertainty for the
future. The idea being that you have a game playing bot that learns to do
"things" in the game world, and a game state encoding bot that learns to
describe the game world. I'm not suggesting you train the bot TO do the least
certain idea, I'm saying you train the bot BY doing the least certain idea;
and update your understanding of the world to be more certain about that
choice in the future.

The measure of uncertainty could be as simple as a multi armed bandit where
the reward is % of accurate guesses following that path.

The pseudo adversarial aspect of this being that both models will learn to
agree on how choices translate to gameplay changes. The bot will not learn to
beat the game necessarily, but a Style GANN kind of approach would probably be
just fine to saying, "Please get to a state that looks like this" with a few
manual samples. Perhaps all of this to say, it's better to learn how to play
the game, and specify that, in this particular run, you'd like the bot to try
and reach the game end credits.

~~~
johnmoberg
Your last paragraph reminds me of Hindsight Experience Replay[1] where they
use goal-conditioned policies that learn how to reach essentially any state.
The idea is that, even if you don't reach your true goal, you've learned how
to reach some other goal, so you can successively learn to navigate the
environment until the end goal is easy. It's a really nice paper, highly
recommended!

[1] [https://arxiv.org/abs/1707.01495](https://arxiv.org/abs/1707.01495)

------
elil17
The paperclip problem is the scenario in which you give an AI a small task
(like making paper clips) and the AI goes too far to achieve that goal
(enslaving all of humanity to make paper clips).

In the extreme case, specification gaming is the exact opposite: an AI will do
anything in its power to satisfy the reward function without having to do any
work, including tricking its creator into making the reward function easier

~~~
antpls
Oddly, that behavior looks very human. If the loopholes found by AI are
related to energy conservation, that's a healthy result.

Human students and workers need a fair amount of "work specifications" (years
of teaching, decades of workplace laws and safety measure, etc) to actually
complete the requested work that allows them to have revenue/salary. I don't
see the specification game as an AI specific issue.

------
sgt101
For forty years AI focused on enabling humans to tell machines what to do in a
more compact, natural and fluid form. Then we had 20 years of ML where we let
machines develop capabilities from data. Now it turns out that we are back to
enabling compact specifications of behavior.

Who knew?

~~~
seph-reed
Many journeys end where they began, but with a new perspective.

