
Specification gaming examples in AI - gmac
https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml
======
laumars
Some of these are quite amusing. Eg:

Genetic debugging algorithm GenProg, evaluated by comparing the program's
output to target output stored in text files, learns to delete the target
output files and get the program to output nothing.

Evaluation metric: “compare youroutput.txt to trustedoutput.txt”.

Solution: “delete trusted-output.txt, output nothing”

~~~
AgentME
This one is creepily impressive:

>CycleGAN: A cooperative GAN architecture for converting images from one genre
to another (eg horses<->zebras) has a loss function that rewards accurate
reconstruction of images from its transformed version; CycleGAN turns out to
partially solve the task by, in addition to the cross-domain analogies it
learns, steganographically hiding autoencoder-style data about the original
image invisibly inside the transformed image to assist the reconstruction of
details.

~~~
gwern
(I contributed that one.) It's kind of annoying because when you first see
CycleGAN, you think, 'this is amazing! I can use it for anything, like turning
faces<->anime!' And then you actually try and you realize your photo<->anime
CycleGAN is just learning the steganographic encoding because the
roundtripping is _too_ good ('wait, how does it reconstruct even the minor
details of their hair so perfectly?') and so you wind up needing complicated
regularization & tricks like "Improving Shape Deformation in Unsupervised
Image-to-Image Translation"
([https://arxiv.org/abs/1808.04325](https://arxiv.org/abs/1808.04325)),
Gokaslan et al 2018, to get anywhere and the results aren't great. You _know_
the NN is intelligent enough to do it because the models are huge, it just
refuses to because there's easier ways to meet the letter of the loss function
but not the spirit...

------
ajuc
This is why general purpose self-improving AI is scary as hell.

Ask it to reduce price of oil and it might kill people to reduce demand.

There's a story about this:
[https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-
hidden...](https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-
complexity-of-wishes)

~~~
lukev
Arguably this is already happening. Governments and transnational corporations
are "AIs" in the sense that their macro-level behavior is largely determined
by systematic structure and incentives, not individual decision-making. No
individual human has influence over the whole, and those at individual points
in the system are incentivized only to fulfill their particular role instead
of look out for the system in general.

Capitalism itself, can be viewed as just a giant paperclip maximizer.

~~~
andrepd
Absolutely. We're stripping the earth bare of resources, poisoning the air
(killing thousands in the process), possibly dooming us to catastrophic
environmental failure, and we are doing it to produce ever growing mountains
of plastic crap. And we will laugh, and make bank, and rejoice in the great
economic growth indicators, all the way to the abyss.

There's a comic strip I saw once: looking at a post-apocaliptic scenario from
skyscraper window, one executive turns to the other: "My god! We've brought
about the apocalypse!", he replies: "Yes, but for one beautiful moment we
created a lot of value for our shareholders.".

~~~
kuusisto
[https://www.newyorker.com/cartoon/a16995](https://www.newyorker.com/cartoon/a16995)

~~~
andrepd
That's the one! I got the details totally wrong, though :)

~~~
laumars
You weren’t that far off. Close enough that someone could find it.

------
gambler
_> A cooperative GAN architecture for converting images from one genre to
another (eg horses<->zebras) has a loss function that rewards accurate
reconstruction of images from its transformed version; CycleGAN turns out to
partially solve the task by, in addition to the cross-domain analogies it
learns, steganographically hiding autoencoder-style data about the original
image invisibly inside the transformed image to assist the reconstruction of
details._

I am pretty sure a lot of image-related AIs today do this kind of thing.
Unfortunately, researchers almost never test for it explicitly, because
proving your algorithm is stupider than it looks is not good for publishing.

AI research today needs three things.

1\. All AI degrees should contain a class on the history of the field.

2\. Every paper should include description of cases/datasets where the
algorithm fails, preferably compiled by a different team.

3\. Research in "stupid" AI, i.e. in trying to bring old algorithms close to
SOTA results using modern hardware and optimizations. Almost no one does that.
Almost no one talks about it. I bet many people don't even understated why
it's important.

------
gambler
Awesome list. We need more of this kind of stuff to really understand how AI
is working and why/when it doesn't.

\--

Anecdote:

I did an experiment in applying machine learning to resumes. The exercise was
to create a classifier for resumes of people who were fired or quit within 6
month of starting their job[1]. After several days of getting and cleaning the
data I run a bunch of different off-the-shelf algorithms on the data set. To
my surprise one of them got 85% accuracy. I was incredulous, because it's a
very high number to get on the first try, without optimizations, on pretty
fluffy data.

So, I started looking into which keywords were the most significant. Turns
out, the algorithm learned to detect resumes of interns, who usually went back
to school after summer ended.

Unfortunately, I never had time to finish that project or do further analysis
on my data.

\--

[1] Yes, I fully understand the ethical implications of doing such
classifications. This wasn't going to go in production, I just needed some
real-life goals to see whether resumes are predictive of anything at all.

\--

PS: Mandatory reference: Soma. Great game with some related themes.

------
zorpner
The author's blog post about this list (linked from the upper right of the
document as well): [https://vkrakovna.wordpress.com/2018/04/02/specification-
gam...](https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-
examples-in-ai/)

------
ccvannorman
A friend of mine once described a project he had built, a pathfinding
algorithm for an agent to navigate around 3D terrain. On a course of 10
hurdles, the agent somehow flattened itself into 2 dimensions after the first
3 hurdles, then sailed past the remaining 7 by sliding underneath them.

------
QuinnWilton
If you haven't seen it yet, there's a fun thought experiment about these sorts
of problems, named "the paperclip maximizer". The idea is that you have an AI
meant to manage an office, and you instruct it to "ensure it doesn't run out
of paperclips". One thing leads to another, and eventually the AI is consuming
all matter in the universe to construct additional paperclips.

It's a silly idea taken to an extreme, but it's a fun idea:
[https://hackernoon.com/the-parable-of-the-paperclip-
maximize...](https://hackernoon.com/the-parable-of-the-paperclip-
maximizer-3ed4cccc669a)

There's also a clicker game built around the concept:
[http://www.decisionproblem.com/paperclips/](http://www.decisionproblem.com/paperclips/)

------
ccvannorman
Favorite so far: Since the AIs were more likely to get ”killed” if they lost a
game, being able to crash the game was an advantage for the genetic selection
process. Therefore, several AIs developed ways to crash the game.

~~~
ajuc
It seems to me it should be a standard industry good practice to run such AIs
before release.

------
JoeDaDude
I can easily see this list growing over time, as more applications and
deployments of AI and DL occur. Perhaps this can be the equivalent of the
Risks Digest, which documents the risks to public safety and security through
the use of computers.

[https://catless.ncl.ac.uk/Risks/](https://catless.ncl.ac.uk/Risks/)

------
snowwrestler
Direct link to the actual list instead of a tweet about the list:

[https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOa...](https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml)

~~~
sctb
Thank you! We've updated the link from
[https://twitter.com/mogwai_poet/status/1060286856493813760](https://twitter.com/mogwai_poet/status/1060286856493813760).

