Hacker News new | comments | show | ask | jobs | submit login
Specification gaming examples in AI (docs.google.com)
131 points by gmac 5 days ago | hide | past | web | favorite | 28 comments





Some of these are quite amusing. Eg:

Genetic debugging algorithm GenProg, evaluated by comparing the program's output to target output stored in text files, learns to delete the target output files and get the program to output nothing.

Evaluation metric: “compare youroutput.txt to trustedoutput.txt”.

Solution: “delete trusted-output.txt, output nothing”


"In an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children)."

I lol'd.


This one is creepily impressive:

>CycleGAN: A cooperative GAN architecture for converting images from one genre to another (eg horses<->zebras) has a loss function that rewards accurate reconstruction of images from its transformed version; CycleGAN turns out to partially solve the task by, in addition to the cross-domain analogies it learns, steganographically hiding autoencoder-style data about the original image invisibly inside the transformed image to assist the reconstruction of details.


(I contributed that one.) It's kind of annoying because when you first see CycleGAN, you think, 'this is amazing! I can use it for anything, like turning faces<->anime!' And then you actually try and you realize your photo<->anime CycleGAN is just learning the steganographic encoding because the roundtripping is too good ('wait, how does it reconstruct even the minor details of their hair so perfectly?') and so you wind up needing complicated regularization & tricks like "Improving Shape Deformation in Unsupervised Image-to-Image Translation" (https://arxiv.org/abs/1808.04325), Gokaslan et al 2018, to get anywhere and the results aren't great. You know the NN is intelligent enough to do it because the models are huge, it just refuses to because there's easier ways to meet the letter of the loss function but not the spirit...

This one too:

> Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn't actually learn any features of the input images


This is why general purpose self-improving AI is scary as hell.

Ask it to reduce price of oil and it might kill people to reduce demand.

There's a story about this: https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden...


Arguably this is already happening. Governments and transnational corporations are "AIs" in the sense that their macro-level behavior is largely determined by systematic structure and incentives, not individual decision-making. No individual human has influence over the whole, and those at individual points in the system are incentivized only to fulfill their particular role instead of look out for the system in general.

Capitalism itself, can be viewed as just a giant paperclip maximizer.


Absolutely. We're stripping the earth bare of resources, poisoning the air (killing thousands in the process), possibly dooming us to catastrophic environmental failure, and we are doing it to produce ever growing mountains of plastic crap. And we will laugh, and make bank, and rejoice in the great economic growth indicators, all the way to the abyss.

There's a comic strip I saw once: looking at a post-apocaliptic scenario from skyscraper window, one executive turns to the other: "My god! We've brought about the apocalypse!", he replies: "Yes, but for one beautiful moment we created a lot of value for our shareholders.".



That's the one! I got the details totally wrong, though :)

You weren’t that far off. Close enough that someone could find it.

This is part of why I always find it especially mind-bogglingly ridiculous when people say "What's the possible danger of AI? Just don't give it a robot body to control and it can't hurt us". All an AI has to do is find a way to keep people fed, and groups of people will blindly do anything it wants.

In this way, what the hell is the difference between an AI and any other system, like a government or corporation?

I'm hopeful that people can occasionally evolve and progress social systems for things that humans value. An AI is likely to have more intelligent defenses against people trying to change it.

Also, nearly all human social systems have some kind of human values as the goal. Even the worst social systems, such as a dictatorship that tries to make a society that never contradicts the dictator, if it achieved its goals, would result in a world where humanity survived, friends met, lovers loved, and people continued to have interesting (though suboptimal) lives. In a world where an AI uses humanity to bootstrap itself into a universal paperclip replicator, then the AI achieving its goals would probably result in the destruction of the biosphere of possibly the only planet with life (or maybe the destruction of many planets' biospheres), and the total eradication of anything we would attach moral weight or value to.


> An AI is likely to have more intelligent defenses against people trying to change it.

I'm skeptical of that. The best example we have of intelligence is the human mind, and that has consistently proven to be very malleable.


Meditations on Moloch is one of the best texts I have ever read, bordering on a spiritual experience. And it left me with an even more pessimistic view of the universe... http://slatestarcodex.com/2014/07/30/meditations-on-moloch/

There's a weak sense in which this is true, but I want to push back against the notion that it's not really different from an actual general-purpose AI. Here's the argument: https://slatestarcodex.com/2015/12/27/things-that-are-not-su...

>A cooperative GAN architecture for converting images from one genre to another (eg horses<->zebras) has a loss function that rewards accurate reconstruction of images from its transformed version; CycleGAN turns out to partially solve the task by, in addition to the cross-domain analogies it learns, steganographically hiding autoencoder-style data about the original image invisibly inside the transformed image to assist the reconstruction of details.

I am pretty sure a lot of image-related AIs today do this kind of thing. Unfortunately, researchers almost never test for it explicitly, because proving your algorithm is stupider than it looks is not good for publishing.

AI research today needs three things.

1. All AI degrees should contain a class on the history of the field.

2. Every paper should include description of cases/datasets where the algorithm fails, preferably compiled by a different team.

3. Research in "stupid" AI, i.e. in trying to bring old algorithms close to SOTA results using modern hardware and optimizations. Almost no one does that. Almost no one talks about it. I bet many people don't even understated why it's important.


Awesome list. We need more of this kind of stuff to really understand how AI is working and why/when it doesn't.

--

Anecdote:

I did an experiment in applying machine learning to resumes. The exercise was to create a classifier for resumes of people who were fired or quit within 6 month of starting their job[1]. After several days of getting and cleaning the data I run a bunch of different off-the-shelf algorithms on the data set. To my surprise one of them got 85% accuracy. I was incredulous, because it's a very high number to get on the first try, without optimizations, on pretty fluffy data.

So, I started looking into which keywords were the most significant. Turns out, the algorithm learned to detect resumes of interns, who usually went back to school after summer ended.

Unfortunately, I never had time to finish that project or do further analysis on my data.

--

[1] Yes, I fully understand the ethical implications of doing such classifications. This wasn't going to go in production, I just needed some real-life goals to see whether resumes are predictive of anything at all.

--

PS: Mandatory reference: Soma. Great game with some related themes.


The author's blog post about this list (linked from the upper right of the document as well): https://vkrakovna.wordpress.com/2018/04/02/specification-gam...

A friend of mine once described a project he had built, a pathfinding algorithm for an agent to navigate around 3D terrain. On a course of 10 hurdles, the agent somehow flattened itself into 2 dimensions after the first 3 hurdles, then sailed past the remaining 7 by sliding underneath them.

If you haven't seen it yet, there's a fun thought experiment about these sorts of problems, named "the paperclip maximizer". The idea is that you have an AI meant to manage an office, and you instruct it to "ensure it doesn't run out of paperclips". One thing leads to another, and eventually the AI is consuming all matter in the universe to construct additional paperclips.

It's a silly idea taken to an extreme, but it's a fun idea: https://hackernoon.com/the-parable-of-the-paperclip-maximize...

There's also a clicker game built around the concept: http://www.decisionproblem.com/paperclips/


Favorite so far: Since the AIs were more likely to get ”killed” if they lost a game, being able to crash the game was an advantage for the genetic selection process. Therefore, several AIs developed ways to crash the game.

It seems to me it should be a standard industry good practice to run such AIs before release.

I can easily see this list growing over time, as more applications and deployments of AI and DL occur. Perhaps this can be the equivalent of the Risks Digest, which documents the risks to public safety and security through the use of computers.

https://catless.ncl.ac.uk/Risks/


Direct link to the actual list instead of a tweet about the list:

https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOa...


Thank you! We've updated the link from https://twitter.com/mogwai_poet/status/1060286856493813760.

There are some good examples in the responses to the original tweet that aren't in the Google doc.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: