
Faulty Reward Functions in the Wild - apsec112
https://openai.com/blog/faulty-reward-functions
======
jayajay
> ...by prioritizing the acquisition of reward signals above

> other measures of success.

This is also true for humans in poorly designed systems. For example, kids
become experts at passing tests irrespective of mastering the material. In the
workplace, employees become skilled at clocking extra time without finishing
additional work. It's reasonable to say that this would eventually emerge in
systems which approximate human behavior.

The video shown in the article could just as easily have been a human who just
discovered the bug, and wants to troll a bit. The key difference is that a
human would soon get _bored_. Our algorithms don't know about boredom outside
of the domain of the reward function.

After playing with a bugged state, A human would lose _just_ enough interest
so as to keep playing the game, but without any further interest in the
"bugged" state. A human is smart enough to know that there are various
microstates of such a "bugged" state, and to ignore those instances as well.

The algorithm is smart enough to find the hack, but it's not smart enough to
say "Hey, this is a non-solution, and I am not very happy about that". What is
it that makes a human decide to lose interest in such a bugged state? Are
these factors locally contained or are they due to external influences?

~~~
armada651
> What is it that makes a human decide to lose interest in such a bugged
> state?

Repetition, the human brain has a reward function that is interested in
finding new patterns. Using the same pattern to gain rewards has diminishing
returns in the human brain, eventually we don't get enough reward and we try
to find a new pattern. When this breaks down and the same pattern continues to
get the same reward you can potentially fall into an addiction.

So in the case of this AI, simply diminishing its reward if it uses the same
route every time to get that reward would prevent it from getting stuck in a
loop.

If you want it to actually finish the race though, you might want to reward it
a little for _following the direction of the course_. And it would make much
more sense if it was rewarded for _finishing the race first_ , humans are also
a competitive bunch after all.

By not rewarding the AI for those things, they just did a very bad job at
explaining the goals of the game.

------
weareschizo
This reminds me of how metric-driven companies can go off the rails when they
over-optimized for metrics that almost, but not perfectly describe their
actual goals.

~~~
Ironchefpython
> metric-driven companies can go off the rails

Any publicly traded corporation (save a small handful with a non-traditional
governance model) are metric-driven companies.

Modern corporations are paperclip maximizer functions executing on a network
general-purpose biological computational engines tied together with powerpoint
and email and excel spreadsheets.

Want to know what the AI of the future will look like? It will be a lot like
Comcast, because it will be built by Comcast and harnessed to the corporate
goals of Comcast and thus will have the same value system as Comcast.

The only thing it will lack is Comcast's institutional incompetence, as it
will be Comcast's goals executing on dedicated hardware and not semi-
autonomous employees. And it will build a dedicated model of every man and
woman on the planet, and use that information to build a personalized profile
that will determine exactly how many illegitimate charges it can cram on your
bill before you'll suffer through a customized cancellation service that is
calibrated to your personality and mental state to be just painful enough to
drive you to the brink of suicide. And the only reason it's merely to be
brink, is because a dead customer is an unprofitable one. (and if you think
that's hyperbole, you have a far brighter view of the future than I do)

~~~
otakucode
I don't expect it will go that far at all. Surely a company will test the idea
of having an AI give executive-level guidance, but when it does so, it will be
hastily dismantled. Companies do not structure themselves in the way they do
and act the way they do timidly. The C-level executives are not worrying that
they could be doing things better. You can see this clearly as essentially
every single large company consciously and intentionally ignores research.
There are over a thousand studies showing that open floor plan offices are
abyssmal for productivity and actively reduce the profit a company earns. Any
AI would tell the execs to have the majority of their workers work remotely,
and give the few which remain in company facilities private offices. And the
execs would summarily ignore it and cancel the project, declaring it a
failure.

They have faith that they are doing the right thing. Research tells them they
are not, and they either do not care, or, in my opinion, desperately guard
their self image as a 'leader' over and above any concern for profit or long-
term viability of the business. An AI would be permitted to take a strong role
in making decisions for a company only insofar as it plays the role of toady,
having been tweaked and misconfigured to ignore all facts which could result
in it telling the executives that they are running 'their company' in the
wrong way.

~~~
Asooka
But what about a new company being ruled by an AI? Surely it would be more
efficient and make more money that competitors, eventually displacing them?

~~~
marcosdumay
Only as far as efficiency can displace entrenched companies anywhere. That is,
almost universally not, but it will happen in enough places to make some
difference.

Why new companies run by competent people don't displace current companies
today?

------
Veen
Humans often do the same thing:

[https://en.wikipedia.org/wiki/Goodhart's_law](https://en.wikipedia.org/wiki/Goodhart's_law)

