
Learning from Human Preferences - darwhy
https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/
======
moyix
Lying as an emergent phenomenon:

> Our algorithm’s performance is only as good as the human evaluator’s
> intuition about what behaviors _look_ correct, so if the human doesn’t have
> a good grasp of the task they may not offer as much helpful feedback.
> Relatedly, in some domains our system can result in agents adopting policies
> that trick the evaluators. For example, a robot which was supposed to grasp
> items instead positioned its manipulator in between the camera and the
> object so that it only _appeared_ to be grasping it, as shown below.

~~~
metalliqaz
I think there is a case to be made that it's a feature, not a bug. Humans have
so much baggage we carry into everything we do. Is there any difference
between a person who has memorized multiplication tables vs a human well
trained in using a calculator? In most situations there is not, though we seem
to personally value the memorization method despite the increased cost. The
principle scales.

What you've pointed out is a wonderful demonstration of our need to improve
measurement and testing, which humans are typically horrible at. For a good
example, just look at government under any party. Policies are continually
argued over when that time would be better spent developing a good test and
solid measurements for policies and laws.

~~~
euyyn
> developing a good test and solid measurements for policies and laws

Most regulations are closed-loop in that sense. They create the way to measure
their effectiveness and be reevaluated accordingly.

------
nicklo
Super exciting seeing on-par performance of RL tasks with dramatically less
supervision.

Really looking forward to a follow-up where they explore 2.2.4 further.
Sampling examples which provide maximal information game seems like it could
result in another huge reduction in the amount of human oversight necessary.
Could see an adversarial scheme which could learn to sample these examples
optimally from the manifold. This kind of thing is powerful in human learning
of complex tasks to ask for clarification or feedback in specific places of
uncertainty.

------
RepressedEmu
Its doing a frontflip not a backflip but I guess thats secondary to how
efficient this kind of human-AI collaboration might turn out to be. Similar in
vein to the human-AI "centaur" chess teams.

~~~
nategri
Extending the concept of 'front' and 'back' to a vertical pile of hotdogs is a
separate, but relevant, problem.

~~~
RepressedEmu
Good point. I guess i just anthropomorphized the pile of hotdogs the
wrong(mirrored) way compared to the researchers.

