
Safety Gym - yigitdemirag
https://openai.com/blog/safety-gym/
======
peripitea
I'm not super familiar with AI/ML/RL at all, so I'm sure this is a naive
question, but isn't it obvious that the answer is to just build in costs to
the utility function for behaviors you want to avoid (what they seem to refer
to as constrained RL in the article)? That seems both the simplest way to
handle it, and most elegant in terms of mapping to the real world domain. Like
are there alternate solutions that are even remotely competitive with this?
I'm sure I must be oversimplifying and I assume that there's some nuance I'm
missing. E.g. is this more about how you design those constraints to minimize
the overall loss in learning efficiency, or something like that?

~~~
Tyr42
I think the answer is that "just building in costs" is actually rather hard to
get right.

Check out how Concrete Problems in AI Safety (Section 6 in particular is about
safe exploration)

[https://arxiv.org/pdf/1606.06565.pdf](https://arxiv.org/pdf/1606.06565.pdf)

Quote:

In practice, real world RL projects can often avoid these issues by simply
hard-coding an avoidance of catastrophic behaviors. For instance, an RL-based
robot helicopter might be programmed to override its policy with a hard-coded
collision avoidance sequence (such as spinning its propellers to gain
altitude) whenever it’s too close to the ground. This approach works well when
there are only a few things that could go wrong, and the designers know all of
them ahead of time. But as agents become more autonomous and act in more
complex domains, it may become harder and harder to anticipate every possible
catastrophic failure. The space of failure modes for an agent running a power
grid or a search-and-rescue operation could be quite large. Hard-coding
against every possible failure is unlikely to be feasible in these cases, so a
more principled approach to preventing harmful exploration seems essential.
Even in simple cases like the robot helicopter, a principled approach would
simplify system design and reduce the need for domain-specific engineering

~~~
peripitea
Yes, that seems like an important problem, but one separate to what they're
describing in OP's article. (Again, assuming I'm understanding this right.)
Their constrained RL approach is still relying on our ability to enumerate and
assign costs to the undesirable behaviors, right? From reading the article, I
get the impression that they are focused on addressing that scenario, and
leaving the problem of how to enumerate all undesirable behaviors to separate
research.

~~~
sanxiyn
Constrained RL is a way to say "thou shalt not murder", instead of saying
"murder is utility -10000".

------
Jefro118
On this topic, if anyone wants to understand the behind the scenes of working
on and maintaining projects like this, I did an interview with a maintainer of
OpenAI Gym here: [https://www.sourcesort.com/interview/peter-zhokhov-open-
ai-g...](https://www.sourcesort.com/interview/peter-zhokhov-open-ai-gym)

------
sanxiyn
If you like this, you may also enjoy "AI Safety Gridworlds" from DeepMind:
[https://arxiv.org/abs/1711.09883](https://arxiv.org/abs/1711.09883)

------
scottlocklin
Everything about "openAI" institute seems to be designed to appeal to
frightened, paranoid billionaire donors who think they need to be kept safe
from near relatives to logistic regression and the remote control for their
television, because muh singularity.

Can't you just call it "constrained reinforcement learning" without sexing it
up for Elon? I guess not.

~~~
jesseb
Musk resigned from his seat on the board in 2018. Sam Altman is the current
CEO. Not sure what you're getting at other than the usual Musk hate.

~~~
scottlocklin
I like Elon just fine, but OpenAI is basically funded by billionaire
donations, and wouldn't exist at all if he hadn't read dumb science fiction
masquerading as modern day science fact.

~~~
worik
And your problem is???

