
Concrete AI Safety Problems - aston
https://openai.com/blog/concrete-ai-safety-problems/
======
lacker
This sort of "internal" approach to AI safety, where you attempt to build
fundamental limits into the AI itself, seems like it is easily thwarted by
someone who intentionally builds an AI without these safety mechanisms. As
long as AI is an open technology there will always be some criminals who just
want to see the world burn.

IMO, a better approach to AI safety research is to focus on securing the first
channels that a malicious AI would be likely to exploit. Like spam, and
security. Can you make communications spam-resistant? Can you make an
unhackable internet service?

Those seem hard, but more plausible than the "Watch out for paperclip
optimizers" approach to AI safety. It just feels like inventing a way to build
a nuclear weapon that can't actually explode, and then hoping the problem of
nuclear war is solved.

~~~
arcanus
Excellent post, I rather agree.

> As long as AI is an open technology there will always be some criminals who
> just want to see the world burn.

Thankfully we have yet to see terror attacks with nuclear weapons, but the
lower barriers of entry for potentially catastrophic AIs undoubtedly is
alarming.

I often wonder if analog solutions will not be critical. To continue with your
example, a nuke that must be armed with a physical lever would be a rather
great hindrance to any AI. Internet communications, etc. will be more tricky,
due to the high frequency activity, but the point of having meatspace
'firewalls' on mission critical activities is a nice short term kludge.

~~~
daveguy
Low barrier to entry for catastrophic AI? Surely you aren't talking about the
current state of AI. The current barrier of entry for an AGI (which is _at
least_ what it would take for catastrophe) is "does not exist". That's about
as big as barriers get. Even when the first AGIs show up it will take full
time supercomputers to perform like a single human brain. I don't see that as
a particularly low barrier.

~~~
pixl97
Eh. The barrier for entry for an AGI does exist, though it is currently
undefined, since we don't know what it is. The reason I say that is there are
at least 7 billion general intelligences running around this planet (and many
more if you consider animal intelligences) It is important to define it that
way, not that it is impossible, just that it is unknown how much effort is
needed to create an artificial one.

This distinction is very important when comparing the threat of AI with other
significant threats. Before nuclear bombs were built we could not tell you
what the difficulty was in creating one. Now that difficulty is well defined,
and we can use that knowledge to prevent them from being built by most
nations, except the most well funded.

If the barrier for entry for AGI (then ASI) is lower than we expect, then the
threat of AI is significantly different than if AGI/ASI can only be created by
nation states.

~~~
argonaut
The barrier for entry for an alien invasion does exist, though it is currently
undefined, since we don't know what it is. The reason I say that is there are
at least 1 bloodthirsty species running around this galaxy (and many more if
you consider the statistical possibility of life on other planets) It is
important to define it that way, not that it is impossible, just that it is
unknown how much time is needed before an alien invasion.

The reason I am framing things this way is we need to be _very_ careful here
because we are starting to turn towards speculation.

~~~
FeepingCreature
You know, you mean for that to sound implausible, but the Great Filter is in
fact an open research problem.

~~~
argonaut
I'm pointing out that this is all speculative and dangerously close to
science-fiction.

~~~
pixl97
You should learn the difference between what is impossible and what just has
not happened yet. Much science-fiction that was in the realm of possibility is
now science-reality. One should not need reminded they are communicating at
the speed of light over a global communications network capable of reaching
billions of people at a time. I'm sure at one point in the past that was
science-fiction, now reality. I don't believe you can show me any science that
points out why AI/AGI/ASI can be created, we simply are not at that level of
sophistication.

~~~
argonaut
Your argument is basically "some science-fiction has sometimes turned out to
be true." That doesn't counter the fact that this is just speculation.

~~~
pixl97
Um, pretty much, no.

Science fiction turns out to be true when physical reality agrees that it can
be true. This again, is why we have a global communications network and
personal wireless devices connected to it. This is also the reason we do not
go faster than light.

The reason we don't have flying cars is they are completely possible. They are
also terribly dangerous and expensive and a complete waste of energy.

The reason we don't have AGI is not that it is impossible, again if nature can
create it, we can recreate it. Since we don't have a good understanding of the
networked nature of emergent intelligence we cannot create a power optimized
network that would allow us to create a energy efficient version. AGI itself
is a complete waste of energy _at this point_. We already have many types of
AI that are energy efficient and used in products now.

~~~
argonaut
> Science fiction turns out to be true when physical reality agrees that it
> can be true

This is a ridiculous argument. Furthermore, even if it were true, it tells us
nothing about the timeline. It could take 10,000 years for all we know.

------
colah3
Paper:
[https://arxiv.org/pdf/1606.06565v1.pdf](https://arxiv.org/pdf/1606.06565v1.pdf)

Google Post: [https://research.googleblog.com/2016/06/bringing-
precision-t...](https://research.googleblog.com/2016/06/bringing-precision-to-
ai-safety.html)

It was a pleasure for us to work on this with OpenAI and others.
John/Paul/Jacob are good friends, and wonderful colleagues! :)

~~~
leblancfg
First of all, thanks for the wonderful work, and I hope there's much more to
come from your team! In fact I'm really pleased one of the authors came down
to HN to comment.

I think the scariest part of AI security is when the program itself becomes
unfathomable. By that I mean, we can't just look at the source code and go
"Ah! There's your problem". Now, your paper assumes a static reward function,
but we can imagine the benefits of an AI that could dynamically change its
reward function, or even its own source code.

In fact, the most powerful tool I can think of to train a multi-purpose agent
is through evolutionary methods, and genetic algorithms. Take for example the
bigger ideas behind
[https://arxiv.org/abs/1606.02580](https://arxiv.org/abs/1606.02580)
[Convolution by Evolution: Differentiable Pattern Producing Networks] and
[http://arxiv.org/abs/1302.4519](http://arxiv.org/abs/1302.4519) [A Genetic
Algorithm for Power-Aware Virtual Machine Allocation in Private Cloud], and
determining the fitness of agents by the global accuracy on a large number of
broad ML tasks. But I digress...

Given enough computing power and time, these have the possibility of ending in
an "outbreak-style" scenario. [ _This exercise is left to the reader_ ]. And
the way AI ideas and methods are so rapidly disseminated and readily
available, it's safe to imagine that it could happen in a relatively short
time span.

Here's my question: I know you're with Google Brain, but do you know if OpenAI
is actively researching these avenues of "self-determined" agents? For their
first security-related article, I was expecting security measures along the
lines of: safety guidelines for AI researchers, containment and exclusion from
the Internet, shutdown protocols for the Internet backbone, etc. I get the
impression some of these issues might rear their ugly heads before our
cleaning robots become cumbersome.

P.S. Looking at your CV, it's funny to see that you once interned at
Environment Canada. I'm also working there presently, during which time I can
perfect my knowledge in ML to eventually transition careers. Small world...

Edit: Grammar.

------
fiatmoney
These are not asking the right questions, although they kind of hint at it,
and they are not fundamentally questions about AI. Example: "Can we transform
an RL agent's reward function to avoid undesired effects on the environment?"
Trivially, the answer is yes; put a weight on whatever effect you're trying to
mitigate, to the extent you care about trading off potential benefits. They
qualify this by saying essentially "... but without specifying every little
thing". So - what you're trying to do is build a rigorous (ie, specified by
code or data) model of what a human would think is "reasonable" behavior,
while still preserving freedom for gordian knot style solutions that trade off
things you don't care about in unexpected ways.

The hard part is actually figuring out what you care about, particularly in
the context of a truly universal optimizer that can decide to trade off
_anything_ in the pursuit of its objectives.

This has been a core problem of philosophy for 3000 years - that is, putting
some amount of rigorous codification behind human preferences. You could think
of it as a branch of deontology, or maybe aesthetics. It is _extremely
unlikely_ that a group sponsored by Sam Altman, whose brilliant idea was
"let's put the government in charge of it" [1], will make a breakthrough
there.

I don't actually doubt that AIs would lead to philosophical implications, and
philosophers like Nick Land have actually explored some of that area. But I
severely doubt the ability of AI researchers to do serious philosophy and
simultaneously build an AI that reifies those concepts.

[1] [http://blog.samaltman.com/machine-intelligence-
part-2](http://blog.samaltman.com/machine-intelligence-part-2)

~~~
argonaut
You're dismissing the paper for not asking the right questions, but you don't
propose any questions that you think are better.

> The hard part is actually figuring out what you care about, particularly in
> the context of a truly universal optimizer that can decide to trade off
> anything in the pursuit of its objectives.

This seems basically equivalent to what they are saying. A reward function
that rewards "what we actually care about." This might seem vague, but that's
fine because these are only proposed problems.

~~~
akvadrako
I'm not sure what point you are trying to make. It's possible to dismiss an
idea without providing an alternative. Yes, finding a reward function is
equivalent to figuring out what we care about. Both are about as hard as
teaching a bacteria to play piano.

The goal is avoiding unsafe AI. The reason such pointless efforts are wasted
on this approach is we don't have a good alternative. The only thing I can
think of is delaying it's creation indefinitely, but that's also a difficult
challenge. For example, in the Dune books, the government outlaws all
computers. That might work for a while.

~~~
argonaut
Let me elaborate. It is _easy easy easy_ to nitpick and find holes in
someone's proposals, someone's problem statements, and someone's goals in
life.

Statements are adding _noise_ and less than nothing of value if they just
consist of telling people they are working on the wrong thing... and not
proceeding to tell them what they should instead be working on, and giving
clear _positive_ reasons why (instead of _negative_ reasons someone should not
be working on something).

Incidentally this is a broader problem with HN discourse.

------
arcanus
In a variety of engineering fields, including but not limited to software, we
have wonderful tools to track down and eliminate 'bugs'. While high standards
are often not upheld, the concepts are largely sound.

In particular, I'm talking about verification and validation testing. I'm
curious why generally these approaches are not being leveraged to ensure
quality of output here.

I suspect this is because of the persistent belief that AI will annihilate
humanity with one mishap, but I'm suggesting that we approach this much more
like traditional engineering problems, such as building a bridge or flying a
plane, whereby rigorous standards of are continually applied to ensure the
system behaves as designed.

The resulting system will look much more like continuous integration with
robust regression testing and high line coverage than it will be the sexy
research ideas presented here, but I can't help but think it will be more
robust. These systems are too complicated to treat them as anything but a
black box, at least from a quality assurance standpoint.

~~~
pixl97
Err. Great engineering failures are never one mishap creating a problem. They
are many issues all meeting at a critical point. The problem with intelligence
is unexpected emergence, a higher order problem occurs out of simple parts in
an novel and unpredictable way.

------
Animats
From the article: _Safe exploration. Can reinforcement learning (RL) agents
learn about their environment without executing catastrophic actions? For
example, can an RL agent learn to navigate an environment without ever falling
off a ledge?_

Yes. That's why I was critical of an academic AI effort which attempts
automatic driving by training a supervised learning system by observing human
drivers. That's going to work OK for a while, and then do something really
stupid, because it has no model of catastrophic actions.

------
chrisfosterelli
Direct link to the paper:
[https://arxiv.org/pdf/1606.06565v1.pdf](https://arxiv.org/pdf/1606.06565v1.pdf)

------
pizza
Related to the wireheading problem [0], [1]

[0] [http://www.wireheading.com/](http://www.wireheading.com/) \- David
Pearce's ideas are.. _interesting_.. to say the least ;)

[1]
[https://wiki.lesswrong.com/wiki/Wireheading](https://wiki.lesswrong.com/wiki/Wireheading)

------
DrNuke
I may be stupid and I am indeed but it is insanely straightforward today (not
tomorrow) to put a gun on a drone and tell it to image recognize some targeted
1.3-2.3m tall biped with oval head and shoot him/her down.

~~~
xyience
Is your point that there are unaddressed safety concerns with existing tech?
While true, none of them are really existential threats, whereas something
with greater than human intelligence yet none of the limitations of a single
biological body to upkeep is such a threat.

------
Mendenhall
I always get the feeling AI is going to be like nuclear capability. Great
reason to create it, but then once its made everyone wants to get rid of it.

~~~
JoshTriplett
There's a huge difference, though. Creating a nuclear weapon encourages others
to do the same. An AGI, if done right, would never allow the creation of a
second one with conflicting values; there should be no _second_ AGI.

~~~
argonaut
You are mistaking an AGI for an artificial superintelligence (I might also add
that the very concept of superintelligence is pure speculation - basic AGI at
least can be grounded in replication of human brains). The first AGI will be
closer to a low-IQ human than a Machiavellian super-optimizer.

~~~
xyience
I don't think there's much reason to be confident that the first AGI will be
like a low-IQ human, at least for very long, even if it starts off as an
emulated human brain. Machines have the huge advantage of faster materials
(neurons are slow), perfect memory, perfect calculation, ability to scale
horizontally, backups to restore in the event of bad self-modification
experiments, and no need for things like sleep and food to take up its time
from learning, improving itself, and acting upon the world.

~~~
argonaut
You've omitted the _huge, overwhelmingly outweighing disadvantages_ of
intelligence in machines, which is that _they don 't work_. Given that
research progress is incremental, there really isn't reason to believe we will
jump from narrow AI to super-smart AGI, instead of narrow AI to dumb AGI to
smart AGI to super-smart AGI.

~~~
xyience
There are historical examples of discontinuities, though I don't think the
FOOM debate will be settled soon. It may be quite a while before we even get
to "dumb AGI", but the hardest part of that is the G part. Right now "they
don't work" is indistinguishable from "they don't exist", but if we get that
G, I don't see how you could claim either. From there, even if we suppose it's
another huge leap to get to true super-intelligence instead of a FOOM, the
time to get to merely smart AGI, and indeed smarter-than-human AGI, would be
short, if only for the basic advantages of a silicon machine substrate. If all
we had were human brains running on silicon even, that would be enough to
quickly reach superhuman general intelligence, even if not true super- (or
perhaps ultra- as I.G. Good originally put it) intelligence that we expect for
a Singularity event.

~~~
argonaut
This makes no sense. This is just unsubstantiated speculation right now. There
is no reason to believe that dumb AGI will hardware-scale to smart AGI for
free. In fact most machine learning algorithms have diminishing (logarithmic)
returns with data and compute.

------
glaberficken
How would we program a self driving car that is faced with something like a
"Trolley problem" [1]. i.e. the car is faced with 2 possible probable
collisions of which it can only avoid one. Or between running over a
pedestrian and crashing into a tree.

I assume this probably already worked into the current prototypes. Does anyone
have references to discussions about this in current gen self driving car
prototypes?

[1]
[https://en.wikipedia.org/wiki/Trolley_problem](https://en.wikipedia.org/wiki/Trolley_problem)

~~~
glaberficken
Oh! just found a few references in the exact wikipedia article I linked.

Patrick Lin (October 8, 2013). "The Ethics of Autonomous Cars". The Atlantic.
[http://www.theatlantic.com/technology/archive/2013/10/the-
et...](http://www.theatlantic.com/technology/archive/2013/10/the-ethics-of-
autonomous-cars/280360/)

Tim Worstall (2014-06-18). "When Should Your Driverless Car From Google Be
Allowed To Kill You?". Forbes.
[http://www.forbes.com/sites/timworstall/2014/06/18/when-
shou...](http://www.forbes.com/sites/timworstall/2014/06/18/when-should-your-
driverless-car-from-google-be-allowed-to-kill-you/)

Jean-François Bonnefon; Azim Shariff; Iyad Rahwan (2015-10-13). "Autonomous
Vehicles Need Experimental Ethics: Are We Ready for Utilitarian Cars?".
arXiv.org. [http://arxiv.org/abs/1510.03346](http://arxiv.org/abs/1510.03346)

Emerging Technology From the arXiv (October 22, 2015). "Why Self-Driving Cars
Must Be Programmed to Kill". MIT Technology review.
[http://www.technologyreview.com/view/542626/why-self-
driving...](http://www.technologyreview.com/view/542626/why-self-driving-cars-
must-be-programmed-to-kill/)

------
fitzwatermellow
> Can we transform an RL agent's reward function > to avoid undesired effects
> on the environment?

To me this is the toughest nut in the lot. Training a Pac-man agent to avoid
ghosts and eat pellets, in a world of infinite hazards and cautions! Any
strategies?

------
w_t_payne
We have well established techniques for developing systems which are safe and
exhibit high levels of integrity. We just need to make the tools that support
these techniques freely available.

~~~
adrianN
90% of the techniques for making reliable systems are careful requirements
engineering and even more careful testing. There is no secret sauce.

I don't think these techniques transfer easily to the AI field. While I might
be able to prove that the state machine that controls my nuclear power plant
always rams in the control rods in case something bad happens, it's a lot
harder to show that some fuzzy system like a neural network doesn't exhibit
kill-all-humans behaviours.

~~~
w_t_payne
You are right. There is no secret sauce. There are no magic bullets. Careful
requirements engineering and careful testing is absolutely what you need.

However -- many of these techniques _do_ transfer to the AI field -- albeit
with some tweaking and careful thought.

Requirements are still utterly critical. Phrasing the requirements right is
important and requires more than a passing thought -- particularly as concerns
testability.

A lot of it boils down to requirements that get placed on the training and
validation data sets; and the statistical tests that need to be passed: how
much data is required and how you can demonstrate that the test data provides
sufficient coverage of the operating envelope of the system to give you
confidence that you understand how it behaves.

The architecture is critical also -- how the problem is decomposed into safe,
testable and understandable subsets -- which has much more to do with how the
system is tested than how it solves the primary problem.

------
kordless
> Avoiding negative side effects.

Oh brother. Avoiding negative side effects is a wasteful proposition. Learning
from those side effects, however, is priceless.

~~~
conradk
Why is avoiding negative side effects a wasteful proposition?

------
mountaineer22
Any recommendations for relevant AI related sci-fi?

~~~
denzil
If you don't mind fanfiction, then Friendship is Optimal is quite interesting
read: [http://www.fimfiction.net/story/62074/friendship-is-
optimal](http://www.fimfiction.net/story/62074/friendship-is-optimal)

Also the related stories that explore this idea:
[http://www.fimfiction.net/group/1857/the-
optimalverse](http://www.fimfiction.net/group/1857/the-optimalverse)

------
JoeAltmaier
Its common _today_ to make a robot that kills anybody that comes within a foot
or two of it. Without any image recognition at all; much more damaging than a
gun; and 100M of them already deployed. This conversation is silly and
pointless, until we clean up the insane number of land mines deployed around
our planet.

~~~
gjm11
Land mines are awful and we should absolutely do something about them. But it
makes precisely no sense at all to say "no one should bother to think about AI
safety because land mines are awful". You might as well say "no one should
bother to think about land mines because cancer is awful" or "no one should
bother to think about cancer because aging is awful".

One problem isn't nonexistent or irrelevant just because there's another
problem that you regard as worse or more urgent. It's not even like solving
the problems of AI safety requires the same kinds of people or the same kinds
of resources as solving the problems of land mines; if you tell people not to
think about AI safety it's not really going to make them go away and solve the
land mine problem.

~~~
JoeAltmaier
Um, 100 million of them already out there? So locking the barn door after the
horse is gone.

I get it; we don't have land mines in first-world countries, and we _will_
have AIs, so AIs are more interesting to talk about. That's why we continue to
have land mines all over the world I think. Not our problem.

All the issues surrounding implacable AI killers on the loose are only
something to talk about, if you haven't lived with them for generations
already. Want to get real answers to sophomoric questions about robot killers?
Just ask the people who already know.

~~~
gjm11
> locking the barn door after the horse is gone.

It sounds as if that's intended to be an objection to something I wrote, but
I've no inkling what. I certainly didn't mean to deny that there are a hell of
a lot of them out there.

> we don't have land mines in first-world countries

I think the chances of a productive discussion would be greater if you didn't
leap straight to assuming bad faith on the part of the people you're talking
to.

Land mines are a big deal. They're a problem that needs solving. But you're
not merely saying that; you're jumping into a discussion of something else and
saying "you shouldn't be talking about this at all as long as there are land
mines".

Which would be at least somewhat consistent (albeit rude), if that were your
response to every HN discussion of things less important than land mines. But
it isn't. By the advanced technique of clicking on your username, I see that
you've been quite happy to participate in discussions of "table-oriented
programming", mobile phone headphone jacks, and off-by-one errors in audio
programming, and that you work in embedded software development. Are those
things, unlike AI safety, more important than land mines?

I doubt you think that headphone jacks are more important than land mines. So
why do you react to a discussion of headphone jacks by talking about headphone
jacks, and to a discussion of AI safety by saying it's ridiculous and
sophomoric to ask about AI safety when there are millions of land mines out
there killing people?

You're trying to make out that the reason is that land mines are _the same
kind of things_ as hypothetical unsafe AI systems because they are human-made
machines that kill people. But you're an intelligent person and surely you
can't possibly really believe that. To deal with land mines we need treaties
to stop them being deployed, we need ways of finding them that are cheap
enough to deploy in quantity and effective enough to be worth deploying, we
need ways of disarming them with the same qualities, and we need effective
help for people who get blown up by them. None of these bears any resemblance
to anything we might do about AI safety. To an excellent approximation, there
is no overlap between the people who can do useful work on AI safety and the
people who can do useful work on land mines. And the dangers don't arise in
the same way: land mines are dangerous because they are put in place with the
specific intention of killing anyone who passes, whereas in the scenarios AI
safety people worry about no one _intends_ the AI systems to cause trouble.

So that can't really be it, I think.

Why do you object to discussing AI safety but not to discussing mobile phone
headphone jacks, _really_?

------
daveguy
And whatever you do, don't let Randall Munroe teach it:

[http://xkcd.com/1696/](http://xkcd.com/1696/) (the current xkcd)

------
yarou
Seems like hyperparameter optimization to me. These techniques will be useful
in general when selecting your model.

------
logicallee
it seems the authors have retracted their concerns. The site is down now but I
got this screenshot [http://imgur.com/eL7GFOr](http://imgur.com/eL7GFOr)

