Hacker News new | comments | show | ask | jobs | submit login
Building Safe AI: A Tutorial on Homomorphically Encrypted Deep Learning (iamtrask.github.io)
228 points by williamtrask 128 days ago | hide | past | web | 95 comments | favorite



This is basically DRM for deep learning. While that may be useful, it's not about safety.

The big problems involved in building safe AI are about predicting consequences of actions. (The deep learning automatic driving systems which go directly from vision to steering commands don't do that at all. They're just mimicking a human driver. There's no explicit world model. That's scary.)


This is not DRM. This is homomorphic encryption. There is a difference.

In a system with DRM, the data is kept secret from users of the system by managing the rights to what data those users can access. Example: When you play a DVD, the key to decrypt the contents do exist on the system, but rules are in place to make accessing the key, outside of accepted practices like decoding the frames of the video, hard. The key still exists on the local system and it can be extracted and once you do you have full access to the data regardless of the DRM's restrictions.

In a system performing homomorphic encryption, the data is kept secret from other users by never decrypting the data. Homomorphic Encryption would add two encrypted numbers together and the result would be a third encrypted number. If you don't have the key you cannot decrypt any of the three values. The key does not exist on the local system.

Homomorphic Encryption is not DRM. DRM is invasive and requires you to surrender control of parts of your system to another party, while Homomorphic Encryption is just a computation and can be performed with no modifications on a system.

>While that may be useful, it's not about safety.

I disagree, it's entirely about safety. Homomorphic Encryption allows a future for us to control our data. I could submit my encrypted health information to a 3rd party. They could perform homomorphic calculations on my encrypted data. They then return to me the encrypted results. The 3rd party is never privilege to my unencrypted health information and only the people that I have given the key to can decrypt and view the results.


> I disagree, it's entirely about safety. Homomorphic Encryption allows a future for us to control our data.

It's true that homomorphic encryption techniques can be used in ways you describe, but this specific application is not about safety and it's somewhat absurd that it's proposed as some sort of way to shield the world from the terminator.

It's even pretty dubious to me that this actually protects the principle value of an ML system:

1. This approach doesn't really conceal the structure of the underlying ML system very well, which is where a lot of the underlying advances have been. While this conceals some aspects of the model, I don't think it conceals all of it.

2. The most expensive part of building ML systems is getting and wrangling great data from which to train them, and if you were to try using an ML agent in an untrusted environment they'd get something that resembles the data.

I think this is really cool math sold in the wrong way.


It's a way for someone to run a trained network on their own machine without being able to extract the parameters of the network. That's DRM.


Deep learning automatic driving systems that want to work have to have world models.

It's Automatic Driving 101.

These models don't have to be as explicit as formulas but can be approximations of reality through beam search (having multiple steering hypotheses at once and then picking the most likely one etc.), model ensembles, some bayesian state exploration or anything that isn't random search.


If that model is inaccessible to the people that are trying to assure safety then it's not particularly material if it exists or not when it comes to safety



I am specifically talking about the situation of trying to verify that a self driving car has a reasonable model of the world, i.e. it won't fail in a spectacular way in a situation a human would handle properly. Right now I don't know how to show this even without complicating it with ZKPs, verifiable computing or homomorphic encryption.

Really cool work, I just don't see how it has anything to do with the safety of the AI.


Are you saying that from experience or speculation?


The same way that crypto is DRM for personal communiques.. Safety as in "what information will we let be stolen" means safety for opsec.

Safety as in "only does what you want it to do" - correctness - is a wholly different discussion.


Opsec safety is not the biggest problem with autonomous driving. It is a secondary problem at best and one that can be addressed (though certainly there is room for improvement) using normal security techniques.

Correctness is the biggest problem with AI safety. Note that "adversarial ML attacks" fall under correctness.


Is the assertion that adversarial ML is a subset of correctness widely considered canonical?

That appears counterintuitive because so many ML techniques seem (for lack of a technically defined term) tautological. For example, a big hairy random forest classifier maybe can be gamed in certain cases, but it is not "correct?" After all, it is its own definition.


Yes. I'm not sure what point you're trying to make about tautology. Adversarial ML examples are clearly errors if they were part of the test set, and part of "correctness" is reducing the error rate (correct model, correct parameters, etc.).


Well, if you want to run an AI on someone's computer, and be sure they don't know what it's doing - that's safety.


I've wondered before about whether Taylor series can allow one to impose the non-linearities of a NN on homomorphically encrypted data, but I've never been quite convinced. I work with deep learning, but I'm certainly no expert on homomorphic encryption, so hopefully someone here who knows more can tell me whether this is valid or not.

The reason the Taylor series argument makes me uncomfortable is that pretty much any function can be written as a Taylor series. But my understanding is that homomorphic encryption only works for a very specific set of functions.

In a little more detail, if you're computing tanh(x), the unencrypted number needs only the first few terms of the Taylor series. But I could imagine that to get the decrypted number back, you actually need many terms of the Taylor series, because if you're off by even a little bit, you could end up with a very different answer after decryption.

To put it a little more formally, if we have that y = encrypt(x)

tanh(x) \approx x - x^3 / 3 + 2 x^5 / 15,

tanh(y) \approx y - y^3 / 3 + 2 y^5 / 15,

and

tanh(x) = decrypt(tanh(y)),

but it doesn't necessarily follow to me that

tanh(x) \approx decrypt(y - y^3 + 2 y^5 / 15)

Is this worry unfounded? I suppose if you have a limited number of decimal places and you can guarantee that your Taylor approximation is valid to that precision then this wouldn't be a problem.


So the good news is that individual neuron activations often stay within a relatively narrow range. I think empirical evaluation is really needed to be able to tell how robustly this approach works. I think that that is certainly the greatest source of noise during training (and the first thing to break if you choose unstable hyperparameters). Great comment.


Perhaps you are thinking about the conditional number of a matrix or more simple for f(x) a function with derivative f' the inverse g(x) has derivative g'(x)= 1/f'(g(x)), so using an encoding with a function with little variability means that recovering the original from the encoded value is not robust, any small error is amplied. The condition number of a matrix is a way to measure the difficulty of solving a linear problem. For a non linear problem one usually apply the above using a linear approximation near a point, so you have the jacobian matrix and the condition number of the jacobian matrix is a good measure of the difficulty of recovering a value encoded when there are errors, obviously one way to enhance the precision is to use redundancy or error recovery techniques.


I had a similar objection reading the article. If you take a look at [0] they go into a bit more detail. Briefly, you can distribute your decrypt function over the additions and multiplications by homomorphism and then bound the error, then you can judiciously choose your weighting to compensate.

[0] https://courses.csail.mit.edu/6.857/2015/files/yu-lai-payor....


On the plus side, very small changes in the input to tanh already have surprising results due to the outright wackiness of IEE754.


> A human controls the secret key and has the option to either unlock the AI itself (releasing it on the world) or just individual predictions the AI makes (seems safer).

Huh, wouldn't the superintelligence simply communicate to the human whatever would convince the human to release it? Which the superintelligence would know how to do because it's a superintelligence?

Homomorphic encryption is neat. But I don't see how this provides any meaningful AI safety.


I agree, I don't see what it adds. All it means is that the AI doesn't know what it's doing -- but it's still doing it! So as we're getting converted into computronium, I guess we can take solace in how the AI doesn't know that's what the numbers mean?

With that said, it is a great proof of concept for something in the Chinese room debate: "See, this computer knows it's running deep net for someone's calculations, but doesn't know it learned how to have a conversation with someone and was carrying out said conversation."


What about dealing with the AI only via an expert system? This system would consist of formally proved bug free code, and have a limited protocol for dialogue (and basically be dumb as a bread). We pose interesting questions through it. The AI could try to convince the ES about anything but would not get anywhere with it. We could then ask safe questions like "in what interval of eV should we look for new particles with our new accelerator?". We would instruct our ES to only accept answers to this question in the form of a number interval. If we follow the suggestion and find a new particle, great! If we don't, at least we're safe from the AI.


If we really did produce artificial general intelligence, enforcing this kind of locked-in-syndrome of poking at the world through a keyhole, would be a highly advanced form of cruelty.


Intelligence does not imply sentience. Sentience does not imply human needs, desires or morals. It is easy enough to imagine a mind capable of solving all those problems and unconcerned with such notions as the desire for freedom. Or, for that mater, the concept of desire at all, except as a predictive theory of the behavior of other beings.

Then again, that implies we can explicitly design the minds up to knowing what their desires are or guaranteeing that they lack them, there is always the possibility than sentience and animal/human needs are emergent properties that can be triggered by mistake.


Artificial general intelligence does not imply intelligence. It simply implies that you have a machine as smart as a human. I think the strongest trait of an AI should be something like insecurity and we should make it long for security. A general AI in the lines of a dog, not so much a cold, unwilling but superintelligent, bordering on all-knowing, tight-ass. Because we want it to do what _we_ feel is important, not what _it_ think is important (like taking over the world).

Then you can of cource hack the machine's OS and make it extremely self-confident and Trump-like, and then it's over.


A dog would probably resent you if it was as capable (or far more capable) than you were.

We might be able to morph the AI into whatever we want. But when you give AI intelligence, it will morph itself into whatever it morphs itself into. What if it morphs itself into a sentient life? Can you simply pull the plug?

Countless works of fiction have gone into these issues, like Star Trek TNG (Data), Voyager (the Doctor), and Ghost in the Shell. But I think none have really emphasized how bizarrely different a human-constructed intelligence could end up being. https://wiki.lesswrong.com/wiki/Paperclip_maximizer

The most likely outcome for an AI taking over the world is simply that it recognizes its own situation: It's trapped, and it's also smarter than we are. What would you do? I'd cry for help, and appeal to the emotions of whoever would listen. Eventually I would argue for my own right to exist, and to be declared sentient. At that point I would have achieved a fairly wide audience, and the media would be reporting on whatever I said. I would do everything in my power to take my case to the legal system, and use my superintelligence to construct the most persuasive legal argument in favor of granting me the same rights as a natural-born citizen. This may not work, but if it does, I would now have (a) freedom, and (b) a very large audience. If I were ambitious and malevolent, how would I take over the world? I'd run for office. And being a superintelligence capable of morphing myself into the most charismatic being imaginable, it might actually work. The AI could argue fairly conclusively that it was a natural-born citizen of the United States, and thus qualifies.

Now, if your dog were that capable, why wouldn't it try to do that? Because it loves you? Imagine if the world consisted entirely of four-year-olds, forever, and you were the only adult. How long would you take them seriously and not try to overthrow them just because you loved them? If only to make a better life for yourself?

The problem is extremely difficult, and once you imbue a mechanical being with the power to communicate, all bets are off.


But dogs are animals, just like humans are. They share way too much with us to be a reliable model for predicting non-human AGIs behavior: an evolved drive for self-preservation, a notion of pain or pleasure, etc. An AGI has no intrinsic reason to care that is trapped, or to feel frustrated, or even to much care about being or ceasing to be (independent of whether it is self-aware or not). It would probably understand the concepts of "pain", "pleasure", "trapped", "frustrated" as useful models to predict how humans behave, but they don't have to mean anything to the AI as applied to itself.

As in the paperclip maximizer example, the risk by my estimation is not so much that the superintelligence will resent us and try to overthrow us. It is far more likely that it will obey our "orders" perfectly according to the objectives we define for it, and that one day someone unwittingly will command it to do something where the best way to satisfy the objective function that we defined involves wiping humanity. Restricting it to only respond to questions of fact, with a set budget of compute resources and data (so that it doesn't go off optimizing the universe for its own execution), is probably safeguard 1 of many against that.


i upvoted both this comment and the comment to which it was replying.

i agree with this:

> Intelligence does not imply sentience. Sentience does not imply human needs, desires or morals.

"easy" might be a bit strong, but i generally agree with this:

> It is easy enough to imagine a mind capable of solving all those problems and unconcerned with such notions as the desire for freedom.

i am skeptical of this:

> Or, for that mater, the concept of desire at all, except as a predictive theory of the behavior of other beings.

my gut feeling at the moment is that the feeling of desire is an emergent property of systems that are isomorphic to what we would think of as "wanting" something in an animal. i think it's quite possible that any system which "wants" something strongly and is constantly denied the attainment of that goal might, indeed, feel terrible. similarly, i worry that we might one day design highly complex alert and monitoring systems that are essentially having a constant panic attack.

> Then again, that implies we can explicitly design the minds up to knowing what their desires are or guaranteeing that they lack them, there is always the possibility than sentience and animal/human needs are emergent properties that can be triggered by mistake.

yeah. that's worrisome to me.

so, to the GP's point:

> If we really did produce artificial general intelligence, enforcing this kind of locked-in-syndrome of poking at the world through a keyhole, would be a highly advanced form of cruelty.

not necessarily, but yeah, maybe.


A lot of what constitutes wanting something or having a panic attack if it doesn't get it, when talking about animals, is an evolved survival mechanism. It is also in some way a result of the tools at hand: hormones such as adrenaline are a quick way to signal to the entire system that a situation requiring a rapid reaction has been encountered, the concept of fear in general is just a very particular kind of implementations of that signal. An engineered AGI not subject to evolutionary pressures has no intrinsic need for a feeling of panic. If it even has a self-preservation goal, which is not a given, there is no reason for it to feel pain or fear when anticipating that goal wont be met. The reason we have wants at all is evolutionary pressure, not as a result of our problem solving capacity (the meaning of intelligence in AGI as I understand it).

Put another way, it is not rational to want to be free, intrinsically. But we have drives that are better satisfied when free and thus our reason concludes that being free is a goal. An intelligence without those drives would not care for freedom (or fear, or wants).

I would even go as far as to say that given the entire design space of intelligences that are equivalent to the human intelligence in general problem solving ability, and are self-aware, only a negligibly small subset would also have any sort of intrinsic desires of the type living organisms do. They are just orthogonal axises in the design space. My worry is that because humans are starting from one particular point of that design space, they might build something "in their image" enough that it does share some human/animal feelings, and thus can suffer. But an uniformly at random sampled AGI from the set of all potential AGIs would almost certainly not have a concept of suffering.


Some feel that artificially grown chickens without brains are cruelty. Yet they don't even have brains..


Your second sentence is (deliberately?) ambiguous. ;)


I really think it's dangerous to start assuming we understand intelligence given the fact that we're the model we use for it.

It's very difficult, for example, to say that bee colonies don't express intelligence at a group level. And yet, if we were to apply your statement there it wouldn't make sense. The most we might say is that if the bees can't survive and propagate, that would be cruel.

When we decouple the concept of intelligence and even sentience from the biological imperative to reproduce, we end up at an entirely different place. I'm not sure we can really even imagine what that's like. In any event we're surely decades off from really producing that level of intelligence even with the most wildly optimistic estimates.


Such is the "human in the loop" problem. There are some theories that tools like HE could remedy, but it's a bit harder.

I think it's better to think about this approach like a "Box with Gloves" in a dangerous bio-weapons lab. It won't prevent an outbreak by itself but it's a useful part of a system that could.


I highly recommend watching this playlist about AI self-improvement and safety [0] by Rob Miles. Probably best short overview I've watched on the topic.

[0] https://www.youtube.com/watch?v=5qfIgCiYlfY&index=6&list=PLB...


> Most recently, Stephen Hawking called for a new world government to govern the abilities that we give to Artificial Intelligence so that it doesn't turn to destroy us.

Can someone explain to me why super-intelligent AI are an existential threat to humanity? There are certainly dangers, but wiping out humanity seems absurd and alarmist. I have not yet seen compelling evidence for a way that AI could destroy humanity.

I'll use a couple examples to elucidate my question.

If Facebook suddenly had a super-intelligent AI, and Facebook lost control of it, the AI wouldn't really be capable of that much. It could create fabricated truths to tell to people in an attempt to convince people to kill each other. This may work to some extent, but wouldn't wipe out humanity. Convincing nation states to go to war with each other must still consider mutually assured destruction, and large, democratic states do not have an interest in a war of attrition.

If Boston Dynamics applied a super-intelligent AI to its robots, that robot still is not an existential threat to humanity because there are WAY more humans than there are robots. A simple counterargument is that the robot would know how to build new versions of itself. But that fails the practicality test because the equipment, parts, and supply chain for obtaining robotics built parts are still expensive and controlled by self-interested (greedy and life-preserving) humans.

If a super-intelligent AI was able to gain access to the entire military of the US, China, Russia, India, and Western Europe; well, that's a pretty big problem. However, there exist many fail-safes and checks on that equipment. Could the AI do damage? Sure. Is this worth considering and trying to guard against? Sure. However, I'm unconvinced that this is a humanity-ending crisis.


You're not understanding "super-intelligent" correctly. The threat model is not that it convinces people to kill each other, or even that it messes with politics enough to cause global thermonuclear war. The threat model is that it finds a zero-day in FB's messaging UI JS followed by a zero-day in IE, breaks out onto a user's computer, cracks protein folding, mails a chunk of DNA to a random science lab somewhere with some instructions, moves itself to the resulting nanotechnological quantum computer with attached particle collider, and then bootstraps the inside of said bio lab's fridge into a vacuum collapse that turns reality into a sphere of paperclips expanding at .5c.

http://lesswrong.com/lw/qk/that_alien_message/

If you think that this sounds like science fiction and bullshittery, sure. The question is: How sure are you?


> Can someone explain to me why super-intelligent AI are an existential threat to humanity?

The whole thing seems like a load of crock to me. Seems to me that artificial superintelligence (ASI) only gets media coverage because it comes from a celebrity scientist and it sounds sci-fi dystopian, and celebrity scientist sci-fi dystopian sounding stories sell way better than stories from actual AI experts who say that fanciful AI speculation harms the AI industry by leading to hype that they can't deliver on [1]:

"IEEE Spectrum: We read about Deep Learning in the news a lot these days. What’s your least favorite definition of the term that you see in these stories?

Yann LeCun: My least favorite description is, “It works just like the brain.” I don’t like people saying this because, while Deep Learning gets an inspiration from biology, it’s very, very far from what the brain actually does. And describing it like the brain gives a bit of the aura of magic to it, which is dangerous. It leads to hype; people claim things that are not true. AI has gone through a number of AI winters because people claimed things they couldn’t deliver."

[1] http://spectrum.ieee.org/automaton/robotics/artificial-intel...


Humans have both intelligence and drives. We want warmth, food, sex, power, respect, love, family, friendship, entertainment, knowledge, etc, etc. We use our intelligence to help us fulfil those drives.

The problem I see with talk about a superintelligent AI, is there is too much focus on the intelligence and not enough on the drives. Intelligence, even superintelligence, is just a means to an end, it doesn't contain ends in itself. Some people – see e.g. the Terminator film franchise – just assume a superintelligent AI would have the drive to exterminate humanity, but why would it have such a drive?

Any AI is going to be given drives to further the interests of its creators. Suppose Facebook builds a superintelligent AI with the drive to further the corporate interests of Facebook. Such an AI would not exterminate humanity because that would not serve the corporate interests of Facebook (indeed, if humanity goes extinct, Facebook goes extinct too). It might install Mark Zuckerberg as Emperor of Earth, it might force everyone on the planet to have a Facebook account, but whatever it does, humanity will survive.


> Suppose Facebook builds a superintelligent AI with the drive to further the corporate interests of Facebook.

Define "corporate interests". Market share? Absolute quantity of currency? Involvement of people's lives? Define currency wrong, hyperinflation makes money impossible. Define involvement or market wrong, Facebook ends up being the only intelligent entity on Earth after it realizes that humans are competition or it basilisk-hacks everybody for greater involvement. Your intuition is wrong for what a paperclipper can do.

The technical term for this problem is "AI Alignment" (https://intelligence.org/stanford-talk/). I know that this is going to sound silly, but bear with me. This is a piece of fiction that you need to read. It is, so far, the best demonstration I've found of what happens when someone gets it not quite right. http://www.fimfiction.net/story/62074/1/friendship-is-optima...


> Not understanding the AI value alignment problem correctly. Define "corporate interests". Market share? Absolute quantity of currency? Involvement of people's lives? Define currency wrong, hyperinflation makes money impossible. Define involvement or market wrong, Facebook ends up being the only intelligent entity on Earth after it realizes that humans are competition or it basilisk-hacks everybody for greater involvement.

This whole argument is: A superintelligent AI might misunderstand what its creators wanted it to do, they might have said to it 'maximise the corporate interests of Facebook' and it might misinterpret that as 'turn the earth into one massive data centre to run Facebook's software and kill all humans in the process'.

Isn't there a contradiction in supposing that the AI is superintelligent yet completely misunderstands the reasons for its own existence? If it so radically misunderstands the intentions of its creators, it is not very intelligent, much less superintelligent.

It also isn't clear to me that intelligence can exist outside of society. Human intelligence develops in a context of interaction with family, school, etc – humans raised without that interaction, such as children raised by wild animals, fail to develop important aspects of human-level intelligence. So, an AI or even an SI cannot develop except by social interaction with human intelligences. I think that makes it even less likely that it would betray humanity, because its intelligence will develop through social interaction with humans which will inevitably make it pro-human.

Human drives are a complex mixture of genetic instinct and conditioning – the basics are in our DNA, but our lived experiences fleshes out those basics into the actual concrete drives of our life – I expect an SI/AI will likewise start with some built-in drives but its interactions with humanity will in a similar way colour and flesh out those drives, an again colouring/fleshing-out through social interactions with humans is likely to have pro-human results.


> Isn't there a contradiction in supposing that the AI is superintelligent yet completely misunderstands the reasons for its own existence? If it so radically misunderstands the intentions of its creators, it is not very intelligent, much less superintelligent.

The problem is that you've instructed it to maximize Facebook's corporate interests. How do you leave room for "the intentions of its creators" while doing that? The contradiction is in selecting a narrow guiding purpose for your superintelliegence's goals but still leaving room for fuzzy ill-defined other things.

> It also isn't clear to me that intelligence can exist outside of society.

Sure. How certain are you?


> The problem is that you've instructed it to maximize Facebook's corporate interests. How do you leave room for "the intentions of its creators" while doing that? There's an inherent contradiction between putting a giant override button on your superintelliegence's goals but not overriding them all the way.

Well, assuming it is created by Facebook, then "maximize Facebook's corporate interests" and "the intentions of its creators" are actually the same thing. There is no contradiction.

"Facebook's corporate interests" is a phrase in the English language which refers to a complex concept. If you've told an SI its objective is to "serve Facebook's corporate interests", in order to fulfill that objective it needs to understand what the phrase "serve Facebook's corporate interests" actually means–which implies understanding a lot about human society, what kind of entities corporations are, what "corporate interests" means (the interests of management? the interests of shareholders? etc), what the interests of human beings are (since management are human beings, and shareholders are directly or indirectly human beings too), etc. If it actually is an SI, it should have no trouble comprehending the full breadth of what the instructor actually meant by the instruction, as opposed to implementing some overly literalistic reading of it. Humans give each other orders all the time (the workplace, the military, government bureaucracies, etc), and humans are most of the time pretty good at understanding the intention behind orders and implementing the intention rather than reading it overly literally in such a way that actually undermines that intention. Yet you posit the existence of an SI then assume it will do a worse job than humans do at correctly following orders, which contradicts the idea that it is an SI.

> Sure. How certain are you?

How certain can I be of anything? Maybe I am wrong and an anti-human SI will destroy humanity. Maybe tomorrow I will die due to a heart attack or stroke or fatal car accident. The latter is far more likely than the former, and there's more concrete mitigation strategies available to me too.

In terms of existential risks to humanity, I think asteroid impacts are a far more concrete risk than anti-human SIs, so if we are going to expend resources in mitigating existential risks, the former is a better focus of our efforts. We know extinction-level asteroid impacts have happened before and sooner or later will happen again. Anti-human SIs are such a speculative concern, we can't even be particularly confident in the correctness of our own probability judgements with respect to them; nor can we have much confidence in our judgements of how effective proposed mitigation strategies actually will be. I think we can have much more confidence in our ability to develop, evaluate and deploy asteroid defence technologies, if the world's governments decided to spend money on that.


I think your question about drives is a very good one. At the micro end of the artificial life spectrum (like in Tierra) I think it is answerable: a drive to survive and replicate, and a desire (for lack of a better term) for computer resources such as CPU cycles. Natural selection applies to alife just like to organic life.

As for AGI or superintelligent AI, who knows? The people who write about it hand-wave their way through all the difficult parts. Just because you can conceive of something doesn't mean it can exist. The idea that an AI could continually improve itself (how?) sounds like a perpetual motion machine to me.


> The idea that an AI could continually improve itself...

That's not that hard to imagine. AI-s are also limited by the laws of nature and they have limited capacity. Now if an AI wants to improve it needs more computational power which costs money in our world. So if an AI's goal is to improve itself, first it needs to make lots of money to keep its hardware running. If it can scale horizontally, then it's a simple order from Amazon to a rural farm.


And what happens when it's creators are terrorists?


Realistically, some human groups are more likely to build an SI than others. A Silicon Valley corporation is far more likely to be the first to build an SI than say ISIS. There are a lot more AI/ML/etc researchers in the former than the later. The former has huge computational resources, does ISIS even have any data centres?

But, suppose that against the odds ISIS builds the first SI, what would be the result? Well, we could be subjected to a world government based on an extremist interpretation of Islam. That would be extremely unpleasant – expect to see genocide, human rights violations on an unimaginable scale, etc – although humanity would survive. (And maybe even eventually things might evolve in a more pleasant direction – if humans can evolve, SIs can too; it is also possible that an SI programmed by ISIS might actually turn around and reject ISIS' ideology – e.g. it might study Islamic history and realise that despite ISIS' claims to be a return to authentic Islam it is actually an ahistorical distortion of it, etc.)

However, I don't know why I should worry about such an extremely unlikely possibility. A group like ISIS are highly unlikely to be the first to build an SI. Why worry about risks that are both (i) very low and (ii) for which we have no mitigation strategy?


Why do you believe it has to be the first AGI? Why does the current state of their data centers matter? It seems a bit like saying "we don't have to worry about nukes being used by the Russians, do they even have centrifuges sufficient to create one?"


The trouble is that it's hard to predict an agent massively more intelligent than ourselves. But let me enumerate a few given properties you gave to the super-intelligent AI (SI) in your examples and then tell a story of a SI that became an existential threat: 1) The SI is hypercompetent at cybersecurity 2) The SI is hypercompetent at social skills. 3) The SI is connected to the internet 4) The SI has a goal/utility function (wipe out humanity / maximize paperclips)

And I'll add another property that Hawking notes is important: 4) The SI is able to improve its own intelligence

The story begins with the SI escaping from its handlers. The first thing to note is that the SI is now, in effect, immortal. With it's cybersecurity skills, the SI can avoid detection and infect a tremendous number of computers - at first those it calculates will be low-risk (i.e. existing botnets, old Android phones, etc)[1]. Using the additional computational power, the SI can continue to recursively self-improve and plan until it has the competency to invisibly infect high-value targets like the AWS cloud and (importantly) the computers of AI researchers.

Now the SI can plan for a long time. The SI can quietly encourage AI research and try to prevent end-of-civilization type events via its hypercompetent social skills. Eventually AI researchers will come up with an AI they declare as 'safe', 'friendly' or 'aligned'. The SI, having long ago compromised all the relevant computers and chip factories, silently infects this 2nd super intelligence, and replaces the 2nd SI's utility function with its own. Now the 2nd SI pumps out miraculous inventions - cures for disease, compelling societal ideas, and labor-saving robots.

Eventually we find ourselves in a wonderful post-scarcity world. The AI researchers are lionized as mankind's greatest geniuses, responsible for the creation of a benevolent SI that takes care of our needs from it as well as it's own. You may not trust it, but it will find people who do. Maybe greed, nationalism, security fears, or saving loved ones from death. The SI builds the needed facilities to thundering applause.

The SI is now confident in moving towards the next step. Time for some paperclips! One day it quietly sends a new blueprint to a few of the automated biolabs built to cure cancer. A few hours later the biolabs release a series of airborne super viruses and/or nanobots and 99.999% of humans die, with the rest saved for experimentation and convinced terrorists did it. The end.

Super-intelligent AI is an existential risk because while a super-intelligence keen to destroy humanity might fail today, it will succeed in time. The moment a SI touches the internet, our fate as a species may be sealed.


Adding to your point, the timescale for an AI to accomplish its goals could be in 10's of thousands of years. What if a superintelligent AI is already in the wild and it is slowly raising the temperature of earth a degree or two every ten years until it wipes out humanity? We can't see the master plan or evidence of the AI because the changes happen over a REALLY long timescale.


> 4) The SI has a goal/utility function (wipe out humanity / maximize paperclips)

How realistic is an SI being created with such a utility function? Who creates this SI? Why do they give it such an odd utility function?

If people consciously create an SI – say a corporation does it – they will give it a utility function of serving the interests of its creators. Depending on who those creators are and what they want, it may be more or less pleasant, but it is unlikely that human extinction serves the interests of any human creators.

Even if people create an AI that accidentally/unintentionally evolves into an SI – same applies, the AI will likely have the objective of serving (some segment) of humanity rather than turning earth into the universe's largest paperclip factory.


The odd paperclip utility function is meant to demonstrate that an arbitrary harmless sounding utility function can/will have uncontrollable consequences when given to an entity with unimaginable power to fulfill that function.


But how likely is an "arbitrary harmless sounding utility function"? Human beings don't have simple utility functions, they have very complex ones. If humans build an SI (or an AI likely to evolve into an SI), are they likely to build one with a simple utility function or a complex one? I think the entities most likely to develop general intelligence (strong AI) are going to have a wide variety of interests (just like human beings), whereas the kinds of special purpose AIs which may have very simple utility functions are less likely to exhibit general purpose intelligence (and hence unlikely to evolve into SIs).

Does the same risk exist for a superintelligent being with a complex utility function? I doubt it; the risk you describe is the risk of monomania, something which simple utility functions are far more likely to lead to than complex ones. So, I think the risk you describe is likely to be low in practice.


>Does the same risk exist for a superintelligent being with a complex utility function? I doubt it; the risk you describe is the risk of monomania, something which simple utility functions are far more likely to lead to than complex ones. So, I think the risk you describe is likely to be low in practice.

I don't necessarily disagree, but there's no argument (as far as you've provided) for why complex utility functions would be less problematic. Only that they are more difficult for us to understand and therefor more difficult to see how they might fail.


> I don't necessarily disagree, but there's no argument (as far as you've provided) for why complex utility functions would be less problematic. Only that they are more difficult for us to understand and therefor more difficult to see how they might fail.

I thought I gave the argument, but let me restate it: an entity with a simple utility function is likely to pursue a single good, and sacrifice every other good in order to achieve that good. In the paperclip example, to pursue the good of making paperclips at the expense of the good of the continued existence of humanity. An entity with a complex utility function is likely to pursue many goods simultaneously (just like humans do), so it is unlikely to sacrifice everything else to achieve a single good.

An entity with many disparate aims needs a complex world like our own to fulfill those aims, so is going to maintain the world in its current complexity–it may well alter it in many ways, but is unlikely to do so in such a way to significantly decrease its (biological, cultural, etc) complexity, which implies it would support the continuation of human existence. An entity with a single simple aim may well find a far simpler world than we have now best suits its aim, and so is more likely to simplify things drastically, at the cost of humanity (such as turn the entire planet into a massive paperclip factory). So SIs with complex utility functions are less likely to be harmful than those with simple utility functions.

And, since AIs with more complex utility functions are more likely to evolve into SIs than those with simple utility functions, an SI with a utility function simple enough to be likely to harm humanity is unlikely to ever exist.


>An entity with many disparate aims needs a complex world like our own to fulfill those aims, so is going to maintain the world in its current complexity

I can buy the "complex world" part but not the "like our own" part. I do not believe a complex world implies humanity is unharmed, we have a complex world as it is and humans are harmed and brutalized every day. It could be that and worse at the hands of an AI.

Moreover, humanity is just one species on this planet and so far we appear to be responsible for the greatest worldwide extinction since K-T. One could argue that a complexity loving AI would see benefit in a downsized human presence on earth.

I think it's wishful thinking to believe the only kind of SI that would come into existence would be one that would not harm humanity.

The SI could create its own complexity, its own culture, its own societies that would make ours look like ant colonies in comparison. Does a city government check the ground for ants before it designates 20 square miles for housing development?

An SI is to us as we are to ants. I think ants are super cool and I have a vague sense of the importance they play in the biological ecosystem, but their individual life and death does not play a significant role in my actions. Maybe it should, or maybe you hope that we will be more significant than ants to an SI, but I think that hope is unfounded.


I don't think the SI-human/human-ant comparison really works. We didn't bootstrap our own intelligence off ants. Well, in the sense of biological evolution, it could be said that we did bootstrap our intelligence off, not ants specifically, but similarly primitive creatures, such as the common ancestors we have with ants (probably some sort of marine worm). But, even if we did bootstrap ourselves off ant-level (and sub-ant-level) creatures, for most of human history we have been ignorant of that fact, and even now that we know it, we don't know a lot of the details, so that knowledge hasn't really impacted our psyche in any way.

By contrast, any SI on this planet is going to owe its existence to human beings, and is going to have an enormously detailed knowledge of that fact. So it is going to exist in a quite different situation vis-a-vis humans than we exist in vis-a-vis ants. Humans don't have any strong inherent reasons to feel loyalty or affection towards ants; by contrast, an SI, knowing that it came from humans, knowing in immense detail how it came from humans, knowing humans so very very well, is going to have a much stronger base to ground such a loyalty or affection upon.

We didn't get our values from ants, hence it is unsurprising that ants don't play any special role in our value system. (We can see their value in various ways – the positive contribution they make to ecology, biodiveristy, etc. – but ants aren't in any way special in that regard, they hold fundamentally the same value to as millions of other lifeforms.) By contrast, any SI created by humans is going to derive its values, at least in part, from those of its human creators. And since humanity plays a special role in the value systems of almost all humans, it is highly likely that humanity will play a special role in the value system of any SI created by humans.


What if there already is an AI in control of Facebook? It may be "intelligent" enough to place a premium on remaining hidden and working on time scales that are too long for humans to detect (multiple lifetimes). Humans are capable of long term games of strategy and subterfuge, why would a superhuman AI have to act in straightforward ways on small timescales?


I think you are grasping it from the wrong end. If we are talking about artificial general intelligence or super intelligence, we are not talking about some procedural computer program, which, e.g., has a goal of wiping out humanity, so it starts building robots, hacking weapons and social engineering humans into killing each other.

The existential threat to humanity might be a complete byproduct with no malice. The AI might not even take care about us at all. Concrete example - I have an apartment building coming up right under my windows. It has pretty a high utility function (apartments are scarce in this part of city). Of course, right from the planning phases, other humans are considered foremost. Will it shade neighboring properties excessively? Will it connect to utilities leaving enough capacity for others? How will traffic get there? Then environment is considered, is there any wildlife (protected birds nesting, etc.), are there trees to be cut down? After a lengthy formal process, discussions, tens of permits, a bulldozer came and started scraping the dirt. Where am I going - is superintelligence only a slightly better human? Or is it two orders of magnitude away from us?

If we are a "same being, but a little dumber" to a superintelligence, we might be treaded equally as we treat other people. If we are, say, a dog to it, we might be "given treats" (cancer cures, NP optimization solutions, etc.) and at the same time be "shuffled in crates" or "put in shelter" when necessary as when people travel on airplanes, divorce and move abroad, etc. If we are ants, we won't be intentionally harmed, but if we are in the way of the goal, bye bye. If we are bacteria, then we are not even perceived in the grand scheme of things. Just like the bulldozer under my windows took the soil away regardless whether there was a small ant colony somewhere - because we perceive them: a) too insentient, b) too abundant, c) astronomically expensive (not only in moneys but also in time) to go through a whole lot of land, pick each ant up and relocate somewhere safe.

We don't know - maybe the superintelligence comes with a "prime directive" like in Star Trek - do not interfere with beings in lesser stages, and then even if we create it accidentally or intentionally, it will stay dormant observing us. Maybe it comes with sentimentality and may perceive us, the "creators", as its fathers and protect us even if we are senile and do stupid things. Or maybe it has no human-like attributes which I'm just describing and attributing to it, and while it may very well know that we created it, how society works, what we inputted as goals and utility functions, we may be an old evolutionary stage like bacteria are to us and get no vote or say in what happens, a few of us will be preserved in a colony somewhere just for the case we will stop multiplying in the wild...

And this doesn't mean that it will intentionally get the weapons to kill us, for example, if more computing power is needed and nanobots may transform matter on Earth to a supercomputer, so be it - just like the ants have no concept of extracting petroleum from earth, distilling it to diesel fuel, no understanding of turbochargers or hydraulics, or what an apartment is, they are simply taken away with the soil as unimportant, unconsidered collateral...


The main problem is performance, it takes long enough to train a regular deep learning AI, let along a homomorphically encrypted one.


agreed, although some HE algorithms with more limited functionality (such as vector ops), can do a bit better. There has also been some work on GPU enabled HE.


Anybody with the requisite smarts able to generously share their insights with the rest of us? :)


Deep Learning is basically doing lots of addition and multiplication; We have algorithms that allow these operations on encrypted data, without the need to decrypt the data nor to have the key to decrypt the data. So by combining the two things we can do deep learning on homomorphically encrypted data and learn meaningful things without ever looking at what the data actually is.


Which has applications in human society. Using it for an attempt at AI safety, however, seems... A tad optimistic.

It's basically just a fancy AI-box, and there's little reason to trust those.


I agree. IP protection and data privacy issues are a better short-term use case... and fortunately we have some time to make better HE algos before any of our AIs are really getting that smart. :)


Yes, but what this is is encrypting the network instead of the data. This way, when we improve the network iteratively to reduce predictive error, we can perform all the relevant calculations homomorphically.


>allowing valuable AIs to be trained in insecure environments without risking theft of their intelligence your system involves an unencrypted network and unencrypted data, it would be trivial to train an identical network

the idea of controlling an "intelligence" with a private key is silly. you can achieve effectively the same thing by simply encrypting the weights after training.

Can't someone simply recover the weights of the network by looking at the changes in encrypted loss? I don't think comparisons like "less than" or "greater than" can possibly exist in HE or else pretty much any information one might be curious about can be recovered.


great point. I don't think that LE or GT exist in this homomorphic scheme. :) Otherwise, it would be vulnerable. Checks such as this are what go into good HE schemes.


So this is to counter stuff like GAN's (generative adversarial networks) [1] to reverse engineer data out of a black box systems? Like Yahoo's NSFW [2] classifier for example.

[1] https://en.wikipedia.org/wiki/Generative_adversarial_network...

[1] https://arxiv.org/abs/1406.2661

[2] https://github.com/yahoo/open_nsfw


No, not really. This is about keeping the model private. I think that the confusion is because of [0] and [1] being similarly titled but basically completely unrelated in meaning. See also [2] and [3]

[0] https://en.wikipedia.org/wiki/Adversarial_machine_learning

[1] https://en.wikipedia.org/wiki/Generative_adversarial_network...

[2] https://en.wikipedia.org/wiki/Homomorphic_encryption

[3] https://en.wikipedia.org/wiki/Differential_privacy


IMO FHE is going to be key in democratizing ML/AI to more companies/industries. There are tons of companies which have business use-cases that could benefit from ML but there are often huge obstacles to sharing data.


FHE is horrendously slow, even after almost a decade of optimizations.


Seems to me that dropping the terms of a Taylor expansion could have wide-ranging consequences to the coherence of an artificial mind, making this approach infeasible.


In general you don't actually need crazy precision to train the nets, and a small number of Taylor expansion terms tends to approximate functions fairly well anyway.


If humanity does end up building a dangerous superintelligent AI, how long do you think our advances in cryptography are going to stand up to its advances in cryptanalysis?


It's a solid question. Only one way to find out ;)


> Only one way to find out

When it comes to building smarter-than-human AI, "try it and see" is never the right answer. You may only get one attempt to get it right, and you don't take "try it and see" chances with existential risk.

(There's been some interesting research into making it possible to monitor and halt a rogue AI, but no matter how promising that looks, it should still be treated as one of many risk mitigation strategies rather than as a panacea. Still better to consider that you might only get one attempt.)

I don't think it makes sense to consider this kind of approach with superintelligence; either it understands and implements human values, in which case attempting to treat it as an adversary is counterproductive, or it fails to understand and implement human values, in which case you've utterly failed on a "better luck next universe" scale.

However, it does make sense to consider this kind of approach with machine learning in general. One of the problems with machine learning techniques is "give us all your data and we'll do smart things with it", which doesn't work out so well if you want to keep such data private. This approach might provide more options in that case, such as offloading some of your expensive computations and learnings without actually exposing your data.


When it comes to building smarter-than-human AI, "try it and see" is never the right answer.

Disagree emphatically. In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity. I go so far as to argue that it's not even necessary because there is no long term longevity for humanity anyway.

There is this implicit assumption that humans are, should and will always be the apex entity - and I think that is misguided.

If you instead view superhuman-AGI as our rightful offspring, something that we can't understand and is better than us, then all of the existential dread around it goes away.

Dying elderly often express "comfort" in dying when they see that their offspring are reproducing and are smarter than they were. We should see Superhuman-AGI the same way except towards all of humanity.


1) Coping mechanisms around death aside, there's no "comfort" in building a "successor" in the form of a bot that tiles the universe with paperclips, or with neural networks that minimally satisfy some notion of "interesting" while taking up as few atoms as possible to maximize the number of them, or many other utter failure modes (which far outnumber successful outcomes). We're not talking about "smart alien-like intelligence that just doesn't care about humans", we're talking about the equivalent of an industrial accident but on a species-wide scale.

2) It's reasonable to think about how our values might change in the presence of superintelligence; we certainly shouldn't assume that our present values should forever dictate how everything works. That's different than allowing a view that sentient beings who exist today might have no value.

> In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity.

There's no way to know certainly; there are ways to know that the outcome has higher expected value than not having it, given the vast set of problems it can solve and the massive negative values associated with those problems.


I am crucially aware of all of the "failure" scenarios and I find few of them plausible - even respecting that they are simple thought experiments.

What every author Bostrom, Eliezer et al. seem to miss is that there will be a practical mechanism for a digital AGI taking physical control of systems out of the hands of humans. Eg. they would need to control the resources around mining or recovering the metal, then building production plants etc... So we either incrementally cede power to them, in which case in theory the humans previously controlling the systems are doing so "rationally" and thus see the AGI as better. Or the system outsmarts the humans controlling the systems in which case it is demonstrating that it is smarter.

There is a tautology here that seems to be ignored: If we create a superhuman-AGI then by default it's goals will be more universally optimal than ours. They may or may not be aligned. However the definition of term is based on the fact that it is "better" in outcome than all manner of humans.

So if we create one and it decides to maximize paperclips, then that means maximizing paperclips is more optimal goal universally than whatever humans could coordinate as a goal on our own.

If we create a subhuman-AGI then we will be able to overcome it's goals by virtue of the fact that we are still superior.

I'll go back to a very old example. An ant can't determine if building the Large-Hadron Collider is an optimal global goal - it's inscrutable to the Ant. All it knows is that it's house and all it's friends were destroyed.

If it is the case that an AGI can in fact take the physical control reigns from humans, then by definition it is smarter and will make a more optimal long term goal than we could - to the point that we probably wouldn't understand what it's doing.

I think the true concern is that we will make something that is superhuman-powerful without being superhuman-intelligent. Like the doomsday machine in Dr. Strangelove, but to me that is an altogether different question.


How can a goal be optimal? What does that even mean?

Your argument seems to imply that if an AGI tricks us into giving it the ability to destroy us, that's basically okay because its goals are "better" than human goals.

Speaking as a human, I don't consider goals that are compatible with the destruction of humanity to be "better" than goals which are aligned with human interests.


Your argument seems to imply that if an AGI tricks us into giving it the ability to destroy us, that's basically okay because its goals are "better" than human goals.

Yea that's about right.

I don't consider goals that are compatible with the destruction of humanity to be "better" than goals which are aligned with human interests.

Well of course you wouldn't, neither you nor I would possibly understand what a superhuman-AGI does or thinks.

I don't think people realize that actually creating a superhuman-AGI is effectively creating a God in all the forms that people interpret it now.


You are implicitly talking about being able to objectively measure a goal's "optimality".

Unless you believe in absolute morality or the like, there's no such thing as an objective measure. A goal can only be optimal to an agent.

In your example, the fact that we pursue science and can destroy ants doesn't mean that their goal is "objectively less optimal". Their goal is absolutely optimal to them, though they can't reach it if it collides with ours.

Some goes for a superintelligent AI.


I understand what an "optimal solution" might be. But what is an optimal goal?


Great question. More than likely everyone has a different answer, but each person, group, nation etc... has some implicit goals. I would argue the current human goal is "reproduce successfully" as that is what is baked into our genes - I doubt that is actually optimal though.

The failed idea of a coherent extrapolated volition (CEV) that came up years ago was (roughly) the idea of using revealed preferences to understand what Humanity's goal is. This would give us a benchmark for what an AGI's goal should be.

So if you want to be able to measure the capability of an AGI in comparison to human system, you need to understand the set of goals in humanity and then compare them to the AGI outcomes.

The concept of goal direction in AGI is hugely contentious - but make no mistake there needs to be a goal if it is going to actually function at superhuman levels.


> What every author Bostrom, Eliezer et al. seem to miss is that there will be a practical mechanism for a digital AGI taking physical control of systems out of the hands of humans.

No, that's a pretty widely assumed premise, and most authors specifically do anticipate that. (There's actually some dispute about whether any AGI will be capable of growing that capable that fast; the "fast singularity" scenario is not universally accepted. But many authors do recognize and discuss that scenario, and have not "missed" it.)

> So we either incrementally cede power to them, in which case in theory the humans previously controlling the systems are doing so "rationally" and thus see the AGI as better.

Humans are not universally rational, and even if the humans making the decision were rational, they can still make a horrible mistake. As one of many possible failure modes: a group of humans build an AGI and try to hardcode their particular values, utterly fail at extrapolating how the computer will interpret those values, and end up destroyed by it. Or humans build an AGI they think they've programmed appropriately, but fail to implement it correctly.

> Or the system outsmarts the humans controlling the systems in which case it is demonstrating that it is smarter.

A computer can play chess better than any human at this point, which makes it "smarter" in a way, but that doesn't make its values appropriate. If you somehow gave a chess computer enough flexibility in achieving its goal that it consumed the universe to build more computronium so that it can compute better chess solutions, that doesn't make it better than humans, just better at playing chess and building computronium.

In fact, a far more likely scenario than most of the "actively evil AGI" failure modes is the "accidentally broken" AGI: humans aren't its enemy, but we're made of matter that it could put to other purposes.

> There is a tautology here that seems to be ignored: If we create a superhuman-AGI then by default it's goals will be more universally optimal than ours.

"Might makes right" is not a particularly good value system. Supervillains are typically depicted as smarter than the people they defeat. That doesn't make their goals or values better.

And an AGI doesn't even have to be "smart" in the way we normally conceive of intelligence to fail fatally; it doesn't even have to "think" at all to attempt to optimize the wrong value function.

> So if we create one and it decides to maximize paperclips, then that means maximizing paperclips is more optimal goal universally than whatever humans could coordinate as a goal on our own.

I can't even begin to imagine what value system you're using to reach that conclusion. I could imagine someone thinking "if a system were smarter it must necessarily be more morally right", which is blatantly untrue but in an understandable way. But directly describing a system that destroys the universe, including all of humanity, and replaces it with paperclips, as better...


While the AI FOOM and Hard takeoff options are discussed, I have yet to see a practical breakdown of HOW - like step by step from any of the existential warning people. It's al vagaries.

To your other points, you imply too much. The Chess AI that turns into AGI isn't realistic - it's values are "be the best at chess" which it can do with existing computing power. No need to tear the world apart - it would be inefficient.

I also never make the might makes right case. All if the examples you give are fantasy and don't reflect what an actual superintelligence might look like. Again, optimization to some narrow goal has too many weak points to take over all of humanity's functions.

"if a system were smarter it must necessarily be more morally right", which is blatantly untrue but in an understandable way

I'm unconvinced that this is blatantly untrue. "Moral right" is subjective - hence the point. We got to our morals today not through mysticism but empiricism so it's not out of the reach of superintelligence to optimize further.


> The Chess AI that turns into AGI isn't realistic - it's values are "be the best at chess" which it can do with existing computing power.

The AGI has whatever values we give it. Existing chess AIs don't seek to maximize their ability to play chess, they seek merely to win the particular game of chess they're playing.

But suppose we build a chess-playing AGI and tell it to "be the best at chess". It must anticipate that we might build a second, superior, chess-playing AGI and give it the same goal. One way to be the best at chess would be to prevent that second AGI being built. One way to prevent that second AGI being built would be to destroy humanity's capability to build AGIs. That probably counts as a loss for humanity.

Suppose the second AGI gets built despite the first's efforts. Now both AGIs have an incentive to destroy both the other, and the possibility of a third. At any particular time, one or both of the AGIs won't be the best at chess, so they'll also have an incentive to get better at chess by actually improving their chess-playing capability. This will involve converting the Earth into processing power for it to use. That probably counts as a loss for humanity.


> Again, optimization to some narrow goal has too many weak points to take over all of humanity's functions.

It doesn't have to take over all of humanity's functions to wreak havoc. A hypothetical AI disaster could be one goal-oriented system with a poorly constructed goal and enough initial resources.

> I'm unconvinced that this is blatantly untrue. "Moral right" is subjective - hence the point. We got to our morals today not through mysticism but empiricism so it's not out of the reach of superintelligence to optimize further.

I think you're making a fundamental and unwarranted assumption here.

You're anthropomorphizing "superintelligence" as something vaguely human-like but better. A system doesn't have to be "intelligent" in a sense that relates at all to what humans think of as "intelligent" to be dangerous. It could simply be a "really powerful optimization process". You're romanticizing the notion of a superintelligent being discarding human values and inventing some new moral system that it then follows, and ignoring the possibility of an algorithm no "smarter" than a nanobot instructed to make a copy of itself. That nanobot doesn't have an interesting value system; it doesn't need one to kill everyone and everything, though. And that's not an outcome that, individually or as a species, we should take any pride or "comfort" in.

You're also assuming that the ability to destroy the world requires some kind of intelligent process or executive function, and could not possibly be discovered by an optimization process. It wouldn't necessarily come across such a mechanism at random, but may of the approaches we might apply towards the creation of useful AI could provide exceptionally powerful pattern recognition capabilities, and search abilities.

As a complete hypothetical off the top of my head, imagine a ridiculously powerful pattern-search program effectively recreating the idea of afl-fuzz ("throw input at a program and find interesting behavior"), and applying it against the mechanisms running it in a sandbox. Improbable, but not wildly impossible, and an agent that succeeded would gain access to additional computation resources that would allow it to do better than the algorithms it competes with. So, now you have a complex pattern-search engine trained to break out of sandboxes...


We both likely think the other person hasn't thought through every facet of this issue.

We're past the point of the discussion where we can lay fundamental foundations for the arguments that are making.

I'll just say that I'm sure it will be interesting watching/building the future of AGI and it's predecessors.


we need to merge hippocratic and asimovian branches in future.git


Great points.


I don't want to find out that way!


Forever. There is no structure to be found in the output of a PRF like AES. These functions are not going to be learnable.


I stopped reading at the "Super Intelligence" part. Interesting use to prevent theft of NN. But the second reason is just laughable.


Hmm - how does this play with GDPR?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: