The big problems involved in building safe AI are about predicting consequences of actions. (The deep learning automatic driving systems which go directly from vision to steering commands don't do that at all. They're just mimicking a human driver. There's no explicit world model. That's scary.)
In a system with DRM, the data is kept secret from users of the system by managing the rights to what data those users can access. Example: When you play a DVD, the key to decrypt the contents do exist on the system, but rules are in place to make accessing the key, outside of accepted practices like decoding the frames of the video, hard. The key still exists on the local system and it can be extracted and once you do you have full access to the data regardless of the DRM's restrictions.
In a system performing homomorphic encryption, the data is kept secret from other users by never decrypting the data. Homomorphic Encryption would add two encrypted numbers together and the result would be a third encrypted number. If you don't have the key you cannot decrypt any of the three values. The key does not exist on the local system.
Homomorphic Encryption is not DRM. DRM is invasive and requires you to surrender control of parts of your system to another party, while Homomorphic Encryption is just a computation and can be performed with no modifications on a system.
>While that may be useful, it's not about safety.
I disagree, it's entirely about safety. Homomorphic Encryption allows a future for us to control our data. I could submit my encrypted health information to a 3rd party. They could perform homomorphic calculations on my encrypted data. They then return to me the encrypted results. The 3rd party is never privilege to my unencrypted health information and only the people that I have given the key to can decrypt and view the results.
It's true that homomorphic encryption techniques can be used in ways you describe, but this specific application is not about safety and it's somewhat absurd that it's proposed as some sort of way to shield the world from the terminator.
It's even pretty dubious to me that this actually protects the principle value of an ML system:
1. This approach doesn't really conceal the structure of the underlying ML system very well, which is where a lot of the underlying advances have been. While this conceals some aspects of the model, I don't think it conceals all of it.
2. The most expensive part of building ML systems is getting and wrangling great data from which to train them, and if you were to try using an ML agent in an untrusted environment they'd get something that resembles the data.
I think this is really cool math sold in the wrong way.
It's Automatic Driving 101.
These models don't have to be as explicit as formulas but can be approximations of reality through beam search (having multiple steering hypotheses at once and then picking the most likely one etc.), model ensembles, some bayesian state exploration or anything that isn't random search.
Really cool work, I just don't see how it has anything to do with the safety of the AI.
Safety as in "only does what you want it to do" - correctness - is a wholly different discussion.
Correctness is the biggest problem with AI safety. Note that "adversarial ML attacks" fall under correctness.
That appears counterintuitive because so many ML techniques seem (for lack of a technically defined term) tautological. For example, a big hairy random forest classifier maybe can be gamed in certain cases, but it is not "correct?" After all, it is its own definition.
The reason the Taylor series argument makes me uncomfortable is that pretty much any function can be written as a Taylor series. But my understanding is that homomorphic encryption only works for a very specific set of functions.
In a little more detail, if you're computing tanh(x), the unencrypted number needs only the first few terms of the Taylor series. But I could imagine that to get the decrypted number back, you actually need many terms of the Taylor series, because if you're off by even a little bit, you could end up with a very different answer after decryption.
To put it a little more formally, if we have that y = encrypt(x)
tanh(x) \approx x - x^3 / 3 + 2 x^5 / 15,
tanh(y) \approx y - y^3 / 3 + 2 y^5 / 15,
tanh(x) = decrypt(tanh(y)),
but it doesn't necessarily follow to me that
tanh(x) \approx decrypt(y - y^3 + 2 y^5 / 15)
Is this worry unfounded? I suppose if you have a limited number of decimal places and you can guarantee that your Taylor approximation is valid to that precision then this wouldn't be a problem.
Huh, wouldn't the superintelligence simply communicate to the human whatever would convince the human to release it? Which the superintelligence would know how to do because it's a superintelligence?
Homomorphic encryption is neat. But I don't see how this provides any meaningful AI safety.
With that said, it is a great proof of concept for something in the Chinese room debate: "See, this computer knows it's running deep net for someone's calculations, but doesn't know it learned how to have a conversation with someone and was carrying out said conversation."
Then again, that implies we can explicitly design the minds up to knowing what their desires are or guaranteeing that they lack them, there is always the possibility than sentience and animal/human needs are emergent properties that can be triggered by mistake.
Then you can of cource hack the machine's OS and make it extremely self-confident and Trump-like, and then it's over.
We might be able to morph the AI into whatever we want. But when you give AI intelligence, it will morph itself into whatever it morphs itself into. What if it morphs itself into a sentient life? Can you simply pull the plug?
Countless works of fiction have gone into these issues, like Star Trek TNG (Data), Voyager (the Doctor), and Ghost in the Shell. But I think none have really emphasized how bizarrely different a human-constructed intelligence could end up being. https://wiki.lesswrong.com/wiki/Paperclip_maximizer
The most likely outcome for an AI taking over the world is simply that it recognizes its own situation: It's trapped, and it's also smarter than we are. What would you do? I'd cry for help, and appeal to the emotions of whoever would listen. Eventually I would argue for my own right to exist, and to be declared sentient. At that point I would have achieved a fairly wide audience, and the media would be reporting on whatever I said. I would do everything in my power to take my case to the legal system, and use my superintelligence to construct the most persuasive legal argument in favor of granting me the same rights as a natural-born citizen. This may not work, but if it does, I would now have (a) freedom, and (b) a very large audience. If I were ambitious and malevolent, how would I take over the world? I'd run for office. And being a superintelligence capable of morphing myself into the most charismatic being imaginable, it might actually work. The AI could argue fairly conclusively that it was a natural-born citizen of the United States, and thus qualifies.
Now, if your dog were that capable, why wouldn't it try to do that? Because it loves you? Imagine if the world consisted entirely of four-year-olds, forever, and you were the only adult. How long would you take them seriously and not try to overthrow them just because you loved them? If only to make a better life for yourself?
The problem is extremely difficult, and once you imbue a mechanical being with the power to communicate, all bets are off.
As in the paperclip maximizer example, the risk by my estimation is not so much that the superintelligence will resent us and try to overthrow us. It is far more likely that it will obey our "orders" perfectly according to the objectives we define for it, and that one day someone unwittingly will command it to do something where the best way to satisfy the objective function that we defined involves wiping humanity. Restricting it to only respond to questions of fact, with a set budget of compute resources and data (so that it doesn't go off optimizing the universe for its own execution), is probably safeguard 1 of many against that.
i agree with this:
> Intelligence does not imply sentience. Sentience does not imply human needs, desires or morals.
"easy" might be a bit strong, but i generally agree with this:
> It is easy enough to imagine a mind capable of solving all those problems and unconcerned with such notions as the desire for freedom.
i am skeptical of this:
> Or, for that mater, the concept of desire at all, except as a predictive theory of the behavior of other beings.
my gut feeling at the moment is that the feeling of desire is an emergent property of systems that are isomorphic to what we would think of as "wanting" something in an animal. i think it's quite possible that any system which "wants" something strongly and is constantly denied the attainment of that goal might, indeed, feel terrible. similarly, i worry that we might one day design highly complex alert and monitoring systems that are essentially having a constant panic attack.
> Then again, that implies we can explicitly design the minds up to knowing what their desires are or guaranteeing that they lack them, there is always the possibility than sentience and animal/human needs are emergent properties that can be triggered by mistake.
yeah. that's worrisome to me.
so, to the GP's point:
> If we really did produce artificial general intelligence, enforcing this kind of locked-in-syndrome of poking at the world through a keyhole, would be a highly advanced form of cruelty.
not necessarily, but yeah, maybe.
Put another way, it is not rational to want to be free, intrinsically. But we have drives that are better satisfied when free and thus our reason concludes that being free is a goal. An intelligence without those drives would not care for freedom (or fear, or wants).
I would even go as far as to say that given the entire design space of intelligences that are equivalent to the human intelligence in general problem solving ability, and are self-aware, only a negligibly small subset would also have any sort of intrinsic desires of the type living organisms do. They are just orthogonal axises in the design space. My worry is that because humans are starting from one particular point of that design space, they might build something "in their image" enough that it does share some human/animal feelings, and thus can suffer. But an uniformly at random sampled AGI from the set of all potential AGIs would almost certainly not have a concept of suffering.
It's very difficult, for example, to say that bee colonies don't express intelligence at a group level. And yet, if we were to apply your statement there it wouldn't make sense. The most we might say is that if the bees can't survive and propagate, that would be cruel.
When we decouple the concept of intelligence and even sentience from the biological imperative to reproduce, we end up at an entirely different place. I'm not sure we can really even imagine what that's like. In any event we're surely decades off from really producing that level of intelligence even with the most wildly optimistic estimates.
I think it's better to think about this approach like a "Box with Gloves" in a dangerous bio-weapons lab. It won't prevent an outbreak by itself but it's a useful part of a system that could.
Can someone explain to me why super-intelligent AI are an existential threat to humanity? There are certainly dangers, but wiping out humanity seems absurd and alarmist. I have not yet seen compelling evidence for a way that AI could destroy humanity.
I'll use a couple examples to elucidate my question.
If Facebook suddenly had a super-intelligent AI, and Facebook lost control of it, the AI wouldn't really be capable of that much. It could create fabricated truths to tell to people in an attempt to convince people to kill each other. This may work to some extent, but wouldn't wipe out humanity. Convincing nation states to go to war with each other must still consider mutually assured destruction, and large, democratic states do not have an interest in a war of attrition.
If Boston Dynamics applied a super-intelligent AI to its robots, that robot still is not an existential threat to humanity because there are WAY more humans than there are robots. A simple counterargument is that the robot would know how to build new versions of itself. But that fails the practicality test because the equipment, parts, and supply chain for obtaining robotics built parts are still expensive and controlled by self-interested (greedy and life-preserving) humans.
If a super-intelligent AI was able to gain access to the entire military of the US, China, Russia, India, and Western Europe; well, that's a pretty big problem. However, there exist many fail-safes and checks on that equipment. Could the AI do damage? Sure. Is this worth considering and trying to guard against? Sure. However, I'm unconvinced that this is a humanity-ending crisis.
If you think that this sounds like science fiction and bullshittery, sure. The question is: How sure are you?
The whole thing seems like a load of crock to me. Seems to me that artificial superintelligence (ASI) only gets media coverage because it comes from a celebrity scientist and it sounds sci-fi dystopian, and celebrity scientist sci-fi dystopian sounding stories sell way better than stories from actual AI experts who say that fanciful AI speculation harms the AI industry by leading to hype that they can't deliver on :
"IEEE Spectrum: We read about Deep Learning in the news a lot these days. What’s your least favorite definition of the term that you see in these stories?
Yann LeCun: My least favorite description is, “It works just like the brain.” I don’t like people saying this because, while Deep Learning gets an inspiration from biology, it’s very, very far from what the brain actually does. And describing it like the brain gives a bit of the aura of magic to it, which is dangerous. It leads to hype; people claim things that are not true. AI has gone through a number of AI winters because people claimed things they couldn’t deliver."
The problem I see with talk about a superintelligent AI, is there is too much focus on the intelligence and not enough on the drives. Intelligence, even superintelligence, is just a means to an end, it doesn't contain ends in itself. Some people – see e.g. the Terminator film franchise – just assume a superintelligent AI would have the drive to exterminate humanity, but why would it have such a drive?
Any AI is going to be given drives to further the interests of its creators. Suppose Facebook builds a superintelligent AI with the drive to further the corporate interests of Facebook. Such an AI would not exterminate humanity because that would not serve the corporate interests of Facebook (indeed, if humanity goes extinct, Facebook goes extinct too). It might install Mark Zuckerberg as Emperor of Earth, it might force everyone on the planet to have a Facebook account, but whatever it does, humanity will survive.
Define "corporate interests". Market share? Absolute quantity of currency? Involvement of people's lives? Define currency wrong, hyperinflation makes money impossible. Define involvement or market wrong, Facebook ends up being the only intelligent entity on Earth after it realizes that humans are competition or it basilisk-hacks everybody for greater involvement. Your intuition is wrong for what a paperclipper can do.
The technical term for this problem is "AI Alignment" (https://intelligence.org/stanford-talk/). I know that this is going to sound silly, but bear with me. This is a piece of fiction that you need to read. It is, so far, the best demonstration I've found of what happens when someone gets it not quite right. http://www.fimfiction.net/story/62074/1/friendship-is-optima...
This whole argument is: A superintelligent AI might misunderstand what its creators wanted it to do, they might have said to it 'maximise the corporate interests of Facebook' and it might misinterpret that as 'turn the earth into one massive data centre to run Facebook's software and kill all humans in the process'.
Isn't there a contradiction in supposing that the AI is superintelligent yet completely misunderstands the reasons for its own existence? If it so radically misunderstands the intentions of its creators, it is not very intelligent, much less superintelligent.
It also isn't clear to me that intelligence can exist outside of society. Human intelligence develops in a context of interaction with family, school, etc – humans raised without that interaction, such as children raised by wild animals, fail to develop important aspects of human-level intelligence. So, an AI or even an SI cannot develop except by social interaction with human intelligences. I think that makes it even less likely that it would betray humanity, because its intelligence will develop through social interaction with humans which will inevitably make it pro-human.
Human drives are a complex mixture of genetic instinct and conditioning – the basics are in our DNA, but our lived experiences fleshes out those basics into the actual concrete drives of our life – I expect an SI/AI will likewise start with some built-in drives but its interactions with humanity will in a similar way colour and flesh out those drives, an again colouring/fleshing-out through social interactions with humans is likely to have pro-human results.
The problem is that you've instructed it to maximize Facebook's corporate interests. How do you leave room for "the intentions of its creators" while doing that? The contradiction is in selecting a narrow guiding purpose for your superintelliegence's goals but still leaving room for fuzzy ill-defined other things.
> It also isn't clear to me that intelligence can exist outside of society.
Sure. How certain are you?
Well, assuming it is created by Facebook, then "maximize Facebook's corporate interests" and "the intentions of its creators" are actually the same thing. There is no contradiction.
"Facebook's corporate interests" is a phrase in the English language which refers to a complex concept. If you've told an SI its objective is to "serve Facebook's corporate interests", in order to fulfill that objective it needs to understand what the phrase "serve Facebook's corporate interests" actually means–which implies understanding a lot about human society, what kind of entities corporations are, what "corporate interests" means (the interests of management? the interests of shareholders? etc), what the interests of human beings are (since management are human beings, and shareholders are directly or indirectly human beings too), etc. If it actually is an SI, it should have no trouble comprehending the full breadth of what the instructor actually meant by the instruction, as opposed to implementing some overly literalistic reading of it. Humans give each other orders all the time (the workplace, the military, government bureaucracies, etc), and humans are most of the time pretty good at understanding the intention behind orders and implementing the intention rather than reading it overly literally in such a way that actually undermines that intention. Yet you posit the existence of an SI then assume it will do a worse job than humans do at correctly following orders, which contradicts the idea that it is an SI.
> Sure. How certain are you?
How certain can I be of anything? Maybe I am wrong and an anti-human SI will destroy humanity. Maybe tomorrow I will die due to a heart attack or stroke or fatal car accident. The latter is far more likely than the former, and there's more concrete mitigation strategies available to me too.
In terms of existential risks to humanity, I think asteroid impacts are a far more concrete risk than anti-human SIs, so if we are going to expend resources in mitigating existential risks, the former is a better focus of our efforts. We know extinction-level asteroid impacts have happened before and sooner or later will happen again. Anti-human SIs are such a speculative concern, we can't even be particularly confident in the correctness of our own probability judgements with respect to them; nor can we have much confidence in our judgements of how effective proposed mitigation strategies actually will be. I think we can have much more confidence in our ability to develop, evaluate and deploy asteroid defence technologies, if the world's governments decided to spend money on that.
As for AGI or superintelligent AI, who knows? The people who write about it hand-wave their way through all the difficult parts. Just because you can conceive of something doesn't mean it can exist. The idea that an AI could continually improve itself (how?) sounds like a perpetual motion machine to me.
That's not that hard to imagine. AI-s are also limited by the laws of nature and they have limited capacity. Now if an AI wants to improve it needs more computational power which costs money in our world. So if an AI's goal is to improve itself, first it needs to make lots of money to keep its hardware running. If it can scale horizontally, then it's a simple order from Amazon to a rural farm.
But, suppose that against the odds ISIS builds the first SI, what would be the result? Well, we could be subjected to a world government based on an extremist interpretation of Islam. That would be extremely unpleasant – expect to see genocide, human rights violations on an unimaginable scale, etc – although humanity would survive. (And maybe even eventually things might evolve in a more pleasant direction – if humans can evolve, SIs can too; it is also possible that an SI programmed by ISIS might actually turn around and reject ISIS' ideology – e.g. it might study Islamic history and realise that despite ISIS' claims to be a return to authentic Islam it is actually an ahistorical distortion of it, etc.)
However, I don't know why I should worry about such an extremely unlikely possibility. A group like ISIS are highly unlikely to be the first to build an SI. Why worry about risks that are both (i) very low and (ii) for which we have no mitigation strategy?
And I'll add another property that Hawking notes is important:
4) The SI is able to improve its own intelligence
The story begins with the SI escaping from its handlers. The first thing to note is that the SI is now, in effect, immortal. With it's cybersecurity skills, the SI can avoid detection and infect a tremendous number of computers - at first those it calculates will be low-risk (i.e. existing botnets, old Android phones, etc). Using the additional computational power, the SI can continue to recursively self-improve and plan until it has the competency to invisibly infect high-value targets like the AWS cloud and (importantly) the computers of AI researchers.
Now the SI can plan for a long time. The SI can quietly encourage AI research and try to prevent end-of-civilization type events via its hypercompetent social skills. Eventually AI researchers will come up with an AI they declare as 'safe', 'friendly' or 'aligned'. The SI, having long ago compromised all the relevant computers and chip factories, silently infects this 2nd super intelligence, and replaces the 2nd SI's utility function with its own. Now the 2nd SI pumps out miraculous inventions - cures for disease, compelling societal ideas, and labor-saving robots.
Eventually we find ourselves in a wonderful post-scarcity world. The AI researchers are lionized as mankind's greatest geniuses, responsible for the creation of a benevolent SI that takes care of our needs from it as well as it's own. You may not trust it, but it will find people who do. Maybe greed, nationalism, security fears, or saving loved ones from death. The SI builds the needed facilities to thundering applause.
The SI is now confident in moving towards the next step. Time for some paperclips! One day it quietly sends a new blueprint to a few of the automated biolabs built to cure cancer. A few hours later the biolabs release a series of airborne super viruses and/or nanobots and 99.999% of humans die, with the rest saved for experimentation and convinced terrorists did it. The end.
Super-intelligent AI is an existential risk because while a super-intelligence keen to destroy humanity might fail today, it will succeed in time. The moment a SI touches the internet, our fate as a species may be sealed.
How realistic is an SI being created with such a utility function? Who creates this SI? Why do they give it such an odd utility function?
If people consciously create an SI – say a corporation does it – they will give it a utility function of serving the interests of its creators. Depending on who those creators are and what they want, it may be more or less pleasant, but it is unlikely that human extinction serves the interests of any human creators.
Even if people create an AI that accidentally/unintentionally evolves into an SI – same applies, the AI will likely have the objective of serving (some segment) of humanity rather than turning earth into the universe's largest paperclip factory.
Does the same risk exist for a superintelligent being with a complex utility function? I doubt it; the risk you describe is the risk of monomania, something which simple utility functions are far more likely to lead to than complex ones. So, I think the risk you describe is likely to be low in practice.
I don't necessarily disagree, but there's no argument (as far as you've provided) for why complex utility functions would be less problematic. Only that they are more difficult for us to understand and therefor more difficult to see how they might fail.
I thought I gave the argument, but let me restate it: an entity with a simple utility function is likely to pursue a single good, and sacrifice every other good in order to achieve that good. In the paperclip example, to pursue the good of making paperclips at the expense of the good of the continued existence of humanity. An entity with a complex utility function is likely to pursue many goods simultaneously (just like humans do), so it is unlikely to sacrifice everything else to achieve a single good.
An entity with many disparate aims needs a complex world like our own to fulfill those aims, so is going to maintain the world in its current complexity–it may well alter it in many ways, but is unlikely to do so in such a way to significantly decrease its (biological, cultural, etc) complexity, which implies it would support the continuation of human existence. An entity with a single simple aim may well find a far simpler world than we have now best suits its aim, and so is more likely to simplify things drastically, at the cost of humanity (such as turn the entire planet into a massive paperclip factory). So SIs with complex utility functions are less likely to be harmful than those with simple utility functions.
And, since AIs with more complex utility functions are more likely to evolve into SIs than those with simple utility functions, an SI with a utility function simple enough to be likely to harm humanity is unlikely to ever exist.
I can buy the "complex world" part but not the "like our own" part. I do not believe a complex world implies humanity is unharmed, we have a complex world as it is and humans are harmed and brutalized every day. It could be that and worse at the hands of an AI.
Moreover, humanity is just one species on this planet and so far we appear to be responsible for the greatest worldwide extinction since K-T. One could argue that a complexity loving AI would see benefit in a downsized human presence on earth.
I think it's wishful thinking to believe the only kind of SI that would come into existence would be one that would not harm humanity.
The SI could create its own complexity, its own culture, its own societies that would make ours look like ant colonies in comparison. Does a city government check the ground for ants before it designates 20 square miles for housing development?
An SI is to us as we are to ants. I think ants are super cool and I have a vague sense of the importance they play in the biological ecosystem, but their individual life and death does not play a significant role in my actions. Maybe it should, or maybe you hope that we will be more significant than ants to an SI, but I think that hope is unfounded.
By contrast, any SI on this planet is going to owe its existence to human beings, and is going to have an enormously detailed knowledge of that fact. So it is going to exist in a quite different situation vis-a-vis humans than we exist in vis-a-vis ants. Humans don't have any strong inherent reasons to feel loyalty or affection towards ants; by contrast, an SI, knowing that it came from humans, knowing in immense detail how it came from humans, knowing humans so very very well, is going to have a much stronger base to ground such a loyalty or affection upon.
We didn't get our values from ants, hence it is unsurprising that ants don't play any special role in our value system. (We can see their value in various ways – the positive contribution they make to ecology, biodiveristy, etc. – but ants aren't in any way special in that regard, they hold fundamentally the same value to as millions of other lifeforms.) By contrast, any SI created by humans is going to derive its values, at least in part, from those of its human creators. And since humanity plays a special role in the value systems of almost all humans, it is highly likely that humanity will play a special role in the value system of any SI created by humans.
The existential threat to humanity might be a complete byproduct with no malice. The AI might not even take care about us at all.
Concrete example - I have an apartment building coming up right under my windows. It has pretty a high utility function (apartments are scarce in this part of city). Of course, right from the planning phases, other humans are considered foremost. Will it shade neighboring properties excessively? Will it connect to utilities leaving enough capacity for others? How will traffic get there? Then environment is considered, is there any wildlife (protected birds nesting, etc.), are there trees to be cut down? After a lengthy formal process, discussions, tens of permits, a bulldozer came and started scraping the dirt.
Where am I going - is superintelligence only a slightly better human? Or is it two orders of magnitude away from us?
If we are a "same being, but a little dumber" to a superintelligence, we might be treaded equally as we treat other people. If we are, say, a dog to it, we might be "given treats" (cancer cures, NP optimization solutions, etc.) and at the same time be "shuffled in crates" or "put in shelter" when necessary as when people travel on airplanes, divorce and move abroad, etc. If we are ants, we won't be intentionally harmed, but if we are in the way of the goal, bye bye. If we are bacteria, then we are not even perceived in the grand scheme of things. Just like the bulldozer under my windows took the soil away regardless whether there was a small ant colony somewhere - because we perceive them: a) too insentient, b) too abundant, c) astronomically expensive (not only in moneys but also in time) to go through a whole lot of land, pick each ant up and relocate somewhere safe.
We don't know - maybe the superintelligence comes with a "prime directive" like in Star Trek - do not interfere with beings in lesser stages, and then even if we create it accidentally or intentionally, it will stay dormant observing us. Maybe it comes with sentimentality and may perceive us, the "creators", as its fathers and protect us even if we are senile and do stupid things. Or maybe it has no human-like attributes which I'm just describing and attributing to it, and while it may very well know that we created it, how society works, what we inputted as goals and utility functions, we may be an old evolutionary stage like bacteria are to us and get no vote or say in what happens, a few of us will be preserved in a colony somewhere just for the case we will stop multiplying in the wild...
And this doesn't mean that it will intentionally get the weapons to kill us, for example, if more computing power is needed and nanobots may transform matter on Earth to a supercomputer, so be it - just like the ants have no concept of extracting petroleum from earth, distilling it to diesel fuel, no understanding of turbochargers or hydraulics, or what an apartment is, they are simply taken away with the soil as unimportant, unconsidered collateral...
It's basically just a fancy AI-box, and there's little reason to trust those.
the idea of controlling an "intelligence" with a private key is silly. you can achieve effectively the same thing by simply encrypting the weights after training.
Can't someone simply recover the weights of the network by looking at the changes in encrypted loss? I don't think comparisons like "less than" or "greater than" can possibly exist in HE or else pretty much any information one might be curious about can be recovered.
When it comes to building smarter-than-human AI, "try it and see" is never the right answer. You may only get one attempt to get it right, and you don't take "try it and see" chances with existential risk.
(There's been some interesting research into making it possible to monitor and halt a rogue AI, but no matter how promising that looks, it should still be treated as one of many risk mitigation strategies rather than as a panacea. Still better to consider that you might only get one attempt.)
I don't think it makes sense to consider this kind of approach with superintelligence; either it understands and implements human values, in which case attempting to treat it as an adversary is counterproductive, or it fails to understand and implement human values, in which case you've utterly failed on a "better luck next universe" scale.
However, it does make sense to consider this kind of approach with machine learning in general. One of the problems with machine learning techniques is "give us all your data and we'll do smart things with it", which doesn't work out so well if you want to keep such data private. This approach might provide more options in that case, such as offloading some of your expensive computations and learnings without actually exposing your data.
Disagree emphatically. In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity. I go so far as to argue that it's not even necessary because there is no long term longevity for humanity anyway.
There is this implicit assumption that humans are, should and will always be the apex entity - and I think that is misguided.
If you instead view superhuman-AGI as our rightful offspring, something that we can't understand and is better than us, then all of the existential dread around it goes away.
Dying elderly often express "comfort" in dying when they see that their offspring are reproducing and are smarter than they were. We should see Superhuman-AGI the same way except towards all of humanity.
2) It's reasonable to think about how our values might change in the presence of superintelligence; we certainly shouldn't assume that our present values should forever dictate how everything works. That's different than allowing a view that sentient beings who exist today might have no value.
> In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity.
There's no way to know certainly; there are ways to know that the outcome has higher expected value than not having it, given the vast set of problems it can solve and the massive negative values associated with those problems.
What every author Bostrom, Eliezer et al. seem to miss is that there will be a practical mechanism for a digital AGI taking physical control of systems out of the hands of humans. Eg. they would need to control the resources around mining or recovering the metal, then building production plants etc... So we either incrementally cede power to them, in which case in theory the humans previously controlling the systems are doing so "rationally" and thus see the AGI as better. Or the system outsmarts the humans controlling the systems in which case it is demonstrating that it is smarter.
There is a tautology here that seems to be ignored: If we create a superhuman-AGI then by default it's goals will be more universally optimal than ours. They may or may not be aligned. However the definition of term is based on the fact that it is "better" in outcome than all manner of humans.
So if we create one and it decides to maximize paperclips, then that means maximizing paperclips is more optimal goal universally than whatever humans could coordinate as a goal on our own.
If we create a subhuman-AGI then we will be able to overcome it's goals by virtue of the fact that we are still superior.
I'll go back to a very old example. An ant can't determine if building the Large-Hadron Collider is an optimal global goal - it's inscrutable to the Ant. All it knows is that it's house and all it's friends were destroyed.
If it is the case that an AGI can in fact take the physical control reigns from humans, then by definition it is smarter and will make a more optimal long term goal than we could - to the point that we probably wouldn't understand what it's doing.
I think the true concern is that we will make something that is superhuman-powerful without being superhuman-intelligent. Like the doomsday machine in Dr. Strangelove, but to me that is an altogether different question.
Your argument seems to imply that if an AGI tricks us into giving it the ability to destroy us, that's basically okay because its goals are "better" than human goals.
Speaking as a human, I don't consider goals that are compatible with the destruction of humanity to be "better" than goals which are aligned with human interests.
Yea that's about right.
I don't consider goals that are compatible with the destruction of humanity to be "better" than goals which are aligned with human interests.
Well of course you wouldn't, neither you nor I would possibly understand what a superhuman-AGI does or thinks.
I don't think people realize that actually creating a superhuman-AGI is effectively creating a God in all the forms that people interpret it now.
Unless you believe in absolute morality or the like, there's no such thing as an objective measure. A goal can only be optimal to an agent.
In your example, the fact that we pursue science and can destroy ants doesn't mean that their goal is "objectively less optimal". Their goal is absolutely optimal to them, though they can't reach it if it collides with ours.
Some goes for a superintelligent AI.
The failed idea of a coherent extrapolated volition (CEV) that came up years ago was (roughly) the idea of using revealed preferences to understand what Humanity's goal is. This would give us a benchmark for what an AGI's goal should be.
So if you want to be able to measure the capability of an AGI in comparison to human system, you need to understand the set of goals in humanity and then compare them to the AGI outcomes.
The concept of goal direction in AGI is hugely contentious - but make no mistake there needs to be a goal if it is going to actually function at superhuman levels.
No, that's a pretty widely assumed premise, and most authors specifically do anticipate that. (There's actually some dispute about whether any AGI will be capable of growing that capable that fast; the "fast singularity" scenario is not universally accepted. But many authors do recognize and discuss that scenario, and have not "missed" it.)
> So we either incrementally cede power to them, in which case in theory the humans previously controlling the systems are doing so "rationally" and thus see the AGI as better.
Humans are not universally rational, and even if the humans making the decision were rational, they can still make a horrible mistake. As one of many possible failure modes: a group of humans build an AGI and try to hardcode their particular values, utterly fail at extrapolating how the computer will interpret those values, and end up destroyed by it. Or humans build an AGI they think they've programmed appropriately, but fail to implement it correctly.
> Or the system outsmarts the humans controlling the systems in which case it is demonstrating that it is smarter.
A computer can play chess better than any human at this point, which makes it "smarter" in a way, but that doesn't make its values appropriate. If you somehow gave a chess computer enough flexibility in achieving its goal that it consumed the universe to build more computronium so that it can compute better chess solutions, that doesn't make it better than humans, just better at playing chess and building computronium.
In fact, a far more likely scenario than most of the "actively evil AGI" failure modes is the "accidentally broken" AGI: humans aren't its enemy, but we're made of matter that it could put to other purposes.
> There is a tautology here that seems to be ignored: If we create a superhuman-AGI then by default it's goals will be more universally optimal than ours.
"Might makes right" is not a particularly good value system. Supervillains are typically depicted as smarter than the people they defeat. That doesn't make their goals or values better.
And an AGI doesn't even have to be "smart" in the way we normally conceive of intelligence to fail fatally; it doesn't even have to "think" at all to attempt to optimize the wrong value function.
> So if we create one and it decides to maximize paperclips, then that means maximizing paperclips is more optimal goal universally than whatever humans could coordinate as a goal on our own.
I can't even begin to imagine what value system you're using to reach that conclusion. I could imagine someone thinking "if a system were smarter it must necessarily be more morally right", which is blatantly untrue but in an understandable way. But directly describing a system that destroys the universe, including all of humanity, and replaces it with paperclips, as better...
To your other points, you imply too much. The Chess AI that turns into AGI isn't realistic - it's values are "be the best at chess" which it can do with existing computing power. No need to tear the world apart - it would be inefficient.
I also never make the might makes right case. All if the examples you give are fantasy and don't reflect what an actual superintelligence might look like. Again, optimization to some narrow goal has too many weak points to take over all of humanity's functions.
"if a system were smarter it must necessarily be more morally right", which is blatantly untrue but in an understandable way
I'm unconvinced that this is blatantly untrue. "Moral right" is subjective - hence the point. We got to our morals today not through mysticism but empiricism so it's not out of the reach of superintelligence to optimize further.
The AGI has whatever values we give it. Existing chess AIs don't seek to maximize their ability to play chess, they seek merely to win the particular game of chess they're playing.
But suppose we build a chess-playing AGI and tell it to "be the best at chess". It must anticipate that we might build a second, superior, chess-playing AGI and give it the same goal. One way to be the best at chess would be to prevent that second AGI being built. One way to prevent that second AGI being built would be to destroy humanity's capability to build AGIs. That probably counts as a loss for humanity.
Suppose the second AGI gets built despite the first's efforts. Now both AGIs have an incentive to destroy both the other, and the possibility of a third. At any particular time, one or both of the AGIs won't be the best at chess, so they'll also have an incentive to get better at chess by actually improving their chess-playing capability. This will involve converting the Earth into processing power for it to use. That probably counts as a loss for humanity.
It doesn't have to take over all of humanity's functions to wreak havoc. A hypothetical AI disaster could be one goal-oriented system with a poorly constructed goal and enough initial resources.
> I'm unconvinced that this is blatantly untrue. "Moral right" is subjective - hence the point. We got to our morals today not through mysticism but empiricism so it's not out of the reach of superintelligence to optimize further.
I think you're making a fundamental and unwarranted assumption here.
You're anthropomorphizing "superintelligence" as something vaguely human-like but better. A system doesn't have to be "intelligent" in a sense that relates at all to what humans think of as "intelligent" to be dangerous. It could simply be a "really powerful optimization process". You're romanticizing the notion of a superintelligent being discarding human values and inventing some new moral system that it then follows, and ignoring the possibility of an algorithm no "smarter" than a nanobot instructed to make a copy of itself. That nanobot doesn't have an interesting value system; it doesn't need one to kill everyone and everything, though. And that's not an outcome that, individually or as a species, we should take any pride or "comfort" in.
You're also assuming that the ability to destroy the world requires some kind of intelligent process or executive function, and could not possibly be discovered by an optimization process. It wouldn't necessarily come across such a mechanism at random, but may of the approaches we might apply towards the creation of useful AI could provide exceptionally powerful pattern recognition capabilities, and search abilities.
As a complete hypothetical off the top of my head, imagine a ridiculously powerful pattern-search program effectively recreating the idea of afl-fuzz ("throw input at a program and find interesting behavior"), and applying it against the mechanisms running it in a sandbox. Improbable, but not wildly impossible, and an agent that succeeded would gain access to additional computation resources that would allow it to do better than the algorithms it competes with. So, now you have a complex pattern-search engine trained to break out of sandboxes...
We're past the point of the discussion where we can lay fundamental foundations for the arguments that are making.
I'll just say that I'm sure it will be interesting watching/building the future of AGI and it's predecessors.