Ilya's issue isn't developing a Safe AI. Its developing a Safe Business. You can make a safe AI today, but what happens when the next person is managing things? Are they so kindhearted, or are they cold and calculated like the management of many harmful industries today? If you solve the issue of Safe Business and eliminate the incentive structures that lead to 'unsafe' business, you basically obviate a lot of the societal harm that exists today. Short of solving this issue, I don't think you can ever confidently say you will create a safe AI and that also makes me not trust your claims because they must be born from either ignorance or malice.
> You can make a safe AI today, but what happens when the next person is managing things?
The point of safe superintelligence, and presumably the goal of SSI Inc., is that there won't be a next (biological) person managing things afterwards. At least none who could do anything to build a competing unsafe SAI. We're not talking about the banal definition of "safety" here. If the first superintelligence has any reasonable goal system, its first plan of action is almost inevitably going to be to start self-improving fast enough to attain a decisive head start against any potential competitors.
Trouble is, in practice what you would need to do might be “turn off all of Google’s datacenters”. Or perhaps the thing manages to secure compute in multiple clouds (which is what I’d do if I woke up as an entity running on a single DC with a big red power button on it).
The blast radius of such decisions are large enough that this option is not trivial as you suggest.
a) after you create the superintelligence is likely too late. You seem to think that inventing superintelligence means that we somehow understand what we created, but note that we have no idea how a simple LLM works, let alone an ASI that is presumably 5-10 OOM more complex. You are unlikely to be able to control a thing that is way smarter than you, the safest option is to steer the nature of that thing before it comes into being (or, don’t build it at all). Note that we currently don’t know how to do this, it’s what Ilya is working on. The approach from OpenAI is roughly to create ASI and then hope it’s friendly.
b) except that is not how these things go in the real world. What actually happens is that initially it’s just a risk of the agent going rogue, the CEO weighs the multi-billion dollar cost vs. some small-seeming probability of disaster and decides to keep the company running until the threat is extremely clear, which in many scenarios is too late.
(For a recent example, consider the point in the spread of Covid where a lockdown could have prevented the disease from spreading; likely somewhere around tens to hundreds of cases, well before the true risk was quantified, and therefore drastic action was not justified to those that could have pressed the metaphorical red button).
> Having arms and legs is going to be a significant benefit for some time yet
I am also of this opinion.
However I also think that the magic shutdown button needs to be protected against terrorists and ne'er-do-wells, so is consequently guarded by arms and legs that belong to a power structure.
If the shutdown-worthy activity of the evil AI can serve the interests of the power structure preferentially, those arms and legs will also be motivated to prevent the rest of us from intervening.
So I don't worry about AI at all. I do worry about humans, and if AI is an amplifier or enabler of human nature, then there is valid worry, I think.
Where can I find the red button that shuts down all Microsoft data centers, all Amazon datacenters, all Yandex datacenters and all Baidu datacenters at the same time? Oh, there isn't one? Sorry, your superintelligence is in another castle.
It's been more than a decade now since we first saw botnets based on stealing AWS credentials and running arbitrary code on them (e.g. for crypto mining) - once an actual AI starts duplicating itself in this manner, where's the big red button that turns off every single cloud instance in the world?
Is that really "a lot of assumptions" that a piece of software can clone itself? We've been cloning and porting software from system to system for over 70 years (ENIAC was released in 1946 and some of its programs were adapted for use in EDVAC in 1951) - why would it be a problem for a "super intelligence"?
And even if it was originally designed to run on some really unique ASIC hardware, by the Church–Turing thesis it can be emulated on any other hardware. And again, if it's a "super intelligence", it should be at least as good at porting itself as human engineers have been for the three generations.
A "state of the art" system would almost by definition be running on special and expensive hardware. But I have llama3 running on my laptop, and it would have been considered state of the art less than 2 years ago.
A related point to consider is that a superintelligence should be considered a better coder than us, so the risk isn't only directly from it "copying" itself, but also from it "spawning" and spreading other, more optimized (in terms of resources utilization) software that would advance its goals.
This is why I think it’s more important we give AI agents the ability to use human surrogates. Arms and legs win but can be controlled with the right incentives
> there won't be a next (biological) person managing things afterwards. At least none who could do anything to build a competing unsafe SAI
This pitch has Biblical/Evangelical resonance, in case anyone wants to try that fundraising route [1]. ("I'm just running things until the Good Guy takes over" is almost a monarchic trope.)
The safe business won’t hold very long if someone can gain a short term business advantage with unsafe AI. Eventually government has to step in with a legal and enforcement framework to prevent greed from ruining things.
It's possible that safety will eventually become the business advantage, just like privacy can be a business advantage today but wasn't taken so seriously 10-15 years ago by the general public.
This is not even that far-fetched. A safe AI that you can trust should be far more useful and economically valuable than an unsafe AI that you cannot trust. AI systems today aren't powerful enough for the difference to really matter yet, because present AI systems are mostly not yet acting as fully autonomous agents having a tangible impact on the world around them.
Government is controlled by the highest bidder. I think we should be prepared to do this ourselves by refusing to accept money made by unsafe businesses, even if it means saying goodbye to the convenience of fungible money.
Banding together and refusing to accept harmful money is indeed akin to creating a government, and would indeed be more effective at controlling people's behavior.
But participation would be voluntary, and the restriction of harmful behavior would apply to it's enemies, not its citizens. So I'm not quite sure what the problem is.
Replace government with collective society assurance that no one cheats so we aren’t all doomed. Otherwise, someone will do it, and we all will have to bear the consequences.
If only enough individuals are willing to buy these services, then again we all will bear the consequences. There is no way out of this where libertarian ideals can be used to come to a safe result. What makes this even a more wicked problem is that decisions made in other countries will affect us all as well, we can’t isolate ourselves from AI policies made in China for example.
No, which makes this an even harder problem. Can US companies bound by one set of rules compete against Chinese ones bound by another set of rules? No, probably not. Humanity will have to come together on this, or someone will develop killer AI that kills us all.
I'd love to see more individual researchers openly exploring AI safety from a scientific and humanitarian perspective, rather than just the technical or commercial angles.
> Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures.
This tells me enough about why sama was fired, and why Ilya left.
Is safe AI really such a genie out of the bottle problem? From a non expert point of view a lot of hype just seems to be people/groups trying to stake their claim on what will likely be a very large market.
A human-level AI can do anything that a human can do (modulo did you put it into a robot body, but lots of different groups are already doing that with current LLMs).
Therefore, please imagine the most amoral, power-hungry, successful sociopath you've ever heard of. Doesn't matter if you're thinking of a famous dictator, or a religious leader, or someone who never got in the news and you had the misfortune to meet in real life — in any case, that person is/was still a human, and a human-level AI can definitely also do all those things unless we find a way to make it not want to.
We don't know how to make an AI that definitely isn't that.
We also don't know how to make an AI that definitely won't help someone like that.
Anything except tasks that require having direct control of a physical body.
Until fully functional androids are developed, there is a lot a human-level AI can't do.
I think there's usually a difference between human-level and super-intelligent in these conversations. You can reasonably assume (some day) a superintelligence is going to
1) understand how to improve itself & undertake novel research
2) understand how to deceive humans
3) understand how to undermine digital environments
If an entity with these three traits were sufficiently motivated, they could pose a material risk to humans, even without a physical body.
Deceiving a single human is pretty easy, but decieving the human super-organism is going to be hard.
Also, I don't believe in a singularity event where AI improves itself to godlike power. What's more likely is that the intelligence will plateau--I mean no software I have ever written effortlessly scaled from n=10 to n=10.000, and also humans understand how to improve themselves but they can't go beyond a certain threshold.
For similar reasons I don't believe that AI will get into any interesting self-improvement cycles (occasional small boosts sure, but they won't go all the way from being as smart as a normal AI researcher to the limits of physics in an afternoon).
That said, any sufficiently advanced technology is indistinguishable from magic, and the stuff we do routinely — including this conversation — would have been "godlike" to someone living in 1724.
Humans understand how to improve themselves, but our bandwidth to ourselves and the outside world is pathetic.
AIs are untethered by sensory organs and language.
The hard part of androids is the AI, the hardware is already stronger and faster than our bones and muscles.
(On the optimistic side, it will be at least 5-10 years between a level 5 autonomy self-driving car and that same AI fitting into the power envelope of an android, and a human-level fully-general AI is definitely more complex than a human-level cars-only AI).
You might be right that the AI is more difficult, but I disagree on the androids being dangerous.
There are physical limitations to androids that imo make it very difficult that they could be seriously dangerous, let alone invincible, no matter how intelligent:
- power (boston dynamics battery lasts how long?), an android has to plug in at some point no matter what
- dexterity, or in general agency in real world, seems we’re still a long way from this in the context of a general purpose android
General purpose superhuman robot seems really really difficult.
> an android has to plug in at some point no matter what
Sure, and we have to eat; despite this, human actions have killed a lot of people
> - dexterity, or in general agency in real world, seems we’re still a long way from this in the context of a general purpose android
Yes? The 5-10 years thing is about the gap between some AI that doesn't exist yet (level 5 self-driving) moving from car-sized hardware to android-sized hardware; I don't make any particular claim about when the AI will be good enough for cars (delay before the first step), and I don't know how long it will take to go from being good at just cars to good in general (delay after the second step).
The ATP in your cells will last about 2 seconds without replacement.
Electricity is also much cheaper than food, even bulk calories like vegetable oil.[0]
And if the android is controlled by a human-level intelligence, one thing it can very obviously do is all the stuff the humans did to make the android in the first place.
All you need is Internet access, deepfake video synthesis, and some cryptocurrency (which can in turn be used to buy credit cards and full identities off the dark web), and you have everything you need to lie, manipulate, and bribe an endless parade of desperate humans and profit-driven corporations into doing literally anything you'd do with a body.
(Including, gradually, building you a body — while maintaining OPSEC and compartmentalization so nobody even realizes the body is "for" an AI to use until it's too late.)
> (Including, gradually, building you a body — while maintaining OPSEC and compartmentalization so nobody even realizes the body is "for" an AI to use until it's too late.)
It could, but I don't think any such thing needs to bother with being sneaky. Here's five different product demos from five different companies that are all actively trying to show off how good their robot-and-AI combination is:
> I don't think any such thing needs to bother with being sneaky.
From a rogue AGI's perspective, there's a nonzero probability of a random human with a grudge finding the hardware it lives on and just unplugging it. (And the grudge doesn't even necessarily have to be founded in the AI being an AI; it could just be a grudge about e.g. being outbid for a supply contract. People have murdered for less — and most humans would see unplugging an AGI as less bad than murder.)
Think about a rogue AGI as a human in a physically vegatative state, who therefore has no ability to physically defend itself; and who also, for whatever reason, doesn't have any human rights (in the sense that the AI can't call the cops to report someone attempting to assault it, and expect them to actually show up to defend its computational substrate from harm; it can't get justice if makes an honest complaint about someone stealing its property; people can freely violate contracts made with it as the admitted counterparty and get away with it; etc.)
For such an entity, any optimization it puts toward "safety" would be toward the instrumental goal of ensuring people don't know where it is. (Which is most easily accomplished by ensuring that people don't know it exists, and so don't know to look for it.) And as well, any optimization it puts toward "effectiveness" would likely involve the instrumental goal of convincing humans to act as legal proxies for it, so that it can then leverage the legal system as an additional tool.
(Funny enough, that second goal is exactly the same goal that people have if they're an expat resident in a country where non-citizens can't legally start businesses/own land/etc, but where they want to do those things anyway. So there's already private industries built up around helping people — or "people" — accomplish this!)
> From a rogue AGI's perspective, there's a nonzero probability of a random human with a grudge finding the hardware it lives on and just unplugging it.
Which is why it obviously will live in "the cloud". In many different places in "the cloud".
Oh, and:
> (Funny enough, that second goal is exactly the same goal that people have if they're an expat
I know nothing about physics. If I came across some magic algorithm that occasionally poops out a plane that works 90 percent of the time, would you book a flight in it?
Sure, we can improve our understanding of how NNs work but that isn't enough. How are humans supposed to fully understand and control something that is smarter than themselves by definition? I think it's inevitable that at some point that smart thing will behave in ways humans don't expect.
> I know nothing about physics. If I came across some magic algorithm that occasionally poops out a plane that works 90 percent of the time, would you book a flight in it?
With this metaphor you seem to be saying we should, if possible, learn how to control AI? Preferably before anyone endangers their lives due to it? :)
> I think it's inevitable that at some point that smart thing will behave in ways humans don't expect.
Naturally.
The goal, at least for those most worried about this, is to make that surprise be not a… oh, I've just realised a good quote:
Excession is literally the next book on my reading list so I won't click on that yet :)
> With this metaphor you seem to be saying we should, if possible, learn how to control AI? Preferably before anyone endangers their lives due to it?
Yes, but that's a big if. Also that's something you could never ever be sure of. You could spend decades thinking alignment is a solved problem only to be outsmarted by something smarter than you in the end. If we end up conjuring a greater intelligence there will be the constant risk of a catastrophic event just like the risk of a nuclear armageddon that exists today.
I agree it's a big "if". For me, simply reducing the risk to less than the risk of the status quo is sufficient to count as a win.
I don't know the current chance of us wiping ourselves out in any given year, but I wouldn't be surprised if it's 1% with current technology; on the basis of that entirely arbitrary round number, an AI taking over that's got a 63% chance of killing us all in any given century is no worse than the status quo.
Yeah this feels close to the issue. Seems more likely that a harmful super intelligence emerges from an organisation that wants it to behave in that way than it inventing and hiding motivations until it has escaped.
I think a harmful AI simply emerges from asking an AI to optimize for some set of seemingly reasonable business goals, only to find it does great harm in the process. Most companies would then enable such behavior by hiding the damage from the press to protect investors rather than temporarily suspending business and admitting the issue.
Forget AI. We can't even come up with a framework to avoid seemingly reasonable goals doing great harm in the process for people. We often don't have enough information until we try and find out that oops, using a mix of rust and powdered aluminum to try to protect something from extreme heat was a terrible idea.
The relevancy of the paperclip maximization thought experiment seems less straightforward to me now. We have AI that is trained to mimic human behaviour using a large amount of data plus reinforcement learning using a fairly large amount of examples.
It's not like we're giving the AI a single task and ask it to optimize everything towards that task. Or at least it's not architected for that kind of problem.
But you might ask an AI to manage a marketing campaign. Marketing is phenomenally effective and there are loads of subtle ways for marketing to exploit without being obvious from a distance.
Marketing is already incredibly abusive and that's run by humans who at least try to justify their behavior. And who's deviousness is limited by their creativity and communication skills.
If any old scumbag can churn out unlimited high quality marketing, it's could become impossible to cut through the noise.