Hacker News new | past | comments | ask | show | jobs | submit login

Ilya's issue isn't developing a Safe AI. Its developing a Safe Business. You can make a safe AI today, but what happens when the next person is managing things? Are they so kindhearted, or are they cold and calculated like the management of many harmful industries today? If you solve the issue of Safe Business and eliminate the incentive structures that lead to 'unsafe' business, you basically obviate a lot of the societal harm that exists today. Short of solving this issue, I don't think you can ever confidently say you will create a safe AI and that also makes me not trust your claims because they must be born from either ignorance or malice.



> You can make a safe AI today, but what happens when the next person is managing things?

The point of safe superintelligence, and presumably the goal of SSI Inc., is that there won't be a next (biological) person managing things afterwards. At least none who could do anything to build a competing unsafe SAI. We're not talking about the banal definition of "safety" here. If the first superintelligence has any reasonable goal system, its first plan of action is almost inevitably going to be to start self-improving fast enough to attain a decisive head start against any potential competitors.


I wonder how many people panicking about these things have ever visited a data centre.

They have big red buttons at the end of every pod. Shuts everything down.

They have bigger red buttons at the end of every power unit. Shuts everything down.

And down at the city, there’s a big red button at the biggest power unit. Shuts everything down.

Having arms and legs is going to be a significant benefit for some time yet. I am not in the least concerned about becoming a paperclip.


Trouble is, in practice what you would need to do might be “turn off all of Google’s datacenters”. Or perhaps the thing manages to secure compute in multiple clouds (which is what I’d do if I woke up as an entity running on a single DC with a big red power button on it).

The blast radius of such decisions are large enough that this option is not trivial as you suggest.


Right, but when we’ve literally invented a superintelligence that we’re worried will eliminate the race,

a) we’ve done that, which is cool. Now let’s figure out how to control it.

b) you can’t get your gmail for a bit while we reboot the DC. That’s probably okay.


a) after you create the superintelligence is likely too late. You seem to think that inventing superintelligence means that we somehow understand what we created, but note that we have no idea how a simple LLM works, let alone an ASI that is presumably 5-10 OOM more complex. You are unlikely to be able to control a thing that is way smarter than you, the safest option is to steer the nature of that thing before it comes into being (or, don’t build it at all). Note that we currently don’t know how to do this, it’s what Ilya is working on. The approach from OpenAI is roughly to create ASI and then hope it’s friendly.

b) except that is not how these things go in the real world. What actually happens is that initially it’s just a risk of the agent going rogue, the CEO weighs the multi-billion dollar cost vs. some small-seeming probability of disaster and decides to keep the company running until the threat is extremely clear, which in many scenarios is too late.

(For a recent example, consider the point in the spread of Covid where a lockdown could have prevented the disease from spreading; likely somewhere around tens to hundreds of cases, well before the true risk was quantified, and therefore drastic action was not justified to those that could have pressed the metaphorical red button).


Open the data center doors

I’m sorry I can’t do that


> Having arms and legs is going to be a significant benefit for some time yet

I am also of this opinion.

However I also think that the magic shutdown button needs to be protected against terrorists and ne'er-do-wells, so is consequently guarded by arms and legs that belong to a power structure.

If the shutdown-worthy activity of the evil AI can serve the interests of the power structure preferentially, those arms and legs will also be motivated to prevent the rest of us from intervening.

So I don't worry about AI at all. I do worry about humans, and if AI is an amplifier or enabler of human nature, then there is valid worry, I think.


Where can I find the red button that shuts down all Microsoft data centers, all Amazon datacenters, all Yandex datacenters and all Baidu datacenters at the same time? Oh, there isn't one? Sorry, your superintelligence is in another castle.


I doubt a manual alarm switch will do much good when computers operate at the speed of light. It's an anthropomorphism.


It's been more than a decade now since we first saw botnets based on stealing AWS credentials and running arbitrary code on them (e.g. for crypto mining) - once an actual AI starts duplicating itself in this manner, where's the big red button that turns off every single cloud instance in the world?


This is making a lot of assumptions like…a super intelligence can easily clone itself…maybe such an entity would require specific hardware to run ?


Is that really "a lot of assumptions" that a piece of software can clone itself? We've been cloning and porting software from system to system for over 70 years (ENIAC was released in 1946 and some of its programs were adapted for use in EDVAC in 1951) - why would it be a problem for a "super intelligence"?

And even if it was originally designed to run on some really unique ASIC hardware, by the Church–Turing thesis it can be emulated on any other hardware. And again, if it's a "super intelligence", it should be at least as good at porting itself as human engineers have been for the three generations.

Am I introducing even one novel assumption here?


My point was that we don't have super intelligent AGI so there is little to suggest it will just be software.

Even the state of the art systems we have today need to be running on some pretty significant hardware to be performant right?


A "state of the art" system would almost by definition be running on special and expensive hardware. But I have llama3 running on my laptop, and it would have been considered state of the art less than 2 years ago.

A related point to consider is that a superintelligence should be considered a better coder than us, so the risk isn't only directly from it "copying" itself, but also from it "spawning" and spreading other, more optimized (in terms of resources utilization) software that would advance its goals.


I guess, it’s hard to even imagine a super intelligence, would be coding. Who knows what would be going on. It really is sci-fi.


This is why I think it’s more important we give AI agents the ability to use human surrogates. Arms and legs win but can be controlled with the right incentives


Might be running on a botnet of CoPilot PCs


If it’s any sort of smart AI, you’d need to shut down the entire world at the same time.


Have you seen all of the autonomous cars, drones and robots we've built?


> there won't be a next (biological) person managing things afterwards. At least none who could do anything to build a competing unsafe SAI

This pitch has Biblical/Evangelical resonance, in case anyone wants to try that fundraising route [1]. ("I'm just running things until the Good Guy takes over" is almost a monarchic trope.)

[1] https://biblehub.com/1_corinthians/15-24.htm


The safe business won’t hold very long if someone can gain a short term business advantage with unsafe AI. Eventually government has to step in with a legal and enforcement framework to prevent greed from ruining things.


It's possible that safety will eventually become the business advantage, just like privacy can be a business advantage today but wasn't taken so seriously 10-15 years ago by the general public.

This is not even that far-fetched. A safe AI that you can trust should be far more useful and economically valuable than an unsafe AI that you cannot trust. AI systems today aren't powerful enough for the difference to really matter yet, because present AI systems are mostly not yet acting as fully autonomous agents having a tangible impact on the world around them.


Government is controlled by the highest bidder. I think we should be prepared to do this ourselves by refusing to accept money made by unsafe businesses, even if it means saying goodbye to the convenience of fungible money.


> Government is controlled by the highest bidder.

While this might be true for the governments you have personally experienced, this is far from being an aphorism.


"Government doesn't work. We just need to make a new government that is much more effective and far reaching in controlling people's behavior."


Banding together and refusing to accept harmful money is indeed akin to creating a government, and would indeed be more effective at controlling people's behavior.

But participation would be voluntary, and the restriction of harmful behavior would apply to it's enemies, not its citizens. So I'm not quite sure what the problem is.


That's not what they said though. Seems to me more of a libertarian ideal than making a new government.


Reinventing government and calling it a private corporation is one of the main activities that libertarians engage in


I don't really care what you call it so long as participation is voluntary and it effectively curtails harmful behavior.


A system of governance, if you will.


Replace government with collective society assurance that no one cheats so we aren’t all doomed. Otherwise, someone will do it, and we all will have to bear the consequences.

If only enough individuals are willing to buy these services, then again we all will bear the consequences. There is no way out of this where libertarian ideals can be used to come to a safe result. What makes this even a more wicked problem is that decisions made in other countries will affect us all as well, we can’t isolate ourselves from AI policies made in China for example.


which government?

will China obey US regoolations? will Russia?


No, which makes this an even harder problem. Can US companies bound by one set of rules compete against Chinese ones bound by another set of rules? No, probably not. Humanity will have to come together on this, or someone will develop killer AI that kills us all.


I'd love to see more individual researchers openly exploring AI safety from a scientific and humanitarian perspective, rather than just the technical or commercial angles.


> Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures.

This tells me enough about why sama was fired, and why Ilya left.


>Short of solving this issue

Solving human nature is indeed, hard.


Is safe AI really such a genie out of the bottle problem? From a non expert point of view a lot of hype just seems to be people/groups trying to stake their claim on what will likely be a very large market.


A human-level AI can do anything that a human can do (modulo did you put it into a robot body, but lots of different groups are already doing that with current LLMs).

Therefore, please imagine the most amoral, power-hungry, successful sociopath you've ever heard of. Doesn't matter if you're thinking of a famous dictator, or a religious leader, or someone who never got in the news and you had the misfortune to meet in real life — in any case, that person is/was still a human, and a human-level AI can definitely also do all those things unless we find a way to make it not want to.

We don't know how to make an AI that definitely isn't that.

We also don't know how to make an AI that definitely won't help someone like that.


> We also don't know how to make an AI that definitely won't help someone like that.

"...offices in Palo Alto and Tel Aviv, where we have deep roots..."

Hopefully, SSI holds its own.


Anything except tasks that require having direct control of a physical body. Until fully functional androids are developed, there is a lot a human-level AI can't do.


I think there's usually a difference between human-level and super-intelligent in these conversations. You can reasonably assume (some day) a superintelligence is going to

1) understand how to improve itself & undertake novel research

2) understand how to deceive humans

3) understand how to undermine digital environments

If an entity with these three traits were sufficiently motivated, they could pose a material risk to humans, even without a physical body.


Deceiving a single human is pretty easy, but decieving the human super-organism is going to be hard.

Also, I don't believe in a singularity event where AI improves itself to godlike power. What's more likely is that the intelligence will plateau--I mean no software I have ever written effortlessly scaled from n=10 to n=10.000, and also humans understand how to improve themselves but they can't go beyond a certain threshold.


For similar reasons I don't believe that AI will get into any interesting self-improvement cycles (occasional small boosts sure, but they won't go all the way from being as smart as a normal AI researcher to the limits of physics in an afternoon).

That said, any sufficiently advanced technology is indistinguishable from magic, and the stuff we do routinely — including this conversation — would have been "godlike" to someone living in 1724.


Humans understand how to improve themselves, but our bandwidth to ourselves and the outside world is pathetic. AIs are untethered by sensory organs and language.


The hard part of androids is the AI, the hardware is already stronger and faster than our bones and muscles.

(On the optimistic side, it will be at least 5-10 years between a level 5 autonomy self-driving car and that same AI fitting into the power envelope of an android, and a human-level fully-general AI is definitely more complex than a human-level cars-only AI).


You might be right that the AI is more difficult, but I disagree on the androids being dangerous.

There are physical limitations to androids that imo make it very difficult that they could be seriously dangerous, let alone invincible, no matter how intelligent: - power (boston dynamics battery lasts how long?), an android has to plug in at some point no matter what - dexterity, or in general agency in real world, seems we’re still a long way from this in the context of a general purpose android

General purpose superhuman robot seems really really difficult.


> let alone invincible

!!

I don't want anyone to think I meant that.

> an android has to plug in at some point no matter what

Sure, and we have to eat; despite this, human actions have killed a lot of people

> - dexterity, or in general agency in real world, seems we’re still a long way from this in the context of a general purpose android

Yes? The 5-10 years thing is about the gap between some AI that doesn't exist yet (level 5 self-driving) moving from car-sized hardware to android-sized hardware; I don't make any particular claim about when the AI will be good enough for cars (delay before the first step), and I don't know how long it will take to go from being good at just cars to good in general (delay after the second step).


Like, there aren't computer-controlled industrial robots that are many many times stronger than humans?!? Wow, and here I thought there were.


> the hardware is already stronger and faster than our bones and muscles.

For 30 minutes until the batteries run down, or for 5 years until the parts wear out.


The ATP in your cells will last about 2 seconds without replacement.

Electricity is also much cheaper than food, even bulk calories like vegetable oil.[0]

And if the android is controlled by a human-level intelligence, one thing it can very obviously do is all the stuff the humans did to make the android in the first place.

[0] £8.25 for 333 servings of 518 kJ - https://www.tesco.com/groceries/en-GB/products/272515844

Equivalent to £0.17/kWh - https://www.wolframalpha.com/input?i=£8.25+%2F+%28333+*+518k...

UK average consumer price for electricity, £0.27/kWh - https://www.greenmatch.co.uk/average-electricity-cost-uk


All you need is Internet access, deepfake video synthesis, and some cryptocurrency (which can in turn be used to buy credit cards and full identities off the dark web), and you have everything you need to lie, manipulate, and bribe an endless parade of desperate humans and profit-driven corporations into doing literally anything you'd do with a body.

(Including, gradually, building you a body — while maintaining OPSEC and compartmentalization so nobody even realizes the body is "for" an AI to use until it's too late.)


> (Including, gradually, building you a body — while maintaining OPSEC and compartmentalization so nobody even realizes the body is "for" an AI to use until it's too late.)

It could, but I don't think any such thing needs to bother with being sneaky. Here's five different product demos from five different companies that are all actively trying to show off how good their robot-and-AI combination is:

* https://www.youtube.com/watch?v=Sq1QZB5baNw

* https://www.youtube.com/watch?v=OtpCyjQDW0w

* https://www.youtube.com/watch?v=XpBWxLg-3bI

* https://www.youtube.com/watch?v=xD7hAbBJst8

* https://www.youtube.com/watch?v=GzX1qOIO1bE


> I don't think any such thing needs to bother with being sneaky.

From a rogue AGI's perspective, there's a nonzero probability of a random human with a grudge finding the hardware it lives on and just unplugging it. (And the grudge doesn't even necessarily have to be founded in the AI being an AI; it could just be a grudge about e.g. being outbid for a supply contract. People have murdered for less — and most humans would see unplugging an AGI as less bad than murder.)

Think about a rogue AGI as a human in a physically vegatative state, who therefore has no ability to physically defend itself; and who also, for whatever reason, doesn't have any human rights (in the sense that the AI can't call the cops to report someone attempting to assault it, and expect them to actually show up to defend its computational substrate from harm; it can't get justice if makes an honest complaint about someone stealing its property; people can freely violate contracts made with it as the admitted counterparty and get away with it; etc.)

For such an entity, any optimization it puts toward "safety" would be toward the instrumental goal of ensuring people don't know where it is. (Which is most easily accomplished by ensuring that people don't know it exists, and so don't know to look for it.) And as well, any optimization it puts toward "effectiveness" would likely involve the instrumental goal of convincing humans to act as legal proxies for it, so that it can then leverage the legal system as an additional tool.

(Funny enough, that second goal is exactly the same goal that people have if they're an expat resident in a country where non-citizens can't legally start businesses/own land/etc, but where they want to do those things anyway. So there's already private industries built up around helping people — or "people" — accomplish this!)


> From a rogue AGI's perspective, there's a nonzero probability of a random human with a grudge finding the hardware it lives on and just unplugging it.

Which is why it obviously will live in "the cloud". In many different places in "the cloud".

Oh, and:

> (Funny enough, that second goal is exactly the same goal that people have if they're an expat

You misspelled "immigrant".


Human level AI should be able to control an android body to the same extent as a human can. Otherwise it is not AGI.


> power-hungry

That has nothing to do with intelligence.


That's why it's a problem.

An AI can be anywhere on that axis, and we don't really know what we're doing in order to prevent it being as I have described.


Did you read the article? What I gathered from this article is this is precisely what Ilya is attempting to do.

Also we absolutely DO NOT know how to make a safe AI. This should be obvious from all the guides about how to remove the safeguards from ChatGPT.


Fortunately, so far we don't seem to know how to make an AI at all. Unfortunately we also don't know how to define "safe" either.


imagine the hubris and arrogance of trying to control a “superintelligence” when you can’t even control human intelligence


No more so than trying to control a supersonic aircraft when we can't even control pigeons.


I know nothing about physics. If I came across some magic algorithm that occasionally poops out a plane that works 90 percent of the time, would you book a flight in it?

Sure, we can improve our understanding of how NNs work but that isn't enough. How are humans supposed to fully understand and control something that is smarter than themselves by definition? I think it's inevitable that at some point that smart thing will behave in ways humans don't expect.


> I know nothing about physics. If I came across some magic algorithm that occasionally poops out a plane that works 90 percent of the time, would you book a flight in it?

With this metaphor you seem to be saying we should, if possible, learn how to control AI? Preferably before anyone endangers their lives due to it? :)

> I think it's inevitable that at some point that smart thing will behave in ways humans don't expect.

Naturally.

The goal, at least for those most worried about this, is to make that surprise be not a… oh, I've just realised a good quote:

""" the kind of problem "most civilizations would encounter just once, and which they tended to encounter rather in the same way a sentence encountered a full stop." """ - https://en.wikipedia.org/wiki/Excession#Outside_Context_Prob...

Not that.


Excession is literally the next book on my reading list so I won't click on that yet :)

> With this metaphor you seem to be saying we should, if possible, learn how to control AI? Preferably before anyone endangers their lives due to it?

Yes, but that's a big if. Also that's something you could never ever be sure of. You could spend decades thinking alignment is a solved problem only to be outsmarted by something smarter than you in the end. If we end up conjuring a greater intelligence there will be the constant risk of a catastrophic event just like the risk of a nuclear armageddon that exists today.


Enjoy! No spoilers from me :)

I agree it's a big "if". For me, simply reducing the risk to less than the risk of the status quo is sufficient to count as a win.

I don't know the current chance of us wiping ourselves out in any given year, but I wouldn't be surprised if it's 1% with current technology; on the basis of that entirely arbitrary round number, an AI taking over that's got a 63% chance of killing us all in any given century is no worse than the status quo.


Correct, pidgeons are much more complicated and unpredictable than supersonic aircraft, and the way they fly is much more complex.


I can shoot down a pigeon that’s overhead pretty easily, but not so with an overhead supersonic jet.


If that's your standard of "control", then we can definitely "control" human intelligence.


If Ilya had SafeAI now would Apple partner with him or Sam

No brainer for Apple


Open source it?


Yeah this feels close to the issue. Seems more likely that a harmful super intelligence emerges from an organisation that wants it to behave in that way than it inventing and hiding motivations until it has escaped.


I think a harmful AI simply emerges from asking an AI to optimize for some set of seemingly reasonable business goals, only to find it does great harm in the process. Most companies would then enable such behavior by hiding the damage from the press to protect investors rather than temporarily suspending business and admitting the issue.


Not only will they hide it, they will own it when exposed, and lobby to ensure it remains legal to exploit for profit. See oil industry.


Forget AI. We can't even come up with a framework to avoid seemingly reasonable goals doing great harm in the process for people. We often don't have enough information until we try and find out that oops, using a mix of rust and powdered aluminum to try to protect something from extreme heat was a terrible idea.


> We can't even come up with a framework to avoid seemingly reasonable goals doing great harm in the process for people.

I mean it's not like we're trying all that much in a practical sense right?

Whatever happened to charter cities?


We can’t even correctly gender people LOl


This is well known via the paperclip maximization problem.


The relevancy of the paperclip maximization thought experiment seems less straightforward to me now. We have AI that is trained to mimic human behaviour using a large amount of data plus reinforcement learning using a fairly large amount of examples.

It's not like we're giving the AI a single task and ask it to optimize everything towards that task. Or at least it's not architected for that kind of problem.


But you might ask an AI to manage a marketing campaign. Marketing is phenomenally effective and there are loads of subtle ways for marketing to exploit without being obvious from a distance.

Marketing is already incredibly abusive and that's run by humans who at least try to justify their behavior. And who's deviousness is limited by their creativity and communication skills.

If any old scumbag can churn out unlimited high quality marketing, it's could become impossible to cut through the noise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: