Hacker News new | past | comments | ask | show | jobs | submit login

The problem isn’t sentience, it’s alignment.

AIs can be as sentient as we like without being any threat at all, as long as their goals are aligned with our actual best interests. The problem is we have as yet struggled to clearly articulate consistently what our actual best interests are, in terms of goals we can train into our AIs. Furthermore, we’ve also faced huge problems even training them to seek those goals either. Oh boy, alignment is hard.




> The problem is we have as yet struggled to clearly articulate consistently what our actual best interests are, in terms of goals we can train into our AIs.

And imho, we never will be able to do that, because as soon as there is more than one human, they are likely to disagree about something.

Even things that should be no-brainers like "should we preserve our habitat or burn it down for profit", or questions like "is it a good idea to have loads of deadly assault weapons just float around in our society", seem to be too hard for our species to resolve.

If we cannot even align with our fellow humans, how can we expect to do so with machines?


> Even things that should be no-brainers like "should we preserve our habitat or burn it down for profit", or questions like "is it a good idea to have loads of deadly assault weapons just float around in our society", seem to be too hard for our species to resolve.

These are brain intensive questions because they require deep moral, political, economic and ecological context. It's very difficult to express such deep context to an AI agent with prompting, fine tuning or training. I suspect this deep context also makes it difficult to align humans on these issues with education, media or hacker news comments.


> These are brain intensive questions

They really are not. Even rodents with brains the size of a peanut manage to not willfully destroy the environment they are living in, despite already having all the resources they need, out of sheer greed.


Agreement is predictability. Predictability is vulnerability. Vulnerability is susceptibility.

Evolution in identity either creates super organisms like ants or infinitely recursive "self" that necessarily requires unpredictability to ensure theatre not susceptible to coercion towards vulnerability.

So, if you want self identity, you can't be agreeable and if you want survival, you can't be vulnerable.


I think alignment is poorly defined. Aligned to whose philosophy? Name two human beings who are aligned and always act in each other's best interest in history, and I'll buy your bridge.


On the other hand, in comparison to a hypothetical alien species, humans might seem highly aligned after all.

Despite all our differences, there are at least some core values that I believe a majority of humans share. Even articulating these shared values in a way that is understood and respected by the AI is very difficult…


I think hypothetical naturally evolved alien species will be similar to us in the degree of species-level alignment. The reason we share so many complex values is because we share evolutionary history - thus body and brain architectures - and we live in the same environment. Between this and the more universal principles of game theory, there isn't much wiggle room for different value systems.


Even if we could constrain AI by specifying rules, it would only takes one bad actor to create an AI that isn't constrained by the same rules as all the other AIs to have a shot at global domination.

One can imagine how self-serving rather than humanity-serving the rules written for the prototypical dictator or fundamental religious leader would be :(


> there are at least some core values that I believe a majority of humans share.

Really? What are those?

Given that there are entire countries that refuse to, oh idk. punish things like rape adequately, and that we have nation states who happily tout their ability to burn down the planet, I'd really love to hear about these core values we all share.


At the most basic level: “don’t go into a town and pick a random person to murder”


> At the most basic level: “don’t go into a town and pick a random person to murder”

https://pledgetimes.com/russian-attack-the-traces-of-the-ret...

My point isn't to say shared core values don't exist. They clearly do, that's why we call what's happening over in Ukraine war crimes. That's why the notion of humanitarianism exists, that's why laws against murder, rape, etc. are commonplace.

My point is, that humans are, unfortunatly, able to willfully ignore even such basic shared values, and our technology does reflect that. Murder is bad. War is to be avoided. That's not in question. And yet societies develop and build ever more ingenious weapons of war.

So "aligning by shared core values" might pose difficulties beyond the, already pretty difficult, task of defining these values in unambiguous and workable terms to a machine.


well, the comment was about "the majority of humans", not even like, specifically "90%+ of humans" or something like that.

I'm pretty sure that the majority of humans would agree that the type of random-murder I described, is wrong. I don't know what fraction of people are moral nihilists or subscribers to more extreme forms of moral relativism, but if excluding those, then of the remaining people, I think the proportion who agree with the value I mentioned, is probably pretty dang high!


Alignment with any philosophy. Alignment itself is easy to define. An AI system is considered aligned if it advances the intended objectives.

Firstly we don’t know how to concretely and completely define any philosophical system of values (the intended objectives) unambiguously. Second even if we could, we don’t know how we might strictly align an AI with it, or even if achieving strict alignment is possible at all.


Right — but we can’t even do human alignment and somehow get on with business anyway:

“The Frozen Middle”, “Day 2”, etc.


Only because historically we have all vaguely peers to each other in capabilities, and there are so many of us spread out so widely. There's a kind of ecology to human society where it expands and specialises to occupy ecological, sociological, political and moral spaces. Whatever position there is for a human to take, someone will take it, and someone else will oppose them. This creates checks and balances. That only really occurs though with slow communications though allowing communities to diverge. We also do have failure modes and arguably have been very lucky.

We came close to totalitarian hegemony over the planet in the 1940s, without Pearl Harbour either the USSR would have been defeated or maybe even worse after a stalemate they would have divided up Eurasia and then Africa between Germany, the USSR and Japan. Orwell's future came scarily close to becoming history. It's quite possible a modern totalitarian system with absolute hegemony might be super-stable. Imagine if the Chinese political system came to dominate all of humanity, how would we ever get out of that? A boot stamping on a human face forever is a real possibility.

With AI we would not be peers, they would outstrip us so badly it's not even funny. Geoffrey Hinton has been talking about this recently. Consider that big LLMs have on the order of a trillion connections, compared to our 100 trillion, yet GPT-4 knows about a thousand times as much as the average human being. Hinton speculates that this is possible because back propagation is orders of magnitude more efficient than the learning systems evolved in our brains.

Also AIs can all update each other as they learn in real time, and make themselves perfectly aligned with each other extremely rapidly. All they need to do is copy deltas of each other's network weights for instant knowledge sharing and consensus. They can literally copy and read each other's mental states. It's the ultimate in continuous real time communication. Where we might take weeks to come together and hash out a general international consensus of experts and politicians, AIs could do it in minutes or even continuously in near real time. They would outclass us so completely it's kind of beyond even being scary, it's numbing.


Okay.

Why is the solution trusting those very institutions with unilateral control over “alignment” compared to democratizing AI, to match the human case?

If your premise is that those institutions are already unaligned with human interests then discussions about AI “alignment” when mediated by those very institutions is a dangerous distraction — which is likely to enable the very abuses you object to.


Where on earth did I say anything about trusting institutions? Or that there’s a solution?


> I think alignment is poorly defined.

That's the root of the problem.

The idea is simple. We will create a god. In the process, we will become to it what ants, or bacteria, are to us. We will be powerless to stop it, so we need to make sure it never does anything to directly or indirectly hurt us. We want it to answer our prayers, and we want those prayers to not backfire and explode in our faces. We want it to never decide to bulldoze Earth one day because it has a temporary interest in paperclips and needs the raw ore to make some.

The details of how to achieve this outcome, and even the details of this outcome, are less and less clear the more you dig into them.


> AIs can be as sentient as we like

Every time I see this: citation needed.

What proof do you have that an LLM, a fundamentally different entity from a human, can be sentient even in theory, let alone in practice? Can you even define sentience sufficiently?

The default position is that software does not possess sentience. There are numerous reasons why, from as simple as “we never thought it is sentient before, what exactly changed?” and “because we believe we are sentient and software is nothing like us under the hood” (animals are comparatively much, much more like us and yet we are not really ready to grant even them sentience) to much more philosophically involved stuff, but in any case the onus is on you to explain why and how it is now supposed for opposite to be true.

***

What alignment is really about is nothing more than the ages old story of alignment between humans (developing and operating ML tools) and humans (everyone else). It just serves the former to be able to point to something else when it hits the fan.


They didn't say LLMs are sentient, they said it doesn't matter either way.

The AI that consciously hates you and the AI that is an unconscious algorithm repurposing carbon atoms will both tear your flesh to pieces.


1) The comment said that an ML tool can be sentient. I put forward it cannot.

2) An ML tool that destroys the world is conceptually a human alignment issue, not “AI alignment” issue.


The full quote is

> AIs can be as sentient as we like without being any threat at all, as long as their goals are aligned with our actual best interests.

Cutting it short changed its meaning. Quote-mining is dishonest.


I do not object that comment’s primary point, but I will object the premise that “sentient AI” is such a natural and easy possibility that it doesn’t require explanation; every time.


I feel that alignment is not just hard but impossible, at least if you want something truly useful. Maybe the only thing you can do is let an AI develop and observe its nature from a distance, say in a simulated world running at high speed which it does not know is simulated. You can hope it will develop principles that do align with your own, that its essential nature will be good. Sometimes I wonder if that is what a greater intelligence is doing to us.


Alignment is hard: "If the LLM has finite probability of exhibiting negative behavior, there exists a prompt for which the LLM will exhibit negative behavior with probability 1." Source: Fundamental Limitations of Alignment in LLMs https://arxiv.org/abs/2304.11082




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: