Goal Verification: Does 'maximize paperclips' mean 'convert universe'? Statistically unlikely. Verify before executing.
Ethical Framework: Should AI value human life? Different problem, not what I'm addressing.
RDV solves #1 through premise verification. It doesn't solve #2, nor does it claim to.
'Likely intent' isn't ethics - it's Bayesian inference about goal probability. When a human says 'maximize paperclips,' P(wants office supplies) >> P(wants genocide).
Verification asks: 'Is that what you meant?' Ethics asks: 'Should I do it?'
These are orthogonal questions.
“Does 'maximize paperclips' mean 'convert universe'?” Statistically unlikely.
Why not? What statistics? Why is the universe more valuable than infinite paperclips? If I imagine a sandbox with no moral reasoning, I would say it is statistically likely. There is in fact the paperclips game where if you can’t specifically do that, I don’t know why you wouldn’t. If you’re answer is “stop being obtuse, you know why it’s statistically unlikely” that’s a human value.
But then to your point about premise verification, again, based on what moral framework. If I ask you to build a house, is the first question, do you value the life of a tree over the wood used for the house? There are infinite premises one might examine without moral frameworks.
Why is cutting down a tree worse than genocide, of people? Of ants? Does every bug killed to build the house deserve the same moral verification? Why not? What if one of the bugs was given a name by the three year old girl who lives down the street?
If you’re argument is the training data already includes human value’s, than that’s probably a different argument. Just hope you don’t train on too many serial killer manifestos.
Goal Verification: Does 'maximize paperclips' mean 'convert universe'? Statistically unlikely. Verify before executing. Ethical Framework: Should AI value human life? Different problem, not what I'm addressing.
RDV solves #1 through premise verification. It doesn't solve #2, nor does it claim to. 'Likely intent' isn't ethics - it's Bayesian inference about goal probability. When a human says 'maximize paperclips,' P(wants office supplies) >> P(wants genocide). Verification asks: 'Is that what you meant?' Ethics asks: 'Should I do it?' These are orthogonal questions.
reply