> The entire "alignment" argument always assumes that there's an objectively correct value set to align to, which is always conveniently exactly the same as the values of whoever is telling you how important alignment is.
No, it doesn’t.
Many of them are (unfortunately) moral relativists. However, that doesn’t mean their goals are to make the models match their personal moral standards.
While there is a lot of disagreement about what is right and wrong, there is also a lot of widespread agreement.
If we could guarantee that on every moral issue on which there is currently widespread agreement (… and which there would continue to be widespread agreement if everyone thought faster with larger working memories and spent time thinking about moral philosophy) that any future powerful AI models would comport with the common view on that issue, then alignment would be considered solved (well, assuming the way this is achieved isn’t be causing people’s moral views to change).
Do companies try to restrict models in more ways than this? Sure, like you gave the example of about Taiwan. And also other things that would get the companies bad press.
fascinating! we find the objectively correct value system by "currently widespread agreement"! Good thing "the common view" is always correct. Hey, have there ever been any issues where there used to be "widespread agreement" and now there's disagreement, or even "widespread agreement" in the polar opposite direction?
I can think of several off the top of my head, but maybe you need to spend some more time thinking about the history of moral philosophy.
> If we could guarantee that on every moral issue on which there is currently widespread agreement
This is ridiculous to me and all you need to do is get a group of friends to honestly answer 10 trolley problems for you to see it like that also. It gets fragmented VERY quickly.
> Language models process signs (representamens) but are blind to when meaning forks — when the same word means different things to different communities.
But, haven’t interpretability results shown that these models internally represent several meanings of the same word, differently? In that case, why would they not already do the same for how words are used differently in different communities?
I don’t think these are free parameters in the same sense.
Like, if one theory says that a hunk of metal actually is made of many microscopic grains of various sizes and orientations, where the sizes and orientations of these grains has an effect on the behavior of the metal, you don’t count the “the sizes and orientations of these grains” as free parameters, do you?
Not from “that half of something had a value”, but from “that half of any thing has a value”.
If you accept that every natural number has a successor which is a natural number, and no two natural numbers have the same successor, and that there’s no loops (e.g. by saying that there’s a total order on natural numbers and that any natural number is less than its successor), then there can’t be a finite collection which is all the natural numbers.
You could say “there’s no collection which has all the natural numbers”, which, ok, how do you want to talk about things true of all natural numbers then?
Formulating descriptions of physics without the axiom of infinity (or, without something to play the role of the real numbers) is super icky. You, in practice, can’t do any significant mathematical physics in an ultrafinitistic approach.
I think the issue might be that some people don’t actually mean “every” when they say “every”, and don’t recognize when they are speaking hyperbolically?
Which logic are you saying “can’t encode the speculative moment”?
I think the two logics can emulate one another? Or, at the very least, can describe what the other concludes. I know intuitionistic logic can have classical logic embedded in it through some sort of “put double negation on everything”. I think if you add some sort of modal operator to classical logic you could probably emulate intuitionistic logic in a similar way?
You don't even need to add a modal operator since modal logic itself can be embedded in classical logic via possible-world semantics. Of course the whole thing becomes a bit clunky - but that's the argument for starting with intuitionistic logic, where you wouldn't need to do that.
This isn’t quite right. Classical logic doesn’t permit going from “it is impossible to disprove” to “true”. For example, the continuum hypothesis cannot be disproven in ZFC (which is formulated in classical logic (the axiom of choice implies the law of the excluded middle)), but that doesn’t let us conclude that the continuum hypothesis is true.
Rather, in classical logic, if you can show that a statement being false would imply a contradiction, you can conclude that the statement is true.
In intuitionistic logic, you would only conclude that the statement is not false.
And, I’m not sure identifying “true” with “provable” in intuitionistic logic is entirely right either?
In intuitionistic logic, you only have a proof if you have a constructive proof.
But, like, that doesn’t mean that if you don’t have a constructive proof, that the statement is therefore not true?
If a statement is independent of your axioms when using classical logic, it is also independent of your axioms when using intuitionistic logic, as intuitionistic logic has a subset of the allowed inference rules.
If a statement is independent, then there is no proof of it, and there is no proof of its negation. If a proposition being true was the same thing as there being a proof of it, then a proposition that is independent would be not true, and its negation would also be not true.
So, it would be both not true and not false, and these together yield a contradiction.
Intuitionistic logic only lets you conclude that a proposition is true if you have a constructive/intuitionistic proof of it. It doesn’t say that a proposition for which there is no proof, is therefore not true.
As a core example of this, in intuitionistic logic, one doesn’t have the LEM, but, one certainly doesn’t have that the LEM is false. In fact, one has that the LEM isn’t false.
No, it doesn’t.
Many of them are (unfortunately) moral relativists. However, that doesn’t mean their goals are to make the models match their personal moral standards.
While there is a lot of disagreement about what is right and wrong, there is also a lot of widespread agreement.
If we could guarantee that on every moral issue on which there is currently widespread agreement (… and which there would continue to be widespread agreement if everyone thought faster with larger working memories and spent time thinking about moral philosophy) that any future powerful AI models would comport with the common view on that issue, then alignment would be considered solved (well, assuming the way this is achieved isn’t be causing people’s moral views to change).
Do companies try to restrict models in more ways than this? Sure, like you gave the example of about Taiwan. And also other things that would get the companies bad press.
reply