Hacker News new | past | comments | ask | show | jobs | submit login

The basic argument is trivial: it is plausible that future systems achieve superhuman capability; capable systems necessarily have instrumental goals; instrumental goals tend to converge; human preferences are unlikely to be preserved when other goals are heavily selected for unless intentionally preserved; we don't know how to make AI systems encode any complex preference robustly.

Robert Miles' videos are among the best presented arguments about specific points in this list, primary on the alignment side rather than the capabilities side, that I have seen for casual introduction.

Eg. this one on instrumental convergence: https://youtube.com/watch?v=ZeecOKBus3Q

Eg. this introduction to the topic: https://youtube.com/watch?v=pYXy-A4siMw

He also has the community-led AI Safety FAQ, https://aisafety.info, which gives brief answers to common questions.

If you have specific questions I might be able to point to a more specific argument at a higher level of depth.




Technically, I think it's not that instrumental goals tend to converge, but rather that there are instrumental goals which are common to many terminal goals, which are the so-called "convergent instrumental goals".

Some of these goals are ones which we really would rather a misaligned super-intelligent agent not to have. For example:

- self-improvement;

- acquisition of resources;

- acquisition of power;

- avoiding being switched off;

- avoiding having one's terminal goals changed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: