When it comes to building smarter-than-human AI, "try it and see" is never the right answer. You may only get one attempt to get it right, and you don't take "try it and see" chances with existential risk.
(There's been some interesting research into making it possible to monitor and halt a rogue AI, but no matter how promising that looks, it should still be treated as one of many risk mitigation strategies rather than as a panacea. Still better to consider that you might only get one attempt.)
I don't think it makes sense to consider this kind of approach with superintelligence; either it understands and implements human values, in which case attempting to treat it as an adversary is counterproductive, or it fails to understand and implement human values, in which case you've utterly failed on a "better luck next universe" scale.
However, it does make sense to consider this kind of approach with machine learning in general. One of the problems with machine learning techniques is "give us all your data and we'll do smart things with it", which doesn't work out so well if you want to keep such data private. This approach might provide more options in that case, such as offloading some of your expensive computations and learnings without actually exposing your data.
Disagree emphatically. In fact it's the only way to do it because there is no way to know certainly that a superhuman-AGI will ensure the longevity of humanity. I go so far as to argue that it's not even necessary because there is no long term longevity for humanity anyway.
There is this implicit assumption that humans are, should and will always be the apex entity - and I think that is misguided.
If you instead view superhuman-AGI as our rightful offspring, something that we can't understand and is better than us, then all of the existential dread around it goes away.
Dying elderly often express "comfort" in dying when they see that their offspring are reproducing and are smarter than they were. We should see Superhuman-AGI the same way except towards all of humanity.
2) It's reasonable to think about how our values might change in the presence of superintelligence; we certainly shouldn't assume that our present values should forever dictate how everything works. That's different than allowing a view that sentient beings who exist today might have no value.
There's no way to know certainly; there are ways to know that the outcome has higher expected value than not having it, given the vast set of problems it can solve and the massive negative values associated with those problems.
It's basically just a fancy AI-box, and there's little reason to trust those.
