I think at this stage, nobody is certain. But I tend to side with more information is better, and you filter what you don't need. Same reason sensor fusion algorithms use the gyroscope and accelerometer in concert to measure movement.
Yes, but that's what I was getting at. You filter those out from the camera. The reverse is probably true as well, where you need depth since an image can't give you enough information.